DJing Stream: A Case Study in SRT Audio Architecture for Real-Time Distribution

How a macOS app achieves broadcast-grade audio quality by prioritizing SRT over WebRTC—and why the industry standard might have it backwards.

The Architecture Challenge

Remote DJ streaming presents an interesting streaming engineering problem: how do you deliver uncompressed audio to multiple venues simultaneously while maintaining broadcast-grade quality, without requiring $15,000 broadcast encoders at each endpoint?

The conventional approach combines video and audio into a single RTMP or HLS stream, relies on adaptive bitrate to handle network fluctuations, and accepts the 15-30 second latency that comes with segment-based delivery. DJing Stream, a macOS application designed for professional DJ-to-venue streaming, takes a radically different approach worth examining from a protocol architecture perspective.

DJing Stream App - Remote DJs Broadcast Grade Audio

Separate Streams, Separate Protocols

The core architectural decision is treating audio and video as fundamentally different media requiring different protocols:

Stream	Protocol	Bitrate	Priority
Audio (default)	SRT	~2,304 kbps (PCM)	Primary
Audio (resilient)	HLS	~900-1,400 kbps (ALAC)	Primary
Video	WebRTC	~1,500 kbps	Secondary

This inversion—audio bitrate higher than video—is virtually unheard of in streaming. Most platforms allocate 5-10x more bandwidth to video than audio. Here's the reasoning: for professional venue deployment, audio quality is the only thing that matters. A bar's sound system will expose every compression artifact. The video feed showing the DJ? That's supplementary—nice to have on screens, but not critical to the customer experience.

Why SRT for Audio?

SRT (Secure Reliable Transport) provides several properties essential for professional audio. Notably, HLS does not support LPCM (Linear PCM) audio—it requires codecs like AAC or AC-3 for lossy delivery, or ALAC for lossless. This makes HLS unsuitable for uncompressed audio, though as we'll see below, HLS ALAC now opens the door to lossless streaming over HLS.

Ordered delivery with retransmission: Unlike WebRTC's best-effort model where packets may be dropped or arrive out of order, SRT guarantees ordered delivery with automatic retransmission of lost packets. For audio, a dropped packet means an audible glitch. SRT's ARQ mechanism ensures that if any data gets lost in transit, it's resent before the buffer depletes.

Configurable latency/reliability trade-off: SRT exposes a latency parameter that directly controls the retransmission window. Higher latency = more time for packet recovery = higher reliability. DJing Stream exposes this as a user-facing slider:

Latency Configuration by Use Case:
├── Live venue deployment: 4-5 seconds (maximum reliability)
├── Interactive sessions: 2-3 seconds (accept occasional dropouts)
├── Home listening: 4-6 seconds (prioritize quality)
└── Challenging networks: 8-10 seconds (international, mobile, congested)

Constant bitrate: SRT doesn't adapt bitrate based on network conditions—it maintains consistent quality and relies on the retransmission buffer to absorb variations. This is critical for audio where adaptive bitrate means audible quality fluctuations.

Why WebRTC for Video?

WebRTC remains the right choice for video, for different reasons:

Real-time feedback: DJs want to see the crowd; venues may want to display the DJ performing. This requires low latency even at the cost of quality.
NAT traversal: WebRTC's ICE/STUN/TURN infrastructure handles the complexity of peer-to-peer video between DJs and venues behind NATs.
Acceptable degradation: Video quality fluctuations are visually tolerable in a way audio glitches are not.

The key insight: if video stutters, audio stays perfect. The streams are completely independent. Toggle video off entirely to save resources without affecting audio.

Uncompressed PCM Over SRT

Where most streaming platforms use AAC or Opus at 128-320 kbps, DJing Stream transmits 24-bit PCM audio:

Audio Specifications:
├── Format: Uncompressed 24-bit PCM
├── Sample rate: 44.1 kHz or 48 kHz (auto-detected)
├── Bitrate: ~2,304 kbps
├── Container: MPEG-TS
└── Transport: SRT

For context, Spotify's highest quality streams at 320 kbps using lossy compression. DJing Stream delivers more than seven times the bitrate with zero compression artifacts. The trade-off is bandwidth—each listener consumes approximately 2.5 Mbps for audio only.

HLS ALAC: Lossless Audio for Tough Conditions

The most recent addition to DJing Stream's protocol arsenal is HLS with ALAC (Apple Lossless Audio Codec). While SRT with uncompressed PCM remains the gold standard for audio quality, HLS ALAC adds a resilient alternative for challenging network scenarios—without sacrificing lossless audio.

ALAC is a lossless codec: every single sample is reconstructed bit-for-bit at the receiver. Unlike AAC or Opus, there are no compression artifacts, no spectral holes, no pre-echo. The audio that arrives at the venue's sound system is mathematically identical to what left the DJ's mixer. The difference from uncompressed PCM is purely in transport efficiency—ALAC achieves roughly 40-60% compression, cutting bandwidth requirements significantly:

HLS ALAC Audio Specifications:
├── Format: ALAC (Apple Lossless Audio Codec)
├── Quality: Lossless (bit-perfect reconstruction)
├── Bitrate: ~900-1,400 kbps (vs ~2,304 kbps for PCM)
├── Container: fMP4 segments over HLS
└── Bandwidth savings: ~40-50% vs uncompressed PCM

The key advantage is network resilience. HLS's segment-based delivery introduces a playback buffer that absorbs network jitter and temporary connectivity drops far more gracefully than SRT's real-time retransmission model. For venues on congested Wi-Fi, international streams crossing multiple ISP boundaries, or mobile tethering setups, HLS ALAC provides a fallback that keeps playing through conditions that would cause SRT to stutter.

The trade-off is latency. Where SRT delivers audio in 2-10 seconds, HLS's segment-based approach adds overhead—typically 10-20 seconds end-to-end. For most venue deployments this is perfectly acceptable: the audience doesn't need sub-second sync with the DJ's movements, they need uninterrupted, lossless audio from the speakers.

This gives operators a practical decision matrix:

Protocol Selection:
├── Stable network + lowest latency → SRT with uncompressed PCM
├── Tough network + lossless quality → HLS with ALAC
└── Video monitoring (any network)   → WebRTC

The DJ selects the audio transport that best suits their network conditions—SRT for stable connections where low latency matters, or HLS ALAC when reliability is the priority.

Hub-and-Spoke Distribution

The network architecture uses a relay model rather than peer-to-peer:

DJ Mixer
    │
    ▼ USB/Thunderbolt
  macOS (AVFoundation capture)
    │
    ▼ MPEG-TS/SRT
  SRT Relay Server
    │
    ├──────────────────┬──────────────────┐
    ▼                  ▼                  ▼
 Venue 1            Venue 2            Venue N
 (SRT Subscriber)   (SRT Subscriber)   (SRT Subscriber)

The DJ publishes a single stream regardless of listener count. The relay server handles fan-out distribution. This keeps upload bandwidth requirements constant for the DJ while enabling simultaneous multi-venue delivery.

Each venue then routes the SRT stream through AVAudioEngine to their sound system or AirPlay endpoints.

Apple Silicon as Broadcast Infrastructure

Traditional broadcast contribution encoders from manufacturers like Comrex or Tieline cost $3,000-$15,000 per endpoint. They achieve slightly lower latency (1-2 seconds) but operate point-to-point—requiring separate hardware for each venue connection.

DJing Stream runs on consumer Macs. Apple Silicon's unified memory architecture and hardware-accelerated media processing enable what previously required dedicated broadcast equipment:

AVFoundation for low-latency audio capture from any USB/Thunderbolt interface
Hardware-accelerated encoding for video (when enabled)
Efficient SRT processing for reliable transport

A refurbished Mac mini M1 ($250-300) handles broadcast-grade streaming without breaking a sweat. The barrier to entry drops from thousands of dollars to existing Mac hardware.

Comparison with Consumer Platforms

Why not just use Mixcloud Live, Twitch, or YouTube Live? Beyond the audio quality limitations (lossy compression, adaptive bitrate), there's a licensing consideration that streaming engineers should understand:

Consumer streaming platforms are licensed for personal listening—they hold public performance licenses for their platform delivery. However, venues playing that content through their sound systems create a secondary public performance that requires the venue's own PRO licensing (ASCAP, BMI, SESAC, SACEM, etc.). Many venues operating in this grey area don't realize the distinction.

DJing Stream positions itself as transport infrastructure for venues that already hold appropriate public performance licenses—the same licensing they need for any live DJ or background music system.

Technical Specifications Summary

Parameter	Value
Audio format (SRT)	Uncompressed 24-bit PCM
Audio format (HLS)	ALAC (lossless)
Audio sample rate	44.1 kHz / 48 kHz (auto)
Audio bitrate (SRT)	~2,304 kbps
Audio bitrate (HLS ALAC)	~900-1,400 kbps
Audio transport	SRT (MPEG-TS) or HLS (fMP4)
Video format	H.264 720p
Video transport	WebRTC
SRT latency	2-10 seconds (configurable)
HLS latency	10-20 seconds E2E
Platform	macOS 15+ (Sequoia)
Architecture	Apple Silicon recommended

Implementation Considerations

For streaming engineers evaluating similar architectures, several design decisions are worth noting:

Protocol independence: Separating audio and video streams allows each to use optimal protocols without compromise. The architectural complexity is higher but the quality benefits are substantial. Perfect audio/video sync is not essential for DJ streaming—but real-time visual feedback is a must. Standard segment-based protocols like HLS introduce 15-30 seconds of latency, making visual monitoring impossible. WebRTC solves this for video while SRT handles the audio quality requirements.

User-exposed latency control: Rather than hiding latency behind "low latency mode" toggles, exposing the actual parameter with use-case guidance lets operators make informed trade-offs.

Relay architecture vs. P2P: The hub-and-spoke model adds a relay hop but dramatically simplifies multi-destination delivery and keeps source bandwidth constant. For any application requiring one-to-many distribution, this is likely the correct choice.

Audio-first bitrate allocation: For any application where audio quality is the primary value proposition, consider whether the standard video-heavy bandwidth allocation makes sense for your use case.

Conclusion

DJing Stream represents an interesting departure from conventional streaming architecture: prioritizing SRT reliability over WebRTC speed for audio, allocating more bandwidth to audio than video, adding HLS ALAC for lossless resilience in tough conditions, and leveraging Apple Silicon to democratize broadcast-grade transport.

Whether you're building venue streaming systems, remote production workflows, or any application where audio fidelity is critical, the architectural patterns here—separate protocols for separate media types, lossless alternatives for challenging networks, configurable latency trade-offs, and hub-and-spoke distribution—offer a template worth considering.

The application is available on the Mac App Store. More information at djing.com.

DJing Stream - Remote DJs