In this blog post I shall discuss the architecture & protocols powering WebRTC. This blog series is for my upcoming WebRTC workshop at the Web Summer Camp, Croatia 2017.
WebRTC architecture consists of over a dozen different standards, covering both the application and browser APIs jointly operated by WEBRTC – W3C Working Group and RTCWEB – IETF Working Group. While its primary purpose is to enable real-time communication between browsers, it is also designed such that it can be integrated with existing communication systems: voice over IP (VOIP), various SIP clients, and even the public switched telephone network (PSTN), just to name a few.
WebRTC brings with it all the capabilities of the Web to the telecommunications world, a trillion dollar industry!
Voice and Video Engines
Enabling RTC requires that the browser be able to access the system hardware to capture both voice and video. Raw voice and video streams are not sufficient on their own. They have to be –
- Processed for noise reduction and echo cancellation
- Automatically encoded with one of the optimized narrowband or wideband audio codecs
- Used with a special error-concealment algorithm to hide the negative effects of network jitter and packet loss
- Process the raw stream to enhance quality
- Synchronize and adjust the stream
- to match the continuously fluctuating bandwidth and latency between the clients
- Decode the received stream in real-time
- Adjust the decoded stream to network jitter and latency delays
VoiceEngine is a framework for the audio media chain, from sound card to the network.
VideoEngine is a framework for the video media chain, from camera to the network, and from network to the screen.
- iSAC: A wideband and super wideband audio codec for VoIP and streaming audio. iSAC uses 16 kHz or 32 kHz sampling frequency with an adaptive and variable bit rate of 12 to 52 kbps.
- iLBC: A narrowband speech codec for VoIP and streaming audio. iLBC uses 8 kHz sampling frequency with a bitrate of 15.2 kbps for 20ms frames and 13.33 kbps for 30ms frames.
- Opus: Supports constant and variable bitrate encoding from 6 kbit/s to 510 kbit/s. Opus supports frame sizes from 2.5 ms to 60 ms, and various sampling rates from 8 kHz (with 4 kHz bandwidth) to 48 kHz (with 20 kHz bandwidth, where the entire hearing range of the human auditory system can be reproduced).
Video Codecs (VP8)
- The VP8 codec used for video encoding requires 100–2,000+ Kbit/s of bandwidth, and the bitrate depends on the quality of the streams.
- This is well suited for RTC as it is designed for low latency.
- This is a video codec from the WebM project.
Real-Time Network Transports
Unlike all other browser communication which use Transmission Control Protocol (TCP), WebRTC transports its data over User Datagram Protocol (UDP).
The requirement for timeliness over reliability is the primary reason why the UDP protocol is a preferred transport for delivery of real-time data.
- TCP delivers a reliable, ordered stream of data. If an intermediate packet is lost, then TCP buffers all the packets after it, waits for a retransmission, and then delivers the stream in order to the application.
- UDP offers no promises on reliability or order of the data, and delivers each packet to the application the moment it arrives. In effect, it is a thin wrapper around the best-effort delivery model offered by the IP layer of our network stacks.
UDP is the foundation for real-time communication in the browser. In order to meet all the requirements of WebRTC, the browser needs a large supporting cast of protocols and services above it to traverse the many layers of NATs and firewalls, negotiate the parameters for each stream, provide encryption of user data, implement congestion and flow control, and more!
The RTP Stack
- ICE: Interactive Connectivity Establishment
- STUN: Session Traversal Utilities for Network Address Translation (NAT)
- TURN: Traversal Using Relays around NAT
- SDP: Session Description Protocol
- DTLS: Datagram Transport Layer Security
- SCTP: Stream Control Transport Protocol
- SRTP: Secure Real-Time Transport Protocol
- ICE, STUN, and TURN are necessary to establish and maintain a peer-to-peer connection over UDP.
- DTLS is used to secure all data transfers between peers; encryption is a mandatory feature of WebRTC.
- SCTP and SRTP are the application protocols used to multiplex the different streams, provide congestion and flow control, and provide partially reliable delivery and other additional services on top of UDP.
- Session Description Protocol (SDP) is a data format used to negotiate the parameters of the peer-to-peer connection. However, the SDP “offer” and “answer” are communicated out of band, which is why SDP is missing from the protocol diagram.