Posted in 2017, conference, JavaScript, webrtc, websc

WebRTC – behind the browser

In this blog post I shall discuss how WebRTC works in the browser. Here is the full blog series.

The browser abstracts most of the WebRTC complexity behind three primary JavaScript APIs:

  • MediaStream: acquisition of audio and video streams
  • RTCPeerConnection: communication of audio and video data
  • RTCDataChannel: communication of arbitrary application data

Connections between two peers are created using RTCPeerConnection interface. Once a connection has been established and opened, media streams (MediaStreams) and/or data channels (RTCDataChannels) can be added to the connection. The above APIs are just the tip of the iceberg: signaling, peer discovery, connection negotiation, security are just a few components required to bring it all together.

Peer – to – peer connection setup

The RTCPeerConnection interface is responsible for managing the full life cycle of each peer-to-peer connection.

Screen Shot 2017-08-19 at 16.37.40
RTCPeerConnection API

RTCPeerConnection –

  • Manages the full ICE workflow for NAT traversal
  • Sends automatic (STUN) keep-alives between peers
  • Keeps track of local and remote streams
  • Triggers automatic stream renegotiation as required
  • Provides necessary APIs to –
    • generate the connection offer
    • accept the answer
    • query the connection for its current state, and more!

In order to establish a successful peer-to-peer connection, browser must –

  1. Notify the other peer of the intent to open a peer-to-peer connection, such that it knows to start listening for incoming packets.
  2. Identify potential routing paths for the peer-to-peer connection on both sides of the connection and relay this information between peers.
  3. Exchange the necessary information about the parameters of the different media and data streams, protocols, encodings used, and so on.

The built-in ICE protocol performs the necessary routing and connectivity checks (Step 2). However, the delivery of notifications (signaling) (Step 3) and initial session negotiation (Step 1) is left to the application.

Signaling is not defined by WebRTC.

The thinking behind WebRTC call setup has been to fully specify and control the media plane, but to leave the signaling plane up to the application as much as possible. Why? Different applications may prefer to use different protocols, such as –

  • Session Initiation Protocol (SIP) – Application-level signaling protocol, widely used for voice over IP (VoIP) and videoconferencing over IP networks.
  • Jingle – Signaling extension for the XMPP protocol, used for session control of voice over IP and videoconferencing over IP networks.
    • Extensible Messaging and Presence Protocol (XMPP) is an open XML technology for real-time communication, which powers a wide range of applications including instant messaging, presence and collaboration.
  • ISDN User Part (ISUP) – Signaling protocol used for setup of telephone calls in many public switched telephone networks around the globe.
    • Integrated Services Digital Network (ISDN) – is a set of communication standards for simultaneous digital transmission of voice, video, data, and other network services over the traditional circuits of the public switched telephone network

What is Signaling?

Before any connectivity checks or session negotiation can occur, we must find out if the other peer is reachable and if it is willing to establish the connection.

Signaling is a process of communication to exchange information before setting up a connection.

The caller extends an offer, and the callee returns an answer.

SDP – Session Description Protocol

SDP is a standard for describing the multimedia content of the connection such as resolution, formats, codecs, encryption, etc so that both peers can understand each other once the data is transferring.

WebRTC uses SDP to define the media characteristics of a call.

JSEP – JavaScript Session Establishment Protocol

Signaling methods and protocols are not specified by WebRTC standards. This approach is outlined by JSEP – JavaScript Session Establishment Protocol. JSEP’s architecture also avoids a browser having to save state: that is, to function as a signaling state machine. This would be problematic if, for example, signaling data was lost each time a page was reloaded. Instead, signaling state can be saved on a server.

Screen Shot 2017-08-19 at 22.56.12
JSEP architecture


JSEP’s handling of session descriptions is simple and straightforward.

  1. Whenever an offer/answer exchange is needed, the initiating side creates an offer by calling a createOffer() API.
  2. The application optionally modifies that offer, and then uses it to set up its local config via the setLocalDescription() API.
  3. The offer is then sent off to the remote side over its preferred signaling mechanism (e.g., WebSockets).
  4. Upon receipt of that offer, the remote party installs it using the setRemoteDescription() API.
  5. When the call is accepted, the callee uses the createAnswer() API to generate an appropriate answer, applies it using setLocalDescription(), and sends the answer back to the initiator over the signaling channel.
  6. When the offerer gets that answer, it installs it using setRemoteDescription(), and initial setup is complete.
  7. This process can be repeated for additional offer/answer exchanges.
Screen Shot 2017-08-19 at 23.16.00
Offer/Answer exchange between peers

Screen Shot 2017-08-19 at 23.23.39

What is ICE?

Interactive Connectivity Establishment (ICE) is a framework to allow the web browser to connect with peers. There are many reasons why a straight up connection from Peer A to Peer B simply won’t work.

  • It needs to bypass firewalls that would prevent opening connections
  • Give a unique address if like most situations the device doesn’t have a public IP address
  • Relay data through a server if the router doesn’t allow to directly connect with peers.

WebRTC’s ICE framework manages most of this complexity:

  • Each RTCPeerConnection connection object contains an “ICE agent.”
  • ICE agent is responsible for gathering local IP, port tuples (candidates).
  • ICE agent is responsible for performing connectivity checks between peers.
  • ICE agent is responsible for sending connection keepalives.

Once a session description (local or remote) is set, local ICE agent automatically begins the process of discovering all the possible candidate IP, port tuples for the local peer:

  1. ICE agent queries the operating system for local IP addresses.
  2. If configured, ICE agent queries an external STUN server to retrieve the public IP and port tuple of the peer.
  3. If configured, ICE agent appends the TURN server as a last resort candidate. If the peer-to-peer connection fails, the data will be relayed through the specified intermediary.

ICE and Signaling

ICE is part of WebRTC, but Signaling isn’t

  • JSEP decouples the ICE state machine from the overall signaling state machine.
  • The ICE state machine must remain in the browser, because only the browser has the necessary knowledge of candidates and other transport info.
  • Through its abstraction of signaling, the JSEP approach does require the application to be aware of the signaling process.

What is STUN, NAT & TURN?

Session Traversal Utilities for NAT (STUN) is a protocol to discover your public address and determine any restrictions in your router that would prevent a direct connection with a peer. The client will send a request to a STUN server on the internet who will reply with the client’s public address and whether or not the client is accessible behind the router’s NAT.

Network Address Translation (NAT) is used to give the device a public IP address. A router will have a public IP address and every device connected to the router will have a private IP address. Requests will be translated from the device’s private IP to the router’s public IP with a unique port. That way a unique public IP for each device isn’t needed but can still be discovered on the internet.

Some routers will have restrictions on who can connect to devices on the network. This can mean that even though we have the public IP address found by the STUN server, not anyone can create a connection. In this situation we need to turn to TURN. Some routers using NAT employ a restriction called ‘Symmetric NAT’. This means the router will only accept connections from peers you’ve previously connected to.

Traversal Using Relays around NAT (TURN) is meant to bypass the Symmetric NAT restriction by opening a connection with a TURN server and relaying all information through that server. You would create a connection with a TURN server and tell all peers to send packets to the server which will then be forwarded to you. This obviously comes with some overhead so is only used if there are no other alternatives.

Below is the summary of the above:



Screen Shot 2017-08-19 at 23.59.53


  1. WebRTC – Browser APIs and Protocols
  2. WebRTC Infrastructure
  3. JSEP
  4. WebRTC Acronyms 

PS: Images used in this post are copied from the internet from one of the above links. I don’t intend to violate any copyright laws, this blog post is a compilation of my notes for my upcoming workshop.

Posted in 2017, conference, JavaScript, webrtc, websc

WebRTC – architecture & protocols

In this blog post I shall discuss the architecture & protocols powering WebRTC. This blog series is for my upcoming WebRTC workshop at the Web Summer Camp, Croatia 2017.

While WebRTC has greatly simplified real time communication on the web through the browser, it’s background comprises of a collection of standards, protocols, and JavaScript APIs! The power of WebRTC is such that only a dozen lines of JavaScript code and any web application can enable peer-to-peer audio, video, and data sharing between browsers (peers).

The Architecture

Screen Shot 2017-08-19 at 13.24.23
WebRTC Architecture

WebRTC architecture consists of over a dozen different standards, covering both the application and browser APIs jointly operated by WEBRTC – W3C Working Group and RTCWEB – IETF Working Group. While its primary purpose is to enable real-time communication between browsers, it is also designed such that it can be integrated with existing communication systems: voice over IP (VOIP), various SIP clients, and even the public switched telephone network (PSTN), just to name a few.

WebRTC brings with it all the capabilities of the Web to the telecommunications world, a trillion dollar industry!

Voice and Video Engines

Enabling RTC requires that the browser be able to access the system hardware to capture both voice and video. Raw voice and video streams are not sufficient on their own. They have to be –

  1. Processed for noise reduction and echo cancellation
  2. Automatically encoded with one of the optimized narrowband or wideband audio codecs
  3. Used with a special error-concealment algorithm to hide the negative effects of network jitter and packet loss


  1. Process the raw stream to enhance quality
  2. Synchronize and adjust the stream
    • to match the continuously fluctuating bandwidth and latency between the clients


  1. Decode the received stream in real-time
  2. Adjust the decoded stream to network jitter and latency delays
Screen Shot 2017-08-19 at 14.00.17
Voice and Video Engines

The fully featured audio and video engines of WebRTC take care of all the signal processing. While all of this processing is done directly by the browser, the web application receives the optimized media stream, which it can then forward to its peers using one of the JavaScript APIs!

VoiceEngine is a framework for the audio media chain, from sound card to the network.

VideoEngine is a framework for the video media chain, from camera to the network, and from network to the screen.

Audio Codecs

Screen Shot 2017-08-19 at 14.24.25
wideband audio
  1. iSAC: A wideband and super wideband audio codec for VoIP and streaming audio. iSAC uses 16 kHz or 32 kHz sampling frequency with an adaptive and variable bit rate of 12 to 52 kbps.
  2. iLBC: A narrowband speech codec for VoIP and streaming audio. iLBC uses 8 kHz sampling frequency with a bitrate of 15.2 kbps for 20ms frames and 13.33 kbps for 30ms frames.
  3. Opus: Supports constant and variable bitrate encoding from 6 kbit/s to 510 kbit/s. Opus supports frame sizes from 2.5 ms to 60 ms, and various sampling rates from 8 kHz (with 4 kHz bandwidth) to 48 kHz (with 20 kHz bandwidth, where the entire hearing range of the human auditory system can be reproduced).

Video Codecs (VP8)

  • The VP8 codec used for video encoding requires 100–2,000+ Kbit/s of bandwidth, and the bitrate depends on the quality of the streams.
  • This is well suited for RTC as it is designed for low latency.
  • This is a video codec from the WebM project.

Real-Time Network Transports

Unlike all other browser communication which use Transmission Control Protocol (TCP), WebRTC transports its data over User Datagram Protocol (UDP).

The requirement for timeliness over reliability is the primary reason why the UDP protocol is a preferred transport for delivery of real-time data.

  • TCP delivers a reliable, ordered stream of data. If an intermediate packet is lost, then TCP buffers all the packets after it, waits for a retransmission, and then delivers the stream in order to the application. 
  • UDP offers no promises on reliability or order of the data, and delivers each packet to the application the moment it arrives. In effect, it is a thin wrapper around the best-effort delivery model offered by the IP layer of our network stacks.
Screen Shot 2017-08-19 at 15.09.13
WebRTC network protocol stack

UDP is the foundation for real-time communication in the browser. In order to meet all the requirements of WebRTC, the browser needs a large supporting cast of protocols and services above it to traverse the many layers of NATs and firewalls, negotiate the parameters for each stream, provide encryption of user data, implement congestion and flow control, and more!

The RTP Stack

  1. ICE: Interactive Connectivity Establishment
  2. STUN: Session Traversal Utilities for Network Address Translation (NAT)
  3. TURN: Traversal Using Relays around NAT
  4. SDP: Session Description Protocol
  5. DTLS: Datagram Transport Layer Security
  6. SCTP: Stream Control Transport Protocol
  7. SRTP: Secure Real-Time Transport Protocol
  • ICE, STUN, and TURN are necessary to establish and maintain a peer-to-peer connection over UDP.
  • DTLS is used to secure all data transfers between peers; encryption is a mandatory feature of WebRTC.
  • SCTP and SRTP are the application protocols used to multiplex the different streams, provide congestion and flow control, and provide partially reliable delivery and other additional services on top of UDP.
  • Session Description Protocol (SDP) is a data format used to negotiate the parameters of the peer-to-peer connection. However, the SDP “offer” and “answer” are communicated out of band, which is why SDP is missing from the protocol diagram.



Posted in 2017, conference, JavaScript, webrtc, websc

WebRTC – a detailed history

The Past

The need to connect virtually and have video conferences and communications on the web has been around for a while. In the past, Flash was one of the popular ways to achieve this. The alternate to this was plug-ins or an installable application on the PC. From a user’s perspective, all these methods required additional installations. From a developer’s perspective, they had to study complex stack and protocols.

The birth of WebRTC

WebRTC technology was first developed by Global IP Solutions (or GIPS), a company founded around 1999 in Sweden. In 2011 GIPS was acquired by Google and the W3C started to work on a standard for WebRTC. Since then Google and other major players in the web-browser market, such as Mozilla and Opera, have been showing great support for WebRTC.

Screen Shot 2017-08-06 at 12.12.30

The newly formed Chrome WebRTC team focused on open sourcing all the low level RTC components such as codecs and echo cancellation techniques. The team added an additional layer – a JavaScript API as an integration layer to web browsers. By combining these audio and video components with a JS interface, this spurred innovation in the RTC market.

A few lines of JS code and no licensing, integration of components or deep knowledge of RTC!

WebRTC – A Standard

WebRTC is a standard for real-time, plugin-free video, audio and data communication maintained by –

  • IETF – defines the formats and protocols used to communicate between browsers
  • W3C – defines the APIs that a Web application can use to control this communication

WebRTC is a standard that has different implementations

WebRTC is a standard that has different implementations, such as OpenWebRTC and The initial version of the OpenWebRTC implementation was developed internally at Ericsson Research. The latter is maintained by the Google Chrome team.

Cover image for this post is from here.
Posted in 2017, conference, JavaScript, webrtc, websc

Web of Things – Peer to Peer Web

At the end of this month, I am attending the Web Summer Camp at Rovinj, Croatia and I would be running a half-day workshop on 01.09.2017 about Web of Things – Peer to Peer Web.

Here is a little abstract about the workshop:

The web today is a growing universe. Over the years, web technologies have evolved to give web developers the ability to create new generations of useful web experiences. One such feature is WebRTC, which provides browsers and mobile applications with Real Time Communication (RTC) capabilities via simple JavaScript APIs. In this hands-on workshop you will learn to build applications to support real time communication on the web. You will build an app to get video and take snapshots with your webcam and share them peer-to-peer via WebRTC. Along the way, you’ll learn how to use the core WebRTC APIs and set up a messaging server using Node.

The focus of this workshop is hands-on coding exercises to build simple and fun WebRTC applications. WebRTC is a huge topic and explaining its technicalities + hands-on coding cannot be entirely covered in a 3-hour session. This blog post series is to aid the participants to know a bit more about WebRTC.

In this post I shall discuss about the title: Web of Things – Peer to Peer Web.

The Web of Things (WoT) is a term used to describe approaches, software architectural styles and programming patterns that allow real-world objects to be part of the World Wide Web. The Web of Things reuses existing and well-known web standards used in the programmable web (e.g., REST, HTTP, JSON), semantic web (e.g., JSON-LD, Microdata, etc.), the real-time web (e.g., Websockets) and the social web (e.g., oauth or social networks).

Peer to Peer Web is in the context of WebRTC which enables peer-to-peer audio, video, and data sharing between browsers (peers). Instead of relying on third-party plug-ins or proprietary software, WebRTC turns real-time communication into a standard feature that any web application can leverage via a simple JavaScript API.

WebRTC is P2P?

This is the traditional definition of the term peer to peer in the context of networks: Each computer acts as both the client and the server, communicating directly with the other computers.

A peer to peer network is often compared with a client server network and this is the obvious difference between the two: A client-server network involves multiple clients, or workstations, connecting to at least one central server. Most data and applications are installed on the server.

WebRTC enables peer to peer communication. But, WebRTC still needs servers!

  • For clients to exchange metadata to coordinate communication. This is called Signaling.
  • To cope with network address translators (NATs) and firewalls.

WebRTC is not only about a standard specification with a default implementation in browsers, but is also an open source media engine.