I don’t know where to begin, but I promised Zainab Bawa I would write-up about my experience in speaking at tech conferences and into the open source world. Sorry for the late post. Here is the speaking journey!
The ‘first’ proposal
The ‘first’ talk
I worked on the feedback and I submitted my second talk proposal which was accepted and I gave my first tech talk at JSConfAsia Singapore in 2015.
The ‘first’ international conference
Prior to me submitting my first-ever talk proposal (for JSFoo), one fine morning I woke up to this fantastic email.
This email was the biggest game changer in my tech life. I had stumbled upon the diversity scholarship from JSConfEU and decided to take a bold step and apply. When I was applying, little did I know that I would bag the scholarship. My joy knew no bounds when I read this email. I can still recall that morning when I had the first dose of confidence++ 🙂
This was just the beginning…. Read More. Below are a couple of places where I have spoken till date:
JSConf Asia 2015, Singapore
Grace Hopper Conference 2015, India
JSConfBP 2016, Hungary
JSUnConf 2016, Germany
CSVConf 2016, Germany
Frontend Union Conf 2016, Lithuania
Web Summer Camp 2017, Croatia
MozFest 2017, London (Upcoming this year in October)
Women Who Code Bangalore
Women Who Code Berlin
Zalando Tech Meetups
PS: I have a long awaiting to-do to add all of the above links in one place in my website princiya.com (I hope to do it soon ^_^)
In short, this has been my wonderful journey (and counting) into the speaker’s world.
Success is when, preparation meets opportunity
Words of wisdom
I didn’t achieve all of this over-night. I have constantly persevered and toiled in lot of blood and sweat. The best thing is, I never gave up! And the most important thing – ‘One doesn’t need to be an expert to speak.’
Why should you consider speaking at conferences?
Speak – we need diversity!
Speak – you will definitely learn a lot
Speak – you will teach others what you have learned
Speak – to get a confidence boost
Speak – you know you are awesome
Speak – the community needs new people and new ideas
In this blog post I shall discuss how WebRTC works in the browser. Here is the full blog series.
MediaStream: acquisition of audio and video streams
RTCPeerConnection: communication of audio and video data
RTCDataChannel: communication of arbitrary application data
Connections between two peers are created using RTCPeerConnection interface. Once a connection has been established and opened, media streams (MediaStreams) and/or data channels (RTCDataChannels) can be added to the connection. The above APIs are just the tip of the iceberg: signaling, peer discovery, connection negotiation, security are just a few components required to bring it all together.
Peer – to – peer connection setup
The RTCPeerConnection interface is responsible for managing the full life cycle of each peer-to-peer connection.
Manages the full ICE workflow for NAT traversal
Sends automatic (STUN) keep-alives between peers
Keeps track of local and remote streams
Triggers automatic stream renegotiation as required
Provides necessary APIs to –
generate the connection offer
accept the answer
query the connection for its current state, and more!
In order to establish a successful peer-to-peer connection, browser must –
Notify the other peer of the intent to open a peer-to-peer connection, such that it knows to start listening for incoming packets.
Identify potential routing paths for the peer-to-peer connection on both sides of the connection and relay this information between peers.
Exchange the necessary information about the parameters of the different media and data streams, protocols, encodings used, and so on.
The built-in ICE protocol performs the necessary routing and connectivity checks (Step 2). However, the delivery of notifications (signaling) (Step 3) and initial session negotiation (Step 1) is left to the application.
Signaling is not defined by WebRTC.
The thinking behind WebRTC call setup has been to fully specify and control the media plane, but to leave the signaling plane up to the application as much as possible. Why? Different applications may prefer to use different protocols, such as –
Session Initiation Protocol (SIP) – Application-level signaling protocol, widely used for voice over IP (VoIP) and videoconferencing over IP networks.
Jingle – Signaling extension for the XMPP protocol, used for session control of voice over IP and videoconferencing over IP networks.
Extensible Messaging and Presence Protocol (XMPP) is an open XML technology for real-time communication, which powers a wide range of applications including instant messaging, presence and collaboration.
ISDN User Part (ISUP) – Signaling protocol used for setup of telephone calls in many public switched telephone networks around the globe.
Integrated Services Digital Network (ISDN) – is a set of communication standards for simultaneous digital transmission of voice, video, data, and other network services over the traditional circuits of the public switched telephone network
What is Signaling?
Before any connectivity checks or session negotiation can occur, we must find out if the other peer is reachable and if it is willing to establish the connection.
Signaling is a process of communication to exchange information before setting up a connection.
The caller extends an offer, and the callee returns an answer.
SDP – Session Description Protocol
SDP is a standard for describing the multimedia content of the connection such as resolution, formats, codecs, encryption, etc so that both peers can understand each other once the data is transferring.
WebRTC uses SDP to define the media characteristics of a call.
JSEP and SDP
JSEP’s handling of session descriptions is simple and straightforward.
Whenever an offer/answer exchange is needed, the initiating side creates an offer by calling a createOffer() API.
The application optionally modifies that offer, and then uses it to set up its local config via the setLocalDescription() API.
The offer is then sent off to the remote side over its preferred signaling mechanism (e.g., WebSockets).
Upon receipt of that offer, the remote party installs it using the setRemoteDescription() API.
When the call is accepted, the callee uses the createAnswer() API to generate an appropriate answer, applies it using setLocalDescription(), and sends the answer back to the initiator over the signaling channel.
When the offerer gets that answer, it installs it using setRemoteDescription(), and initial setup is complete.
This process can be repeated for additional offer/answer exchanges.
It needs to bypass firewalls that would prevent opening connections
Give a unique address if like most situations the device doesn’t have a public IP address
Relay data through a server if the router doesn’t allow to directly connect with peers.
WebRTC’s ICE framework manages most of this complexity:
Each RTCPeerConnection connection object contains an “ICE agent.”
ICE agent is responsible for gathering local IP, port tuples (candidates).
ICE agent is responsible for performing connectivity checks between peers.
ICE agent is responsible for sending connection keepalives.
Once a session description (local or remote) is set, local ICE agent automatically begins the process of discovering all the possible candidate IP, port tuples for the local peer:
ICE agent queries the operating system for local IP addresses.
If configured, ICE agent queries an external STUN server to retrieve the public IP and port tuple of the peer.
If configured, ICE agent appends the TURN server as a last resort candidate. If the peer-to-peer connection fails, the data will be relayed through the specified intermediary.
ICE and Signaling
ICE is part of WebRTC, but Signaling isn’t
JSEP decouples the ICE state machine from the overall signaling state machine.
The ICE state machine must remain in the browser, because only the browser has the necessary knowledge of candidates and other transport info.
Through its abstraction of signaling, the JSEP approach does require the application to be aware of the signaling process.
What is STUN, NAT & TURN?
Session Traversal Utilities for NAT (STUN) is a protocol to discover your public address and determine any restrictions in your router that would prevent a direct connection with a peer. The client will send a request to a STUN server on the internet who will reply with the client’s public address and whether or not the client is accessible behind the router’s NAT.
Network Address Translation (NAT) is used to give the device a public IP address. A router will have a public IP address and every device connected to the router will have a private IP address. Requests will be translated from the device’s private IP to the router’s public IP with a unique port. That way a unique public IP for each device isn’t needed but can still be discovered on the internet.
Some routers will have restrictions on who can connect to devices on the network. This can mean that even though we have the public IP address found by the STUN server, not anyone can create a connection. In this situation we need to turn to TURN. Some routers using NAT employ a restriction called ‘Symmetric NAT’. This means the router will only accept connections from peers you’ve previously connected to.
Traversal Using Relays around NAT (TURN) is meant to bypass the Symmetric NAT restriction by opening a connection with a TURN server and relaying all information through that server. You would create a connection with a TURN server and tell all peers to send packets to the server which will then be forwarded to you. This obviously comes with some overhead so is only used if there are no other alternatives.
PS: Images used in this post are copied from the internet from one of the above links. I don’t intend to violate any copyright laws, this blog post is a compilation of my notes for my upcoming workshop.
In this blog post I shall discuss the architecture & protocols powering WebRTC. This blog series is for my upcoming WebRTC workshop at the Web Summer Camp, Croatia 2017.
WebRTC architecture consists of over a dozen different standards, covering both the application and browser APIs jointly operated by WEBRTC – W3C Working Group and RTCWEB – IETF Working Group. While its primary purpose is to enable real-time communication between browsers, it is also designed such that it can be integrated with existing communication systems: voice over IP (VOIP), various SIP clients, and even the public switched telephone network (PSTN), just to name a few.
WebRTC brings with it all the capabilities of the Web to the telecommunications world, a trillion dollar industry!
Voice and Video Engines
Enabling RTC requires that the browser be able to access the system hardware to capture both voice and video. Raw voice and video streams are not sufficient on their own. They have to be –
Processed for noise reduction and echo cancellation
Automatically encoded with one of the optimized narrowband or wideband audio codecs
Used with a special error-concealment algorithm to hide the negative effects of network jitter and packet loss
Process the raw stream to enhance quality
Synchronize and adjust the stream
to match the continuously fluctuating bandwidth and latency between the clients
Decode the received stream in real-time
Adjust the decoded stream to network jitter and latency delays
VoiceEngine is a framework for the audio media chain, from sound card to the network.
VideoEngine is a framework for the video media chain, from camera to the network, and from network to the screen.
iSAC: A wideband and super wideband audio codec for VoIP and streaming audio. iSAC uses 16 kHz or 32 kHz sampling frequency with an adaptive and variable bit rate of 12 to 52 kbps.
iLBC: A narrowband speech codec for VoIP and streaming audio. iLBC uses 8 kHz sampling frequency with a bitrate of 15.2 kbps for 20ms frames and 13.33 kbps for 30ms frames.
Opus: Supports constant and variable bitrate encoding from 6 kbit/s to 510 kbit/s. Opus supports frame sizes from 2.5 ms to 60 ms, and various sampling rates from 8 kHz (with 4 kHz bandwidth) to 48 kHz (with 20 kHz bandwidth, where the entire hearing range of the human auditory system can be reproduced).
Video Codecs (VP8)
The VP8 codec used for video encoding requires 100–2,000+ Kbit/s of bandwidth, and the bitrate depends on the quality of the streams.
This is well suited for RTC as it is designed for low latency.
Unlike all other browser communication which use Transmission Control Protocol (TCP), WebRTC transports its data over User Datagram Protocol (UDP).
The requirement for timeliness over reliability is the primary reason why the UDP protocol is a preferred transport for delivery of real-time data.
TCP delivers a reliable, ordered stream of data. If an intermediate packet is lost, then TCP buffers all the packets after it, waits for a retransmission, and then delivers the stream in order to the application.
UDP offers no promises on reliability or order of the data, and delivers each packet to the application the moment it arrives. In effect, it is a thin wrapper around the best-effort delivery model offered by the IP layer of our network stacks.
UDP is the foundation for real-time communication in the browser. In order to meet all the requirements of WebRTC, the browser needs a large supporting cast of protocols and services above it to traverse the many layers of NATs and firewalls, negotiate the parameters for each stream, provide encryption of user data, implement congestion and flow control, and more!
The RTP Stack
ICE: Interactive Connectivity Establishment
STUN: Session Traversal Utilities for Network Address Translation (NAT)
TURN: Traversal Using Relays around NAT
SDP: Session Description Protocol
DTLS: Datagram Transport Layer Security
SCTP: Stream Control Transport Protocol
SRTP: Secure Real-Time Transport Protocol
ICE, STUN, and TURN are necessary to establish and maintain a peer-to-peer connection over UDP.
DTLS is used to secure all data transfers between peers; encryption is a mandatory feature of WebRTC.
SCTP and SRTP are the application protocols used to multiplex the different streams, provide congestion and flow control, and provide partially reliable delivery and other additional services on top of UDP.
Session Description Protocol (SDP) is a data format used to negotiate the parameters of the peer-to-peer connection. However, the SDP “offer” and “answer” are communicated out of band, which is why SDP is missing from the protocol diagram.
Here is a gif from my latest aframe experiments for Lightbeam.
For Mozfest 2017, I had submitted the following proposal – ‘Lightbeam, an immersive experience‘. While the proposal is still being reviewed, I have been experimenting with Aframe and the above gif is an initial proof of concept 🙂
Here is the excerpt from the proposal:
What will happen in your session?
Lightbeam is a key tool for Mozilla to educate the public about privacy. Using interactive visualisations, Lightbeam’s main goal is to show web tracking, aka, show the first and third party sites you interact with on the Web.
In this session, the participants will get to interact with the trackers in the VR world thus creating an immersive Lightbeam experience. With animated transitions and well-crafted interfaces, this unique Lightbeam experience can make exploring trackers feel more like playing a game. This can be a great medium for engaging an audience who might not otherwise care about web privacy & security.
What is the goal or outcome of your session?
The ultimate goal of this session is for the audience to know and understand about web tracking.
While web tracking isn’t 100% evil (cookies can help your favourite websites stay in business), its workings remain poorly understood. Your personal information is valuable and it’s your right to know what data is being collected about you. The trick is in taking this data and shacking up with third parties to help them come up with new ways to convince you to spend money and give up more information. It would be fine if you decided to give up this information for a tangible benefit, but no one is including you in the decision.
Read this blog post to understand Lightbeam’s migration from SVG to Canvas.
Ignore the transforms and the inversions (this.transform.invert) in this post. Those are part of d3-zoom and explaining the math of this and d3-force is beyond the scope of this blog post.
mousemove event is registered on the canvas element itself.
The mouse <clientX, clientY> positions are re-calculated w.r.t the canvas’s bounding rectangles. This ensures the mouse coordinates are confined to the canvas’s area.
getNodeAtCoordinates(x, y) returns a node, if a node is present at the given <x, y> values.
D3’s force layout has simulation.find(x, y[, radius] which returns the node closest to the position <x, y> with the given search radius. I chose to write isPointInsideCircle() to find out if a node exists at the given <x, y> values. The intention here is to isolate the logic from D3 specific as much as possible.
When you hover over the canvas, and if the mouse coordinates are inside any circle, then there is a node present at these coordinates.
The point <x, y> is
inside the circle if d < r
on the circle if d = r
outside the circle if d > r
Square roots are expensive. Hence d is compared with r*r!
The tooltip has position: absolute.
Because of this property, there is a need to check the tooltips’ left property doesn’t exceed the canvas’s right property, else there will be horizontal scrollbar on the parent container because of overflow-x.
x+tooltipWidth >= canvasRight takes care of the overflow and sets left to x-tooltipWidth.
Setting left to x-tooltipWidth/2 ensures the tooltip arrow is centre aligned to the node.
If a favicon exists for a given node, then it is drawn.
The favicon is drawn at the centre of the circle (firstParty) or triangle (thirdParty).
A square that fits exactly in a circle has a side length of sqrt(2) * radius.
firstParty & thirdParty nodes
Given that we are drawing on a canvas, firstParty is a circle on the canvas. thirdParty is an equilateral triangle.
Given the centre of the circle is at <x, y>, r is the radius of the circumcircle and dr is the radius of the incircle.
zoom and drag
d3-zoom and d3-drag are used to achieve the zoom and drag behaviours respectively. It is quite complex when the two are combined. If you click and drag on the background, the view pans; if you click and drag on a circle, it moves.
d3-drag requires a dragSubject. I am using the same getNodeAtCoordinates(x, y) function which is used to show the tooltips and the logic remains same. This is how drag and zoom are combined for Lightbeam. If there is a node, (dragSubject) then it drags, else it pans.
Here is the d3-zoom implementation.
The tricky part here is the need to distinguish between two coordinate spaces: the world coordinates used to position the nodes and links, and the pointer coordinates representing the mouse or touches. The drag behaviour doesn’t know the view is being transformed by the zoom behaviour, so we must convert between the two coordinate spaces.
This is where transform.invert or transform.apply come into play.
I hope I have done justice to the math in this post!
The need to connect virtually and have video conferences and communications on the web has been around for a while. In the past, Flash was one of the popular ways to achieve this. The alternate to this was plug-ins or an installable application on the PC. From a user’s perspective, all these methods required additional installations. From a developer’s perspective, they had to study complex stack and protocols.
The birth of WebRTC
WebRTC technology was first developed by Global IP Solutions (or GIPS), a company founded around 1999 in Sweden. In 2011 GIPS was acquired by Google and the W3C started to work on a standard for WebRTC. Since then Google and other major players in the web-browser market, such as Mozilla and Opera, have been showing great support for WebRTC.
A few lines of JS code and no licensing, integration of components or deep knowledge of RTC!
WebRTC – A Standard
WebRTC is a standard for real-time, plugin-free video, audio and data communication maintained by –
IETF – defines the formats and protocols used to communicate between browsers
W3C – defines the APIs that a Web application can use to control this communication
WebRTC is a standard that has different implementations
WebRTC is a standard that has different implementations, such as OpenWebRTC and webrtc.org. The initial version of the OpenWebRTC implementation was developed internally at Ericsson Research. The latter is maintained by the Google Chrome team.