Download as pdf.
SIP stands for Session Initiation Protocol and its main reason for existence is to allow people to make telephone calls over the internet. Making telephone calls over the internet is commonly referred to as VoIP (Voice over Internet Protocol). SIP is one way to make VoIP calls but there are others such as Skype.
So Skype and SIP are the same thing?
No; well they are sort of, at least at a technical level. Skype is a commercial company that has developed its own VoIP technology. The bits of Skype’s technology that handle the making and receiving of voice and video calls are roughly equivalent to SIP. However there are differences between the two in technical and other areas: Skype’s protocols are largely proprietary whereas SIP is an open standard, Skype’s design relies heavily on peer-to-peer connections between Skype users whereas SIP is typically deployed in a centralised manner where calls are between an end user and their SIP provider’s server and there are more but that gives you a bit of an idea. Incidentally there are services that allow calls between SIP and Skype see this SIPSorcery recipe: Skype calls with ippi as one example.
OK so SIP is not Skype but they do sort of the same thing. But what about Google Voice? That gives me free calls from a web browser so is that using SIP or Skype?
None of the above, at least not at the edges of their network. Google Voice uses yet another protocol called XMPP which stands for Extensible Messaging and Presence Protocol. XMPP is fairly similar to SIP but its roots are in instant messaging, it evolved from a protocol called Jabber which was originally used to run an instant message chat network. Google Voice also has the additional complication in that to place a call you need to provide a number for Google to call you on and after you answer they will place the call to your requested destination. It is actually possible to dispense with that callback mechanism and place calls directly with Google Voice’s XMPP gateway but it’s not officially supported and while there are products on the market such as the Obihai adapters that currently work with the gateway Google could update or shut it at any point.
So Skype and Google Voice are VoIP but they are not SIP. Where does that leave SIP?
It leaves SIP as the main choice used by just about every other VoIP provider (there are other VoIP protocols such as IAX and H.323 but they are either very specialised in the case of IAX or declining in use in the case of H.323). There are thousands of SIP based VoIP providers spread around the World ranging from one man shops up to big corporations such as Vonage. SIP is the protocol of choice amongst all these providers because of the wide support it enjoys from the manufacturers of VoIP phones and adapters and because of the wide availability of SIP server software.
Got it. So how does SIP work?
The two most important functions in SIP are making calls and registering. To make a call a SIP device sends a special type of request, called an invite request, to another SIP device and if that device answers the call then typically audio or video will flow between them in the same way as someone ringing and having their call answered with a traditional telephone. Invite requests are very special in SIP and there are a lot of extra mechanisms specifically for them to do things like ensure reliable transmission, allow progress indications, change the characteristics of the audio or video stream and much more.
The second important type of request is a register request. To get an idea of the relative importance of requests if invites and their related requests would be 10, registers would be 3 and all other requests would be 1 or less. Back to register requests. They are used to tell a SIP server that a particular SIP device exists and that it’s available to receive calls. Unlike invite request processing the register request is a simple one; the client periodically sends a register request to the server which acknowledges it and stores the address of where the client device can be reached in the event it needs to forward a call to it. Most SIP client devices provide some kind of status indication to let users know that they are registered or not. A flashing orange light on a VoIP adapter will typically be because a SIP register request was not able to reach the SIP server or because the server was not able to process it.
Got that, invite requests are the bread and butter of SIP, register requests are important and no other request types matter much.
Yes but with one clarification. There are three other types of SIP requests – ack, bye and cancel – that are needed by invite requests and are therefore also essential. The combination of the invite request, the other three types and the SIP response messages are all referred to as an invite transaction and to be correct the invite transaction rather than just the invite request is what should be considered as the bread and butter of SIP.
What about these other terms RTP and codecs? Are they part of SIP?
No but they are closely related. RTP stands for Real-time Transport Protocol and a SIP call will typically result in RTP packets being exchanged to do the interesting part of the call which is the audio or video. In fact SIP’s job is largely done as soon as the audio and video starts flowing. At that point the SIP invite request has been accepted and acknowledged and the RTP part of the call has taken over. SIP will be called on again when the call is hung up but during the time the call is active there will typically be no SIP packets involved at all. It’s worth noting that the XMPP protocol mentioned earlier also uses RTP in the same way SIP does.
Codec stands for coder-decoder and it’s not actually a protocol like SIP or RTP and is instead a general term for the algorithm that converts an analogue audio or video stream from a microphone or webcam into bits and bytes that are suitable for transmission over the internet and then converts them back at the other end. RTP packets are what are used to transport the bits and bytes that the codec algorithm spits out.
So putting it all together. SIP is responsible for the setting up or the signalling portion of a VoIP call. RTP is responsible for carrying the audio or video once SIP has successfully done its job and call tell the RTP where it needs to be sent and received from. The codecs at either end of the call are responsible for making sense of the data video that comes out of the RTP packets so that it can be heard on a speaker or displayed as a video and the also the reverse.
Wow that seems pretty straight forward. I thought there was much more to it than that!
Unfortunately there is. In fact last time I counted there were over 50 different standards that could now be considered to be part of the “SIP standard”. Some of the additional standards are to add useful features like call transfers or message waiting indications for voicemail. Some are to fix oversights made in the original SIP standard such as no support for NAT, which is a pretty incredible omission give that very few people have the luxury of a dedicated IP address on the internet. In many ways it’s actually amazing that SIP does work as well as it does.
Hopefully this guide has left you a little bit wiser about SIP and you never know maybe it will help you survive without losing too many hairs the next time your SIP adapter goes belly up and refuses to play ball.