Last September we presented the technological aspect of the secure calling project at LinuxCon 2009. This was an important milestone in presenting the technological aspect of how this project will offer the means for anyone to create and deploy network scalable and secure VoIP/collaboration solution to enable privacy without the need for a central service provider or proprietary software to achieve these goals. Our overall vision is to facilitate both solutions that are privately built, such as for organizations that wish to have secure communication as a foundation, and especially which can be autonomously assembled over the public Internet as a full public alternative to Skype using only free software and that depends purely on existing DNS for user lookup, rather than a service provider, and eliminates the use of source secret clients which can of course be compromised.
Background of architecture:
A SIP user agent is a front-end application which supports a standard set of protocols to enable registering with a directory service (SIP registrar) and a routing server to establish calls by sip uri's. Some user agents can also directly connect if you know each parties IP address, though some will not allow that because in the SIP standard a UA is supposed to only accept calls received by it's published "contact" uri that was sent to a registrar, and not just any arbitrary client calling it directly without looking that up first (such as by ip address).
Some use this behavior as a security means, by having the UA generate a UUID or some other kind of token for the contact uri to publish with a registrar, so that unless the call resolved through the registrar it is using there would be no way to directly know what uri the agent will respond to. Most UA's use
it as a means to separate which "identity" it is receiving a call as, since a UA can register itself with multiple registrars which may represent different service providers, and each one would have a different and unique contact uri.
Many VoIP providers offer themselves as a "backend" service for SIP. This means your UA is tethered to said provider, and your call peering goes through them. That looks like a standard telephone service simply conducted on TCP/IP rather than something new. It also is very convenient from a regulatory and intercept regime since all call control and routing happens at their end.
One can run a local asterisk server as a backend SIP registrar and routing service, but it (and bayonne) makes several assumptions. First, the call must connect to the server before the destination is even determined. This means all audio is established through the server first, and then hopped across the server to the final destination, converted as necessary. In one sense it is convenient, but since the audio session is established with and must be decoded by the server first, it obviously cannot pass encrypted audio end-to-end. It also means that said server has to have all supported codecs that will be used, including proprietary or patent encumbered ones if calls are supported with them. It means the call capacity is compute-bound, and induces latency. Finally, in the case of Asterisk, it was never designed for arbitrary uri routing, but rather for resolving things that are purely telephone numbers in form.
Skype actually is a kind of user agent that includes/integrates code for specific routing and network connection logic, but also depends on the Skype backend to find users. It is of course also proprietary, and the protocols it uses are undocumented and proprietary as well.
SIP Witch operates by keeping the network routing layer separate from the user agent rather than merging them like the Skype application does, hence any standard's compliant SIP client can be used with it. It also peers calls by URI using DNS lookup. It also does destination routing, so the final destination is determined first, and the calling user agent is then directed to connect itself with the final destination's IP address directly, rather than the asterisk/bayonne model where user agent is directed to connect with the server and without the need of a central directory service. This means all media connections are established peer-to-peer, and this can support an end-to-end encrypted media channel like ZRTP. It also means all codecs are negotiated between the endpoints, which also means
conducting calls does not require having patent licensed codecs, though the ua's may have and certainly use them if they choose. That is the user's decision and circumstances of course, but at least is not something that is burdened or otherwise forced on the software used for conveyance as well.