The Media Edge
The edge is a SIP node: the part of SIP.IO that actually carries signaling and audio. It’s designed to be thin and replaceable, so it asks the brain for every decision and executes pushed commands. Three open-source workhorses plus a small control sidecar.
SIP signaling & registration
Section titled “SIP signaling & registration”the SIP signaling layer is the SIP front door: a high-performance SIP proxy and registrar. On the SIP.IO edge it:
- Terminates
REGISTERand stores contacts in the location store (after the edge runtime/authvalidates the digest). - Receives
INVITEs and asks the edge runtime/routewhat to do, then executes the returned directive (proxy to a registered device, hand off to the media engine for a flow, or send outbound to a trunk/carrier). - Handles NAT traversal and engages the media relay for the media path.
- Feeds dialog events (call start/end) and register/qualify events to the edge runtime
/presence/event, which keeps presence and CAC counters honest. - Runs sipguard, the abuse defense layer (see Security).
the SIP signaling layer listens on UDP, TCP, TLS, and WSS (WSS being the WebRTC signaling transport). It is a pure proxy/registrar: no back-to-back user agent (B2BUA). All the “smarts” live in the brain.
Media relay
Section titled “Media relay”the media relay is the media plane for proxied calls. Its superpower is the kernel packet path: once a call is set up, RTP is relayed in-kernel (via iptables/nftables), which scales to tens of thousands of concurrent streams per box at minimal CPU. It also handles:
- NAT traversal for media,
- SRTP and WebRTC bridging (DTLS-SRTP, ICE, rtcp-mux): the media relay is the public media anchor, so no separate TURN server is needed,
- transcoding and recording when required (these run in userspace and are the real sizing constraint, not plain relay).
When a queued caller is bridged to an agent, the media engine releases the media and the talk path collapses back to an the media relay kernel relay.
Media engine: IVR & queues
Section titled “Media engine: IVR & queues”the media engine is the media application engine. It is only inserted when the platform must answer: play prompts, run an IVR, collect digits, hold a queue caller, take a voicemail, or host a conference. It is driven by the brain: the per-call CallSessionDO feeds it the next command through the /flow loop (an poll-based poll).
Crucially, the media engine on the edge holds no registration, routing, ACD, or account state; all of that is the brain’s. It is a stateless, on-demand media worker. Its load scales with active media-application time, not total call minutes.
node-agent: the control channel
Section titled “node-agent: the control channel”The brain mostly responds to the edge, but some actions are pushed the other way: “an agent just freed, dispatch the waiting caller now,” or “announce position 3 to this caller.” The node-agent is a small sidecar on the node that exposes an /op endpoint. The edge runtime reaches it over a a secure tunnel (with Access service-token auth plus a per-node agent token), looked up from a node registry in the edge SQL database.
Two ops drive the real-time experience:
dispatch: push a queued caller to an agent the instant the agent becomes available (event-driven; no polling).position: push a position announcement when, and only when, a caller’s place in line actually changes.
This push channel is what makes hold music gapless: the brain only interrupts the audio when there’s a real, changed announcement.
Outbound termination
Section titled “Outbound termination”For outbound calls, the brain selects a route (a customer trunk or the default wholesale carrier) and the SIP signaling layer sends the call there. The default termination is the wholesale carrier (our termination partner), where each SIP.IO customer is a verifiable subaccount, which is what enables proper STIR/SHAKEN attestation on outbound. See Outbound & Trunking.
Security at the edge
Section titled “Security at the edge”Each node runs sipguard, a defense-in-depth abuse layer with three ban tiers:
nft: network-level packet-flood bans (auto-ban ~1 hour),pike: SIP-request flood detection (~5 minutes),ua: known-scanner User-Agent signatures.
Nodes report bans to the edge runtime every ~30s; operators can list and lift bans via the security endpoints. Sensitive control-plane endpoints are additionally IP-gated at the application layer (only node and operator IPs may reach /auth, /route, /flow, /presence, /agent, /cac, /security, /media, /calls).
Next: the Data Model.