Skip to content
DocsStart free

Edge-Native Architecture

SIP.IO is built on one architectural bet: keep the SIP edge thin and replaceable, and put all the logic in a globally-distributed brain. This page explains that split, why it matters, and how a call flows through it.

THE BRAIN: edge runtime + stateful edge objects
routes: /auth · /route · /flow · /presence · /agent · /cac
state: the edge SQL database (config) · the key-value store · object storage (media + CDR)
PresenceDO · CallSessionDO · BalanceDO
▲ decision (node → edge runtime, direct HTTPS)
▼ dispatch (edge runtime → node-agent, over a the secure tunnel)
THE EDGE: SIP node (horizontally scalable, replaceable)
the SIP signaling layer : SIP signaling + registrar
the media relay : kernel media relay
the media engine : IVR / queues / voicemail / conference
node-agent : receives pushed control commands
  • The brain is an edge runtime plus a set of stateful edge objects. It owns authentication, routing, the call-flow engine, presence/ACD, and concurrency control. It is the single source of truth for decisions.
  • The edge is a SIP node: the SIP signaling layer for signaling and registration, the media relay for media relay, the media engine for media applications, and a small node-agent that receives pushed commands. The edge carries packets and executes; it does not decide.

The dividing line: “does the platform answer the call?”

Section titled “The dividing line: “does the platform answer the call?””

The single question that determines how a call is handled:

Does the platform need to answer the call itself, to play a prompt, collect digits, queue, record, or mix audio?

  • No → the call is proxied. the SIP signaling layer routes it; the media relay relays the media in the kernel (an iptables/nftables packet path that scales to tens of thousands of streams per box). the media engine is never touched. This is the bulk of traffic: user↔PSTN, PSTN→DID→user, extension↔extension.
  • Yes → the call is answered by the media engine as a media application: IVR, queue, voicemail, conference. When a queued caller is then bridged to an agent, the media engine performs a media release (re-INVITE) so the talk path reverts to the media relay’s kernel relay, and the media engine steps back out of the audio.

The consequence is a very different cost curve: the media engine load scales with active media-application time, not with total call minutes. A million minutes of plain calls cost the media tier almost nothing; only the IVR/queue/voicemail seconds land on the media engine.

Putting the brain on the edge runtime + stateful edge objects buys several properties that are hard to get from a regional application server:

  • Global, low-latency decisions. The edge runtime runs close to the node handling the call, so /auth, /route, and /flow round-trips stay short wherever your callers are.
  • Single-threaded correctness where it counts. A stateful edge object processes one message at a time. That makes the ACD reservation fence, CAC admission, and per-call flow state atomic without locks: there is no double-dispatch and no torn counter.
  • Ship without restarting the edge. Because logic lives in the edge runtime, most product changes deploy by publishing an edge runtime, with zero SBC restarts, no dialplan reloads, and no media-server bounce.
  • Elastic, replaceable edge. Nodes are stateless-ish carriers of packets. Add capacity by adding nodes; lose one and the brain re-homes work. The edge is a commodity.

Five canonical paths, each a short conversation between the edge and the brain:

  1. Register. A phone sends REGISTER. the SIP signaling layer POSTs the digest params to the edge runtime /auth, which validates against the device’s ha1 and returns the account context. the SIP signaling layer stores the contact in the location store.
  2. Internal call (ext↔ext). the SIP signaling layer asks /route; the brain resolves the target extension, checks concurrency, and returns a proxy directive. Media goes straight through the media relay. the media engine is untouched.
  3. Inbound DID. the SIP signaling layer asks /route; the brain matches the DID, runs CAC, applies any attached business hours, and returns the route target, often a flow directive that sends the call to the media engine.
  4. Call flow. the media engine drives the flow through the /flow command loop (an poll-based poll): the brain advances a per-call CallSessionDO one node at a time and returns the next command. Pure-logic nodes (conditions, time conditions, HTTP) resolve inside the session object with no extra round-trip.
  5. Queue dispatch. When a caller enqueues, the per-account PresenceDO runs the ACD. The moment an agent frees, the brain pushes a dispatch to the caller’s node via the node-agent over a a secure tunnel, event-driven, not polled.

The architecture is opinionated about where each kind of state belongs:

StateStoreNotes
Configurationthe edge SQL database (SQLite)Accounts, numbers, queues, schedules, trunks, policies: the relational source of truth.
Live call/agent statestateful edge objectsPresenceDO (per account: presence, ACD, CAC, dashboard), CallSessionDO (per call: flow interpreter), BalanceDO (per account: retail wallet).
SIP registrationsthe registrar's location storeRaw SIP contacts; the registrar’s job.
Steering & secretsthe key-value storeSTEERING (per-call routing keys) and SECRETS (encrypted trunk passwords, tokens, PINs).
Mediaobject storageSystem prompts, MOH, voicemail recordings, TTS cache.
CDR & event tracethe Apache Iceberg data lakeHigh-volume, columnar; see Observability.

The principle: the location store holds raw SIP contacts; stateful edge objects hold derived call and agent state; the edge SQL database holds configuration. Each store does the one job it’s best at.

A single brain is the source of truth, but the design extends to multiple regions:

  • The control plane is pinned to one region per account (its agent-densest region), giving one authoritative source for reservations and counters.
  • Read-heavy lookups are served from regional replicas; the reservation compare-and-set always happens at the source of truth, and that’s the correctness gate.
  • Cross-region reachability uses a location service with SRV regional failover. Far-region setup costs a few hundred milliseconds once (the enqueue/reserve/confirm round-trips), never the audio path.

Continue to The Control Plane for what the edge runtime and stateful edge objects actually do.