VoIP Technology Fundamentals

Voice over IP (VoIP) digitizes voice audio and transmits it as IP packets rather than over traditional circuit-switched telephone networks. Voice is sampled at 8 kHz (G.711 wideband at 16 kHz), compressed using a codec, packetized, transmitted over IP, reassembled, decompressed, and rendered at the receiving end. This all occurs in real time with end-to-end latency that must remain below 150 ms one-way (per ITU G.114 recommendation) to avoid perceptible echo and unnatural conversation rhythm.

VoIP codec selection affects both call quality and bandwidth requirements. G.711 (PCMU/PCMA) is uncompressed PCM audio providing excellent quality at 64 kbps per direction (approximately 87 kbps total bandwidth including RTP/UDP/IP headers). G.729 compresses audio to 8 kbps through predictive coding, providing acceptable quality at 1/8 the bandwidth. Wideband codecs including G.722 (64 kbps, 50-7000 Hz range vs. 300-3400 Hz for narrowband) and Opus (variable bitrate, 6-510 kbps, full HD voice to 48 kHz) provide significantly better voice quality, particularly for music and high-frequency speech sounds. Enterprise deployments should use wideband codecs for internal calls where bandwidth is not constrained and G.711 or G.729 for external SIP trunk calls depending on carrier capabilities.

SIP Trunking

SIP (Session Initiation Protocol) trunking replaces traditional PRI (T1/E1) or analog telephone lines with IP-based connections to a carrier for external PSTN calling. SIP trunks carry voice, video, and messaging over IP to the carrier's SBC (Session Border Controller) which interconnects with the PSTN. Advantages over PRI include lower cost, elastic capacity (burst calling without purchasing additional trunk capacity), support for geographic number portability, and single provider for local, long-distance, and international.

SIP trunk sizing uses Erlang traffic engineering. An Erlang is a dimensionless unit representing one hour of continuous channel occupancy. For most businesses, 1 concurrent call per 3-5 users is the design assumption during the busy hour. A 100-person office with an average call duration of 3 minutes and 10 calls per person per hour (approximately 0.5 Erlangs per person) requires approximately 50 Erlangs of traffic, which translates to approximately 60-70 SIP channels using Erlang B tables at 1% blocking probability.

Quality of Service for VoIP

VoIP traffic is extremely sensitive to network impairments that matter little for data traffic. Latency above 150 ms degrades conversation quality; above 400 ms becomes unacceptable. Jitter (variation in packet arrival time) above 20-30 ms causes perceptible audio degradation despite dejitter buffers. Packet loss above 1-2% causes audible artifacts; above 5% makes voice unintelligible with most codecs. QoS (Quality of Service) configuration is mandatory for enterprise VoIP, not optional.

QoS for VoIP uses Differentiated Services (DiffServ) marking with DSCP (Differentiated Services Code Point) values. Voice bearer traffic (RTP streams) should be marked DSCP EF (Expedited Forwarding, DSCP 46) which receives strict-priority queuing at network devices. SIP signaling traffic should be marked DSCP CS3 (Class Selector 3, DSCP 24) which receives high-priority queuing but not strict-priority. End-to-end QoS requires: IP phones or softphones to mark outgoing packets with the correct DSCP values (or the switch to remark them at ingress); all network devices in the path (switches, routers, WAN links) to honor and propagate DSCP markings; and adequate bandwidth allocation in priority queues to handle the expected concurrent call volume without dropping EF-marked packets.

Unified Communications Platforms

Modern enterprises typically standardize on a unified communications (UC) platform that integrates voice, video, messaging, presence, and collaboration tools. Microsoft Teams Phone is the dominant UC platform for organizations already using Microsoft 365, providing calling via Microsoft Calling Plans (Microsoft as SIP carrier) or Direct Routing (connecting existing SIP trunks through a certified Session Border Controller). Cisco Webex Calling provides similar capabilities on Cisco infrastructure. Zoom Phone offers cloud PBX functionality for Zoom-centric organizations. On-premises or hybrid PBX solutions from Cisco (Cisco Unified Communications Manager), Avaya (Aura), and Mitel remain common in enterprises with existing investments or specific compliance requirements.

UC platform selection criteria include: integration with existing directory services (Active Directory, LDAP) for unified identity; mobile client quality for remote and mobile workers; contact center integration requirements; API extensibility for custom workflows; compliance recording capability for regulated industries; and total cost of ownership including licensing, SIP trunk costs, maintenance, and support.

E911 Compliance

Kari Law (Ray Baum Act) and associated FCC rules require that multi-line telephone systems (MLTS) in the US provide dispatchable location information to 911 when a call is made from any extension, and that an on-site notification be generated at a security desk or central alert point when 911 is called. The dispatchable location must identify the specific floor and room where the 911 call originated, not just the building address, because emergency responders need to know which suite or room to enter.

Implementing E911 for VoIP requires mapping individual IP phone MAC addresses or extension numbers to physical locations (building, floor, room) in a database that is queried when a 911 call is placed. This database must be maintained as phones move. Most enterprise UC platforms include E911 management modules or integrate with dedicated E911 platforms like 911inform, RedSky, or Intrado. The PSAP (Public Safety Answering Point) receives the call plus the location information and dispatches responders accordingly.