PBXAI - Real-time Voice AI Orchestration Engine
Architected a low-latency Node.js system to bridge traditional VoIP telephony (Asterisk) with generative AI brains, enabling real-time, bidirectional voice interactions.
// class lineage
Read this project as a specialized implementation built on a reusable engineering base.
Real-time Media Gateway
A high-concurrency runtime that bridges protocol boundaries, manages bidirectional media streams, and maintains stateful channel lifecycles with built-in resilience patterns.
PBXAI - Real-time Voice AI Orchestration Engine
A specialized voice AI orchestrator that bridges legacy telephony with generative AI through real-time audio processing and intelligent turn-taking.
Inherited Traits
Overrides
The Challenge
Traditional telephony systems and modern AI models operate on vastly different protocols. Bridging them requires maintaining sub-millisecond latency to prevent conversation lag and 'unnatural' pauses in voice interactions.
The system needed to handle complex SIP call states (ringing, answered, bridged, hung up) while simultaneously managing high-bandwidth audio streaming, silence detection, and jitter buffering.
Ensuring high availability was critical; any failure in the orchestration layer would lead to dropped calls or broken AI logic, requiring a robust recovery and circuit-breaker mechanism.
The Solution
I engineered a custom media gateway using the Asterisk REST Interface (ARI) to capture raw audio streams and route them to AI 'Brains' via high-concurrency WebSockets. The architecture utilizes a state-machine pattern to manage call lifecycles predictably.
To ensure a natural conversation flow, I implemented an advanced media processing layer that includes a silence detector for intelligent turn-taking and a jitter buffer to handle network-induced audio artifacts.
Bidirectional Audio Routing
High-performance media router that handles full-duplex audio streaming between PSTN channels and WebSocket-based AI clients.
Intelligent Silence Detection
Real-time audio analysis engine that detects user speech end-points to trigger AI response generation with minimal latency.
Stateful Call Management
A robust state machine that tracks Asterisk channel events and ensures the AI context remains synchronized with the telephony state.
Resilient AI Clients
Integration layer featuring circuit breakers and error recovery handlers to maintain call stability during AI service interruptions.
Technical Implementation
The orchestrator is built on Fastify for its high throughput and low overhead. It leverages an ARI Gateway to interface with Asterisk, while a dedicated Media Manager handles the intricacies of audio chunking, buffering, and protocol transformation.
// Audio Receiver implementation with silence detection
class AudioReceiver extends EventEmitter {
private silenceDetector: SilenceDetector;
constructor(config: ReceiverConfig) {
super();
this.silenceDetector = new SilenceDetector(config.threshold);
}
public handleAudioFrame(frame: Buffer) {
const isSilent = this.silenceDetector.analyze(frame);
if (!isSilent) {
// Stream to AI Brain
this.wsClient.send(frame);
this.lastSpeechTimestamp = Date.now();
} else if (this.isUserSpeaking && this.isTurnThresholdMet()) {
this.emit('user_finished_speaking');
this.isUserSpeaking = false;
}
}
} Results & Impact
The implementation of the Node Orchestrator successfully transformed legacy telephony into a modern AI-ready platform. By optimizing the media path and implementing a rigorous state machine, the system now supports thousands of concurrent voice sessions with natural, low-latency AI interaction.