This project demonstrates how to integrate Twilio's Programmable Voice API with OpenAI's real-time streaming API to enable real-time voice agents. Users can make voice calls via Twilio and the system proxies the audio with OpenAI's Realtime API.
- The /incoming-callendpoint responds to Twilio's incoming call webhook with the TwiML noun<Stream/>
- A Media Stream is established with the app's websocket endpoint.
- Audio packets from the voice call are forwarded to OpenAI's Realtime API.
- OpenAI responds with audio packets, which are forwarded to Twilio.
- Twilio account with a phone number
- OpenAI Platform Account and OPENAI_API_KEY
- nGrok installed globally
git clone https://github.com/pBread/twilio-openai-realtime-minimalist
cd twilio-openai-realtime-minimalistnpm installThe application needs to know the domain (HOSTNAME) it is deployed to in order to function correctly. This domain is set in the HOSTNAME environment variable and it must be configured before starting the app.
Start ngrok by running this command.
ngrok http 3000Then copy the domain
Note: ngrok provides static domains for all ngrok users. You can avoid updating the HOSTNAME every time by provisioning your own static domain.
OPENAI_API_KEY=your-openai-api-keyHOSTNAME=your-ngrok-domain.ngrok.appThis command will start the Express server which handles incoming Twilio webhook requests and media streams.
npm run devGo to your Twilio Console and configure the Voice webhooks for your Twilio phone number:
- Incoming Call Webhook: Select POSTand set url to:https://your-ngrok-domain.ngrok.app/incoming-call
- Call Status Update Webhook: Select POSTand set url to:https://your-ngrok-domain.ngrok.app/call-status-update
You're all set. Place a call to your Twilio Phone Number and you should see the real-time transcript logged to your local terminal.
