Azure Voice Live is powerful but unforgiving. A misconfigured chunk size, a missing error handler, or a silent session timeout can destroy the user experience entirely. This guide covers every issue you’re likely to encounter, from first connection to production incident.


Issue 1: WebSocket Connection Drops Mid-Session

Symptoms: The session disconnects after 30–60 seconds without user action. The browser console shows WebSocket closed: 1006 (Abnormal Closure).

Causes:

  1. Azure session timeout — Voice Live sessions have a maximum duration. After ~10 minutes of inactivity, they close.
  2. Cloudflare/load balancer timeout — HTTP proxies often close idle WebSocket connections after 60–90 seconds.
  3. Browser idle detection — Chromium throttles inactive tabs.

Fix: Implement Automatic Reconnection

// In useVoiceLive.ts

const MAX_RECONNECT_ATTEMPTS = 3;
const RECONNECT_DELAY_MS = 1500;

const reconnectAttempts = useRef(0);
const conversationHistory = useRef<string>('');

const connectWithRetry = useCallback(async () => {
  try {
    await connect();
    reconnectAttempts.current = 0;
  } catch (err) {
    if (reconnectAttempts.current < MAX_RECONNECT_ATTEMPTS) {
      reconnectAttempts.current++;
      console.log(`[VoiceLive] Reconnecting (${reconnectAttempts.current}/${MAX_RECONNECT_ATTEMPTS})...`);
      setTimeout(connectWithRetry, RECONNECT_DELAY_MS);
    } else {
      updateStatus('error');
    }
  }
}, [connect, updateStatus]);

// On unexpected close (not user-initiated):
ws.onclose = (event) => {
  if (event.code !== 1000 && reconnectAttempts.current < MAX_RECONNECT_ATTEMPTS) {
    // Preserve conversation history before reconnecting
    setTimeout(connectWithRetry, RECONNECT_DELAY_MS);
  } else {
    updateStatus('idle');
  }
};

Fix: Restore Conversation Context After Reconnect

When reconnecting, inject the conversation history into the system prompt:

async function connectWithHistory() {
  await connect();
  
  if (conversationHistory.current) {
    ws.send(JSON.stringify({
      type: 'session.update',
      session: {
        instructions: `${systemPrompt}
        
Previous conversation context:
${conversationHistory.current}

Continue the interview from where it left off.`,
      },
    }));
  }
}

Issue 2: Choppy / Glitchy Audio Playback

Symptoms: The AI voice stutters, pops, or has gaps. Especially noticeable at the start of each AI response.

Causes:

  1. Audio buffer underrun — the buffer empties before the next frames arrive
  2. Main thread blocking — React re-renders or heavy JS blocking the audio thread
  3. Mixed sample rates — browser and Azure operating at different rates

Fix: Implement a Jitter Buffer

// In the AudioWorklet processor
class PCMPlayerProcessor extends AudioWorkletProcessor {
  constructor() {
    super();
    this.buffer = [];
    this.minBufferSize = 480; // 20ms at 24kHz — pre-fill before playing
    this.playing = false;
    
    this.port.onmessage = (e) => {
      this.buffer.push(...e.data);
    };
  }

  process(inputs, outputs) {
    const output = outputs[0][0];
    
    // Don't start playing until we have enough buffer (prevents startup pop)
    if (!this.playing && this.buffer.length < this.minBufferSize) {
      output.fill(0);
      return true;
    }
    this.playing = this.buffer.length > 0;
    
    for (let i = 0; i < output.length; i++) {
      output[i] = this.buffer.length > 0 ? this.buffer.shift() : 0;
    }
    return true;
  }
}

Fix: Move Audio Processing Off Main Thread

Ensure your ScriptProcessorNode is not competing with React renders:

// Debounce UI updates triggered by voice events
const handleServerMessage = useMemo(() => 
  debounce((msg: ServerMessage) => {
    // process message
  }, 0), // 0ms debounce still yields to audio thread
[]);

Issue 3: VAD False Positives (Background Noise Talks to AI)

Symptoms: The AI responds to keyboard sounds, fan noise, or room noise as if the user spoke.

Fix: Increase VAD Threshold

turn_detection: {
  type: 'server_vad',
  threshold: 0.65, // Increase from default 0.5
  silence_duration_ms: 450,
  prefix_padding_ms: 300,
},

Fix: High-Pass Filter on Input

Add a high-pass filter to cut low-frequency ambient noise before sending to Azure:

// In startMicCapture()
const highPassFilter = ctx.createBiquadFilter();
highPassFilter.type = 'highpass';
highPassFilter.frequency.value = 80; // Cut below 80Hz

source.connect(highPassFilter);
highPassFilter.connect(processor);
processor.connect(ctx.destination);

Fix: Push-to-Talk Mode for Noisy Environments

Disable VAD and let users control turn-taking:

// Disable server-side VAD
turn_detection: null,

// User presses button to signal end of turn
function sendManualTurnEnd(ws: WebSocket) {
  ws.send(JSON.stringify({ type: 'input_audio_buffer.commit' }));
  ws.send(JSON.stringify({ type: 'response.create' }));
}

Issue 4: CORS Errors

Symptoms: Browser shows Access to XMLHttpRequest at ... has been blocked by CORS policy.

Root Cause: The WebSocket upgrade request is treated as a cross-origin request if your frontend and backend run on different origins.

Fix: Same-Origin Proxy

Always route WebSocket connections through your Next.js server (same origin as the frontend). Never let the browser connect directly to Azure:

// ✅ Correct — same origin
const ws = new WebSocket(`wss://${window.location.host}/api/voice`);

// ❌ Wrong — cross-origin and exposes API key
const ws = new WebSocket(`wss://interview-voice-openai.openai.azure.com/...?api-key=...`);

Issue 5: Azure Rate Limiting and Quota Errors

Symptoms: Connections fail with 429 Too Many Requests or error.type: 'rate_limit_exceeded'.

Azure Limits for GPT-4o Realtime Audio:

LimitDefault Value
Concurrent sessions10 per deployment
Tokens per minute100K (configurable)
Max session duration~10 minutes
Audio input per minuteNo stated limit

Fix: Implement a Session Queue

// In your Next.js API
const activeSessions = new Map<string, WebSocket>();

function canStartSession(): boolean {
  return activeSessions.size < parseInt(process.env.MAX_CONCURRENT_SESSIONS || '10');
}

// In the WebSocket upgrade handler
server.on('upgrade', (request, socket, head) => {
  if (!canStartSession()) {
    socket.write('HTTP/1.1 503 Service Unavailable\r\n\r\n');
    socket.destroy();
    return;
  }
  // ... proceed
});

Fix: Request Quota Increase

For production, request a quota increase via the Azure portal:

  1. Go to your Azure OpenAI resource → Quotas
  2. Request increase for gpt-4o-realtime-preview
  3. Standard increases to 300K TPM are approved automatically

Issue 6: Session Timeout (10-Minute Hard Limit)

Azure Voice Live sessions have a hard maximum duration of approximately 10 minutes. An interview may legitimately need 30+ minutes.

Fix: Graceful Session Rotation

const SESSION_DURATION_LIMIT_MS = 9 * 60 * 1000; // 9 minutes (1 min buffer)

useEffect(() => {
  const sessionTimer = setTimeout(() => {
    // Silently rotate the session
    performSilentReconnect();
  }, SESSION_DURATION_LIMIT_MS);
  
  return () => clearTimeout(sessionTimer);
}, [wsConnected]);

async function performSilentReconnect() {
  // 1. Save conversation summary
  const summary = await generateConversationSummary();
  
  // 2. Disconnect quietly
  wsRef.current?.close(1000, 'Session rotation');
  
  // 3. Reconnect with history
  await connectWithHistory(summary);
}

Issue 7: Microphone Permission Denied

Symptoms: Users see a browser permission error and the interview can’t start.

Fix: Permission-First UX

// Check permission before starting (don't startle users)
async function checkMicPermission(): Promise<PermissionState> {
  const result = await navigator.permissions.query({ name: 'microphone' as PermissionName });
  return result.state;
}

// In the UI — check before showing Start button
const [micState, setMicState] = useState<PermissionState>('prompt');

useEffect(() => {
  checkMicPermission().then(setMicState);
}, []);

// Show appropriate UI
{micState === 'denied' && (
  <div className="alert-error">
    Microphone access is blocked. Please allow it in your browser settings to continue.
  </div>
)}

Debugging Tools

Real-Time Audio Visualizer

// Visualize what the microphone is capturing
function AudioMeter({ stream }: { stream: MediaStream | null }) {
  const canvasRef = useRef<HTMLCanvasElement>(null);
  
  useEffect(() => {
    if (!stream || !canvasRef.current) return;
    const ctx = new AudioContext();
    const analyser = ctx.createAnalyser();
    const source = ctx.createMediaStreamSource(stream);
    source.connect(analyser);
    
    const data = new Uint8Array(analyser.frequencyBinCount);
    const canvas = canvasRef.current;
    const cCtx = canvas.getContext('2d')!;
    
    let animId: number;
    function draw() {
      analyser.getByteTimeDomainData(data);
      cCtx.clearRect(0, 0, canvas.width, canvas.height);
      cCtx.beginPath();
      data.forEach((v, i) => {
        const x = (i / data.length) * canvas.width;
        const y = (v / 128.0) * canvas.height / 2;
        i === 0 ? cCtx.moveTo(x, y) : cCtx.lineTo(x, y);
      });
      cCtx.stroke();
      animId = requestAnimationFrame(draw);
    }
    draw();
    return () => { cancelAnimationFrame(animId); ctx.close(); };
  }, [stream]);
  
  return <canvas ref={canvasRef} width={300} height={60} className="border rounded" />;
}

Azure Monitor Logs

Enable diagnostic logging in Azure portal:

  1. Azure OpenAI resource → Diagnostic settings
  2. Add setting → Select allLogs
  3. Send to Log Analytics workspace

Query for errors:

AzureDiagnostics
| where ResourceType == "OPENAI"
| where resultType_s == "Failed"
| order by TimeGenerated desc
| project TimeGenerated, operationName_s, resultDescription_s

Complete Error Reference

Error TypeMessageFix
session_expiredSession exceeded time limitImplement session rotation
rate_limit_exceededToo many concurrent sessionsSession queue + quota increase
invalid_audio_formatWrong PCM encodingVerify Int16 encoding, not Float32
model_not_deployedDeployment not foundCheck deployment name in Azure portal
content_filterContent policy violationReview and adjust prompts
connection_errorWebSocket upgrade failedCheck setNoDelay(true) is set

Next: Part 7 — Deploy, Scale & Pricing →

Part 5 — Audio Quality | This is Part 6 of the Azure Voice Live series.

Export for reading

Comments