The Voice AI Interview Playbook: Recording, Transcription, and Compliance — GDPR, HIPAA, and Getting It Right (Part 9 of 12)

In Part 8, we added video analysis to the stack. We built Gemini Live multimodal sessions, frame sampling pipelines, and context injection. The technical side is largely done. Now comes the part that most builders skip until a lawyer sends them an email — compliance.

Recording interviews puts you in a different category than just running a voice pipeline. You’re creating persistent records of people speaking candidly, often sharing sensitive information about their work history, current employer, and career vulnerabilities. Getting this right is not optional. This post covers what you actually need to do before you record a single session.

LiveKit Egress for Session Recording

LiveKit Egress is the mechanism for capturing room audio and video to persistent storage. It runs as a separate service alongside your LiveKit server.

There are three egress types relevant to interviews:

Audio-only egress — captures the full mix of all participants as an OGG or MP3 file. Best for voice-only interviews where you want a lightweight audio record.

Video composite egress — captures the full room as an MP4, compositing all video streams into a single output. Used when you have webcam or screen share tracks and want a reviewable recording.

Track egress — captures individual participant tracks as separate files. Useful when you need to analyze the interviewer and candidate audio separately for post-processing.

# recording_service.py
from livekit import api
import os

class InterviewRecordingService:
    def __init__(self):
        self.lk_api = api.LiveKitAPI(
            url=os.environ["LIVEKIT_URL"],
            api_key=os.environ["LIVEKIT_API_KEY"],
            api_secret=os.environ["LIVEKIT_API_SECRET"],
        )
        self.active_recordings: dict[str, str] = {}  # room_name → egress_id

    async def start_recording(
        self,
        room_name: str,
        session_id: str,
        include_video: bool = False,
    ) -> str:
        """Start recording a LiveKit room. Returns egress ID."""

        output_path = f"s3://your-bucket/interviews/{session_id}/"

        if include_video:
            # Composite recording — audio + video
            request = api.RoomCompositeEgressRequest(
                room_name=room_name,
                layout="speaker",
                audio_only=False,
                file_outputs=[
                    api.EncodedFileOutput(
                        file_type=api.EncodedFileType.MP4,
                        filepath=f"{output_path}recording.mp4",
                        s3=api.S3Upload(
                            access_key=os.environ["AWS_ACCESS_KEY_ID"],
                            secret=os.environ["AWS_SECRET_ACCESS_KEY"],
                            region=os.environ["AWS_REGION"],
                            bucket=os.environ["S3_BUCKET"],
                        ),
                        disable_manifest=True,
                    )
                ],
                segment_outputs=[],
            )
            response = await self.lk_api.egress.start_room_composite_egress(request)
        else:
            # Audio-only recording
            request = api.RoomCompositeEgressRequest(
                room_name=room_name,
                audio_only=True,
                file_outputs=[
                    api.EncodedFileOutput(
                        file_type=api.EncodedFileType.OGG,
                        filepath=f"{output_path}audio.ogg",
                        s3=api.S3Upload(
                            access_key=os.environ["AWS_ACCESS_KEY_ID"],
                            secret=os.environ["AWS_SECRET_ACCESS_KEY"],
                            region=os.environ["AWS_REGION"],
                            bucket=os.environ["S3_BUCKET"],
                        ),
                    )
                ],
            )
            response = await self.lk_api.egress.start_room_composite_egress(request)

        egress_id = response.egress_id
        self.active_recordings[room_name] = egress_id
        return egress_id

    async def stop_recording(self, room_name: str) -> dict:
        """Stop recording and return egress info."""
        egress_id = self.active_recordings.pop(room_name, None)
        if not egress_id:
            raise ValueError(f"No active recording for room {room_name}")

        response = await self.lk_api.egress.stop_egress(
            api.StopEgressRequest(egress_id=egress_id)
        )
        return {
            "egress_id": egress_id,
            "status": response.status,
            "file_results": [f.location for f in response.file_results],
        }

    async def get_recording_status(self, egress_id: str) -> str:
        response = await self.lk_api.egress.list_egress(
            api.ListEgressRequest(egress_id=egress_id)
        )
        if response.items:
            return response.items[0].status.name
        return "NOT_FOUND"

Real-Time Transcription: Streaming vs Batch

You have two approaches to transcription, and the right choice depends on your use case.

Streaming Transcription with Deepgram

If you need the transcript during the interview (for live captioning, real-time coaching, or immediate post-interview analysis), use Deepgram’s streaming API. You already have Deepgram connected for STT in your voice pipeline — transcription is a side effect of that same connection:

# transcript_collector.py
import asyncio
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional

@dataclass
class TranscriptSegment:
    speaker: str           # "candidate" or "agent"
    text: str
    start_time: float      # seconds from interview start
    end_time: float
    confidence: float
    is_final: bool

@dataclass
class InterviewTranscript:
    session_id: str
    segments: list[TranscriptSegment] = field(default_factory=list)
    started_at: datetime = field(default_factory=datetime.utcnow)

    def add_segment(self, segment: TranscriptSegment):
        self.segments.append(segment)
        # Merge consecutive segments from same speaker
        self._coalesce_segments()

    def _coalesce_segments(self, gap_threshold: float = 2.0):
        """Merge segments from same speaker with small gaps."""
        if len(self.segments) < 2:
            return
        merged = [self.segments[0]]
        for seg in self.segments[1:]:
            last = merged[-1]
            if (
                seg.speaker == last.speaker
                and seg.start_time - last.end_time < gap_threshold
                and last.is_final
                and seg.is_final
            ):
                # Merge into last segment
                merged[-1] = TranscriptSegment(
                    speaker=last.speaker,
                    text=last.text + " " + seg.text,
                    start_time=last.start_time,
                    end_time=seg.end_time,
                    confidence=min(last.confidence, seg.confidence),
                    is_final=True,
                )
            else:
                merged.append(seg)
        self.segments = merged

    def to_json(self) -> dict:
        return {
            "session_id": self.session_id,
            "started_at": self.started_at.isoformat(),
            "segments": [
                {
                    "speaker": s.speaker,
                    "text": s.text,
                    "start_time": round(s.start_time, 2),
                    "end_time": round(s.end_time, 2),
                    "confidence": round(s.confidence, 3),
                }
                for s in self.segments
                if s.is_final
            ],
        }

    def to_text(self) -> str:
        """Plain text format for human reading."""
        lines = []
        for s in self.segments:
            if s.is_final:
                timestamp = f"[{int(s.start_time // 60):02d}:{int(s.start_time % 60):02d}]"
                lines.append(f"{timestamp} {s.speaker.upper()}: {s.text}")
        return "\n".join(lines)

Batch Transcription with Whisper

For post-session transcription where real-time latency doesn’t matter, Whisper via the OpenAI API gives excellent accuracy, particularly for technical vocabulary:

# post_session_transcriber.py
import asyncio
from openai import AsyncOpenAI
import aiofiles
import json

class PostSessionTranscriber:
    def __init__(self):
        self.client = AsyncOpenAI()

    async def transcribe_recording(
        self,
        audio_file_path: str,
        session_id: str,
        prompt: Optional[str] = None,
    ) -> InterviewTranscript:
        """
        Transcribe a recorded interview audio file using Whisper.
        Uses word-level timestamps for accurate speaker attribution.
        """
        # Build domain-specific prompt to help Whisper with technical terms
        whisper_prompt = prompt or (
            "Technical interview. May include terms like: "
            "REST API, GraphQL, Kubernetes, PostgreSQL, Redis, "
            "TypeScript, Python, microservices, CI/CD, Docker."
        )

        async with aiofiles.open(audio_file_path, "rb") as f:
            audio_data = await f.read()

        response = await self.client.audio.transcriptions.create(
            model="whisper-1",
            file=("audio.ogg", audio_data, "audio/ogg"),
            response_format="verbose_json",
            timestamp_granularities=["word", "segment"],
            prompt=whisper_prompt,
        )

        return self._parse_whisper_response(response, session_id)

    def _parse_whisper_response(self, response, session_id: str) -> InterviewTranscript:
        transcript = InterviewTranscript(session_id=session_id)

        for segment in response.segments:
            # Whisper doesn't do speaker diarization natively
            # Use the agent's known segments from the voice pipeline to label speakers
            speaker = self._identify_speaker(segment.start, segment.end)
            transcript.add_segment(
                TranscriptSegment(
                    speaker=speaker,
                    text=segment.text.strip(),
                    start_time=segment.start,
                    end_time=segment.end,
                    confidence=1.0 - segment.no_speech_prob,
                    is_final=True,
                )
            )

        return transcript

    def _identify_speaker(self, start: float, end: float) -> str:
        """
        Use agent speech timeline (recorded separately) to determine speaker.
        If the agent was speaking during this time, label as 'agent'; otherwise 'candidate'.
        """
        # This requires storing agent speech timestamps during the session
        # See AgentSpeechTracker below
        return "candidate"  # Simplified — real implementation uses timeline

GDPR applies if you’re recording EU residents — which in practice means most B2B SaaS products need to take it seriously. Here’s what it actually requires for interview recording.

The Six Requirements

1. Legal basis for processing. Recording an interview requires explicit consent (Article 6(1)(a)) or legitimate interests (Article 6(1)(f)). Consent is cleaner for most cases — it’s clear, auditable, and revocable. Document which basis you’re using.

2. Purpose limitation. The recording can only be used for the stated purpose. If you collected consent for “reviewing interview performance,” you cannot later use those recordings to train your AI model without separate consent.

3. Data minimization. Collect only what you need. Audio-only transcripts are less invasive than video recordings. Frame-level analysis data from Part 8 should be discarded after the session (analyze-and-discard) unless there’s a specific need.

4. Right to erasure. A candidate can request deletion of all their data, including recordings and transcripts. You need a workflow that actually implements this.

5. Data portability. Candidates can request a copy of their data in machine-readable format. Your transcript JSON format serves this purpose.

6. DPA (Data Processing Agreement). If you use third-party processors that handle personal data — your cloud storage provider, Deepgram, OpenAI — you need DPAs in place with each of them. Most major providers have standard DPAs available in their legal documentation.

# consent_manager.py
from dataclasses import dataclass
from datetime import datetime, timezone
from typing import Optional
import hashlib
import json

@dataclass
class ConsentRecord:
    session_id: str
    candidate_email: str
    consent_given: bool
    consent_timestamp: datetime
    ip_address: str
    consent_text_hash: str  # Hash of the exact consent text shown
    recording_consent: bool
    video_analysis_consent: bool
    training_data_consent: bool
    withdrawal_timestamp: Optional[datetime] = None

    def to_audit_log(self) -> dict:
        return {
            "session_id": self.session_id,
            "candidate_email_hash": hashlib.sha256(
                self.candidate_email.encode()
            ).hexdigest()[:16],  # Partial hash for audit without storing PII
            "consent_given": self.consent_given,
            "consent_timestamp": self.consent_timestamp.isoformat(),
            "recording_consent": self.recording_consent,
            "video_analysis_consent": self.video_analysis_consent,
            "training_data_consent": self.training_data_consent,
            "consent_text_hash": self.consent_text_hash,
        }

class ConsentManager:
    # The exact consent text — if this changes, existing consents are invalid
    CONSENT_TEXT_VERSION = "v2026.02.1"
    RECORDING_CONSENT_TEXT = """
    By proceeding, you consent to:
    1. Audio recording of this interview session
    2. Generation of an automated transcript
    3. Storage of the recording for [RETENTION_DAYS] days

    You may request deletion of your data at any time by contacting privacy@yourcompany.com.
    You may proceed without consenting to recording — voice analysis will still occur
    but no recording will be retained.
    """

    def record_consent(
        self,
        session_id: str,
        candidate_email: str,
        recording_consent: bool,
        video_analysis_consent: bool,
        training_data_consent: bool,
        ip_address: str,
    ) -> ConsentRecord:
        record = ConsentRecord(
            session_id=session_id,
            candidate_email=candidate_email,
            consent_given=recording_consent,
            consent_timestamp=datetime.now(timezone.utc),
            ip_address=ip_address,
            consent_text_hash=hashlib.sha256(
                self.RECORDING_CONSENT_TEXT.encode()
            ).hexdigest(),
            recording_consent=recording_consent,
            video_analysis_consent=video_analysis_consent,
            training_data_consent=training_data_consent,
        )
        # Persist to your audit log (immutable append-only store)
        self._persist_consent_record(record)
        return record

    def _persist_consent_record(self, record: ConsentRecord):
        # Store in append-only audit log — never delete, never modify
        pass  # Implementation depends on your storage layer

    async def handle_erasure_request(self, candidate_email: str) -> dict:
        """
        GDPR Article 17: Right to Erasure
        Delete all recordings, transcripts, and personal data for this candidate.
        Returns a deletion report for audit purposes.
        """
        deletion_report = {
            "request_timestamp": datetime.now(timezone.utc).isoformat(),
            "candidate_email_hash": hashlib.sha256(candidate_email.encode()).hexdigest(),
            "items_deleted": [],
            "items_retained": [],
        }

        # Find all sessions for this candidate
        sessions = await self._find_sessions_by_email(candidate_email)

        for session_id in sessions:
            # Delete recording files from S3
            recording_deleted = await self._delete_s3_recordings(session_id)
            if recording_deleted:
                deletion_report["items_deleted"].append(f"recording:{session_id}")

            # Delete transcript
            transcript_deleted = await self._delete_transcript(session_id)
            if transcript_deleted:
                deletion_report["items_deleted"].append(f"transcript:{session_id}")

            # Delete evaluation data
            eval_deleted = await self._delete_evaluation(session_id)
            if eval_deleted:
                deletion_report["items_deleted"].append(f"evaluation:{session_id}")

            # Retain: consent audit log (legal obligation to prove consent was obtained/revoked)
            deletion_report["items_retained"].append(
                f"consent_audit_log:{session_id} (retained per legal obligation)"
            )

        return deletion_report

The Penalties Are Real

GDPR fines for serious violations can reach €20 million or 4% of global annual turnover, whichever is higher. For an early-stage company, a significant fine is often existential. The regulators that matter most (Germany’s BfDI, France’s CNIL, Ireland’s DPC) have all issued fines against companies that failed to obtain proper consent or implement data subject rights.

More practically: a single high-profile candidate who asks “where’s my interview recording?” and can’t get an answer is a PR incident waiting to happen on LinkedIn.

HIPAA Considerations

If your platform serves healthcare employers — hospitals, pharma companies, health tech startups — and the candidate interview touches Protected Health Information (PHI), you’re in HIPAA territory. This is less common for standard tech interviews but comes up when interviewing clinical staff, medical writers, or healthcare compliance roles where candidates discuss patient scenarios.

What Triggers HIPAA

PHI is individually identifiable health information. In an interview context, PHI exposure usually happens when a clinical candidate uses real patient examples (even anonymized ones are risky), discusses a specific medical case from their work, or shares details about healthcare system vulnerabilities they’ve managed.

Business Associate Agreement (BAA)

If you’re a Business Associate — meaning you process PHI on behalf of a Covered Entity (the healthcare employer) — you need a BAA with the employer, and the employer needs BAAs with your sub-processors (AWS, your transcription provider, your LLM provider).

OpenAI offers a HIPAA BAA for ChatGPT Enterprise and their API at higher tiers. Deepgram offers HIPAA-compliant plans. AWS has BAAs available for all HIPAA-eligible services. Check your actual contracts — “HIPAA compliant” as marketing language does not mean a BAA exists.

Encryption Requirements

HIPAA requires:

Encryption in transit: TLS 1.2 minimum, TLS 1.3 recommended. LiveKit uses TLS for WebRTC signaling and DTLS for media — this is handled for you.
Encryption at rest: AES-256 for stored recordings and transcripts. AWS S3 supports this via SSE-S3 or SSE-KMS.

# s3_hipaa_uploader.py
import boto3

class HIPAACompliantS3Uploader:
    """
    S3 upload configuration for HIPAA-eligible storage.
    Requires SSE-KMS with a customer-managed key.
    """

    def __init__(self, kms_key_id: str):
        self.s3 = boto3.client("s3")
        self.kms_key_id = kms_key_id

    def upload_recording(
        self, file_path: str, bucket: str, key: str, session_id: str
    ):
        with open(file_path, "rb") as f:
            self.s3.put_object(
                Bucket=bucket,
                Key=key,
                Body=f,
                ServerSideEncryption="aws:kms",
                SSEKMSKeyId=self.kms_key_id,
                # Tag for lifecycle management and audit
                Tagging=(
                    f"session_id={session_id}"
                    f"&data_classification=phi"
                    f"&retention_days=90"
                ),
            )

Key Management

For HIPAA, storing encryption keys alongside data is not acceptable. Use AWS KMS or HashiCorp Vault to separate key management from data storage. Rotate keys regularly. Audit key usage — KMS CloudTrail logs show every key use event.

SOC 2 Audit Considerations

If you’re selling to enterprise customers, SOC 2 Type II will come up in security questionnaires. Here are the controls most relevant to interview recording:

Availability (CC6.1): LiveKit rooms have health checks. Your recording pipeline needs health monitoring — alert if egress stops mid-session.

Confidentiality (CC6.7): Restrict access to interview recordings to authorized personnel. Implement RBAC: hiring managers see their candidates’ recordings, not others’. Log every access.

Access Control (CC6.2): Multi-factor authentication for any admin interface that can access recordings or transcripts. Service accounts for the recording pipeline should use IAM roles, not long-lived credentials.

Audit Logging (CC7.2): Every action on a recording — creation, access, deletion — should be logged immutably. AWS CloudTrail handles this for S3 access if you enable it.

# audit_logger.py
import json
from datetime import datetime, timezone
from enum import Enum

class AuditAction(Enum):
    RECORDING_STARTED = "recording_started"
    RECORDING_STOPPED = "recording_stopped"
    RECORDING_ACCESSED = "recording_accessed"
    RECORDING_DOWNLOADED = "recording_downloaded"
    RECORDING_DELETED = "recording_deleted"
    TRANSCRIPT_GENERATED = "transcript_generated"
    TRANSCRIPT_ACCESSED = "transcript_accessed"
    CONSENT_RECORDED = "consent_recorded"
    ERASURE_REQUESTED = "erasure_requested"
    ERASURE_COMPLETED = "erasure_completed"

class AuditLogger:
    def log(
        self,
        action: AuditAction,
        session_id: str,
        actor_id: str,
        details: dict = None,
    ):
        entry = {
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "action": action.value,
            "session_id": session_id,
            "actor_id": actor_id,
            "details": details or {},
        }
        # Write to append-only audit log
        # Never delete audit log entries — regulators may ask for them
        self._write_to_audit_store(entry)

    def _write_to_audit_store(self, entry: dict):
        # Options: CloudWatch Logs (append-only), immutable S3 bucket with Object Lock,
        # dedicated audit log service like Sumo Logic
        pass

Data Retention Policies

Storing recordings indefinitely is both expensive and a compliance liability. Set retention policies and automate deletion.

# retention_policy.py
from enum import Enum
from datetime import timedelta

class RetentionTier(Enum):
    # Standard hiring process completion
    STANDARD = "standard"          # 90 days after interview date
    # Active candidates in pipeline
    ACTIVE_CANDIDATE = "active"    # Until rejection or hire + 30 days
    # EU/GDPR-constrained
    GDPR_STRICT = "gdpr_strict"    # 30 days, explicit consent required for longer
    # Healthcare / HIPAA
    HIPAA = "hipaa"                # 6 years (HIPAA requires 6-year minimum)
    # Legal hold (active litigation or investigation)
    LEGAL_HOLD = "legal_hold"      # Indefinite until legal hold lifted

RETENTION_PERIODS = {
    RetentionTier.STANDARD: timedelta(days=90),
    RetentionTier.ACTIVE_CANDIDATE: timedelta(days=180),
    RetentionTier.GDPR_STRICT: timedelta(days=30),
    RetentionTier.HIPAA: timedelta(days=365 * 6),
    RetentionTier.LEGAL_HOLD: None,  # No automatic deletion
}

# AWS S3 Lifecycle Rule (via Terraform)
LIFECYCLE_RULE_TEMPLATE = """
resource "aws_s3_bucket_lifecycle_configuration" "interview_recordings" {
  bucket = aws_s3_bucket.recordings.id

  rule {
    id     = "standard_retention"
    status = "Enabled"

    filter {
      tag {
        key   = "retention_tier"
        value = "standard"
      }
    }

    expiration {
      days = 90
    }

    noncurrent_version_expiration {
      noncurrent_days = 7
    }
  }

  rule {
    id     = "gdpr_strict_retention"
    status = "Enabled"

    filter {
      tag {
        key   = "retention_tier"
        value = "gdpr_strict"
      }
    }

    expiration {
      days = 30
    }
  }
}
"""

Transcript Anonymization for Training Data

Your interview transcripts are potentially valuable training data for fine-tuning your AI models. But using them requires careful treatment of PII.

The challenge: interview transcripts are full of PII. Candidate names, current employer names, project names, salary history, location. You cannot use raw transcripts for training without consent specifically for that purpose.

The solution: anonymization before any training use.

# transcript_anonymizer.py
import re
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import OperatorConfig

class TranscriptAnonymizer:
    """
    Removes PII from interview transcripts using Microsoft Presidio.
    Suitable for creating training data from consented recordings.
    """

    def __init__(self):
        self.analyzer = AnalyzerEngine()
        self.anonymizer = AnonymizerEngine()

        # Patterns specific to interview context
        self.custom_patterns = [
            # Salary figures
            r"\$[\d,]+(?:k|K)?(?:\s*(?:per year|annually|\/year|\/yr))?",
            # Company-specific project names (usually proper nouns in context)
            # GitHub usernames
            r"github\.com/[\w-]+",
            # LinkedIn profiles
            r"linkedin\.com/in/[\w-]+",
        ]

    def anonymize(self, transcript_text: str) -> str:
        """
        Replace PII with type-consistent placeholders.
        Example: "I worked at Google" → "I worked at [COMPANY]"
        """
        # Presidio analysis
        results = self.analyzer.analyze(
            text=transcript_text,
            language="en",
            entities=[
                "PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER",
                "LOCATION", "ORGANIZATION", "URL",
                "CREDIT_CARD", "IBAN_CODE", "IP_ADDRESS",
            ],
        )

        # Anonymize with meaningful placeholders
        anonymized = self.anonymizer.anonymize(
            text=transcript_text,
            analyzer_results=results,
            operators={
                "PERSON": OperatorConfig("replace", {"new_value": "[CANDIDATE_NAME]"}),
                "ORGANIZATION": OperatorConfig("replace", {"new_value": "[COMPANY]"}),
                "LOCATION": OperatorConfig("replace", {"new_value": "[LOCATION]"}),
                "EMAIL_ADDRESS": OperatorConfig("replace", {"new_value": "[EMAIL]"}),
                "PHONE_NUMBER": OperatorConfig("replace", {"new_value": "[PHONE]"}),
                "URL": OperatorConfig("replace", {"new_value": "[URL]"}),
            },
        )

        # Apply custom patterns
        result = anonymized.text
        for pattern in self.custom_patterns:
            result = re.sub(pattern, "[REDACTED]", result, flags=re.IGNORECASE)

        return result

    def validate_anonymization(self, anonymized_text: str) -> list[str]:
        """
        Verify no PII remains. Returns list of warnings if any PII found.
        """
        results = self.analyzer.analyze(
            text=anonymized_text,
            language="en",
            score_threshold=0.6,
        )
        warnings = []
        for result in results:
            snippet = anonymized_text[result.start:result.end]
            warnings.append(
                f"Potential PII ({result.entity_type}, confidence {result.score:.2f}): '{snippet}'"
            )
        return warnings

Interview Data as Training Data: The Legal and Ethical Line

This deserves a frank section on its own. The commercial temptation is strong: you have thousands of interview transcripts showing how candidates answer technical questions, and that data could make your AI interviewer substantially better.

Here’s the line:

What you can do without additional consent:

Use anonymized transcripts to improve question phrasing and interview structure
Use aggregated patterns (not individual transcripts) to calibrate evaluation rubrics
Use your own internal practice interviews where you explicitly consented participants

What requires separate, explicit consent:

Fine-tuning your LLM on actual candidate transcripts
Using recordings for any purpose beyond what was stated in the original consent
Sharing transcripts with third-party AI providers for training (even anonymized)

What you should never do regardless of consent:

Use interview data to train systems that screen out candidates based on protected characteristics
Create synthetic candidates based on real interview data without clear disclosure
Use early interview transcripts from before your consent workflow was solid

The GDPR concept of purpose limitation is strict here: “improving our services” buried in a privacy policy is not specific enough to cover using individual interview data for model training. You need a specific, granular consent option for training use — and you should make it genuinely optional, with no impact on the candidate’s interview if they decline.

The Pre-Launch Compliance Checklist

Here’s what you need to have in place before recording your first real interview:

Legal foundation

Privacy policy updated to describe interview recording and data handling
DPAs signed with: cloud storage provider, STT provider, LLM provider, LiveKit Cloud (if used)
Legal basis documented for each type of processing (consent vs legitimate interest)
GDPR Article 30 records of processing activities completed

Consent and candidate rights

Pre-session consent modal with granular options (recording, video analysis, training data)
Opt-out path that allows interview without recording
Data subject request workflow (erasure, portability, access)
Designated privacy contact email
Retention period clearly stated to candidates

Technical controls

Encryption at rest: AES-256 for all stored recordings and transcripts (S3 SSE-KMS)
Encryption in transit: TLS 1.3 for all data paths
Access control: RBAC so only authorized users see specific candidate data
Audit logging: immutable log of all access, creation, and deletion events
Automated deletion: S3 lifecycle rules matching stated retention periods
Backup encryption: if you backup recordings, backups are also encrypted

Operations

Incident response plan for data breach (GDPR requires 72-hour notification)
Employee training on data handling procedures
Data Protection Officer appointed if processing at scale in EU
Annual review date scheduled for privacy practices

HIPAA (if applicable)

BAAs in place with employer client and all sub-processors
PHI classification applied to recordings from healthcare employer interviews
Extended retention period (6 years) configured for HIPAA-classified sessions
HIPAA breach notification procedure documented

What We Built

This was the least glamorous post in the series and arguably the most important. Specifically we covered:

LiveKit Egress for audio-only and composite video recording, with S3 output
Streaming transcription with Deepgram and batch transcription with Whisper, including speaker labeling
Structured transcript format with timestamps and confidence scores
GDPR consent management: explicit per-category consent, audit-proof logging, right to erasure implementation
HIPAA encryption requirements: SSE-KMS for storage, TLS 1.3 in transit, key management separation
SOC 2 audit controls: access logging, RBAC, immutable audit trails
Automated retention policies via S3 lifecycle rules
Transcript anonymization with Microsoft Presidio for training data use
The legal boundary on using interview data for model training

The compliance work is not optional overhead. It’s what separates a product that can close enterprise contracts from one that stalls in procurement. Get it right before you scale, not after.

In Part 10, we shift focus to the infrastructure that lets you run thousands of concurrent voice sessions without the per-session architecture becoming a bottleneck.

This is Part 9 of a 12-part series: The Voice AI Interview Playbook.

Series outline:

Why Real-Time Voice Changes Everything — The landscape, the vision, and the reference architecture (Part 1)
Cascaded vs. Speech-to-Speech — Choosing your pipeline architecture (Part 2)
LiveKit vs. Pipecat vs. Direct — Picking your framework (Part 3)
STT, LLM, and TTS That Actually Work — Building the voice pipeline (Part 4)
Multi-Role Agents — Interviewer, coach, and evaluator personas (Part 5)
Knowledge Base and RAG — Making your voice agent an expert (Part 6)
Web and Mobile Clients — Cross-platform voice experiences (Part 7)
Video Interview Integration — Multimodal analysis with Gemini Live (Part 8)
Recording, Transcription, and Compliance — GDPR, HIPAA, and getting it right (this post)
Scaling to Thousands — Architecture for concurrent voice sessions (Part 10)
Cost Optimization — From $0.14/min to $0.03/min (Part 11)
Multi-Provider Support — OpenAI Realtime, Bedrock Nova, Grok, and the adapter pattern (Part 12)

Export for reading

The Voice AI Interview Playbook: Recording, Transcription, and Compliance — GDPR, HIPAA, and Getting It Right (Part 9 of 12)

LiveKit Egress for Session Recording

Real-Time Transcription: Streaming vs Batch

Streaming Transcription with Deepgram

Batch Transcription with Whisper

The Six Requirements

The Penalties Are Real

HIPAA Considerations

What Triggers HIPAA

Business Associate Agreement (BAA)

Encryption Requirements

Key Management

SOC 2 Audit Considerations

Data Retention Policies

Transcript Anonymization for Training Data

Interview Data as Training Data: The Legal and Ethical Line

The Pre-Launch Compliance Checklist

What We Built

Comments

On this page

The Voice AI Interview Playbook: Recording, Transcription, and Compliance — GDPR, HIPAA, and Getting It Right (Part 9 of 12)

The Voice AI Interview Playbook: Recording, Transcription, and Compliance — GDPR, HIPAA, and Getting It Right (Part 9 of 12)

LiveKit Egress for Session Recording

Real-Time Transcription: Streaming vs Batch

Streaming Transcription with Deepgram

Batch Transcription with Whisper

GDPR Compliance

The Six Requirements

Consent Management Implementation

The Penalties Are Real

HIPAA Considerations

What Triggers HIPAA

Business Associate Agreement (BAA)

Encryption Requirements

Key Management

SOC 2 Audit Considerations

Data Retention Policies

Transcript Anonymization for Training Data

Interview Data as Training Data: The Legal and Ethical Line

The Pre-Launch Compliance Checklist

What We Built

Comments

The Voice AI Interview Playbook: Recording, Transcription, and Compliance — GDPR, HIPAA, and Getting It Right (Part 9 of 12)