In Part 8, we added video analysis to the stack. We built Gemini Live multimodal sessions, frame sampling pipelines, and context injection. The technical side is largely done. Now comes the part that most builders skip until a lawyer sends them an email — compliance.
Recording interviews puts you in a different category than just running a voice pipeline. You’re creating persistent records of people speaking candidly, often sharing sensitive information about their work history, current employer, and career vulnerabilities. Getting this right is not optional. This post covers what you actually need to do before you record a single session.
LiveKit Egress for Session Recording
LiveKit Egress is the mechanism for capturing room audio and video to persistent storage. It runs as a separate service alongside your LiveKit server.
There are three egress types relevant to interviews:
Audio-only egress — captures the full mix of all participants as an OGG or MP3 file. Best for voice-only interviews where you want a lightweight audio record.
Video composite egress — captures the full room as an MP4, compositing all video streams into a single output. Used when you have webcam or screen share tracks and want a reviewable recording.
Track egress — captures individual participant tracks as separate files. Useful when you need to analyze the interviewer and candidate audio separately for post-processing.
# recording_service.py
from livekit import api
import os
class InterviewRecordingService:
def __init__(self):
self.lk_api = api.LiveKitAPI(
url=os.environ["LIVEKIT_URL"],
api_key=os.environ["LIVEKIT_API_KEY"],
api_secret=os.environ["LIVEKIT_API_SECRET"],
)
self.active_recordings: dict[str, str] = {} # room_name → egress_id
async def start_recording(
self,
room_name: str,
session_id: str,
include_video: bool = False,
) -> str:
"""Start recording a LiveKit room. Returns egress ID."""
output_path = f"s3://your-bucket/interviews/{session_id}/"
if include_video:
# Composite recording — audio + video
request = api.RoomCompositeEgressRequest(
room_name=room_name,
layout="speaker",
audio_only=False,
file_outputs=[
api.EncodedFileOutput(
file_type=api.EncodedFileType.MP4,
filepath=f"{output_path}recording.mp4",
s3=api.S3Upload(
access_key=os.environ["AWS_ACCESS_KEY_ID"],
secret=os.environ["AWS_SECRET_ACCESS_KEY"],
region=os.environ["AWS_REGION"],
bucket=os.environ["S3_BUCKET"],
),
disable_manifest=True,
)
],
segment_outputs=[],
)
response = await self.lk_api.egress.start_room_composite_egress(request)
else:
# Audio-only recording
request = api.RoomCompositeEgressRequest(
room_name=room_name,
audio_only=True,
file_outputs=[
api.EncodedFileOutput(
file_type=api.EncodedFileType.OGG,
filepath=f"{output_path}audio.ogg",
s3=api.S3Upload(
access_key=os.environ["AWS_ACCESS_KEY_ID"],
secret=os.environ["AWS_SECRET_ACCESS_KEY"],
region=os.environ["AWS_REGION"],
bucket=os.environ["S3_BUCKET"],
),
)
],
)
response = await self.lk_api.egress.start_room_composite_egress(request)
egress_id = response.egress_id
self.active_recordings[room_name] = egress_id
return egress_id
async def stop_recording(self, room_name: str) -> dict:
"""Stop recording and return egress info."""
egress_id = self.active_recordings.pop(room_name, None)
if not egress_id:
raise ValueError(f"No active recording for room {room_name}")
response = await self.lk_api.egress.stop_egress(
api.StopEgressRequest(egress_id=egress_id)
)
return {
"egress_id": egress_id,
"status": response.status,
"file_results": [f.location for f in response.file_results],
}
async def get_recording_status(self, egress_id: str) -> str:
response = await self.lk_api.egress.list_egress(
api.ListEgressRequest(egress_id=egress_id)
)
if response.items:
return response.items[0].status.name
return "NOT_FOUND"
Real-Time Transcription: Streaming vs Batch
You have two approaches to transcription, and the right choice depends on your use case.
Streaming Transcription with Deepgram
If you need the transcript during the interview (for live captioning, real-time coaching, or immediate post-interview analysis), use Deepgram’s streaming API. You already have Deepgram connected for STT in your voice pipeline — transcription is a side effect of that same connection:
# transcript_collector.py
import asyncio
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
@dataclass
class TranscriptSegment:
speaker: str # "candidate" or "agent"
text: str
start_time: float # seconds from interview start
end_time: float
confidence: float
is_final: bool
@dataclass
class InterviewTranscript:
session_id: str
segments: list[TranscriptSegment] = field(default_factory=list)
started_at: datetime = field(default_factory=datetime.utcnow)
def add_segment(self, segment: TranscriptSegment):
self.segments.append(segment)
# Merge consecutive segments from same speaker
self._coalesce_segments()
def _coalesce_segments(self, gap_threshold: float = 2.0):
"""Merge segments from same speaker with small gaps."""
if len(self.segments) < 2:
return
merged = [self.segments[0]]
for seg in self.segments[1:]:
last = merged[-1]
if (
seg.speaker == last.speaker
and seg.start_time - last.end_time < gap_threshold
and last.is_final
and seg.is_final
):
# Merge into last segment
merged[-1] = TranscriptSegment(
speaker=last.speaker,
text=last.text + " " + seg.text,
start_time=last.start_time,
end_time=seg.end_time,
confidence=min(last.confidence, seg.confidence),
is_final=True,
)
else:
merged.append(seg)
self.segments = merged
def to_json(self) -> dict:
return {
"session_id": self.session_id,
"started_at": self.started_at.isoformat(),
"segments": [
{
"speaker": s.speaker,
"text": s.text,
"start_time": round(s.start_time, 2),
"end_time": round(s.end_time, 2),
"confidence": round(s.confidence, 3),
}
for s in self.segments
if s.is_final
],
}
def to_text(self) -> str:
"""Plain text format for human reading."""
lines = []
for s in self.segments:
if s.is_final:
timestamp = f"[{int(s.start_time // 60):02d}:{int(s.start_time % 60):02d}]"
lines.append(f"{timestamp} {s.speaker.upper()}: {s.text}")
return "\n".join(lines)
Batch Transcription with Whisper
For post-session transcription where real-time latency doesn’t matter, Whisper via the OpenAI API gives excellent accuracy, particularly for technical vocabulary:
# post_session_transcriber.py
import asyncio
from openai import AsyncOpenAI
import aiofiles
import json
class PostSessionTranscriber:
def __init__(self):
self.client = AsyncOpenAI()
async def transcribe_recording(
self,
audio_file_path: str,
session_id: str,
prompt: Optional[str] = None,
) -> InterviewTranscript:
"""
Transcribe a recorded interview audio file using Whisper.
Uses word-level timestamps for accurate speaker attribution.
"""
# Build domain-specific prompt to help Whisper with technical terms
whisper_prompt = prompt or (
"Technical interview. May include terms like: "
"REST API, GraphQL, Kubernetes, PostgreSQL, Redis, "
"TypeScript, Python, microservices, CI/CD, Docker."
)
async with aiofiles.open(audio_file_path, "rb") as f:
audio_data = await f.read()
response = await self.client.audio.transcriptions.create(
model="whisper-1",
file=("audio.ogg", audio_data, "audio/ogg"),
response_format="verbose_json",
timestamp_granularities=["word", "segment"],
prompt=whisper_prompt,
)
return self._parse_whisper_response(response, session_id)
def _parse_whisper_response(self, response, session_id: str) -> InterviewTranscript:
transcript = InterviewTranscript(session_id=session_id)
for segment in response.segments:
# Whisper doesn't do speaker diarization natively
# Use the agent's known segments from the voice pipeline to label speakers
speaker = self._identify_speaker(segment.start, segment.end)
transcript.add_segment(
TranscriptSegment(
speaker=speaker,
text=segment.text.strip(),
start_time=segment.start,
end_time=segment.end,
confidence=1.0 - segment.no_speech_prob,
is_final=True,
)
)
return transcript
def _identify_speaker(self, start: float, end: float) -> str:
"""
Use agent speech timeline (recorded separately) to determine speaker.
If the agent was speaking during this time, label as 'agent'; otherwise 'candidate'.
"""
# This requires storing agent speech timestamps during the session
# See AgentSpeechTracker below
return "candidate" # Simplified — real implementation uses timeline
GDPR Compliance
GDPR applies if you’re recording EU residents — which in practice means most B2B SaaS products need to take it seriously. Here’s what it actually requires for interview recording.
The Six Requirements
1. Legal basis for processing. Recording an interview requires explicit consent (Article 6(1)(a)) or legitimate interests (Article 6(1)(f)). Consent is cleaner for most cases — it’s clear, auditable, and revocable. Document which basis you’re using.
2. Purpose limitation. The recording can only be used for the stated purpose. If you collected consent for “reviewing interview performance,” you cannot later use those recordings to train your AI model without separate consent.
3. Data minimization. Collect only what you need. Audio-only transcripts are less invasive than video recordings. Frame-level analysis data from Part 8 should be discarded after the session (analyze-and-discard) unless there’s a specific need.
4. Right to erasure. A candidate can request deletion of all their data, including recordings and transcripts. You need a workflow that actually implements this.
5. Data portability. Candidates can request a copy of their data in machine-readable format. Your transcript JSON format serves this purpose.
6. DPA (Data Processing Agreement). If you use third-party processors that handle personal data — your cloud storage provider, Deepgram, OpenAI — you need DPAs in place with each of them. Most major providers have standard DPAs available in their legal documentation.
Consent Management Implementation
# consent_manager.py
from dataclasses import dataclass
from datetime import datetime, timezone
from typing import Optional
import hashlib
import json
@dataclass
class ConsentRecord:
session_id: str
candidate_email: str
consent_given: bool
consent_timestamp: datetime
ip_address: str
consent_text_hash: str # Hash of the exact consent text shown
recording_consent: bool
video_analysis_consent: bool
training_data_consent: bool
withdrawal_timestamp: Optional[datetime] = None
def to_audit_log(self) -> dict:
return {
"session_id": self.session_id,
"candidate_email_hash": hashlib.sha256(
self.candidate_email.encode()
).hexdigest()[:16], # Partial hash for audit without storing PII
"consent_given": self.consent_given,
"consent_timestamp": self.consent_timestamp.isoformat(),
"recording_consent": self.recording_consent,
"video_analysis_consent": self.video_analysis_consent,
"training_data_consent": self.training_data_consent,
"consent_text_hash": self.consent_text_hash,
}
class ConsentManager:
# The exact consent text — if this changes, existing consents are invalid
CONSENT_TEXT_VERSION = "v2026.02.1"
RECORDING_CONSENT_TEXT = """
By proceeding, you consent to:
1. Audio recording of this interview session
2. Generation of an automated transcript
3. Storage of the recording for [RETENTION_DAYS] days
You may request deletion of your data at any time by contacting privacy@yourcompany.com.
You may proceed without consenting to recording — voice analysis will still occur
but no recording will be retained.
"""
def record_consent(
self,
session_id: str,
candidate_email: str,
recording_consent: bool,
video_analysis_consent: bool,
training_data_consent: bool,
ip_address: str,
) -> ConsentRecord:
record = ConsentRecord(
session_id=session_id,
candidate_email=candidate_email,
consent_given=recording_consent,
consent_timestamp=datetime.now(timezone.utc),
ip_address=ip_address,
consent_text_hash=hashlib.sha256(
self.RECORDING_CONSENT_TEXT.encode()
).hexdigest(),
recording_consent=recording_consent,
video_analysis_consent=video_analysis_consent,
training_data_consent=training_data_consent,
)
# Persist to your audit log (immutable append-only store)
self._persist_consent_record(record)
return record
def _persist_consent_record(self, record: ConsentRecord):
# Store in append-only audit log — never delete, never modify
pass # Implementation depends on your storage layer
async def handle_erasure_request(self, candidate_email: str) -> dict:
"""
GDPR Article 17: Right to Erasure
Delete all recordings, transcripts, and personal data for this candidate.
Returns a deletion report for audit purposes.
"""
deletion_report = {
"request_timestamp": datetime.now(timezone.utc).isoformat(),
"candidate_email_hash": hashlib.sha256(candidate_email.encode()).hexdigest(),
"items_deleted": [],
"items_retained": [],
}
# Find all sessions for this candidate
sessions = await self._find_sessions_by_email(candidate_email)
for session_id in sessions:
# Delete recording files from S3
recording_deleted = await self._delete_s3_recordings(session_id)
if recording_deleted:
deletion_report["items_deleted"].append(f"recording:{session_id}")
# Delete transcript
transcript_deleted = await self._delete_transcript(session_id)
if transcript_deleted:
deletion_report["items_deleted"].append(f"transcript:{session_id}")
# Delete evaluation data
eval_deleted = await self._delete_evaluation(session_id)
if eval_deleted:
deletion_report["items_deleted"].append(f"evaluation:{session_id}")
# Retain: consent audit log (legal obligation to prove consent was obtained/revoked)
deletion_report["items_retained"].append(
f"consent_audit_log:{session_id} (retained per legal obligation)"
)
return deletion_report
The Penalties Are Real
GDPR fines for serious violations can reach €20 million or 4% of global annual turnover, whichever is higher. For an early-stage company, a significant fine is often existential. The regulators that matter most (Germany’s BfDI, France’s CNIL, Ireland’s DPC) have all issued fines against companies that failed to obtain proper consent or implement data subject rights.
More practically: a single high-profile candidate who asks “where’s my interview recording?” and can’t get an answer is a PR incident waiting to happen on LinkedIn.
HIPAA Considerations
If your platform serves healthcare employers — hospitals, pharma companies, health tech startups — and the candidate interview touches Protected Health Information (PHI), you’re in HIPAA territory. This is less common for standard tech interviews but comes up when interviewing clinical staff, medical writers, or healthcare compliance roles where candidates discuss patient scenarios.
What Triggers HIPAA
PHI is individually identifiable health information. In an interview context, PHI exposure usually happens when a clinical candidate uses real patient examples (even anonymized ones are risky), discusses a specific medical case from their work, or shares details about healthcare system vulnerabilities they’ve managed.
Business Associate Agreement (BAA)
If you’re a Business Associate — meaning you process PHI on behalf of a Covered Entity (the healthcare employer) — you need a BAA with the employer, and the employer needs BAAs with your sub-processors (AWS, your transcription provider, your LLM provider).
OpenAI offers a HIPAA BAA for ChatGPT Enterprise and their API at higher tiers. Deepgram offers HIPAA-compliant plans. AWS has BAAs available for all HIPAA-eligible services. Check your actual contracts — “HIPAA compliant” as marketing language does not mean a BAA exists.
Encryption Requirements
HIPAA requires:
- Encryption in transit: TLS 1.2 minimum, TLS 1.3 recommended. LiveKit uses TLS for WebRTC signaling and DTLS for media — this is handled for you.
- Encryption at rest: AES-256 for stored recordings and transcripts. AWS S3 supports this via SSE-S3 or SSE-KMS.
# s3_hipaa_uploader.py
import boto3
class HIPAACompliantS3Uploader:
"""
S3 upload configuration for HIPAA-eligible storage.
Requires SSE-KMS with a customer-managed key.
"""
def __init__(self, kms_key_id: str):
self.s3 = boto3.client("s3")
self.kms_key_id = kms_key_id
def upload_recording(
self, file_path: str, bucket: str, key: str, session_id: str
):
with open(file_path, "rb") as f:
self.s3.put_object(
Bucket=bucket,
Key=key,
Body=f,
ServerSideEncryption="aws:kms",
SSEKMSKeyId=self.kms_key_id,
# Tag for lifecycle management and audit
Tagging=(
f"session_id={session_id}"
f"&data_classification=phi"
f"&retention_days=90"
),
)
Key Management
For HIPAA, storing encryption keys alongside data is not acceptable. Use AWS KMS or HashiCorp Vault to separate key management from data storage. Rotate keys regularly. Audit key usage — KMS CloudTrail logs show every key use event.
SOC 2 Audit Considerations
If you’re selling to enterprise customers, SOC 2 Type II will come up in security questionnaires. Here are the controls most relevant to interview recording:
Availability (CC6.1): LiveKit rooms have health checks. Your recording pipeline needs health monitoring — alert if egress stops mid-session.
Confidentiality (CC6.7): Restrict access to interview recordings to authorized personnel. Implement RBAC: hiring managers see their candidates’ recordings, not others’. Log every access.
Access Control (CC6.2): Multi-factor authentication for any admin interface that can access recordings or transcripts. Service accounts for the recording pipeline should use IAM roles, not long-lived credentials.
Audit Logging (CC7.2): Every action on a recording — creation, access, deletion — should be logged immutably. AWS CloudTrail handles this for S3 access if you enable it.
# audit_logger.py
import json
from datetime import datetime, timezone
from enum import Enum
class AuditAction(Enum):
RECORDING_STARTED = "recording_started"
RECORDING_STOPPED = "recording_stopped"
RECORDING_ACCESSED = "recording_accessed"
RECORDING_DOWNLOADED = "recording_downloaded"
RECORDING_DELETED = "recording_deleted"
TRANSCRIPT_GENERATED = "transcript_generated"
TRANSCRIPT_ACCESSED = "transcript_accessed"
CONSENT_RECORDED = "consent_recorded"
ERASURE_REQUESTED = "erasure_requested"
ERASURE_COMPLETED = "erasure_completed"
class AuditLogger:
def log(
self,
action: AuditAction,
session_id: str,
actor_id: str,
details: dict = None,
):
entry = {
"timestamp": datetime.now(timezone.utc).isoformat(),
"action": action.value,
"session_id": session_id,
"actor_id": actor_id,
"details": details or {},
}
# Write to append-only audit log
# Never delete audit log entries — regulators may ask for them
self._write_to_audit_store(entry)
def _write_to_audit_store(self, entry: dict):
# Options: CloudWatch Logs (append-only), immutable S3 bucket with Object Lock,
# dedicated audit log service like Sumo Logic
pass
Data Retention Policies
Storing recordings indefinitely is both expensive and a compliance liability. Set retention policies and automate deletion.
# retention_policy.py
from enum import Enum
from datetime import timedelta
class RetentionTier(Enum):
# Standard hiring process completion
STANDARD = "standard" # 90 days after interview date
# Active candidates in pipeline
ACTIVE_CANDIDATE = "active" # Until rejection or hire + 30 days
# EU/GDPR-constrained
GDPR_STRICT = "gdpr_strict" # 30 days, explicit consent required for longer
# Healthcare / HIPAA
HIPAA = "hipaa" # 6 years (HIPAA requires 6-year minimum)
# Legal hold (active litigation or investigation)
LEGAL_HOLD = "legal_hold" # Indefinite until legal hold lifted
RETENTION_PERIODS = {
RetentionTier.STANDARD: timedelta(days=90),
RetentionTier.ACTIVE_CANDIDATE: timedelta(days=180),
RetentionTier.GDPR_STRICT: timedelta(days=30),
RetentionTier.HIPAA: timedelta(days=365 * 6),
RetentionTier.LEGAL_HOLD: None, # No automatic deletion
}
# AWS S3 Lifecycle Rule (via Terraform)
LIFECYCLE_RULE_TEMPLATE = """
resource "aws_s3_bucket_lifecycle_configuration" "interview_recordings" {
bucket = aws_s3_bucket.recordings.id
rule {
id = "standard_retention"
status = "Enabled"
filter {
tag {
key = "retention_tier"
value = "standard"
}
}
expiration {
days = 90
}
noncurrent_version_expiration {
noncurrent_days = 7
}
}
rule {
id = "gdpr_strict_retention"
status = "Enabled"
filter {
tag {
key = "retention_tier"
value = "gdpr_strict"
}
}
expiration {
days = 30
}
}
}
"""
Transcript Anonymization for Training Data
Your interview transcripts are potentially valuable training data for fine-tuning your AI models. But using them requires careful treatment of PII.
The challenge: interview transcripts are full of PII. Candidate names, current employer names, project names, salary history, location. You cannot use raw transcripts for training without consent specifically for that purpose.
The solution: anonymization before any training use.
# transcript_anonymizer.py
import re
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import OperatorConfig
class TranscriptAnonymizer:
"""
Removes PII from interview transcripts using Microsoft Presidio.
Suitable for creating training data from consented recordings.
"""
def __init__(self):
self.analyzer = AnalyzerEngine()
self.anonymizer = AnonymizerEngine()
# Patterns specific to interview context
self.custom_patterns = [
# Salary figures
r"\$[\d,]+(?:k|K)?(?:\s*(?:per year|annually|\/year|\/yr))?",
# Company-specific project names (usually proper nouns in context)
# GitHub usernames
r"github\.com/[\w-]+",
# LinkedIn profiles
r"linkedin\.com/in/[\w-]+",
]
def anonymize(self, transcript_text: str) -> str:
"""
Replace PII with type-consistent placeholders.
Example: "I worked at Google" → "I worked at [COMPANY]"
"""
# Presidio analysis
results = self.analyzer.analyze(
text=transcript_text,
language="en",
entities=[
"PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER",
"LOCATION", "ORGANIZATION", "URL",
"CREDIT_CARD", "IBAN_CODE", "IP_ADDRESS",
],
)
# Anonymize with meaningful placeholders
anonymized = self.anonymizer.anonymize(
text=transcript_text,
analyzer_results=results,
operators={
"PERSON": OperatorConfig("replace", {"new_value": "[CANDIDATE_NAME]"}),
"ORGANIZATION": OperatorConfig("replace", {"new_value": "[COMPANY]"}),
"LOCATION": OperatorConfig("replace", {"new_value": "[LOCATION]"}),
"EMAIL_ADDRESS": OperatorConfig("replace", {"new_value": "[EMAIL]"}),
"PHONE_NUMBER": OperatorConfig("replace", {"new_value": "[PHONE]"}),
"URL": OperatorConfig("replace", {"new_value": "[URL]"}),
},
)
# Apply custom patterns
result = anonymized.text
for pattern in self.custom_patterns:
result = re.sub(pattern, "[REDACTED]", result, flags=re.IGNORECASE)
return result
def validate_anonymization(self, anonymized_text: str) -> list[str]:
"""
Verify no PII remains. Returns list of warnings if any PII found.
"""
results = self.analyzer.analyze(
text=anonymized_text,
language="en",
score_threshold=0.6,
)
warnings = []
for result in results:
snippet = anonymized_text[result.start:result.end]
warnings.append(
f"Potential PII ({result.entity_type}, confidence {result.score:.2f}): '{snippet}'"
)
return warnings
Interview Data as Training Data: The Legal and Ethical Line
This deserves a frank section on its own. The commercial temptation is strong: you have thousands of interview transcripts showing how candidates answer technical questions, and that data could make your AI interviewer substantially better.
Here’s the line:
What you can do without additional consent:
- Use anonymized transcripts to improve question phrasing and interview structure
- Use aggregated patterns (not individual transcripts) to calibrate evaluation rubrics
- Use your own internal practice interviews where you explicitly consented participants
What requires separate, explicit consent:
- Fine-tuning your LLM on actual candidate transcripts
- Using recordings for any purpose beyond what was stated in the original consent
- Sharing transcripts with third-party AI providers for training (even anonymized)
What you should never do regardless of consent:
- Use interview data to train systems that screen out candidates based on protected characteristics
- Create synthetic candidates based on real interview data without clear disclosure
- Use early interview transcripts from before your consent workflow was solid
The GDPR concept of purpose limitation is strict here: “improving our services” buried in a privacy policy is not specific enough to cover using individual interview data for model training. You need a specific, granular consent option for training use — and you should make it genuinely optional, with no impact on the candidate’s interview if they decline.
The Pre-Launch Compliance Checklist
Here’s what you need to have in place before recording your first real interview:
Legal foundation
- Privacy policy updated to describe interview recording and data handling
- DPAs signed with: cloud storage provider, STT provider, LLM provider, LiveKit Cloud (if used)
- Legal basis documented for each type of processing (consent vs legitimate interest)
- GDPR Article 30 records of processing activities completed
Consent and candidate rights
- Pre-session consent modal with granular options (recording, video analysis, training data)
- Opt-out path that allows interview without recording
- Data subject request workflow (erasure, portability, access)
- Designated privacy contact email
- Retention period clearly stated to candidates
Technical controls
- Encryption at rest: AES-256 for all stored recordings and transcripts (S3 SSE-KMS)
- Encryption in transit: TLS 1.3 for all data paths
- Access control: RBAC so only authorized users see specific candidate data
- Audit logging: immutable log of all access, creation, and deletion events
- Automated deletion: S3 lifecycle rules matching stated retention periods
- Backup encryption: if you backup recordings, backups are also encrypted
Operations
- Incident response plan for data breach (GDPR requires 72-hour notification)
- Employee training on data handling procedures
- Data Protection Officer appointed if processing at scale in EU
- Annual review date scheduled for privacy practices
HIPAA (if applicable)
- BAAs in place with employer client and all sub-processors
- PHI classification applied to recordings from healthcare employer interviews
- Extended retention period (6 years) configured for HIPAA-classified sessions
- HIPAA breach notification procedure documented
What We Built
This was the least glamorous post in the series and arguably the most important. Specifically we covered:
- LiveKit Egress for audio-only and composite video recording, with S3 output
- Streaming transcription with Deepgram and batch transcription with Whisper, including speaker labeling
- Structured transcript format with timestamps and confidence scores
- GDPR consent management: explicit per-category consent, audit-proof logging, right to erasure implementation
- HIPAA encryption requirements: SSE-KMS for storage, TLS 1.3 in transit, key management separation
- SOC 2 audit controls: access logging, RBAC, immutable audit trails
- Automated retention policies via S3 lifecycle rules
- Transcript anonymization with Microsoft Presidio for training data use
- The legal boundary on using interview data for model training
The compliance work is not optional overhead. It’s what separates a product that can close enterprise contracts from one that stalls in procurement. Get it right before you scale, not after.
In Part 10, we shift focus to the infrastructure that lets you run thousands of concurrent voice sessions without the per-session architecture becoming a bottleneck.
This is Part 9 of a 12-part series: The Voice AI Interview Playbook.
Series outline:
- Why Real-Time Voice Changes Everything — The landscape, the vision, and the reference architecture (Part 1)
- Cascaded vs. Speech-to-Speech — Choosing your pipeline architecture (Part 2)
- LiveKit vs. Pipecat vs. Direct — Picking your framework (Part 3)
- STT, LLM, and TTS That Actually Work — Building the voice pipeline (Part 4)
- Multi-Role Agents — Interviewer, coach, and evaluator personas (Part 5)
- Knowledge Base and RAG — Making your voice agent an expert (Part 6)
- Web and Mobile Clients — Cross-platform voice experiences (Part 7)
- Video Interview Integration — Multimodal analysis with Gemini Live (Part 8)
- Recording, Transcription, and Compliance — GDPR, HIPAA, and getting it right (this post)
- Scaling to Thousands — Architecture for concurrent voice sessions (Part 10)
- Cost Optimization — From $0.14/min to $0.03/min (Part 11)
- Multi-Provider Support — OpenAI Realtime, Bedrock Nova, Grok, and the adapter pattern (Part 12)