voice-system-expert by aiskillstore
Use when working with Quetrex's voice interface, OpenAI Realtime API, WebRTC, or echo cancellation. Knows Quetrex's specific voice architecture decisions and patterns. CRITICAL - prevents breaking working voice system.
Content & Writing
85 Stars
2 Forks
Updated Jan 19, 2026, 04:39 AM
Why Use This
This skill provides specialized capabilities for aiskillstore's codebase.
Use Cases
- Developing new features in the aiskillstore repository
- Refactoring existing code to follow aiskillstore standards
- Understanding and working with aiskillstore's codebase structure
Install Guide
2 steps- 1
Skip this step if Ananke is already installed.
- 2
Skill Snapshot
Auto scan of skill assets. Informational only.
Valid SKILL.md
Checks against SKILL.md specification
Source & Community
Skill Stats
SKILL.md 319 Lines
Total Files 1
Total Size 0 B
License NOASSERTION
---
name: voice-system-expert
description: Use when working with Quetrex's voice interface, OpenAI Realtime API, WebRTC, or echo cancellation. Knows Quetrex's specific voice architecture decisions and patterns. CRITICAL - prevents breaking working voice system.
allowed-tools: Read
---
# Quetrex Voice System Expert
## CRITICAL: Read This First
Quetrex's voice system architecture is **extensively documented and battle-tested**. Before making ANY changes to voice-related code, you MUST read:
1. **ADR-001-VOICE-ECHO-CANCELLATION.md** (definitive architectural decision)
2. **docs/architecture/VOICE-SYSTEM.md** (technical implementation)
3. **docs/features/voice-interface.md** (user-facing features)
**Location:** `src/lib/openai-realtime.ts`
## Core Architecture Decision
### ALWAYS-ON MICROPHONE + BROWSER AEC
This is **Decision 4** from VOICE-SYSTEM.md and the definitive approach documented in ADR-001.
```typescript
// ✅ CORRECT: Always-on microphone
const mediaStream = await navigator.mediaDevices.getUserMedia({
audio: {
echoCancellation: true, // CRITICAL - browser handles echo cancellation
noiseSuppression: true,
autoGainControl: true,
},
})
// Microphone track stays ENABLED throughout conversation
// Browser's native AEC prevents feedback loops
// Server-side VAD (Voice Activity Detection) handles turn detection
```
## How It Works
### Audio Pipeline
```
User speaks
↓
Microphone (always enabled, echoCancellation: true)
↓
WebRTC → OpenAI Realtime API
↓
Server-side VAD detects speech
↓
OpenAI processes and responds
↓
Audio response via WebRTC
↓
HTMLAudioElement playback (stays in browser pipeline)
↓
Browser AEC compares mic input + speaker output
↓
Echo automatically canceled (no feedback loop)
```
### Why This Works
**Browser Echo Cancellation Requirements:**
1. Both microphone input AND speaker output must be in browser's audio graph
2. Audio must flow through browser's WebRTC stack
3. No manual intervention needed (browser handles it)
**This is the industry standard:**
- ChatGPT voice mode
- Google Meet
- Zoom
- Discord
- Microsoft Teams
They ALL use always-on microphone + browser AEC.
## DO NOT DO THESE THINGS
### ❌ DON'T: Toggle Microphone Track
```typescript
// ❌ WRONG - This breaks echo cancellation
async function pauseRecording() {
microphone.enabled = false // DON'T DO THIS
}
async function resumeRecording() {
microphone.enabled = true // DON'T DO THIS
}
```
**Why this fails:**
- Not industry standard
- Causes audio glitches
- Can break echo cancellation
- Adds artificial delays
- More complex state management
- No real benefit
### ❌ DON'T: Route Audio Outside Browser
```typescript
// ❌ WRONG - AudioWorklet bypass to native audio
const workletNode = new AudioWorkletNode(audioContext, 'bypass-processor')
workletNode.port.postMessage({ cmd: 'route-to-native' })
```
**Why this fails:**
- Breaks browser AEC (audio leaves browser pipeline)
- Causes echo/feedback loops
- Requires complex native audio handling
- Platform-specific implementations
- Already tried and failed (see abandoned-approaches/)
### ❌ DON'T: Implement Custom Echo Cancellation
```typescript
// ❌ WRONG - Custom AEC implementation
class CustomEchoCanceller {
cancelEcho(input: AudioBuffer, output: AudioBuffer) {
// Complex DSP code...
}
}
```
**Why this fails:**
- Reinventing the wheel
- Browser AEC is battle-tested by billions of users
- Extremely complex to implement correctly
- Requires deep DSP knowledge
- Performance issues
- Platform-specific tuning needed
### ❌ DON'T: Add Artificial Delays
```typescript
// ❌ WRONG - Delays for echo prevention
await new Promise(resolve => setTimeout(resolve, 500))
await playAudio()
await new Promise(resolve => setTimeout(resolve, 500))
resumeRecording()
```
**Why this fails:**
- Not necessary (browser AEC handles it)
- Degrades user experience
- Adds latency
- Still doesn't prevent echo if implementation is wrong
## What You CAN Change
### ✅ DO: Adjust Audio Settings
```typescript
// ✅ Can tweak these settings
const constraints = {
audio: {
echoCancellation: true, // MUST be true
noiseSuppression: true, // Can adjust
autoGainControl: true, // Can adjust
sampleRate: 24000, // Can change for quality
channelCount: 1, // Mono is fine for voice
},
}
```
### ✅ DO: Handle Connection States
```typescript
// ✅ Can manage connection lifecycle
async function connectVoice() {
// Setup WebRTC connection
// Start streaming
// Handle connection events
}
async function disconnectVoice() {
// Clean up WebRTC connection
// Stop streaming
// Release microphone
}
```
### ✅ DO: Add UI Feedback
```typescript
// ✅ Can add visual indicators
function onVoiceActivity(active: boolean) {
if (active) {
// Show "listening" indicator
// Animate microphone icon
} else {
// Show "idle" indicator
}
}
```
### ✅ DO: Handle Errors
```typescript
// ✅ Can improve error handling
try {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true })
} catch (error) {
if (error.name === 'NotAllowedError') {
// Show permission request UI
} else if (error.name === 'NotFoundError') {
// Show "no microphone found" error
}
}
```
## Implementation Location
**Primary file:** `src/lib/openai-realtime.ts`
**Key functions:**
- `setupMediaStream()` - Initializes microphone with correct constraints
- `connectToOpenAI()` - Establishes WebRTC connection
- `handleAudioResponse()` - Plays AI responses via HTMLAudioElement
**DO NOT modify:**
- Microphone enable/disable logic (should stay always-on)
- Echo cancellation settings (must be true)
- Audio routing (must stay in browser pipeline)
**CAN modify:**
- UI feedback and visual indicators
- Error handling and user messages
- Connection retry logic
- Audio quality settings (sample rate, etc.)
## Why This Architecture Was Chosen
From ADR-001:
**Option A: Trust Industry Pattern (CHOSEN)**
- ✅ Used by ChatGPT, Google Meet, Zoom, Discord
- ✅ Browser vendors optimize for this pattern
- ✅ Works across all platforms (macOS, Windows, Linux, mobile)
- ✅ No custom implementation needed
- ✅ Proven reliability
**Options Rejected:**
- ❌ Manual microphone toggling (not industry standard)
- ❌ AudioWorklet native bypass (breaks AEC)
- ❌ Custom echo cancellation (reinventing wheel)
- ❌ Native Rust WebRTC (2-4 weeks work for no benefit)
## Platform Context: Web Application
**IMPORTANT:** Quetrex is now a **pure web application**, not a Tauri desktop app.
**Why this matters for voice:**
- Browser echo cancellation works perfectly in all browsers
- No WKWebView limitations (that was the Tauri problem)
- Universal compatibility (Chrome, Safari, Firefox, Edge)
- No platform-specific audio handling needed
- Just works™️
**The WKWebView Problem (Historical):**
Quetrex was originally a Tauri desktop app. On macOS, Tauri uses WKWebView, which has a bug: **WebRTC audio playback doesn't work**. This forced us to try workarounds (AudioWorklet bypass, native audio routing), which all broke echo cancellation.
**Solution:** Convert to web application. Now echo cancellation works perfectly everywhere.
## Testing Voice Changes
If you must modify voice code:
1. **Test on real browsers** (not just dev tools)
2. **Test the echo scenario:**
- Turn up speaker volume
- Start voice conversation
- Verify no feedback loop/echo
3. **Test across platforms:**
- Chrome (most common)
- Safari (macOS, iOS)
- Firefox
- Edge
4. **Test edge cases:**
- Poor network connection
- Microphone permission denied
- Mid-conversation disconnect
## When to Consult Documentation
**Before any voice changes, read:**
1. `docs/decisions/ADR-001-VOICE-ECHO-CANCELLATION.md`
2. `docs/architecture/VOICE-SYSTEM.md`
3. `docs/development/abandoned-approaches/`
**If you see mentions of:**
- Tauri → Ignore (Quetrex is web app now)
- WKWebView → Ignore (not relevant anymore)
- AudioWorklet bypass → Don't implement (already tried, failed)
- Manual mic toggling → Don't implement (not industry standard)
## Summary
**The Golden Rule:** Trust browser echo cancellation. Always-on microphone + `echoCancellation: true` + audio in browser pipeline = Perfect echo cancellation.
**DO:**
- Keep microphone enabled throughout conversation
- Use `echoCancellation: true`
- Keep audio in browser (HTMLAudioElement)
- Let OpenAI Realtime API handle VAD
- Trust the industry pattern
**DON'T:**
- Toggle microphone track
- Route audio outside browser
- Implement custom echo cancellation
- Add artificial delays
- Try to "improve" what already works
**If in doubt:** Read ADR-001 and VOICE-SYSTEM.md. The architecture is thoroughly documented for a reason.
Name Size