How fast is 'real-time'?
Latency varies by tool and implementation. Voco publishes a ~500ms end-to-end latency figure — half a second from speech to text appearing on an attendee's screen. LiveVoice claims 0.2 seconds. Glossa describes its speed as 'fractions of a second' without publishing a figure. At sub-second latency, the translation feels genuinely simultaneous — far better than the multiple-second delays experienced with older systems.
What can go wrong with latency
Latency degrades when: (1) the WiFi connection drops and the system doesn't recover gracefully (the 'minutes behind' problem); (2) the translation server is under load; (3) the audio pipeline introduces delay before reaching the translation engine. Well-designed systems like Voco handle drops with auto-reconnect and backfill — so even if latency briefly increases, the content catches up rather than falling permanently behind.
Real-time vs near-real-time vs human interpretation
- Human simultaneous interpretation: 0–2 second lag, highest accuracy, very expensive
- AI real-time translation (Voco, LiveVoice): 200–600ms, very good accuracy for common languages
- AI near-real-time translation: 1–3 seconds, good accuracy — adequate for most services
- Human consecutive interpretation (spoken aloud after each paragraph): 30–90 second delay, typically used in small groups only