Multilingual meeting AI, measured in the open.
Beesiness transcribes multilingual business conversations at 8.9% WER on our internal evaluation set — comparable to the best vendor numbers we've seen published and roughly three times more accurate than the open-source baseline. The sections below show the numbers, the methodology, and five real code-switching examples across English, Spanish, German, and French.
Multilingual ASR Word Error Rate (lower is better)
| Model | WER % | Note |
|---|---|---|
Beesiness production pipelineours | 8.9% | Averaged across EN, ES, DE, FR, PT, AR business-meeting samples |
Industry baseline (zero-shot, multilingual) | 25.4% | Publicly published open-source baseline on the same evaluation set |
Beesiness next-gen (Q4 2026 target)target | 6.5% | In-house fine-tune target — not yet shipped |
Methodology.
- Evaluation set: 120 minutes of held-out business meetings recorded with consent across EN, ES, DE, FR, PT, AR — code-switching present in ~30% of clips. Sales, product, engineering, and education domains represented. Available under NDA to qualified enterprise buyers reviewing procurement; not yet released publicly.
- Reference transcripts: dual human annotators with disagreement resolution by a third reviewer. Standard NIST WER scoring; case-insensitive, punctuation stripped.
- Baseline: open-source ASR (zero-shot, large multilingual model) evaluated on the same audio. We re-run the comparison whenever we ship a pipeline update.
- Independent replication: contact benchmarks@beesiness.com for the evaluation set under an NDA. We're working toward a public release that doesn't expose customer audio.
Code-switching · five real meeting examples
Modern corporate speech mixes English business jargon inside any sentence frame — 'runway', 'churn', 'pipeline', 'rollback', 'SOC 2'. Most ASR + summarisation tools "translate" these, destroying the meaning. Beesiness preserves them verbatim across the language switch.
"We need to extend the runway this quarter — Series A is off the table until churn drops below 3%."
✓ Beesiness
Beesiness preserves 'runway', 'churn', and 'Series A' verbatim. Action item captured as "Extend Q4 runway plan — bring churn under 3% before Series A pitch".
✗ Naive translation
Naive translation pipelines convert 'runway' → 'airport runway' and 'churn' → 'butter churn', destroying the business meaning.
"Necesitamos actualizar el deal stage en HubSpot — hay un agujero serio en el pipeline este mes."
✓ Beesiness
Recognises HubSpot as a product name, keeps 'deal stage' and 'pipeline' as English even mid-Spanish. Action item: "Cerrar el gap en el HubSpot pipeline — este mes".
✗ Naive translation
Naive: HubSpot transliterated, 'pipeline' rendered as 'tubería' (literal plumbing), 'deal stage' translated word-by-word.
"Vor dem Deploy in Production müssen die E2E-Tests grün sein — letztes Mal mussten wir einen rollback machen."
✓ Beesiness
Decision row captured as "No production deploy without green E2E tests — rollback risk from last time". German wrapper preserved with English DevOps terms intact.
✗ Naive translation
Naive: 'rollback' → 'Rückwärtsbewegung', losing the deploy-pipeline-specific meaning.
"On va faire un showcase du sprint demo à la product team, et on iterate sur le feedback tout de suite."
✓ Beesiness
Action item: "Sprint demo showcase for product team — iterate on feedback the same day". Five English business terms preserved across the French sentence frame.
✗ Naive translation
Naive: 'showcase', 'feedback', 'iterate' all translated into clumsy native equivalents, losing the product-management precision.
"To close the enterprise deal we need SOC 2 Type II — let's accelerate the audit."
✓ Beesiness
Action item: "Accelerate SOC 2 Type II audit to close enterprise deal". Compliance terms preserved exactly across the speaker switch.
✗ Naive translation
Naive: 'enterprise', 'audit', 'Type II' translated when followed by non-English context — compliance reviewers would reject the summary.
See it on your own meeting
60 minutes free, no credit card. Bot joins your next Meet, Zoom, or Teams call and you read the benchmark numbers on your own meeting.