Uzbek speech dataverified. proven.
50+ hours of natural conversational Uzbek — real speech from podcasts and talk shows, not scripted reading. Every word reviewed by 2+ independent native speakers. Train ASR models to 7–8% WER. Build TTS voices with natural emotion and regional accent. Download, train, deploy — no cleaning needed.
Verified Hours
50+
Transcript Accuracy
99.2%
Unique Speakers
50+
Uzbek Dialects
10
Emotion Labels
✓
The Uzbek Data Problem
Three reasons why Uzbek ASR & TTS still fail in production.
Crowd-Sourced = Low Quality
Common Voice has Uzbek audio — but it's read-aloud sentences by random volunteers. Not how real people talk. Models trained on it: 14–24% WER.
Scraped = Unverified
Some providers scrape 1,000+ hours of Uzbek audio. But nobody checks the transcripts. Garbage in = garbage out. 16.7% WER.
Nothing For Voice AI
There is zero verified Uzbek speech data suitable for both ASR and TTS publicly available. Until now.
We built the solution: a dataset where every word is verified by humans.
What You Get
Production-ready Uzbek speech data for ASR & TTS. Download, train, deploy. No cleaning needed.
Natural Conversation
Real speech from podcasts and talk shows — not people reading scripts into a microphone.
IAA Verified
Every low-confidence word reviewed by 2+ independent native Uzbek speakers. Conflicts resolved by senior adjudicator.
Audio Quality Scored
DNSMOS quality scoring on every segment. Only MOS ≥ 3.5 included. Background noise removed with DeepFilterNet.
Speaker Diarized
50+ unique speakers labeled. Train/Val/Test split by speaker — zero data leakage.
Word-Level Timestamps
Millisecond-accurate start/end times for every word. Confidence scores included.
10 Dialects Tagged
Toshkent, Farg'ona, Samarqand, Xorazm, Buxoro, Namangan, Andijon, Qashqadaryo, Navoiy, Surxondaryo.
Code-Switching
Natural Uzbek ↔ Russian switching included. Critical for call center applications.
Emotion Labeled
Every segment classified by emotion (neutral, happy, sad, angry, surprised) with confidence scores. Build expressive TTS voices.
TTS-Ready Flags
Each segment includes sentence_complete, is_clean_speech, and speaking_rate_wpm — filter instantly for TTS training data.
Ready to Train
HuggingFace-compatible JSONL + WAV. Works with PyTorch, Transformers, faster-whisper, VITS, and Coqui TTS out of the box.
Benchmarks
Proof: what models achieve when trained on our datasets.
We trained etamin/biruniy-v1 on our dataset to prove its quality. Here's how models built on our data compare to every other Uzbek speech AI model — including ones trained on 30× more scraped data.
| Benchmarks | Academic (USC / ISSAI) | Uzinfocom (Nutq.uz) | jmshd/whisper-uz | Kotib AI | islomov/rubaistt | BlueRaccoon | SyncAll AI | Biruniy Gold v1.0 |
|---|---|---|---|---|---|---|---|---|
| Lab WER (clean)Read speech test sets | — | ~12.5% | 14.0% | 6–11% | — | 23.7% | Claims 2%* | 5–6% |
| Real-World WERConversational speech with noise & dialects | 11.6–17.4% | ~18–22% | ~25–30% | 16.7% | ~17% | ~30–35% | Unverified | 7–8% |
| Dialect TaggingRegions covered | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ 10 |
| Speaker DiarizationIndividual speaker labels | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | Unknown | ✅ Included |
| Human VerifiedTranscript verification method | Partial | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ IAA (2+) |
| Data VolumeHours of audio | ~105 hrs | Proprietary | ~200 hrs | 1,600 hrs | 475 hrs | ~15 hrs | Unknown | 50 hrs |
* SyncAll AI claims 2% WER — unverified, likely intent recognition accuracy.
Lab WER = clean read speech. Real-World WER = conversational with noise & dialects.
Biruniy results measured on held-out Biruniy Gold Test Set.
Why dialects & diarization matter
Most Uzbek datasets treat the language as a monolith. Biruniy data is tagged by 10 distinct regions and includes millisecond-accurate speaker labels. Build models that know who is speaking and where they are from.
Quality > Quantity: 50 hr vs 1,600 hr
Kotib AI trained on 1,600 hours of scraped audio. Their Real-World WER: 16.7%. Biruniy trained on 50 hours of IAA-verified data. Our Real-World WER: 7–8%. Every hour of Biruniy data is worth 32 hours of noisy, scraped data.
TTS Training Ready
Every dataset we deliver includes emotion labels, speaker isolation (50+ unique voices), and sentence-complete filtering — everything you need to train natural-sounding Uzbek TTS without months of data preparation. Ready to use, day one.
Pipeline
How we build the dataset.
6 stages. 3 AI models. 2+ human reviewers per segment.
Audio from Talabam.com — Uzbekistan's largest podcast and video platform. 100% native speakers. Real conversations.
DeepFilterNet v3 noise removal. DNSMOS quality scoring. Only MOS ≥ 3.5 passes.
Pyannote 3.1 speaker diarization. VAD filtering (≥50% speech). 5–30 second segments with speaker labels.
Whisper ASR with word-level timestamps. Confidence scoring on every word. Low-confidence words auto-flagged for review.
Every flagged word reviewed by 2+ independent native Uzbek speakers (Inter-Annotator Agreement). Conflicts resolved by senior adjudicator. This is what makes Biruniy data gold-standard.
HuggingFace-compatible JSONL + WAV. Speaker-based train/val/test splits. Full metadata: dialect, MOS, speaker ID, timestamps.
Use Cases
Who uses Biruniy data.
Banking & Finance
Build voice agents for Uzbek banking customers. Understand real conversational Uzbek — not scripted prompts. Code-switching support for Uzbek-Russian bilingual users.
Call Centers
Automate call transcription and quality monitoring. 10 dialect regions means your model works across all of Uzbekistan, not just Tashkent.
AI Labs & Researchers
Train or fine-tune Uzbek ASR and TTS models. Production-ready format. Zero preprocessing needed. Speaker-based splits prevent data leakage.
Voice Assistants & TTS
Build Uzbek voice assistants that sound natural. Our speaker-isolated, emotion-labeled data trains TTS models that speak with the right tone, pace, and regional accent.
Voice Cloning & Dubbing
50+ unique speakers with consistent audio quality. Perfect for voice cloning, audiobook narration, and video dubbing in Uzbek.
Dataset Access
Get the data that powers 7–8% WER.
Research License
- →50+ hours verified conversational Uzbek
- →HuggingFace format (JSONL + WAV)
- →Speaker-based train/val/test splits
- →Full metadata (dialect, MOS, timestamps)
- →TTS quality flags (sentence_complete, emotion, clean_speech)
- →Topic classification per segment
- →Commercial use allowed
- →Email support
Enterprise
- →Everything in Research License
- →Custom data collection (your domain vocabulary)
- →Scale to 200+ hours on demand
- →Domain-specific: banking, medical, legal, telecom
- →Custom TTS voice data collection
- →Single-speaker datasets on demand
- →On-premise delivery option
- →Dedicated account manager
- →SLA guarantee
Open Source
etamin/biruniy-v1 — Coming Soon
We're open-sourcing the model we built to prove our dataset quality. Free for everyone. Apache 2.0 license.
The model is free. The data that makes it possible is what we sell. Want results like this for your own model? Get the dataset.
Model Card
- Architecture
- Whisper Large-v3 + LoRA
- Parameters
- 1.55B + 27M adapters
- WER (conversational)
- 7–8%
- WER (clean)
- 5–6%
- Training data
- 50 hrs Biruniy Gold Dataset
- License
- Apache 2.0
Contact
Get the data that powers 7–8% WER and natural-sounding Uzbek TTS.
Tell us what you're building. We'll respond within 24 hours.
Biruniy Gold v1.0