Uzbek speech dataverified. proven.

50+ hours of natural conversational Uzbek — real speech from podcasts and talk shows, not scripted reading. Every word reviewed by 2+ independent native speakers. Train ASR models to 7–8% WER. Build TTS voices with natural emotion and regional accent. Download, train, deploy — no cleaning needed.

Verified Hours

50+

Transcript Accuracy

99.2%

Unique Speakers

50+

Uzbek Dialects

10

Emotion Labels

The Uzbek Data Problem

Three reasons why Uzbek ASR & TTS still fail in production.

Crowd-Sourced = Low Quality

Common Voice has Uzbek audio — but it's read-aloud sentences by random volunteers. Not how real people talk. Models trained on it: 14–24% WER.

Scraped = Unverified

Some providers scrape 1,000+ hours of Uzbek audio. But nobody checks the transcripts. Garbage in = garbage out. 16.7% WER.

Nothing For Voice AI

There is zero verified Uzbek speech data suitable for both ASR and TTS publicly available. Until now.

We built the solution: a dataset where every word is verified by humans.

What You Get

Production-ready Uzbek speech data for ASR & TTS. Download, train, deploy. No cleaning needed.

Natural Conversation

Real speech from podcasts and talk shows — not people reading scripts into a microphone.

IAA Verified

Every low-confidence word reviewed by 2+ independent native Uzbek speakers. Conflicts resolved by senior adjudicator.

Audio Quality Scored

DNSMOS quality scoring on every segment. Only MOS ≥ 3.5 included. Background noise removed with DeepFilterNet.

Speaker Diarized

50+ unique speakers labeled. Train/Val/Test split by speaker — zero data leakage.

Word-Level Timestamps

Millisecond-accurate start/end times for every word. Confidence scores included.

10 Dialects Tagged

Toshkent, Farg'ona, Samarqand, Xorazm, Buxoro, Namangan, Andijon, Qashqadaryo, Navoiy, Surxondaryo.

Code-Switching

Natural Uzbek ↔ Russian switching included. Critical for call center applications.

Emotion Labeled

Every segment classified by emotion (neutral, happy, sad, angry, surprised) with confidence scores. Build expressive TTS voices.

TTS-Ready Flags

Each segment includes sentence_complete, is_clean_speech, and speaking_rate_wpm — filter instantly for TTS training data.

Ready to Train

HuggingFace-compatible JSONL + WAV. Works with PyTorch, Transformers, faster-whisper, VITS, and Coqui TTS out of the box.

Benchmarks

Proof: what models achieve when trained on our datasets.

We trained etamin/biruniy-v1 on our dataset to prove its quality. Here's how models built on our data compare to every other Uzbek speech AI model — including ones trained on 30× more scraped data.

BenchmarksAcademic (USC / ISSAI)Uzinfocom (Nutq.uz)jmshd/whisper-uzKotib AIislomov/rubaisttBlueRaccoonSyncAll AIBiruniyBiruniy Gold v1.0
Lab WER (clean)Read speech test sets~12.5%14.0%6–11%23.7%Claims 2%*5–6%
Real-World WERConversational speech with noise & dialects11.6–17.4%~18–22%~25–30%16.7%~17%~30–35%Unverified7–8%
Dialect TaggingRegions covered✅ 10
Speaker DiarizationIndividual speaker labelsUnknown✅ Included
Human VerifiedTranscript verification methodPartial✅ IAA (2+)
Data VolumeHours of audio~105 hrsProprietary~200 hrs1,600 hrs475 hrs~15 hrsUnknown50 hrs

* SyncAll AI claims 2% WER — unverified, likely intent recognition accuracy.

Lab WER = clean read speech. Real-World WER = conversational with noise & dialects.

Biruniy results measured on held-out Biruniy Gold Test Set.

Why dialects & diarization matter

Most Uzbek datasets treat the language as a monolith. Biruniy data is tagged by 10 distinct regions and includes millisecond-accurate speaker labels. Build models that know who is speaking and where they are from.

Quality > Quantity: 50 hr vs 1,600 hr

Kotib AI trained on 1,600 hours of scraped audio. Their Real-World WER: 16.7%. Biruniy trained on 50 hours of IAA-verified data. Our Real-World WER: 7–8%. Every hour of Biruniy data is worth 32 hours of noisy, scraped data.

TTS Training Ready

Every dataset we deliver includes emotion labels, speaker isolation (50+ unique voices), and sentence-complete filtering — everything you need to train natural-sounding Uzbek TTS without months of data preparation. Ready to use, day one.

Pipeline

How we build the dataset.

6 stages. 3 AI models. 2+ human reviewers per segment.

01Source

Audio from Talabam.com — Uzbekistan's largest podcast and video platform. 100% native speakers. Real conversations.

02Clean

DeepFilterNet v3 noise removal. DNSMOS quality scoring. Only MOS ≥ 3.5 passes.

03Segment

Pyannote 3.1 speaker diarization. VAD filtering (≥50% speech). 5–30 second segments with speaker labels.

04Transcribe

Whisper ASR with word-level timestamps. Confidence scoring on every word. Low-confidence words auto-flagged for review.

05Human Verify

Every flagged word reviewed by 2+ independent native Uzbek speakers (Inter-Annotator Agreement). Conflicts resolved by senior adjudicator. This is what makes Biruniy data gold-standard.

06Export

HuggingFace-compatible JSONL + WAV. Speaker-based train/val/test splits. Full metadata: dialect, MOS, speaker ID, timestamps.

Use Cases

Who uses Biruniy data.

Banking & Finance

Build voice agents for Uzbek banking customers. Understand real conversational Uzbek — not scripted prompts. Code-switching support for Uzbek-Russian bilingual users.

Call Centers

Automate call transcription and quality monitoring. 10 dialect regions means your model works across all of Uzbekistan, not just Tashkent.

AI Labs & Researchers

Train or fine-tune Uzbek ASR and TTS models. Production-ready format. Zero preprocessing needed. Speaker-based splits prevent data leakage.

Voice Assistants & TTS

Build Uzbek voice assistants that sound natural. Our speaker-isolated, emotion-labeled data trains TTS models that speak with the right tone, pace, and regional accent.

Voice Cloning & Dubbing

50+ unique speakers with consistent audio quality. Perfect for voice cloning, audiobook narration, and video dubbing in Uzbek.

Dataset Access

Get the data that powers 7–8% WER.

Contact for pricing

Research License

  • 50+ hours verified conversational Uzbek
  • HuggingFace format (JSONL + WAV)
  • Speaker-based train/val/test splits
  • Full metadata (dialect, MOS, timestamps)
  • TTS quality flags (sentence_complete, emotion, clean_speech)
  • Topic classification per segment
  • Commercial use allowed
  • Email support
Custom contract

Enterprise

  • Everything in Research License
  • Custom data collection (your domain vocabulary)
  • Scale to 200+ hours on demand
  • Domain-specific: banking, medical, legal, telecom
  • Custom TTS voice data collection
  • Single-speaker datasets on demand
  • On-premise delivery option
  • Dedicated account manager
  • SLA guarantee

Open Source

etamin/biruniy-v1 — Coming Soon

We're open-sourcing the model we built to prove our dataset quality. Free for everyone. Apache 2.0 license.

The model is free. The data that makes it possible is what we sell. Want results like this for your own model? Get the dataset.

Model Card

Architecture
Whisper Large-v3 + LoRA
Parameters
1.55B + 27M adapters
WER (conversational)
7–8%
WER (clean)
5–6%
Training data
50 hrs Biruniy Gold Dataset
License
Apache 2.0

Contact

Get the data that powers 7–8% WER and natural-sounding Uzbek TTS.

Tell us what you're building. We'll respond within 24 hours.