Question 1

Why is crowd-sourced Uzbek speech data low quality?

Accepted Answer

Common Voice has Uzbek audio, but it's read-aloud sentences by random volunteers — not how real people talk. Models trained on it achieve 14-24% WER.

Question 2

What's wrong with scraped Uzbek audio datasets?

Accepted Answer

Some providers scrape 1,000+ hours of Uzbek audio, but nobody checks the transcripts. Garbage in equals garbage out — these models achieve 16.7% WER.

Question 3

Is there any verified conversational Uzbek speech data available for both ASR and TTS?

Accepted Answer

Until Biruniy, there was zero verified Uzbek speech data suitable for both ASR and TTS publicly available. Biruniy Gold is the first dataset where every word is verified by 2+ independent native speakers, with emotion labels, speaker isolation, and sentence-complete filtering for TTS training.

Question 4

What WER can models achieve on Biruniy data?

Accepted Answer

Models trained on Biruniy Gold achieve 7-8% WER on real-world conversational Uzbek speech. On clean read speech, the WER is 5-6%.

Question 5

How many Uzbek dialects does Biruniy cover?

Accepted Answer

Biruniy data is tagged across 10 distinct Uzbek dialect regions: Toshkent, Farg'ona, Samarqand, Xorazm, Buxoro, Namangan, Andijon, Qashqadaryo, Navoiy, and Surxondaryo.

Question 6

What is the Biruniy Gold Dataset?

Accepted Answer

Biruniy Gold is a production-ready conversational Uzbek speech dataset with 50+ hours of human-verified audio, IAA quality assurance, speaker diarization, 10 dialect tags, word-level timestamps, and code-switching annotations.

Question 7

Can I use Biruniy data for commercial applications?

Accepted Answer

Yes. Biruniy offers a Research License that permits commercial use, as well as an Enterprise tier for custom data collection tailored to your domain vocabulary and use case.

Question 8

What format is the Biruniy dataset delivered in?

Accepted Answer

The dataset is delivered in HuggingFace-compatible JSONL + WAV format. It includes speaker-based train/val/test splits, full metadata (dialect, MOS, speaker ID, timestamps, emotion labels, sentence_complete flags), and works out of the box with PyTorch, Transformers, faster-whisper, VITS, and Coqui TTS.

Benchmarks	Academic (USC / ISSAI)	Uzinfocom (Nutq.uz)	jmshd/whisper-uz	Kotib AI	islomov/rubaistt	BlueRaccoon	SyncAll AI	Biruniy Gold v1.0
Lab WER (clean)Read speech test sets	—	~12.5%	14.0%	6–11%	—	23.7%	Claims 2%*	5–6%
Real-World WERConversational speech with noise & dialects	11.6–17.4%	~18–22%	~25–30%	16.7%	~17%	~30–35%	Unverified	7–8%
Dialect TaggingRegions covered	❌	❌	❌	❌	❌	❌	❌	✅ 10
Speaker DiarizationIndividual speaker labels	❌	❌	❌	❌	❌	❌	Unknown	✅ Included
Human VerifiedTranscript verification method	Partial	✅	❌	❌	❌	❌	❌	✅ IAA (2+)
Data VolumeHours of audio	~105 hrs	Proprietary	~200 hrs	1,600 hrs	475 hrs	~15 hrs	Unknown	50 hrs

Uzbek speech dataverified. proven.

Three reasons why Uzbek ASR & TTS still fail in production.

Crowd-Sourced = Low Quality

Scraped = Unverified

Nothing For Voice AI

Production-ready Uzbek speech data for ASR & TTS. Download, train, deploy. No cleaning needed.

Natural Conversation

IAA Verified

Audio Quality Scored

Speaker Diarized

Word-Level Timestamps

10 Dialects Tagged

Code-Switching

Emotion Labeled

TTS-Ready Flags

Ready to Train

Proof: what models achieve when trained on our datasets.

Why dialects & diarization matter

Quality > Quantity: 50 hr vs 1,600 hr

TTS Training Ready

How we build the dataset.

Who uses Biruniy data.

Banking & Finance

Call Centers

AI Labs & Researchers

Voice Assistants & TTS

Voice Cloning & Dubbing

Get the data that powers 7–8% WER.

Research License

Enterprise

etamin/biruniy-v1 — Coming Soon

Get the data that powers 7–8% WER and natural-sounding Uzbek TTS.