Open infrastructure for Ethiopian language AI

The future of AI
speaks Ethiopian

Dataset.ET is an open dataset initiative building speech, text, and translation datasets for 80+ Ethiopian languages to enable AI research and language technology.

0+

Languages

0+

Sentences

Our Mission

Preserving languages.
Powering intelligence.

Every language carries a universe of knowledge. No language should be left behind as the world moves toward AI.

Safeguard 80+ Languages

Every Ethiopian language carries centuries of knowledge, poetry, and identity. We create high-quality datasets that ensure no language faces digital extinction.

80+languages at risk

Train Smarter AI

From Amharic Ge'ez script to Oromo grammar, our corpus enables AI to understand, speak, and reason in Ethiopian languages with true fluency.

Zerolanguages left behind

Community-Owned Data

Open infrastructure built by researchers, developers, and language enthusiasts across Ethiopia. Your data, your rules, your future.

Opensource

Languages

Teaching AI to understand Ethiopia

Building high-quality datasets to train the next generation of Ethiopian AI models, expanding to more every quarter.

01

Amharic

500k+

አማርኛ

SemiticGe'ez script
02

Afaan Oromo

450k+

Afaan Oromoo

CushiticLatin script
03

Somali

200k+

Af Soomaali

CushiticLatin script
04

Tigrinya

200k+

ትግርኛ

SemiticGe'ez script
05

Sidamo

Coming soon

Sidaamu Afoo

CushiticLatin script
06

Wolaytta

Coming soon

Wolaytta

OmoticLatin script

+ 74 more languages on our roadmap

How It Works

From contribution to AI

Every contribution, no matter how small, helps bridge the gap between Ethiopian languages and artificial intelligence.

1

Contribute

Submit sentences, paragraphs, or documents in your native Ethiopian language through our portal.

2

Validate

Community reviewers verify every contribution for quality, accuracy, and linguistic structure.

3

Structure

Validated data is cleaned, annotated, and stored in our open-access corpus with full attribution.

4

Power AI

Researchers and developers use the corpus to train models that understand Ethiopian languages.

Community

Built by
Ethiopians,
for everyone.

Volunteers, researchers, linguists, and developers who believe in the power of language technology for Ethiopia.

Contributors

Donate your voice, translate text, and help expand our datasets across multiple Ethiopian languages.

Validators

Ensure quality by reviewing and verifying submitted audio and text contributions.

Developers

Build innovative AI tools and models using our open-source datasets and APIs.

Linguists

Provide expert guidance on grammar, phonetics, and cultural nuances of Ethiopian languages.

Get Involved

Your language matters.
Help us teach AI.

Every sentence you contribute builds the open infrastructure that will power Ethiopian AI for generations. No technical skills required.

By contributing, you agree to our Privacy Policy