Open infrastructure for Ethiopian language AI

The future of AI
speaks Ethiopian

Dataset.ET is an open dataset initiative building speech, text, and translation datasets for 80+ Ethiopian languages to enable AI research and language technology.

Start Contributing Explore the Mission

Languages

Sentences

Our Mission

Preserving languages.
Powering intelligence.

Every language carries a universe of knowledge. No language should be left behind as the world moves toward AI.

Safeguard 80+ Languages

Every Ethiopian language carries centuries of knowledge, poetry, and identity. We create high-quality datasets that ensure no language faces digital extinction.

80+languages at risk

Train Smarter AI

From Amharic Ge'ez script to Oromo grammar, our corpus enables AI to understand, speak, and reason in Ethiopian languages with true fluency.

Zerolanguages left behind

Community-Owned Data

Open infrastructure built by researchers, developers, and language enthusiasts across Ethiopia. Your data, your rules, your future.

Opensource

Languages

Teaching AI to understand Ethiopia

Building high-quality datasets to train the next generation of Ethiopian AI models, expanding to more every quarter.

Amharic

500k+

አማርኛ

SemiticGe'ez script

Afaan Oromo

450k+

Afaan Oromoo

CushiticLatin script

Somali

200k+

Af Soomaali

CushiticLatin script

Tigrinya

200k+

ትግርኛ

SemiticGe'ez script

Sidamo

Coming soon

Sidaamu Afoo

CushiticLatin script

Wolaytta

Coming soon

Wolaytta

OmoticLatin script

+ 74 more languages on our roadmap

How It Works

From contribution to AI

Every contribution, no matter how small, helps bridge the gap between Ethiopian languages and artificial intelligence.

Contribute

Submit sentences, paragraphs, or documents in your native Ethiopian language through our portal.

Validate

Community reviewers verify every contribution for quality, accuracy, and linguistic structure.

Structure

Validated data is cleaned, annotated, and stored in our open-access corpus with full attribution.

Power AI

Researchers and developers use the corpus to train models that understand Ethiopian languages.

Community

Built by
Ethiopians,
for everyone.

Volunteers, researchers, linguists, and developers who believe in the power of language technology for Ethiopia.

Join our Telegram Group

Contributors

Donate your voice, translate text, and help expand our datasets across multiple Ethiopian languages.

Validators

Ensure quality by reviewing and verifying submitted audio and text contributions.

Developers

Build innovative AI tools and models using our open-source datasets and APIs.

Linguists

Provide expert guidance on grammar, phonetics, and cultural nuances of Ethiopian languages.

Get Involved

Your language matters.
Help us teach AI.

Every sentence you contribute builds the open infrastructure that will power Ethiopian AI for generations. No technical skills required.

Start Contributing Now

By contributing, you agree to our Privacy Policy

The future of AIspeaks Ethiopian

Preserving languages.Powering intelligence.

Safeguard 80+ Languages

Train Smarter AI

Community-Owned Data

Teaching AI to understand Ethiopia

Amharic

Afaan Oromo

Somali

Tigrinya

Sidamo

Wolaytta

From contribution to AI

Contribute

Validate

Structure

Power AI

Built byEthiopians,for everyone.

Contributors

Validators

Developers

Linguists

Your language matters.Help us teach AI.

The future of AI
speaks Ethiopian

Preserving languages.
Powering intelligence.

Built by
Ethiopians,
for everyone.

Your language matters.
Help us teach AI.