Create New Project
My Project
Text Ranking Page
Text-to-text Page
Text Classification Page
Job Details
Job List
My Annotations
Landing Page

AnotaNusa - Preserving Indonesian Local Languages Through Data Annotation

Inspiration

With over 700 local languages spoken across its archipelago, Indonesia is a nation of remarkable linguistic diversity, which is an essential part of its national identity, embodied in the motto "Bhinneka Tunggal Ika" (Unity in Diversity). Yet in the digital age, this rich heritage is increasingly at risk. Most of these languages are critically underrepresented in technology and media, leading to a widening digital divide that threatens both their preservation and the cultural connections they sustain. While pioneering initiatives such as NusaCrowd, IndoNLG, NusaX, and NusaWrites have made valuable contributions, they also highlight how difficult and resource-intensive it is to create high-quality, labeled datasets for local languages. The core challenge lies in a severe shortage of high-quality, labeled data needed for training advanced AI and NLP systems like Cendol and SahabatAI. For the vast majority of local languages, labeled data is either extremely limited or entirely absent, making it nearly impossible to build robust AI systems that can preserve and promote these languages in the digital era.

What it does

AnotaNusa is a collaborative, crowdsourced web platform built to drive the creation of high-quality, culturally aware NLP datasets for all of Indonesia's languages. It operates as a two-sided platform that directly addresses the critical data bottleneck hindering the development of truly localized AI systems.

For Creators (Researchers and Developers):

Easily design and launch annotation projects for various AI and NLP tasks
Access qualified, native local language speakers without logistical burden
Obtain high-quality, culturally authentic datasets

For Contributors/Annotators (Indonesian Speakers):

Register and select tasks in their native local language(s)
Contribute linguistic expertise through intuitive interfaces
Receive fair, direct compensation for their work

Key Features:

1. Core NLP Tasks Annotation

Text Classification: Simple interface for sentiment and emotion labeling with majority voting for quality assurance
Text-to-Text Generation: Human-powered translation, summarization, and conversational AI training data creation
Text Ranking: Drag-and-drop interface for ranking AI outputs from best to worst

2. Direct Contributor Reward System

Transparent per-task payment system
Fair compensation for local language speakers
Sustainable, community-driven data creation model

How we built it

Our development process leveraged modern tools and AI-assisted development: Tech Stack:

Frontend & Backend: Next.js (Full-stack framework)
Database: Firebase & Firestore (NoSQL database)
Hosting: Vercel AI-Assisted Development Tools:
Cursor: AI-powered code editor for enhanced productivity
v0.dev: AI-powered UI component generation and rapid prototyping
Various Generative AI Models: For code generation, debugging, and development assistance The platform features specialized annotation workflows designed to support key NLP tasks and mobilize a nationwide effort to create high-quality, labeled datasets. We implemented intuitive interfaces for text classification, text-to-text generation, and text ranking tasks, each optimized for different types of linguistic annotation work.

Challenges we ran into

1. Time Management

Working within tight hackathon deadlines while building a comprehensive platform
Balancing feature development with quality assurance

2. Team Constraints (-1 member)

One team member fell sick and couldn't attend, reducing our team to just 3 members
Had to redistribute workload and responsibilities on the fly

3. Continuous Brainstorming and Alignment

Ensuring all team members were aligned on the vision and implementation
Avoiding time waste through effective communication and decision-making
Balancing ambitious goals with realistic implementation timelines

Accomplishments that we're proud of

Despite working with only 3 team members, we successfully delivered a comprehensive solution to preserve low-resource languages in Indonesia. Our key accomplishments include:

1. Complete Platform Development

Built a fully functional two-sided platform connecting researchers with native speakers
Implemented all core NLP annotation features (text classification, text-to-text generation, text ranking)

2. Scalable Architecture

Created a sustainable, community-driven workflow for data generation
Designed transparent reward systems to motivate contributor participation

3. Cultural Authenticity Focus

Ensured the platform captures not just grammatical correctness but cultural authenticity
Built tools that leverage deep cultural knowledge of native speakers

4. Technical Innovation

Successfully integrated modern AI-assisted development tools
Created intuitive interfaces that make complex annotation tasks accessible While our solution isn't perfect (because no one is perfect), we're proud of delivering a platform that leverages all the core features needed to address Indonesia's linguistic diversity challenge.

What we learned

1. The Power of AI-Assisted Development

Modern tools like Cursor and v0.dev can significantly accelerate development
Generative AI can help bridge skill gaps and enhance productivity

2. Community-Driven Solutions Work

Crowdsourcing can be an effective approach to large-scale data collection
Fair compensation and transparent processes are crucial for sustainable participation

3. Cultural Sensitivity in Tech

Building for linguistic diversity requires deep understanding of cultural contexts
Technology solutions must respect and preserve cultural authenticity

4. Team Resilience

Small, dedicated teams can achieve significant results with proper planning
Effective communication becomes even more critical with reduced team size

What's next for AnotaNusa

1. Community Building

Launch pilot programs with specific language communities
Partner with universities and research institutions
Create educational resources about language preservation

2. Data Quality Enhancement

Implement advanced validation and verification systems
Develop contributor training programs
Create quality metrics and feedback systems

3. Integration and Partnerships

Connect with existing Indonesian AI initiatives (Cendol, SahabatAI)
Partner with government agencies focused on cultural preservation
Collaborate with international organizations working on endangered languages

4. Sustainability Model

Develop long-term funding strategies
Create enterprise solutions for commercial NLP development
Establish partnerships with tech companies needing Indonesian language data

Our ultimate goal is to ensure that every Indonesian language has the digital representation it deserves, preserving our nation's linguistic heritage for future generations while enabling the development of truly inclusive AI systems.

#BahasaUntukBangsa - Preserving Indonesian Local Languages Through Data Annotation