A collaborative platform for building a comprehensive Kurdish-Kurmanji language dataset through community contributions. This project provides a user-friendly interface that allows non-technical users to contribute to the dataset by submitting and reviewing Kurdish text content.
The project aims to create a large-scale Kurdish-Kurmanji dataset by enabling contributions from both technical and non-technical users. The platform features:
- A user-friendly web interface for text submission
- PDF to text conversion with manual review capabilities
- Admin panel for content moderation
- Automated integration with Hugging Face datasets
- Easy Submission: Simple form interface for submitting Kurdish text content
- PDF Processing: Automatic conversion of PDF files to editable text
- Content Review: Built-in text editor for reviewing and editing converted content
- Admin Moderation: Comprehensive admin panel for content approval
- Dataset Integration: Automatic pushing of approved content to Hugging Face
- Backend: Django
- Database: SQLite
- Storage: Supabase
- Dataset Hosting: Hugging Face Hub
We welcome contributions to this project! Whether you're a developer or a Kurdish language enthusiast, you can help in several ways:
- Submit Kurdish text content through the web interface
- Review and validate submitted content
- Report issues or suggest improvements
- Contribute code improvements
This project is licensed under the MIT License - see the LICENSE file for details.