Skip to content

HappyHackingSpace/Kurdish-Dataset

Kurdish Dataset

A collaborative platform for building a comprehensive Kurdish-Kurmanji language dataset through community contributions. This project provides a user-friendly interface that allows non-technical users to contribute to the dataset by submitting and reviewing Kurdish text content.

Overview

The project aims to create a large-scale Kurdish-Kurmanji dataset by enabling contributions from both technical and non-technical users. The platform features:

  • A user-friendly web interface for text submission
  • PDF to text conversion with manual review capabilities
  • Admin panel for content moderation
  • Automated integration with Hugging Face datasets

Features

  • Easy Submission: Simple form interface for submitting Kurdish text content
  • PDF Processing: Automatic conversion of PDF files to editable text
  • Content Review: Built-in text editor for reviewing and editing converted content
  • Admin Moderation: Comprehensive admin panel for content approval
  • Dataset Integration: Automatic pushing of approved content to Hugging Face

Technical Stack

  • Backend: Django
  • Database: SQLite
  • Storage: Supabase
  • Dataset Hosting: Hugging Face Hub

Contributing

We welcome contributions to this project! Whether you're a developer or a Kurdish language enthusiast, you can help in several ways:

  1. Submit Kurdish text content through the web interface
  2. Review and validate submitted content
  3. Report issues or suggest improvements
  4. Contribute code improvements

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •