Skip to content

Commit

Permalink
add hugging face
Browse files Browse the repository at this point in the history
  • Loading branch information
aeltorio committed Nov 3, 2024
1 parent f4f9944 commit a9036fa
Showing 1 changed file with 21 additions and 0 deletions.
21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
[![Rust](https://img.shields.io/badge/Rust-1.70%2B-blue.svg)](https://www.rust-lang.org)
[![Cargo](https://img.shields.io/badge/Cargo-1.70%2B-blue.svg)](https://doc.rust-lang.org/cargo/)

# French Names Database Extractor

A Rust-based tool that creates a comprehensive database of French first names and last names by processing death records from INSEE (French National Institute of Statistics and Economic Studies).
Expand Down Expand Up @@ -96,6 +97,26 @@ lastnames.json

The death records data is sourced from INSEE's public database: <https://www.insee.fr/fr/information/4769950>

## Machine Learning Dataset

The extracted data can be used to create a machine learning dataset for training models to generate realistic French names.
Two datasets are provided in Hugging Face's datasets library:

- https://huggingface.co/datasets/eltorio/french_first_names_insee_2024
```python
from datasets import load_dataset

ds = load_dataset("eltorio/french_first_names_insee_2024")
```

- https://huggingface.co/datasets/eltorio/french_last_names_insee_2024

```python
from datasets import load_dataset

ds = load_dataset("eltorio/french_last_names_insee_2024")
```

## License

This project is licensed under the GNU Affero General Public License v3.0 - see the LICENSE.md file for details.
Expand Down

0 comments on commit a9036fa

Please sign in to comment.