Skip to content

Latest commit

 

History

History
84 lines (57 loc) · 4.01 KB

README-en.md

File metadata and controls

84 lines (57 loc) · 4.01 KB

🤵 Mann eller Kvinne? 💃

A website that guesses whether you're a man or a woman based on what you type

🇳🇴 Norsk

This repo is the backend for the mann-eller-kvinne project. You can find the repo for the fontend by clicking this link.

The website makes a guess by using simple machine learning algorithms. The model is trained on 3000+ book reviews featured in norwegian media and its goal is to highlight the difference between men and women in the way they write.

The concept of this website is inspired by a controversy in Norway. This controversy stemmed from a criminal offense case where threatning letters had been sent. There was discussion about whether or not the letters were written by a man or a woman due to the letter using the word "tisse" instead of "pisse". You can read more about the case here

You're welcome to contribute if you feel like there's something missing or if there's something that can be improved. If have any expreience with machine learning we highly appreaciate any contributions. Before contributing though, check out the contribution guide. If you know React or have any experience with web development and want to contribute to the frontend, head over to its repo.

A quick note

This website's purpose is to discover the difference between men and women in their writing. It should be noted though that the models that are used to make predictions are not optimized and are trained on a fairly small dataset. In other words, you should think critically about the results. This project's is not meant for any serious purpose or reasearch.

Get started

Docker
  1. Run the backend
docker run -d -p 5000:5000 --name mann-eller-kvinne-backend ghcr.io/lblend/mann-eller-kvinne:latest

We reommend using the supplied command above. You are however free to change variables as you please if you know what you're doing.

Note that changing the internal port will break the application without making changes to the code.

Also note that by using our prebuilt images, you won't have the ability to change the port at all.

Manual
  1. Clone this repo and install the dependencies
  • Python 3.10+
  • Pip
  1. Run the build/installation script sh build.sh

Note that this script assumes that you have set python3 as the PATH to your Python installation. If this isn't the case, you have to modify the script or change your path accordingly.

  1. Run the API
    uvicorn src.main:app --host 0.0.0.0 --port 5000 --proxy-headers
    

In addition to these options you have the option to use the docker-compose file included in this repo to run both the frontend and the backend through Docker. Note however that this option may limit your ability to configure your images to your desire. See the frontend repo for more information.

Thank you

We want to thank LtgOslo for making the corpus/dataset that is used to train the machine learning model.

You can find the source for this corpus here

@inproceedings{touileb-etal-2020-gender,
    title = "Gender and sentiment, critics and authors: a dataset of {N}orwegian book reviews",
    author = "Touileb, Samia and {\O}vrelid, Lilja and Velldal, Erik",
    booktitle = "Proceedings of the Second Workshop on Gender Bias in Natural Language Processing",
    month = dec,
    year = "2020",
    address = "Barcelona, Spain (Online)",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.gebnlp-1.11",
    pages = "125--138"
}