Skip to content

hectoramirez/Language-localization_FIFA

Repository files navigation

Language localization for the FIFA videogame

End-to-end project involving exploratory data analysis, social media analysis and natural language processing.

This is an end-to-end project where we aim to perform a language localization for the FIFA videogame with only public and recollected data sets.

  • A Medium Story of this project was featured in the Data Science and Programming topic pages, and was published in Towards Data Science.

  • Another Medium Story covering the process in Tweets processing and sentiment.ipynb was featured in the Data Science and Machine Learning topic pages, and was published in Towards Data Science.

Binder Medium Medium TDS Licence

Language localization is the process of adapting a product's translation to a specific country or region. It is the second phase of a larger process of product translation and cultural adaptation (for specific countries, regions, cultures or groups) to account for differences in distinct markets, a process known as internationalization and localization.

In this project we aim to find the best possible language to which translate next versions of Electronic Arts's videogame, FIFA. For this, we don't have any official data but the specifications of the game and a full dataset of the attributes and skills of the players in the game. Apart from this, we also employ a collection of Twitter tweets mentioning the game and with a sentiment score. This data, however, is enough to come with an insightful conclusion about what would be a good language candidate for the future.

Notebooks

  • FIFA_localization.ipynb: Main notebook with the analyses. The contents include:

    1. FIFA languages included in the game
    2. The FIFA 20 dataset
      1. Adding languages to the dataset
      2. Player's international reputations
    3. Twitter analysis
      1. Tweets with coordinates (exact location)
      2. Tweets by tweet language
      3. Tweets by user location
      4. Sentiment of the top languages by international reputation
    4. Conclusions
  • Tweets processing and sentiment.ipynb: Here we clean, process and translate the tweets and compute the sentiment. A copy of this notebook comes in a Python script in Tweets processing and sentiment.py.

  • Countries_dataset.ipynb: We construct a country/language CSV dataset from several public JSON files.

Twitter/: This folder contains Python scripts used to collect the tweets processed in Tweets processing and sentiment.ipynb.

Note: Please keep in mind that Twitter does not allow to make data obtained form their API publicly available. Therefore, some parts of the notebooks are not reproducible in Binder or Google Colab.

Results

The results are summarized in Plotly's maps in the Results/ folder.