Skip to content
Gleb Abroskin edited this page Jun 11, 2017 · 3 revisions

The goal

The goal is to create library which is going to provide easy-to-use interface for all the DotA2 internal data, such as Heroes stats, abilities specs and items descriptions. This can be used to discover and analyse the data, build a site around it.

Difference from other projects

Every ML project that I have seen so far connected with dota2 uses the same approach: you got huge amount of matches and build the model around data about picks. In this approach model doesn't contain any information about hero except for one number -- his id. But there is a lot of data from attributes to hero type and laning preferences about almost all heroes, so probably this can help build a better model and predict result more precisely.

The data

There are a lot of information in DotA2 and I've tried to extract, clean and structure it.

Extraction

All the data is stored in txt with special formatting. Examples of in-game files can be found in data/702 folder. In preprocessing/txt2json.py (which is a standalone script) you can find functions to transform txt files from the game to the json which is way more handy to use.

Cleaning

The information about abilities is quite strange: there is no such a thing as set of variables or columns in which skills are described, variable can easily contain name of the ability arow_stun, max_damage and damage_max are the same thing, but named differently etc. I tried to clean all that mess up without losing any information, at the end I got 837 variables instead of 1400+ at the beginning. preprocessing/abilities.py contains all the needed functions to perform such cleaning or I would say mining.

Storing

I wanted to learn some SQLAlchemy and choose it for ORM which is used instead of raw SQL queries. All the models are stored in 'models/' folder one file for model. The schemas are stored in 'data/db_schemas.json', and wrapped by a set of function which is in 'db/schemas.py'. This is done that way, because I wanted to separate schemas from code and hide the way it was implemented with a set of similar functions like get_<tablename>_scheme.

Structuring

All the information is structured in classes and classes are organised in following way: there are 2 interfaces (I call them so): Member and Group.

Member

The descendant of this class contains information about single member of some matter (hero, ability, item). It has only 2 necessary fields: id and model. Everything in the game has id, so it is stored and the model is one of the 'models/' to which some queries about this objects will be send.

Clone this wiki locally