-
Notifications
You must be signed in to change notification settings - Fork 1
Home
The goal is to create library which is going to provide easy-to-use interface for all the DotA2 internal data, such as Heroes stats, abilities specs and items descriptions. This can be used to discover and analyse the data, build a site around it.
Every ML project that I have seen so far connected with dota2 uses the same approach: you got huge amount of matches and build the model around data about picks. In this approach model doesn't contain any information about hero except for one number -- his id. But there is a lot of data from attributes to hero type and laning preferences about almost all heroes, so probably this can help build a better model and predict result more precisely.
There are a lot of information in DotA2 and I've tried to extract, clean and structure it.
All the data is stored in txt with special formatting. Examples of in-game files can be found in data/702
folder. In preprocessing/txt2json.py
(which is a standalone script) you can find functions to transform txt files from the game to the json which is way more handy to use.
The information about abilities is quite strange: there is no such a thing as set of variables or columns in which skills are described, variable can easily contain name of the ability arow_stun
, max_damage
and damage_max
are the same thing, but named differently etc. I tried to clean all that mess up without losing any information, at the end I got 837 variables instead of 1400+ at the beginning. preprocessing/abilities.py
contains all the needed functions to perform such cleaning or I would say mining.
I wanted to learn some SQLAlchemy and choose it for ORM which is used instead of raw SQL queries. All the models are stored in 'models/' folder one file for model. The schemas are stored in 'data/db_schemas.json', and wrapped by a set of function which is in 'db/schemas.py'. This is done that way, because I wanted to separate schemas from code and hide the way it was implemented with a set of similar functions like get_<tablename>_scheme
.
All the information is structured in classes and classes are organised in following way: there are 2 interfaces (I call them so): Member and Group.
The descendant of this class contains information about single member of some matter (hero, ability, item). It has only 2 necessary fields: id
and model
. Everything in the game has id, so it is stored and the model is one of the 'models/' to which some queries about this objects will be send.
Contacts: