Clustering twitter conversations
This repository contains code that facilitates looking at clusters of conversation within the twitters, and how they evolve over time.
Relevant files include:
- Conversational Cluster Detection and Transition Likelihood Calculation Python Version
- Conversational Cluster Detection and Transition Likelihood Calculation Unicage Version
- Performance Comparison between python and unicage
- Data collection scripts
- Visualization
To configure the locations of various data, working, and tools directories, modify the config.json.template
file to include the relative paths to:
data_dir
: The location of raw twitter message dumpspython_working_dir
: Where intermediate python files should be storedunicage_working_dir
: Where intermediate unicage working files should be storedcos-parallel
: the path to the cos-parallel toolmaximal_cliques
: the path to cos-parallel's utility 'maximal cliques'
as an example:
{"maximal_cliques": "tools/cosparallel-0.99/extras/./maximal_cliques",
"cos-parallel": "tools/cosparallel-0.99/extras/./maximal_cliques",
"python_working_dir": "../PYTHON/working/",
"unicage_working_dir": "../UNICAGE/working/",
"data_dir": "../data/" }