flowerpower

The Software was created and used in the paper: Jaspreet Singh, Sergej Zerr, and Stefan Siersdorfer. 2017. Structure-Aware Visualization of Text Corpora. In Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval (CHIIR '17).

A very short guide (will be more detailed hopefully later):

Create a main folder for your dataset
Every subfolder in that folder is assumed to contain text files from a particular category
Every line in the text file is a document
The flower creator class: FlowerPower/src/main/java/de/l3s/analysis/topicflower/TopicFlowerCreator.java Start it with --help, it will tell which options are required
If your flower is garbage - there could be several reasons:

pa) not enough text / size of the corpora too small (topics make no sense)

pb) Too much overlapping information between the categories (same topics in different categories)

pc) Too many or to little k (topic number) is selected (all topics look the same, or are too general)

pd) Text need cleaning (Strange, or meaningless terms in topic lables)

The package can support by some, but not all proplems:

pa) It is possible to give additional folder with text as background information

pb,pd) There is a class, which helps to identify the corpora specific stopwords: https://github.com/sergejzr/flowerpower/blob/master/FlowerPower/src/main/java/de/l3s/analysis/topicflower/StatisticsAnalysis.java run with -h, it will tell you the details

pd) There is a class for text cleaning, it will create a copy of the folder structure and copy there all files from the main folder cleaned: https://github.com/sergejzr/flowerpower/blob/master/FlowerPower/src/main/java/de/l3s/analysis/topicflower/TextFileCleaner.java

For pc) the user is responsible. Try different models, see which one makes sense. We have a tool for estimation of optimal k in the research pipeline, but it is not expected soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

flowerpower

Files

README.md

Latest commit

History

README.md

File metadata and controls

flowerpower