LTM (Lifelong Topic Model)

LTM is an open-source Java package implementing the algorithm proposed in the paper (Chen and Liu, ICML 2014), created by Zhiyuan (Brett) Chen. For more details, please refer to this paper.

If you use this package, please cite the paper: Zhiyuan Chen and Bing Liu. Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data. In Proceedings of ICML 2014, pages 703-711.

If you have any question or bug report, please send it to Zhiyuan (Brett) Chen ([email protected]).

The output directory contains topic model results for each learning iteration (different from Gibbs sampling iteration, see the paper for details). LearningIteration 0 is always LDA, i.e., without any knowledge. LearningIteration i with i > 0 is the LTM model. The knowledge used for LearningIteration i is extracted from LearningIteration i - 1, except LearningIteration 0 which is LDA.

Under each learning iteration folder and sub-folder "DomainModels", there are a list of domain folders where each domain folder contains topic model results for each domain. Under each domain folder, there are 8 files (can be opened by text editors):

domain.docs: each line (representing a document) contains a list of word ids.
domain.dtopicdist: topic-word distribution.
domain.knowl: record the knowledge (for LTM only).
domain.param: parameter settings.
domain.tassign: topic assignment for each word in each document.
domain.twdist: topic-word distribution
domain.twords: top words under each topic. The columns are separated by '\t' where each column corresponds to each topic.
domain.vocab: mapping from word id (starting from 0) to word.

## Efficiency The program and parameters are set to achieve the best performance in terms of topic coherence quality, instead of efficiency. There are several ways to improve efficiency (from the simplest to the hardest).

Increase the number of threads in the program (specified by -nthreads in file "global/CmdOption.java"). The topic models are execuated in parallel in each domain using multithreading.
Reduce the frequency of updating knowledge in Gibbs sampling (i.e., knowledgeUpdatelag in the file "model/ModelParameters.java"). The default setting is 1, meaning the knowledge is updated in each Gibbs sampling iteration. Setting this value to any number from 10 to 50 will greatly reduce the execution time while slightly deteriorating the topic quality.
Use a better implementation for Apriori algorithm or use faster frequent itemset algorithm such as FP-growth.

## Contact Information * Author: Zhiyuan (Brett) Chen * Affiliation: University of Illinois at Chicago * Research Area: Text Mining, Machine Learning, Statistical Natural Language Processing, and Data Mining * Email: [email protected] * Homepage: http://www.cs.uic.edu/~zchen/

Name	Name	Last commit message	Last commit date
Latest commit brett-pplx Fixed two bugs of indexing in fim Mar 28, 2015 7b43242 · Mar 28, 2015 History 57 Commits
Data/Input/Dataset	Data/Input/Dataset	Deleted outpus	Sep 15, 2014
Src	Src	Fixed two bugs of indexing in fim	Mar 28, 2015
LICENSE	LICENSE	Initial commit	Jul 5, 2014
README.md	README.md	Update README.md	Dec 22, 2014
ReadMe_PlainText	ReadMe_PlainText	Update ReadMe_PlainText	Dec 22, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LTM (Lifelong Topic Model)

Table of Contents

Output

About

Releases

Packages

Languages

License

brett-pplx/LTM

Folders and files

Latest commit

History

Repository files navigation

LTM (Lifelong Topic Model)

Table of Contents

Output

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages