🍅 tomoto - high performance topic modeling - for Ruby
Add this line to your application’s Gemfile:
gem "tomoto"
Train a model
model = Tomoto::LDA.new(k: 2)
model.add_doc("text from document one")
model.add_doc("text from document two")
model.add_doc("text from document three")
model.train(100) # iterations
Get the summary
model.summary
Get topic words
model.topic_words
Save the model to a file
model.save("model.bin")
Load the model from a file
model = Tomoto::LDA.load("model.bin")
Get topic probabilities for a document
doc = model.docs[0]
doc.topics
Get the number of words for each topic
model.count_by_topics
Get the vocab
model.vocabs
Get the log likelihood per word
model.ll_per_word
Perform inference for unseen documents
doc = model.make_doc("unseen doc")
topic_dist, ll = model.infer(doc)
Supports:
- Latent Dirichlet Allocation (
LDA
) - Labeled LDA (
LLDA
) - Partially Labeled LDA (
PLDA
) - Supervised LDA (
SLDA
) - Dirichlet Multinomial Regression (
DMR
) - Generalized Dirichlet Multinomial Regression (
GDMR
) - Hierarchical Dirichlet Process (
HDP
) - Hierarchical LDA (
HLDA
) - Multi Grain LDA (
MGLDA
) - Pachinko Allocation (
PA
) - Hierarchical PA (
HPA
) - Correlated Topic Model (
CT
) - Dynamic Topic Model (
DT
)
This library follows the tomotopy API. There are a few changes to make it more Ruby-like:
- The
get_
prefix has been removed from methods (topic_words
instead ofget_topic_words
) - Methods that return booleans use
?
instead ofis_
(live_topic?
instead ofis_live_topic
)
If a method or option you need isn’t supported, feel free to open an issue.
Documents are tokenized by whitespace by default, or you can perform your own tokenization.
model.add_doc(["tokens", "from", "document", "one"])
tomoto uses AVX2, AVX, or SSE2 instructions to increase performance on machines that support it. Check which instruction set architecture it’s using with:
Tomoto.isa
Choose a parallelism algorithm with:
model.train(parallel: :partition)
Supported values are :default
, :none
, :copy_merge
, and :partition
.
View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone --recursive https://github.com/ankane/tomoto-ruby.git
cd tomoto-ruby
bundle install
bundle exec rake compile
bundle exec rake test