Skip to content

Small Flask API written in python used for Topic Modeling with Latent Dirichlet Allocation on a collection of documents.

License

Notifications You must be signed in to change notification settings

leonardomra/topic-modelling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

topic-modelling

Small Flask API written in python used for Topic Modeling with Latent Dirichlet Allocation on a collection of documents.

The API offers two endpoints, namely:

  • POST /topics
  • Data param: file: dataset.csv
  • Example of success response:
[{"title": "Suggested Topic", "terms": ["cell", "epithelial", "type", "epithelium", "airway", "human", "tissue", "cf", "ifn", "expression"]}, {"title": "Suggested Topic", "terms": ["patient", "hospital", "study", "group", "icu", "care", "\u00b1", "result", "day", "ed"]},...]
  • POST /count
  • Data param: file: dataset.csv
  • Example of success response:
{"count": {"analysis": 81773, "case": 83358, "include": 84623, "group": 85330, "human": 85863, "gene": 92091, ...}}

When running for the first time, it will be necessary to train the model. This can take quite some time. Because of the, the model can be stored and later on retrived, therefore avoiding retraining. For training set shouldUseDump to False in topicmodeller.py (this will be fixed later on).


The dataset can be either generated or acquired here GitHub. The link also provides the necessary documentation on the format and structure of the dataset. This API is based on this notebook. For more information on Topic Modelling with Latent Dirichlet Allocation, check this article!

About

Small Flask API written in python used for Topic Modeling with Latent Dirichlet Allocation on a collection of documents.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages