Skip to content

bdotloh/textanalysis

Repository files navigation

text analysis

A simple workflow for processing text. Combines a series of sequential transformations to a set of raw, unprocessed documents.

The order of transformations, as of 25/05/2023:

  1. Preprocess: remove contractions, fix encoding issues.
  2. Obtain Document-Term matrix with user specified n-gram
  3. Embed text with Sentence-Transformers
  4. Reduce embedding dimensions with UMAP
  5. Cluster reduced dimensions embeddings with HDBSCAN

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published