Skip to content

Latest commit

 

History

History
35 lines (24 loc) · 1.1 KB

File metadata and controls

35 lines (24 loc) · 1.1 KB

English Corpus Text Visualization using Word2Vec Model

Machine Learning approach to English Corpus Text-visualization using Word2Vec Model from Gensim Library in NLP. This project was done to test the accuracy of the Word2Vec Model on English Corpus.

Library requirements:

  1. Sklearn: Used for data preprocessing, model selection, classification, Regression, clustering.
  2. Matplotlib: It's used for 2D or 3D plotting to show Histogram, Bar-Chart etc
  3. Gensim: Open Source Library used in Text Analysis, Word2Vec, Doc2Vec.
  4. Used Melon Honey font & sample texts are collected from the Internet.

Word2Vec

Word2Vec model is used in word embedding. I have used here Gensim library & Matplotlib-pyplot for 2d visualization of corpus.

Methodology:

  1. First I took an English Corpus applied punctuation remover.
  2. Splitted the data & visualized the corpus using.
  3. Repeated the Process taking larger corpus.

Tools:

  1. Google Colab/Jupyter Notebook
  2. Language: Python
  3. Word2Vec from Gensim
  4. Matplotlib | Plyplot

Mentor

Prof. Sandipan Ganguly, HIT-K.

Developer

Rajdeep Das

Thank you