Document-Feature-Identification-Vasavi

The project is about taking random text documents of different topic scenarios and grouping/classifying these documents into respective domain areas by creating a algorithm. Initially, the user has to upload the files that needs to be classified.The classification algorithm takes all the input files into an array. Chunking and parts of speech tagging is done on the data array.Now construct parse tree and append all the nouns to an array. Frequency distribution is calculated on the first file in array. We get top 100(we can change this number) frequent words from the frequency distribution . now get top 100(number can be changed) words from the category we want to compare .Start comparing every word from frequency distribution with all categories necessary and increase the count variable of that respective category. The file is then moved to the folder which has more number of word matches.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.idea		.idea
src		src
PL_corpora.txt		PL_corpora.txt
README.md		README.md
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document-Feature-Identification-Vasavi

About

Releases

Packages

Contributors 2

Languages

CSC-ORG/Document-Feature-Identification-Vasavi

Folders and files

Latest commit

History

Repository files navigation

Document-Feature-Identification-Vasavi

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages