Skip to content

CSC-ORG/Document-Feature-Identification-Vasavi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Document-Feature-Identification-Vasavi

The project is about taking random text documents of different topic scenarios and grouping/classifying these documents into respective domain areas by creating a algorithm. Initially, the user has to upload the files that needs to be classified.The classification algorithm takes all the input files into an array. Chunking and parts of speech tagging is done on the data array.Now construct parse tree and append all the nouns to an array. Frequency distribution is calculated on the first file in array. We get top 100(we can change this number) frequent words from the frequency distribution . now get top 100(number can be changed) words from the category we want to compare .Start comparing every word from frequency distribution with all categories necessary and increase the count variable of that respective category. The file is then moved to the folder which has more number of word matches.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published