The project is about taking random text documents of different topic scenarios and grouping/classifying these documents into respective domain areas by creating a algorithm. Initially, the user has to upload the files that needs to be classified.The classification algorithm takes all the input files into an array. Chunking and parts of speech tagging is done on the data array.Now construct parse tree and append all the nouns to an array. Frequency distribution is calculated on the first file in array. We get top 100(we can change this number) frequent words from the frequency distribution . now get top 100(number can be changed) words from the category we want to compare .Start comparing every word from frequency distribution with all categories necessary and increase the count variable of that respective category. The file is then moved to the folder which has more number of word matches.
-
Notifications
You must be signed in to change notification settings - Fork 2
CSC-ORG/Document-Feature-Identification-Vasavi
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published