Skip to content

An algorithm developed for counting words from documents in Python using pandas and textract. REGex pattern is tweaked to identify Latin characters all together (such as enzyme, protein names)

Notifications You must be signed in to change notification settings

OzgenOzan/word-counter-py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

An algorithm developed for counting words from documents in Python using pandas and textract. REGex pattern is tweaked to identify Latin characters all together (such as enzyme, protein names).

Our test data consists of 10 examples articles (Articles folder). We can identify word combinations written with dash, e.g. deep-learning, real-value... Also enzyme names such as ɑ-secretase, beta-secretase etc. (composed of Latin letters).

It can be useful to clear out the unnecessary words using excel. Some algorithms which can be used in VBA:

Delete Rows with a Specific Word/Value

Sub DeleteRowswithSpecificValue()
For i = Selection.Rows.Count To 1 Step -1
If Cells(i, 2).Value = "Certain Text or Value" Then
Cells(i, 2).EntireRow.Delete
End If
Next i
End Sub

Delete Words from cells that exceed certain character count

=IF(LEN(A2)>35,"Yes","")

I want to develop this algorithm into a web aplication.

About

An algorithm developed for counting words from documents in Python using pandas and textract. REGex pattern is tweaked to identify Latin characters all together (such as enzyme, protein names)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages