An API (which provides the list of upcoming technical conferences)
tweaker program written in python which performs following tasks:
-
Extract contents from given Api link
-
Print the contents in a human readable format, e.g.:
“San Antonio CyberSecurity Conference”, November 6th, 2019, San Antonio, TX, US, Paid. https://futureconevents.com/events/san-antonio/”
-
Identify exact duplicates (if any)
-
Identify semantic duplicates (i.e., the conferences are same but the details provided are slightly different, e.g., “React Conference 2019” in one entry and “ReactConf ‘19” in another entry but the other fields are same or similar).
-
Task 3 is partly completed. Semantic duplicates are extracted with complete accuracy when implemented in NLP which is out of scope for this task. Therefore, python library fuzzywuzzy is used here for getting better results.
fuzzywuzzy
has 4 types,FuzzyWuzzyRatio
FuzzyWuzzy PartialRatio
FuzzyWuzzyTokenSortRatio
andFuzzyWuzzyWRatio
.- Among these four,
FuzzyWuzzyRatio
produces descent result for this particular api with ratio>=71.