Twitter streaming API for streaming data and store in curated form and adding some filters.The task was divided in three parts
Streamed data from twitter based on a particular keyword "modi" and curate the data and store in Pymongo in two collections namely "User" and "Tweet".
Filtered tweets based on name,screenname,text,favcount,retweet count,tweet mentions,language,created_date.
Returned API response in CSV format based on selected columns like name,screenname,language,counts etc.
- Python/Flask Framework
- Pymongo(Mlab)
- Tweepy library
- Clone the project
- cd into the current project directory
- Create virtual environment by
virtualenv venvand activate it bysource venv/bin/activate - Install the requirements by
pip install -r requirements.txt - Run
python mongo.py
To trigger stream and store the result in mongoDB based on a keyword
API = http://localhost:8000/firstapi/<string:keyword>
This API runs for time 60 sec as the time_limit = 60
sw = starts with
ew = ends with
co = contains
lt = less than
gt = greater than
eq = equal to
| Filter Keyword | Filter Parameters |
|---|---|
| name | sw,ew,co |
| text | sw,ew,co |
| scrname | sw,ew,co |
| mention | sw,ew,co |
| rtcount | lt,gt,eq |
| favcount | lt,gt,eq |
| lang | |
| datestart | yyyy-mm-dd |
| dateend | yyyy-mm-dd |
API = "http://localhost:8000/secondapi/?text=covisit&scrname=swDev&rtcount=lt5"
Returns tweets which contains visit in text and screenname starts with dev and retweet count is less than 5
API = "http://localhost:8000/secondapi/?datestart=2018-02-28"
Returns tweets whose created date is greater than 28th Feb,2018
| Sort By | Parameter |
|---|---|
| name | sort=name |
| screenname | sort=scrname |
| text | sort=text |
API = "http://localhost:8000/secondapi/?rtcount=lt4&sort=name"
API = "http://localhost:8000/secondapi/?rtcount=lt4&sort=name&page=2"
By default page=1 is returned.
Returns a CSV response of the filtered tweets.Filters are same as in second API.
Example
API = "http://localhost:8000/firstCSVfile/?rtcount=lt4&sort=name"