Data Fields fetched by the Twitter Scraper #14

mlkorra · 2021-12-01T06:37:32Z

mlkorra
Dec 1, 2021

The following are the tweets fetched by the twitter scraper (twint)

{"id":,
"conversation_id",
"created_at":,
"date":,
"time":,
"timezone":,
"user_id":,
"username":,
"name":,
"place":,
"tweet":,
"language":,
"mentions":,
"urls":,
"photos":,
"replies_count":,
"retweets_count":,
"likes_count":,
"hashtags":,
"cashtags":,
"link": "",
"retweet":,
"quote_url":,
"video": ,
"thumbnail":,
"near":,
"geo":,
"source":,
"user_rt_id":,
"user_rt":,
"retweet_id":,
"reply_to":,
"retweet_date":,
"translate":,
"trans_src":,
"trans_dest":}

Note :

Fields in bold are currently stored in the database

Additional Fields in the database :

"timestamp_of_scraping",
"type" -> whether the tweet is fetched using keyword,hashtag or userhandle,
"search" -> search term used to scrape,
"content_type" -> whether the tweet contains text,image,gif,video,
"s3_url" -> url of the media uploaded to s3 bucket,

Please,go through the data fields and mention/discuss if any of the fields is helpful for the task

rn-v · 2021-12-02T11:05:24Z

rn-v
Dec 2, 2021
Collaborator

@mlkorra Do you have the field descriptions as well? Couldn't find it in the Twint docs.

3 replies

mlkorra Dec 9, 2021
Author

@rn-v sorry for the late response.
One of the sample data fields for #RheaChakraborty hastag.
Let me know if you still need any clarification

{"id": 1265478926396280832, "conversation_id": "1265478926396280832", "created_at": "2020-05-27 08:34:22 IST", "date": "2020-05-27", "time": "08:34:22", "timezone": "+0530", "user_id": 4274342847, "username": "mkishan1105", "name": "Mishra Kishan (MK)", "place": "", "tweet": "My recent artwork of @Tweet2Rhea Please check out my Instagram page @mkhiddenart for more progress shot of this artwork. https://t.co/mZDHB8RkOi #mkhiddenart #RheaChakraborty #art #sketch #bollywoodactress https://t.co/OxUjB6m7sS", "language": "en", "mentions": [{"screen_name": "tweet2rhea", "name": "rhea chakraborty", "id": "50544680"}, {"screen_name": "mkhiddenart", "name": "mkhiddenart", "id": "1385164911576190978"}], "urls": ["https://instagram.com/mkhiddenart?igshid=1pi3awiihzmnk"], "photos": ["https://pbs.twimg.com/media/EY_iXXRXQAIeE6J.jpg"], "replies_count": 0, "retweets_count": 0, "likes_count": 0, "hashtags": ["mkhiddenart", "rheachakraborty", "art", "sketch", "bollywoodactress"], "cashtags": [], "link": "https://twitter.com/mkishan1105/status/1265478926396280832", "retweet": false, "quote_url": "", "video": 1, "thumbnail": "https://pbs.twimg.com/media/EY_iXXRXQAIeE6J.jpg", "near": "", "geo": "", "source": "", "user_rt_id": "", "user_rt": "", "retweet_id": "", "reply_to": [], "retweet_date": "", "translate": "", "trans_src": "", "trans_dest": ""}

rn-v Dec 9, 2021
Collaborator

I see, so mentions is just handles tagged in the tweet. Have no idea what cashtags are supposed to be.

dennyabrain Dec 9, 2021
Maintainer

@rn-v given the rise of crypto talks on twitter, twitter has added cashtags to group conversations around them. so if you mention a currency prefixed by '$' sign, its called a cashtag. for instance, $ETH for etherium.

rn-v · 2021-12-02T11:29:23Z

rn-v
Dec 2, 2021
Collaborator

Some fields (like mentions, urls etc) I wanted to clarify the output for.
Otherwise, I can see benefits of scraping "username", "replies_count", "retweets_count", "likes_count".

1 reply

rn-v Dec 9, 2021
Collaborator

Also, "hashtags", "urls"
Pardon if these already are, the bold rendering on dark mode for Github isn't perfect for me.

tarunima · 2021-12-10T10:57:07Z

tarunima
Dec 10, 2021
Maintainer

based on discussion with CIS folks, we've decided to scrape all data fields other than cashtag for descriptive research.

0 replies

Uh oh!

Data Fields fetched by the Twitter Scraper #14

Uh oh!

mlkorra Dec 1, 2021

The following are the tweets fetched by the twitter scraper (twint)

Note :

Additional Fields in the database :

Replies: 3 comments · 4 replies

Uh oh!

Uh oh!

rn-v Dec 2, 2021 Collaborator

Uh oh!

Uh oh!

mlkorra Dec 9, 2021 Author

Uh oh!

rn-v Dec 9, 2021 Collaborator

Uh oh!

dennyabrain Dec 9, 2021 Maintainer

Uh oh!

Uh oh!

rn-v Dec 2, 2021 Collaborator

Uh oh!

rn-v Dec 9, 2021 Collaborator

Uh oh!

tarunima Dec 10, 2021 Maintainer

mlkorra
Dec 1, 2021

Replies: 3 comments 4 replies

rn-v
Dec 2, 2021
Collaborator

mlkorra Dec 9, 2021
Author

rn-v Dec 9, 2021
Collaborator

dennyabrain Dec 9, 2021
Maintainer

rn-v
Dec 2, 2021
Collaborator

rn-v Dec 9, 2021
Collaborator

tarunima
Dec 10, 2021
Maintainer