Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing (see Field Options), and saves the data in HTML (for easy viewing of the tweets using the iframe
tag), CSV, and JSON formats.
pip install waybacktweets
waybacktweets [OPTIONS] USERNAME
waybacktweets --from 20150101 --to 20191231 --limit 250 jack
Open the application, a prototype written in Python with the Streamlit framework and hosted on Streamlit Cloud.
from waybacktweets import WaybackTweets, TweetsParser, TweetsExporter
USERNAME = "jack"
api = WaybackTweets(USERNAME)
archived_tweets = api.get()
if archived_tweets:
field_options = [
"archived_timestamp",
"original_tweet_url",
"archived_tweet_url",
"archived_statuscode",
]
parser = TweetsParser(archived_tweets, USERNAME, field_options)
parsed_tweets = parser.parse()
exporter = TweetsExporter(parsed_tweets, USERNAME, field_options)
exporter.save_to_csv()
- Tristan Lee (Bellingcat's Data Scientist) for the idea of the application.
- Jessica Smith (Snowflake's Marketing Specialist) and Streamlit/Snowflake teams for the additional server resources on Streamlit Cloud.
- OSINT Community for recommending the application.
Note
If the Streamlit application is down, please check the Streamlit Cloud Status.