Skip to content

Commit d5abea2

Browse files
authored
Merge pull request #36 from claromes/resumption_key
Major Release: v1.0
2 parents fc9de72 + 05d320a commit d5abea2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

77 files changed

+4753
-1090
lines changed

.github/FUNDING.yml

Lines changed: 0 additions & 1 deletion
This file was deleted.

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
*.csv
22
*.json
33
*.html
4+
*.txt
5+
6+
test.py
47

58
waybacktweets/__pycache__
69
waybacktweets/api/__pycache__

CITATION.cff

Lines changed: 9 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -14,29 +14,25 @@ authors:
1414
identifiers:
1515
- type: doi
1616
value: 10.5281/zenodo.12528447
17-
description: The concept DOI of the work.
17+
description: Retrieves archived tweets from Wayback Machine in HTML, CSV, and JSON.
1818
- type: url
1919
value: "https://pypi.org/project/waybacktweets/"
2020
description: Python Package Index.
2121
- type: url
22-
value: "https://claromes.github.io/waybacktweets/"
22+
value: "https://waybacktweets.claromes.com/"
2323
description: Documentation.
2424
repository-code: "https://github.com/claromes/waybacktweets"
25-
url: "https://claromes.github.io/waybacktweets"
25+
url: "https://waybacktweets.claromes.com/"
2626
abstract: >-
27-
Retrieves archived tweets CDX data from the Wayback
28-
Machine, performs necessary parsing, and saves the data in
29-
HTML (for easy viewing of the tweets using the iframe
30-
tag), CSV, and JSON formats.
27+
Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing, and saves the data in HTML, for easy viewing of the tweets using the iframe tags, CSV, and JSON formats.
3128
keywords:
3229
- Twitter
33-
- Wayback Machine
30+
- X
3431
- Tweets
35-
- Python
32+
- Wayback Machine
3633
- OSINT
3734
- SOCMINT
38-
- X
35+
- Python
3936
license: GPL-3.0
40-
commit: 16f9997a8e2e2b87932ca061bf5731cd65d1d588
41-
version: 1.0a5
42-
date-released: "2024-06-24"
37+
version: 1.0
38+
date-released: "2025-05-26"

README.md

Lines changed: 74 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,61 @@
11
# Wayback Tweets
22

3-
[![PyPI](https://img.shields.io/pypi/v/waybacktweets)](https://pypi.org/project/waybacktweets) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.12528447.svg)](https://doi.org/10.5281/zenodo.12528447) [![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://waybacktweets.streamlit.app) [![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tnaM3rMWpoSHBZ4P_6iHFPjraWRQ3OGe?usp=sharing)
3+
[![PyPI](https://img.shields.io/pypi/v/waybacktweets)](https://pypi.org/project/waybacktweets) [![PyPI Downloads](https://static.pepy.tech/badge/waybacktweets)](https://pepy.tech/projects/waybacktweets)
44

5-
6-
Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing (see [Field Options](https://claromes.github.io/waybacktweets/field_options.html)), and saves the data in HTML, for easy viewing of the tweets using the iframe tags, CSV, and JSON formats.
5+
Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing (see [Field Options](https://waybacktweets.claromes.com/field_options)), and saves the data in HTML, for easy viewing of the tweets using the iframe tags, CSV, and JSON formats.
76

87
## Installation
98

9+
It is compatible with Python versions 3.10 and above. [See installation options](https://waybacktweets.claromes.com/installation).
10+
1011
```shell
11-
pip install waybacktweets
12+
pipx install waybacktweets
1213
```
1314

14-
## Quickstart
15-
16-
### Using Wayback Tweets as a standalone command line tool
17-
18-
waybacktweets [OPTIONS] USERNAME
15+
## CLI
1916

2017
```shell
21-
waybacktweets --from 20150101 --to 20191231 --limit 250 jack
18+
Usage:
19+
waybacktweets [OPTIONS] USERNAME
20+
USERNAME: The Twitter username without @
21+
22+
Options:
23+
-c, --collapse [urlkey|digest|timestamp:xx]
24+
Collapse results based on a field, or a
25+
substring of a field. XX in the timestamp
26+
value ranges from 1 to 14, comparing the
27+
first XX digits of the timestamp field. It
28+
is recommended to use from 4 onwards, to
29+
compare at least by years.
30+
-f, --from DATE Filtering by date range from this date.
31+
Format: YYYYmmdd
32+
-t, --to DATE Filtering by date range up to this date.
33+
Format: YYYYmmdd
34+
-l, --limit INTEGER Query result limits.
35+
-rk, --resumption_key TEXT Allows for a simple way to scroll through
36+
the results. Key to continue the query from
37+
the end of the previous query.
38+
-mt, --matchtype [exact|prefix|host|domain]
39+
Results matching a certain prefix, a certain
40+
host or all subdomains.
41+
-v, --verbose Shows the log.
42+
--version Show the version and exit.
43+
-h, --help Show this message and exit.
44+
45+
Examples:
46+
waybacktweets jack
47+
waybacktweets --from 20200305 --to 20231231 --limit 300 --verbose jack
48+
49+
Repository:
50+
https://github.com/claromes/waybacktweets
51+
52+
Documentation:
53+
https://waybacktweets.claromes.com
2254
```
2355

24-
### Using Wayback Tweets as a Web App
25-
26-
[Open the application](https://waybacktweets.streamlit.app), a prototype written in Python with the Streamlit framework and hosted on Streamlit Cloud.
56+
## Module
2757

28-
### Using Wayback Tweets as a Python Module
58+
[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tnaM3rMWpoSHBZ4P_6iHFPjraWRQ3OGe?usp=sharing)
2959

3060
```python
3161
from waybacktweets import WaybackTweets, TweetsParser, TweetsExporter
@@ -37,29 +67,51 @@ archived_tweets = api.get()
3767

3868
if archived_tweets:
3969
field_options = [
70+
"archived_urlkey",
4071
"archived_timestamp",
41-
"original_tweet_url",
72+
"parsed_archived_timestamp",
4273
"archived_tweet_url",
74+
"parsed_archived_tweet_url",
75+
"original_tweet_url",
76+
"parsed_tweet_url",
77+
"available_tweet_text",
78+
"available_tweet_is_RT",
79+
"available_tweet_info",
80+
"archived_mimetype",
4381
"archived_statuscode",
82+
"archived_digest",
83+
"archived_length",
84+
"resumption_key",
4485
]
4586

4687
parser = TweetsParser(archived_tweets, USERNAME, field_options)
4788
parsed_tweets = parser.parse()
4889

4990
exporter = TweetsExporter(parsed_tweets, USERNAME, field_options)
5091
exporter.save_to_csv()
92+
exporter.save_to_json()
93+
exporter.save_to_html()
5194
```
5295

96+
## Web App
97+
98+
[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://waybacktweets.streamlit.app)
99+
100+
A prototype written in Python with the Streamlit framework and hosted on Streamlit Cloud.
101+
102+
Important: Starting from version 1.0, the web app will no longer receive all updates from the official package. To access all features, prefer using the package from PyPI.
103+
53104
## Documentation
54105

55-
- [Wayback Tweets documentation](https://claromes.github.io/waybacktweets)
56-
- [Wayback CDX Server API (Beta) documentation](https://archive.org/developers/wayback-cdx-server.html)
106+
- [Wayback Tweets documentation](https://waybacktweets.claromes.com/).
107+
- [Wayback CDX Server API (Beta) documentation](https://archive.org/developers/wayback-cdx-server.html).
57108

58109
## Acknowledgements
59110

60-
- Tristan Lee (Bellingcat's Data Scientist) for the idea of the application.
61-
- Jessica Smith (Snowflake's Community Growth Specialist) and Streamlit/Snowflake team for the additional server resources on Streamlit Cloud.
62-
- OSINT Community for recommending the application.
111+
- Tristan Lee (Bellingcat's Data Scientist) for the idea.
112+
- Jessica Smith (Snowflake's Community Growth Specialist) and Streamlit team for the additional server resources on Streamlit Cloud.
113+
- OSINT Community for recommending the package and the application.
114+
115+
## License
63116

64-
> [!NOTE]
65-
> If the Streamlit application is down, please check the [Streamlit Cloud Status](https://www.streamlitstatus.com/).
117+
[GPL-3.0](LICENSE.md)

0 commit comments

Comments
 (0)