Skip to content
This repository has been archived by the owner on Feb 28, 2023. It is now read-only.

Commit

Permalink
Closes #32 and use better user-agents per OS
Browse files Browse the repository at this point in the history
  • Loading branch information
Mincka committed Nov 10, 2017
1 parent a8fe118 commit 7843435
Show file tree
Hide file tree
Showing 3 changed files with 39 additions and 4 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ $ /Library/Frameworks/Python.framework/Versions/3.4/bin/dmarchiver
Not at all. Unlike other online backup services, everything happens here on your computer. Your username and your password are only sent once to Twitter using a secured connection. Your messages are downloaded from your connection, and are written on your computer at the end of the script execution, so are the images and the GIFs if you chose to download them.

### I received an e-mail from Twitter saying a suspicious connection occured on Twitter, should I be worried about it?
Not at all. The tool simulates a Firefox browser on Windows 10. Consequently, if you do not use usually this configuration, Twitter warns you about this. You can safely ignore this message if you received it at the same time the tool was used.
Not at all. The tool simulates a Chrome (Windows or Linux) or Safari (macOS) browser on your current operation system. Because the tool does not keep any cookie locally, Twitter will warn you each time you use it. You can safely ignore this message if you received it at the same time the tool was used.

### macOS says the application is blocked because it is not from an identified developer, what should I do?
I am not able to sign the macOS executable. You will have to unblock the application if you want to use it. Go the "Security & Privacy" settings and click on the "Open Anyway" button.
Expand Down
3 changes: 3 additions & 0 deletions dmarchiver/cmdline.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,9 @@ def main():
except KeyboardInterrupt:
print('Script execution interruption requested. Exiting.')
sys.exit()
except Exception as ex:
print(ex)
sys.exit(1)

if __name__ == "__main__":
main()
38 changes: 35 additions & 3 deletions dmarchiver/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
import os
import re
import shutil
from sys import platform
import time
import lxml.html
import requests
Expand Down Expand Up @@ -250,10 +251,16 @@ class Crawler(object):
"""

_twitter_base_url = 'https://twitter.com'
_user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36'
if platform == 'darwin':
_user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13) AppleWebKit/603.1.13 (KHTML, like Gecko) Version/10.1 Safari/603.1.13'
elif platform == 'linux' or platform == 'linux2':
_user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3184.0 Safari/537.36'

_http_headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0'}
'User-Agent': _user_agent}
_ajax_headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0',
'User-Agent': _user_agent,
'Accept': 'application/json, text/javascript, */*; q=0.01',
'X-Requested-With': 'XMLHttpRequest'}

Expand Down Expand Up @@ -316,6 +323,17 @@ def get_threads(self, delay, raw_output):

json = response.json()

if 'errors' in json:
print('An error occured during the parsing of the conversions.\n')
if json['errors'][0]['code'] == 326:
print('''DMArchiver was identified as suspicious and your account as been temporarily locked by Twitter.
Don\'t worry, you can unlock your account by following the intructions on the Twitter website.
Maybe it\'s the first time you use it or maybe you have a lot of messages.
You can unlock your account and try again, and possibly use the -d option to slow down the tool.\n''')
print('''Twitter error details below:
Code {0}: {1}\n'''.format(json['errors'][0]['code'], json['errors'][0]['message']))
raise Exception('Stopping execution due to parsing error while retrieving the conversations')

try:
if first_request is False:
first_request = True
Expand All @@ -341,7 +359,10 @@ def get_threads(self, delay, raw_output):

except KeyError as ex:
print(
'Unable to fully parse the list of the conversations. Maybe your account is locked or Twitter has updated the HTML code. Use -r to get the raw output and post an issue on GitHub. Exception: {0}'.format(str(ex)))
'Unable to fully parse the list of the conversations. \
Maybe your account is locked or Twitter has updated the HTML code. \
Use -r to get the raw output and post an issue on GitHub. \
Exception: {0}'.format(str(ex)))
break

time.sleep(delay)
Expand Down Expand Up @@ -657,6 +678,17 @@ def crawl(

json = response.json()

if 'errors' in json:
print('An error occured during the parsing of the tweets.\n')
if json['errors'][0]['code'] == 326:
print('''DMArchiver was identified as suspicious and your account as been temporarily locked by Twitter.
Don\'t worry, you can unlock your account by following the intructions on the Twitter website.
Maybe it\'s the first time you use it or maybe you have a lot of messages.
You can unlock your account and try again, and possibly use the -d option to slow down the tool.\n''')
print('''Twitter error details below:
Code {0}: {1}\n'''.format(json['errors'][0]['code'], json['errors'][0]['message']))
raise Exception('Stopping execution due to parsing error while retrieving the tweets.')

if 'max_entry_id' not in json:
print('Begin of thread reached')
break
Expand Down

0 comments on commit 7843435

Please sign in to comment.