WebDL is a project aiming to make periodical collection and consumption of online media easy. It is not meant to be a replacement for technologies such as RSS, but rather an end-to-end framework using whatever is appropriate to extract and deliver what matters to the user.
There are three Python scripts in the project:
redditDLdownloads linked media or content ofselfposts from a specific subreddit usingSeleniumandurllib. Content is sorted by vote rank in a user-specified time period.imgManipscan detect and delete duplicate images in a directory using downscaled difference hashing.wallUpdatecreates or updates a wallpaper folder using multiple web sources (currently subreddits). It usesredditDLfunctionality.
redditDLis a good working prototype to quickly collect data for consumption or archival storage, albeit from a single website. I would like to generalize its approach so that more sources (and website structures) are easily supported.- Like
wallUpdate, there are other tasks related to structured data collection that I'd like to automate. Other singleton scripts or a generalizedTaskapproach are possible avenues to follow. - Text media should also have smart* text-to-speech functionality to further increase accessibility while screen access is limited. *Smart as in: Read only what matters without me having to tell you.
Fully automated luxury space organized ranked media consumption, baby!
In boring terms, I want to reduce search and discovery costs for appropriate periodical tasks. These include aggregating ranking news and research articles (by date or topic), collecting wallpapers, figuring out what matters in a given article (a search cost within the object), etc.
- General:
- Python 3.8
- Pip
- redditDL:
- Selenium - Needed for the headless browser webscraping. Run
pip install seleniumin the redditDL.py directory or install it globally using the-Uflag. - Some webdriver, e.g. Firefox webdriver - Place in virtual environment directory
- Selenium - Needed for the headless browser webscraping. Run
- wallUpdate:
- shutil - install with
pip - subprocess - install with
pip
- shutil - install with
- imgManips
- opencv - cv2 install with
pipwith its dependencies
- opencv - cv2 install with
-
redditDL:
python_path redditDL.py subreddit_name max_media_download_count sort_period OPTIONAL_flags- flags:
-ffor flat folder structure when using multiple download sources
- flags:
-
wallUpdate:
python_path wallUpdate.py abs_or_rel_wallpaper_folder_path count_per_sr time_period subreddit_list -
imgManips:
python_path imgManips.py operation relative_directory- operations:
cleanupfor deleting duplicate images
- operations:
-
Automation: For scheduled usage like periodically updating a wallpaper folder with online images, you can automate the scripts by with the same commands using
cron(UNIX) or theTask Scheduler(Windows).
