Skip to content

A Clojure program that can scrape emails and links from any web page!

License

Notifications You must be signed in to change notification settings

ridiculouswaffle/mail-harvester

Repository files navigation

Mail Harvester

Note

This project is still under active development. While the latest release is ready to use, it may not be feature complete for some.

A Clojure app that scrapes emails and links from any website. Mail Harvester is:

  • Free and Open Source
  • Cross-platform
  • And requires no programming knowledge to use

Browsers supported:

  • Chrome
  • Firefox
  • Safari

How to use?

Setup for Safari

Note

If you are using Chrome or Firefox, skip this section.

If you want to use Safari, you should enable a feature to use this application (Remote Automation)

For Sonoma:

  • Open Safari > Preferences from the menu bar (or use the shortcut Command + ,)
  • Go to the Advanced section
  • Check the "Show features for web developers" checkbox
  • Go to the Developer section
  • Check the "Allow Remote Automation" checkbox

For Ventura and below:

  • Open Safari > Preferences from the menu bar (or use the shortcut Command + ,)
  • Go to the Advanced Section
  • Check the "Show Develop menu in menu bar"
  • Click Safari > Develop > Allow Remote Automation from the menu bar

Prerequisites & Installation

Before using this app, you need to install Java from here.

After you have installed Java, download the latest release from the Releases section in the right side of this page

After you have download the archive, unzip it and double click the .jar to use!

Usage

When you open the .jar file, you will see a window like this:

An image of the application

The URL section is where you put the website you want to scrape emails or links from

The Browser dropdown is where you choose the browser to use. You should use the one installed on your system.

After you fill a link, you can scrape links or emails by clicking the buttons at the bottom.

Filters

You can add filters to exclude certain emails/links from export.

A filter is typically a text file containing content like this:

https://google.com
[email protected]

Every excluded link/email should be in a new line for it to work as intended.

Planned features

  • Go through a list of links to scrape more links/emails
  • Filters
  • Notifying the user when scraping is done
  • Dark mode

For Developers

Setup for development

While running locally, this project expects the drivers to be in a folder named drivers. For users convenience, these drivers are packaged in the Releases, but not in the repository.

If you run it with the clj tool, they should be in the root of the repository. If you run it after compiling it in a jar, it needs to be in the target directory (or wherever the JAR is)

The browsers this application supports are:

  • Chrome
  • Firefox
  • Safari

You can download the drivers for them at:

How to run

To run from the clj tool, use clj -M -m mail-harvester.core To compile a JAR, use clj -T:build uber

License

This project is licensed under the MIT License.