Skip to content

Latest commit

 

History

History
19 lines (14 loc) · 593 Bytes

README.md

File metadata and controls

19 lines (14 loc) · 593 Bytes

Breadth-First Search Web Crawler

The purpose of this program is to fetch a list all the pages and static resources of a given url. The pages which are crawled are all within the same domain, whereas we also list static assets that points to outside domains.

How to build

./gradlew build

Usage

java -jar build/libs/webcrawler-1.0-SNAPSHOT.jar https://elixir-lang.org/

Dependencies

We are using only jsoup.

Caveats

For all the errors we are just printing to sysout and moving along. Example of errors are links to pdfs, mailTo: on links, 404 page not found.