Skip to content
Muhammad Ali Hassan edited this page Apr 20, 2016 · 7 revisions

crawler4j

crawler4j is an open source web crawler for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-threaded web crawler in few minutes.

Installation

To use the latest release of crawler4j, please use the following snippet in your pom.xml

<dependency>
    <groupId>edu.uci.ics</groupId>   
    <artifactId>crawler4j</artifactId>    
    <version>4.2</version>    
</dependency>

#Without Maven crawler4j JARs are available on the release page and at Maven Central.

If you use crawler4j without Maven, be aware that crawler4j jar file has a couple of external dependencies. In release page, you can find a file named crawler4j-X.Y-with-dependencies.jar that includes crawler4j and all of its dependencies as a bundle. You can add download it and add it to your classpath to get all the dependencies covered.

Clone this wiki locally