Skip to content

a crawler to find URL which needs to be examined

License

Notifications You must be signed in to change notification settings

viegelinsch/crawler

 
 

Repository files navigation

README v1.0 / 2015-08-17

Crawler

Introduction

We needed a crawler to find files on our website, which are not of the "usual pictures, slides and office" file types. Everything that is not "usual" and might create "problems", should be found and put on a list for review. This is exactly what this crawler does.

Special thanks to Johannes Lorenz for allowing to reuse his code.

Usage

crawler$ groovy src/de/fau/rrze/pp/crawler/Crawler.groovy

Contributing

Issue a pull request. It will be evaluated and in all likelihood merged.

Help

Currently there is no help beside of knowledge and understanding ... ☹

Installation

Requirements

Clone this repository

git clone https://github.com/RRZE-PP/crawler.git

Configuration

Change the content of the list seedUrls.add("") in src/de/fau/rrze/pp/crawler/Crawler.groovy (starting at line 43).

Pay attention to use proper URLs!

Credits

Contact

License

This project is licensed under GNU GPL V 3. See LICENSE for details.

About

a crawler to find URL which needs to be examined

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Groovy 100.0%