Skip to content

Crawl your own website with various clients for SEO and indexing purposes.

License

Notifications You must be signed in to change notification settings

mediamonks/crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status Scrutinizer Code Quality Code Coverage Total Downloads Latest Stable Version Latest Unstable Version SensioLabs Insight License

MediaMonks Crawler

This tool allows you to easily crawl a website and get a DOM object for every url that was found. We use this to crawl our own site pages regardless if it was generated with server and/or client side content by using the Prerender.io client. The resulting data can be used for creating a full site search and/or improving SEO for single-page applications.

Highlights

  • Ships with Prerender & Prerender.io clients, uses Goutte by default
  • Supports any Symfony BrowserKit client
  • Supports both whitelisting and blacklisting of urls
  • Supports url normalization which allow you to prevent duplicates based on minor url differences
  • Implements the PSR-3 Logger Interface

Documentation

Documentation and examples can be found in the /doc folder.

System Requirements

You need:

  • PHP >= 5.5.0

To use the library.

Install

Install this package by using Composer.

$ composer require mediamonks/crawler

Security

If you discover any security related issues, please email [email protected] instead of using the issue tracker.

License

The MIT License (MIT). Please see License File for more information.