Skip to content

Distribution Support for Scrapy & Gerapy using RabbitMQ

Notifications You must be signed in to change notification settings

Gerapy/GerapyRabbitMQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gerapy RabbitMQ

This is a package for supporting distribution in Scrapy using RabbitMQ, also this package is a module in Gerapy.

Installation

You can install with this command:

pip3 install gerapy-rabbitmq

Usage

Required configuration:

# Use RabbitMQ for queue
SCHEDULER = "gerapy_rabbitmq.scheduler.Scheduler"
SCHEDULER_QUEUE_KEY = '%(spider)s_requests'

# RabbitMQ Connection Parameters, see https://pika.readthedocs.io/en/stable/modules/parameters.html
RABBITMQ_CONNECTION_PARAMETERS = {
    'host': 'localhost'
}

# Use Redis for dupefilter
DUPEFILTER_CLASS = "gerapy_redis.dupefilter.RFPDupeFilter"
SCHEDULER_DUPEFILTER_KEY = '%(spider)s:dupefilter'

Optional configuration:

# RabbitMQ Queue Configuration
SCHEDULER_QUEUE_DURABLE = True
SCHEDULER_QUEUE_MAX_PRIORITY = 100
SCHEDULER_QUEUE_PRIORITY_OFFSET = 30
SCHEDULER_QUEUE_FORCE_FLUSH = True
SCHEDULER_PERSIST = False
SCHEDULER_IDLE_BEFORE_CLOSE = 0
SCHEDULER_FLUSH_ON_START = False
SCHEDULER_PRE_ENQUEUE_ALL_START_REQUESTS = True

More

For more detail, you can refer to example.

RabbitMQ Preview

About

Distribution Support for Scrapy & Gerapy using RabbitMQ

Resources

Stars

Watchers

Forks

Packages

No packages published