class based harvester creation

Making a class based harvester

If you're creating a harvester for a service that uses OAI-PMH, you can create a harvester using scrapi classes that will harvest data and send normalized data through the SHARE pipeline.

Your harvester will live in the scrapi harvesters directory along with the other harvesters.

This class based harvester will make calls to the specified OAI PMH service using the ListRecords verb and the oai_dc namespace, with a date range of one day in the past.

You can find the base class definition for the OAI PMH class in the scrapi code, available here.

To create a class-based harvester, follow the detailed instructions on the SHARE OSF Wiki.

Fork the scrapi repo, and create your own harvester in a folder with the same name under the scrapi/harvesters directory.
Within your new harvester folder, create a file named __init__.py where you will create an instance of the harvester class.

Your __init__.py will have 3 main parts:
- The imports section at the top, where you'll import the base OAI harvester class
- The schema transformer, which defines each main element and where in the source API that item can be found.
- Your instance of the harvester class, with some key areas defined:
  - the name of your provider (as it will show up in the source field)
  - the base url where you will make your OAI requests. Should include everything before the ? in the request url
  - a list of "approved sets" - if your provider has a certain set of items with a particular "setSpec" entry that should make their way into the notification service, list the approved "setSpec" items here. Only those entries that are in the approved setSpec list will be normalized and set to the notification Service.
  - timeout - time in seconds to wait between subsequent requests to gather resources.
  - timezone_granularity - how much time detail to include in the OAI request. Setting timezone_granularity to True will add 'T00:00:00Z' to the date request.
Here's an example of what your __init__.py file might look like:
```
from __future__ import unicode_literals

from scrapi.base import OAIHarvester


calpoly = OAIHarvester(
    name='calpoly',
    base_url='http://digitalcommons.calpoly.edu/do/oai/',
    property_list=['type', 'source', 'publisher', 'format', 'date'],
    timezone_granularity = True,
    timeout = 5,
    approved_sets=[
        'csusymp2009',
        'acct_fac',
        'aerosp',
        'aero_fac',
    ]
)
```
Add your provider's favicon to the favicon folder
From the root directory, run invoke provider_map
Test your harvester locally by running invoke harvester harvester_name_here
Create a pull request to add your new harvester to the scrapi repo

SHARE at ARL • SHARE Knowledgebase • SHARE on Twitter

Technical Overview

Creating a Harvester

Running Harvesters with ScrAPI

Consuming Notifications - Feed Options

Issues & Using the Issue Tracker

Experimental Push API

Use Cases

SHARE is a project of the ARL, AAU, and APLU. Development of the SHARE Notification Service is being carried out in partnership with the Center for Open Science and is supported by generous funding from The Institute of Museum and Library Services (IMLS) and the Alfred P. Sloan Foundation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

class based harvester creation

Making a class based harvester

Clone this wiki locally