-
Notifications
You must be signed in to change notification settings - Fork 64
class based harvester creation
If you're creating a harvester for a service that uses OAI-PMH, you can create a harvester using scrapi classes that will harvest data and send normalized data through the SHARE pipeline.
Your harvester will live in the scrapi harvesters directory along with the other harvesters.
This class based harvester will make calls to the specified OAI PMH service using the ListRecords verb and the oai_dc namespace, with a date range of one day in the past.
You can find the base class definition for the OAI PMH class in the scrapi code, available here.
To create a class-based harvester, follow the detailed instructions on the SHARE OSF Wiki.
-
Fork the scrapi repo, and create your own harvester in a folder with the same name under the scrapi/harvesters directory.
-
Within your new harvester folder, create a file named
__init__.py
where you will create an instance of the harvester class.Your
__init__.py
will have 3 main parts:- The imports section at the top, where you'll import the base OAI harvester class
- The schema transformer, which defines each main element and where in the source API that item can be found.
- Your instance of the harvester class, with some key areas defined:
- the name of your provider (as it will show up in the source field)
- the base url where you will make your OAI requests. Should include everything before the ? in the request url
- a list of "approved sets" - if your provider has a certain set of items with a particular "setSpec" entry that should make their way into the notification service, list the approved "setSpec" items here. Only those entries that are in the approved setSpec list will be normalized and set to the notification Service.
- timeout - time in seconds to wait between subsequent requests to gather resources.
- timezone_granularity - how much time detail to include in the OAI request. Setting timezone_granularity to True will add 'T00:00:00Z' to the date request.
Here's an example of what your
__init__.py
file might look like:from __future__ import unicode_literals from scrapi.base import OAIHarvester calpoly = OAIHarvester( name='calpoly', base_url='http://digitalcommons.calpoly.edu/do/oai/', property_list=['type', 'source', 'publisher', 'format', 'date'], timezone_granularity = True, timeout = 5, approved_sets=[ 'csusymp2009', 'acct_fac', 'aerosp', 'aero_fac', ] )
-
Add your provider's favicon to the favicon folder
-
From the root directory, run
invoke provider_map
-
Test your harvester locally by running
invoke harvester harvester_name_here
-
Create a pull request to add your new harvester to the scrapi repo
Running Harvesters with ScrAPI
Consuming Notifications - Feed Options
Issues & Using the Issue Tracker
SHARE is a project of the ARL, AAU, and APLU. Development of the SHARE Notification Service is being carried out in partnership with the Center for Open Science and is supported by generous funding from The Institute of Museum and Library Services (IMLS) and the Alfred P. Sloan Foundation.