Used to schedule recurring data imports into Apache Solr (from e.g. RDBMS, JSON files, ...).
Written for Solr 3.1 and migrated from GoogleCode.
I'll gladly accept a pull request that refactors it to work with latest version of Solr.
Solr is a very popular, blindingly fast open source enterprise search platform that originated inside Doug Cutting's Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.
I wrote DIH Scheduler for myself, as I needed to periodically index changes made in MS SQL Server.
The Solr app was deployed on Windows Server (I know, it wasn't up to me), so I didn't have the option of using a simple cron job.
In 2010 I published the original source in the Solr Wiki,
and soon after more and more people started asking for a compiled version so they can just drop in a JAR file and be done with it.
- working DIH configuration in place
Important! There's currently a bug in the jar file, so you'll have to build it yourself, from the provided source (until I get some free time)
- download the jar file here (it's in the release folder).
- place the jar file into the
web-inf/lib
folder, inside your WAR (before deployment), or intolib
folder inside (already deployed) Solr's root - copy the contents of dataimport.properties file (everything bellow
last_index_time
) in your existingdataimport.properties
. Make sure, regardless of whether you have single or multi-core Solr, that you usedataimport.properties
located in yoursolr.home/conf
(NOTsolr.home/core/conf
) - customize the synchronization schedule and other mandatory params inside
dataimport.properties
- add the following snippet into your WAR/EAR's
web.xml
:
<listener>
<listener-class>org.apache.solr.handler.dataimport.scheduler.ApplicationListener</listener-class>
</listener>
Restart the Solr web app to apply changes.
- Enables scheduling DIH delta or full imports
- It uses Solr's REST API to send POST request to DIH
- Successfully tested on Apache Tomcat v6 (should work on any other servlet container)
- Jira ticket
- Hasn't landed upstream (see the Jira ticket for the targeted release)
Feel free to ask a question or give a suggestion (file an issue).
- enable user to create multiple scheduled tasks (
List<DataImportScheduler>
) - add
cancel
functionality (to be able to completely disable DIHScheduler background thread, without stopping the app/server). Currently, sync can be disabled by settingsyncEnabled
param to anything other than"1"
indataimport.properties
, but the background thread still remains active and reloads the properties file on every run (so that sync can be hot-redeployed).
- became core-aware (now works regardless of whether single or multi-core Solr is deployed)
- parameterized the schedule interval (in minutes)
- use
SolrResourceLoader
to getsolr.home
(instead ofSystem
properties in v1.0) - forces reloading of the properties file if the response code is not
200
- use slf4j for logging
- initial release