A rake task that indexes CRAN packages with a basic web interface and cron scheduler for running the task periodically.
CRAN is a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R. The use these CRAN Servers to store the R packages.
A CRAN server is just a simple Apache Dir with a bunch of tar.gz
files.
Every CRAN server contains a plain file listing all packages in that server.
A package can be accessed via
http://cran.rproject.org/src/contrib/[PACKAGE_NAME]_[PACKAGE_VERSION].tar.gz
A package contains a description file, that contains info about the package.
- Extract useful info from the
PACKAGES
file of the CRAN server. - Extract certain info from the
DESCRIPTION
file of each package. - On every run of the
task
, for the first 50 packages, store info about each package, author(s) and maintainers(s). - Prevents duplicate indexes of a version of same package.
- I used Whenever for scheduling the task. If you use Heroku, Heroku scheduler should do just fine.
rails packages:index
bundle exec rspec
- Pagination for listing packages.
- I currently use DCF for converting the content of the PACKAGES & DESCRIPTION file to hashes. Maybe I could use a faster library but this does the job as well :)
- More, more and more