Skip to content

shawnliujw/node-crawler-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

6046328 · Sep 23, 2014

History

19 Commits
Sep 23, 2014
Sep 23, 2014
Sep 21, 2014
Sep 22, 2014
Sep 21, 2014
Sep 21, 2014
Sep 22, 2014
Sep 22, 2014
Sep 23, 2014

Repository files navigation

casper-crawler

Web crawler base casperjs and phantomjs , with job queue. it can load ajax content and work as you are using browser.

##How to installs npm install casper-crawler

##How to use

###scrape

prepare script file you want to execute script.js with follow format `exports.details = function(casper,callback){

//casper is instance of casperjs
//dosomething();
var json = {};//the object will be returned

callback();
}`

var casperCrawler = require("casper-crawler"); var page = { "url":"http url", "script":"script.js"

}; casperCrawler.scrape(page,"details",0)// if set expiration=0, will drop cache in the DB .then(function(result){ console.log(result);

})

###clearCache

var urls = ["urls1","url2"];

//there are two params //1. depend on the name you are using in your script //2. urls need to be removed

casperCrawler.clearCache("details",urls);

##Test

About

Web crawler base casperjs and phantomjs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published