Skip to content
This repository has been archived by the owner on Jan 13, 2023. It is now read-only.

Commit

Permalink
Change puppeteer to puppet-scraper (#10)
Browse files Browse the repository at this point in the history
  • Loading branch information
Griko Nibras authored May 8, 2020
1 parent fea1a77 commit 356e339
Show file tree
Hide file tree
Showing 12 changed files with 68 additions and 68 deletions.
2 changes: 1 addition & 1 deletion .all-contributorsrc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"projectName": "scrappeteer",
"projectName": "puppet-scraper",
"projectOwner": "grikomsn",
"repoType": "github",
"repoHost": "https://github.com",
Expand Down
46 changes: 23 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@

<div align="center">

[![scrappeteer](./header.png)](.)
[![puppet-scraper](./header.png)](.)

![github release](https://badgen.net/github/release/grikomsn/scrappeteer?icon=github)
![npm version](https://badgen.net/npm/v/scrappeteer?icon=npm)
![github release](https://badgen.net/github/release/grikomsn/puppet-scraper?icon=github)
![npm version](https://badgen.net/npm/v/puppet-scraper?icon=npm)

</div>

Expand All @@ -25,13 +25,13 @@

---

**Scrappeteer is a opinionated wrapper library for utilizing [Puppeteer](https://github.com/puppeteer/puppeteer) to scrape pages easily, bootstrapped using [Jared Palmer's tsdx](https://github.com/jaredpalmer/tsdx).**
**PuppetScraper is a opinionated wrapper library for utilizing [Puppeteer](https://github.com/puppeteer/puppeteer) to scrape pages easily, bootstrapped using [Jared Palmer's tsdx](https://github.com/jaredpalmer/tsdx).**

Most people create a new scraping project by `require`-ing Puppeteer and create their own logic to scrape pages, and that logic will get more complicated when trying to use multiple pages.

Scrappeteer allows you to just pass the URLs to scrape, the function to evaluate (the scraping logic), and how many pages (or tabs) to open at a time. Basically, Scrappeteer abstracts the need to create multiple page instances and retrying the evaluation logic.
PuppetScraper allows you to just pass the URLs to scrape, the function to evaluate (the scraping logic), and how many pages (or tabs) to open at a time. Basically, PuppetScraper abstracts the need to create multiple page instances and retrying the evaluation logic.

**Version 0.1.0 note**: Scrappeteer was initially made as a project template rather than a wrapper library, but the core logic is still the same with various improvements and without extra libraries, so you can include Scrappeteer in your project easily using `npm` or `yarn`.
**Version 0.1.0 note**: PuppetScraper was initially made as a project template rather than a wrapper library, but the core logic is still the same with various improvements and without extra libraries, so you can include PuppetScraper in your project easily using `npm` or `yarn`.

## Brief example

Expand All @@ -40,11 +40,11 @@ Here's a [basic example](./examples/hn.js) on scraping the entries on [first pag
```js
// examples/hn.js

const { Scrappeteer } = require('scrappeteer');
const { PuppetScraper } = require('puppet-scraper');

const sc = await Scrappeteer.launch();
const ps = await PuppetScraper.launch();

const data = await sc.scrapeFromUrl({
const data = await ps.scrapeFromUrl({
url: 'https://news.ycombinator.com',
evaluateFn: () => {
let items = [];
Expand All @@ -62,7 +62,7 @@ const data = await sc.scrapeFromUrl({

console.log({ data });

await sc.close();
await ps.close();
```

View more examples on the [`examples` directory](./examples).
Expand All @@ -71,12 +71,12 @@ View more examples on the [`examples` directory](./examples).

### Installing dependency

Install `scrappeteer` via `npm` or `yarn`:
Install `puppet-scraper` via `npm` or `yarn`:

```console
$ npm install scrappeteer
$ npm install puppet-scraper
--- or ---
$ yarn add scrappeteer
$ yarn add puppet-scraper
```

Install peer dependency `puppeteer` or Puppeteer equivalent ([`chrome-aws-lambda`](https://github.com/alixaxel/chrome-aws-lambda), untested):
Expand All @@ -89,33 +89,33 @@ $ yarn add puppeteer

### Instantiation

Create the Scrappeteer instance, either launching a new browser instance, connect or use an existing browser instance:
Create the PuppetScraper instance, either launching a new browser instance, connect or use an existing browser instance:

```js
const { Scrappeteer } = require('scrappeteer');
const { PuppetScraper } = require('puppet-scraper');
const Puppeteer = require('puppeteer');

// launches a new browser instance
const instance = await Scrappeteer.launch();
const instance = await PuppetScraper.launch();

// connect to an existing browser instance
const external = await Scrappeteer.connect({
const external = await PuppetScraper.connect({
browserWSEndpoint: 'ws://127.0.0.1:9222/devtools/browser/...',
});

// use an existing browser instance
const browser = await Puppeteer.launch();
const existing = await Scrappeteer.use({ browser });
const existing = await PuppetScraper.use({ browser });
```

### Customize options

`launch` and `connect` has the same props with `Puppeteer.launch` and `Puppeteer.connect`, but with an extra `concurrentPages` and `maxEvaluationRetries` property:

```js
const { Scrappeteer } = require('scrappeteer');
const { PuppetScraper } = require('puppet-scraper');

const instance = await Scrappeteer.launch({
const instance = await PuppetScraper.launch({
concurrentPages: 3,
maxEvaluationRetries: 10
headless: false,
Expand Down Expand Up @@ -174,7 +174,7 @@ const urls = Array.from({ length: 5 }).map(
(_, i) => `https://news.ycombinator.com/news?p=${i + 1}`,
);

const data = await sc.scrapeFromUrls({
const data = await ps.scrapeFromUrls({
urls,
evaluateFn: () => {
let items = [];
Expand All @@ -201,7 +201,7 @@ await instance.close();

### Access the browser instance

Scrappeteer also exposes the browser instance if you want to do things manually:
PuppetScraper also exposes the browser instance if you want to do things manually:

```js
const browser = instance.___internal.browser;
Expand All @@ -216,7 +216,7 @@ Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/d
<!-- markdownlint-disable -->
<table>
<tr>
<td align="center"><a href="https://griko.id"><img src="https://avatars1.githubusercontent.com/u/8220954?v=4" width="100px;" alt=""/><br /><sub><b>Griko Nibras</b></sub></a><br /><a href="https://github.com/grikomsn/scrappeteer/commits?author=grikomsn" title="Code">💻</a> <a href="#maintenance-grikomsn" title="Maintenance">🚧</a></td>
<td align="center"><a href="https://griko.id"><img src="https://avatars1.githubusercontent.com/u/8220954?v=4" width="100px;" alt=""/><br /><sub><b>Griko Nibras</b></sub></a><br /><a href="https://github.com/grikomsn/puppet-scraper/commits?author=grikomsn" title="Code">💻</a> <a href="#maintenance-grikomsn" title="Maintenance">🚧</a></td>
</tr>
</table>

Expand Down
8 changes: 4 additions & 4 deletions examples/hn-custom-browser.js
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
const { Scrappeteer } = require('..');
const { PuppetScraper } = require('..');

async function hnCustomBrowser() {
const sc = await Scrappeteer.launch({
const ps = await PuppetScraper.launch({
executablePath:
'C:\\Program Files (x86)\\Microsoft\\Edge Dev\\Application\\msedge.exe',
headless: false,
});

const data = await sc.scrapeFromUrl({
const data = await ps.scrapeFromUrl({
url: 'https://news.ycombinator.com',
evaluateFn: () => {
let items = [];
Expand All @@ -25,7 +25,7 @@ async function hnCustomBrowser() {

console.log({ data });

await sc.close();
await ps.close();
}

hnCustomBrowser();
8 changes: 4 additions & 4 deletions examples/hn-multiple.js
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
const { Scrappeteer } = require('..');
const { PuppetScraper } = require('..');

async function hnMultiple() {
const sc = await Scrappeteer.launch({
const ps = await PuppetScraper.launch({
concurrentPages: 5,
});

const urls = Array.from({ length: 5 }).map(
(_, i) => `https://news.ycombinator.com/news?p=${i + 1}`,
);

const data = await sc.scrapeFromUrls({
const data = await ps.scrapeFromUrls({
urls,
evaluateFn: () => {
let items = [];
Expand All @@ -27,7 +27,7 @@ async function hnMultiple() {

console.log({ data });

await sc.close();
await ps.close();
}

hnMultiple();
8 changes: 4 additions & 4 deletions examples/hn.js
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
const { Scrappeteer } = require('..');
const { PuppetScraper } = require('..');

async function hn() {
const sc = await Scrappeteer.launch();
const ps = await PuppetScraper.launch();

const data = await sc.scrapeFromUrl({
const data = await ps.scrapeFromUrl({
url: 'https://news.ycombinator.com',
evaluateFn: () => {
let items = [];
Expand All @@ -21,7 +21,7 @@ async function hn() {

console.log({ data });

await sc.close();
await ps.close();
}

hn();
Binary file modified header.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
{
"name": "scrappeteer",
"name": "puppet-scraper",
"description": "Scraping using Puppeteer the sane way 🤹🏻‍♂️",
"version": "0.2.0",
"repository": "https://github.com/grikomsn/scrappeteer.git",
"repository": "https://github.com/grikomsn/puppet-scraper.git",
"author": "Griko Nibras <[email protected]>",
"files": [
"dist",
"src"
],
"main": "dist/index.js",
"module": "dist/scrappeteer.esm.js",
"module": "dist/puppet-scraper.esm.js",
"typings": "dist/index.d.ts",
"scripts": {
"prepare": "tsdx build",
Expand Down
2 changes: 1 addition & 1 deletion src/index.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
export * from './defaults';
export * from './types';

export { Scrappeteer } from './scrappeteer';
export { PuppetScraper } from './puppet-scraper';
18 changes: 9 additions & 9 deletions src/scrappeteer.ts → src/puppet-scraper.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,15 @@ import {
DEFAULT_PAGE_OPTIONS,
} from './defaults';
import {
ScBootstrap,
ScConnect,
ScLaunch,
PSBootstrap,
PSConnect,
PSLaunch,
PSUse,
ScrapeFromUrl,
ScrapeFromUrls,
ScUse,
} from './types';

const bootstrap: ScBootstrap = async ({
const bootstrap: PSBootstrap = async ({
browser,
concurrentPages = DEFAULT_CONCURRENT_PAGES,
maxEvaluationRetries = DEFAULT_EVALUATION_RETRIES,
Expand Down Expand Up @@ -117,7 +117,7 @@ const bootstrap: ScBootstrap = async ({
};
};

const connect: ScConnect = async ({
const connect: PSConnect = async ({
concurrentPages,
maxEvaluationRetries,
...opts
Expand All @@ -126,7 +126,7 @@ const connect: ScConnect = async ({
return bootstrap({ browser, concurrentPages, maxEvaluationRetries });
};

const launch: ScLaunch = async ({
const launch: PSLaunch = async ({
concurrentPages,
maxEvaluationRetries,
...opts
Expand All @@ -135,11 +135,11 @@ const launch: ScLaunch = async ({
return bootstrap({ browser, concurrentPages, maxEvaluationRetries });
};

const use: ScUse = (opts) => {
const use: PSUse = (opts) => {
if (!opts.browser) {
throw new Error('browser is not defined');
}
return bootstrap(opts);
};

export const Scrappeteer = { connect, launch, use };
export const PuppetScraper = { connect, launch, use };
20 changes: 10 additions & 10 deletions src/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,9 @@ export type ScrapeFromUrls = <T>(

// #endregion

// #region Scrappeteer
// #region PuppetScraper

export interface ScInstance {
export interface PSInstance {
scrapeFromUrl: ScrapeFromUrl;
scrapeFromUrls: ScrapeFromUrls;
close: () => Promise<void>;
Expand All @@ -45,24 +45,24 @@ export interface ScInstance {
};
}

export type ScBootstrapProps = {
export type PSBootstrapProps = {
browser?: Browser;
concurrentPages?: number;
maxEvaluationRetries?: number;
};

export type ScBootstrap = (props?: ScBootstrapProps) => Promise<ScInstance>;
export type PSBootstrap = (props?: PSBootstrapProps) => Promise<PSInstance>;

export type ScLaunchOptions = ScBootstrapProps & LaunchOptions;
export type PSLaunchOptions = PSBootstrapProps & LaunchOptions;

export type ScLaunch = (opts?: ScLaunchOptions) => ReturnType<ScBootstrap>;
export type PSLaunch = (opts?: PSLaunchOptions) => ReturnType<PSBootstrap>;

export type ScConnectOptions = ScBootstrapProps & ConnectOptions;
export type PSConnectOptions = PSBootstrapProps & ConnectOptions;

export type ScConnect = (opts: ScConnectOptions) => ReturnType<ScBootstrap>;
export type PSConnect = (opts: PSConnectOptions) => ReturnType<PSBootstrap>;

export type ScUseOptions = ScBootstrapProps & { browser: Browser };
export type PSUseOptions = PSBootstrapProps & { browser: Browser };

export type ScUse = (opts: ScUseOptions) => ReturnType<ScBootstrap>;
export type PSUse = (opts: PSUseOptions) => ReturnType<PSBootstrap>;

// #endregion
14 changes: 7 additions & 7 deletions test/scrappeteer.test.ts → test/puppet-scraper.test.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { ScInstance, Scrappeteer } from '..';
import { PSInstance, PuppetScraper } from '..';
import { sleep } from './utils';
import { launchOptions, timeout } from './variables';

Expand All @@ -8,10 +8,10 @@ describe('launching instance', () => {
it(
'should launch and close without errors',
() => {
let instance: ScInstance;
let instance: PSInstance;

const instantiateAndClose = async () => {
instance = await Scrappeteer.launch(launchOptions);
instance = await PuppetScraper.launch(launchOptions);
await instance.close();
};

Expand All @@ -22,10 +22,10 @@ describe('launching instance', () => {
});

describe('scrape hacker news from single url', () => {
let instance: ScInstance;
let instance: PSInstance;

beforeAll(() => {
const creating = Scrappeteer.launch(launchOptions);
const creating = PuppetScraper.launch(launchOptions);
return creating.then((created) => (instance = created));
}, timeout);

Expand Down Expand Up @@ -84,10 +84,10 @@ describe('scrape hacker news from single url', () => {
});

describe('scrape hacker news from multiple urls', () => {
let instance: ScInstance;
let instance: PSInstance;

beforeAll(async () => {
const creating = Scrappeteer.launch(launchOptions);
const creating = PuppetScraper.launch(launchOptions);
return creating.then((created) => (instance = created));
}, timeout);

Expand Down
4 changes: 2 additions & 2 deletions test/variables.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { ScLaunchOptions } from '..';
import { PSLaunchOptions } from '..';

export const launchOptions: ScLaunchOptions = {
export const launchOptions: PSLaunchOptions = {
...(process.env.CI
? { executablePath: 'google-chrome-stable', args: ['--no-sandbox'] }
: {}),
Expand Down

0 comments on commit 356e339

Please sign in to comment.