Skip to content

Commit

Permalink
Merge pull request #353 from extractus/7.2.18
Browse files Browse the repository at this point in the history
v7.2.18
  • Loading branch information
ndaidong authored Jul 5, 2023
2 parents 3e47e87 + a84705a commit 4c1a49c
Show file tree
Hide file tree
Showing 7 changed files with 53 additions and 29 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/ci-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ jobs:
node-version: ${{ matrix.node_version }}

- name: run npm scripts
env:
PROXY_SERVER: ${{ secrets.PROXY_SERVER }}
run: |
npm install
npm run lint
Expand Down
12 changes: 7 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,10 @@ await extract(url, {}, {

Passing requests to proxy is useful while running `@extractus/article-extractor` on browser. View [examples/browser-article-parser](examples/browser-article-parser) as reference example.

For more info about proxy authentication, please refer [HTTP authentication](https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication)

For a deeper customization, you can consider using [Proxy](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Proxy) to replace `fetch` behaviors with your own handlers.

Another way to work with proxy is use `agent` option instead of `proxy` as below:

```js
Expand All @@ -201,19 +205,17 @@ import { HttpsProxyAgent } from 'https-proxy-agent'

const proxy = 'http://abc:[email protected]:31113'

const url = 'https://www.cnbc.com/2022/09/21/what-another-major-rate-hike-by-the-federal-reserve-means-to-you.html'

const article = await extract(url, {}, {
agent: new HttpsProxyAgent(proxy),
})
console.log('Run article-extractor with proxy:', proxy)
console.log(art)
console.log(article)
```

For more info about [https-proxy-agent](https://www.npmjs.com/package/https-proxy-agent), check [its repo](https://github.com/TooTallNate/proxy-agents).

For more info about proxy authentication, please refer [HTTP authentication](https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication)

For a deeper customization, you can consider using [Proxy](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Proxy) to replace `fetch` behaviors with your own handlers.


### `extractFromHtml()`

Expand Down
35 changes: 18 additions & 17 deletions dist/article-extractor.esm.js

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions dist/cjs/article-extractor.js

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion dist/cjs/package.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"name": "@extractus/article-extractor",
"version": "7.2.17",
"version": "7.2.18",
"main": "./article-extractor.js"
}
9 changes: 5 additions & 4 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"version": "7.2.17",
"version": "7.2.18",
"name": "@extractus/article-extractor",
"description": "To extract main article from given URL",
"homepage": "https://github.com/extractus/article-extractor",
Expand Down Expand Up @@ -33,15 +33,16 @@
"dependencies": {
"@mozilla/readability": "^0.4.4",
"bellajs": "^11.1.2",
"cross-fetch": "^3.1.6",
"cross-fetch": "^4.0.0",
"linkedom": "^0.14.26",
"sanitize-html": "2.11.0"
},
"devDependencies": {
"@types/sanitize-html": "^2.9.0",
"esbuild": "^0.18.10",
"esbuild": "^0.18.11",
"eslint": "^8.44.0",
"jest": "^29.5.0",
"https-proxy-agent": "^7.0.0",
"jest": "^29.6.0",
"nock": "^13.3.1"
},
"keywords": [
Expand Down
18 changes: 18 additions & 0 deletions src/main.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@

import { readFileSync } from 'fs'

import { HttpsProxyAgent } from 'https-proxy-agent'

import nock from 'nock'

import {
Expand All @@ -13,6 +15,9 @@ import {
removeTransformations
} from './main'

const env = process.env || {}
const PROXY_SERVER = env.PROXY_SERVER || ''

const parseUrl = (url) => {
const re = new URL(url)
return {
Expand Down Expand Up @@ -142,3 +147,16 @@ describe('test extract with modified sanitize-html options', () => {
expect(result.content).toEqual(expect.stringContaining('code class="lang-js"'))
})
})

if (PROXY_SERVER !== '') {
describe('test extract live article API via proxy server', () => {
test('check if extract method works with proxy server', async () => {
const url = 'https://www.cnbc.com/2022/09/21/what-another-major-rate-hike-by-the-federal-reserve-means-to-you.html'
const result = await extract(url, {}, {
agent: new HttpsProxyAgent(PROXY_SERVER),
})
expect(result.title).toEqual(expect.stringContaining('Federal Reserve'))
expect(result.source).toEqual('cnbc.com')
}, 10000)
})
}

0 comments on commit 4c1a49c

Please sign in to comment.