Skip to content

Commit

Permalink
Merge pull request #275 from ndaidong/6.0.5
Browse files Browse the repository at this point in the history
6.0.5
  • Loading branch information
ndaidong authored Jul 4, 2022
2 parents fb32038 + 5074746 commit 0a70987
Show file tree
Hide file tree
Showing 10 changed files with 28 additions and 16 deletions.
13 changes: 9 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ extract('https://bad-website.domain/page/article')
addQueryRules([
{
patterns: [
{ hostname: 'bad-website.domain' }
'*://bad-website.domain/*'
],
selector: '#noop_article_locates_here',
unwanted: [
Expand All @@ -137,11 +137,16 @@ addQueryRules([
extract('https://bad-website.domain/page/article')
````

Regarding pattern structure, please refer [URL Pattern API](https://developer.mozilla.org/en-US/docs/Web/API/URL_Pattern_API).
#### Query Rule

A query rule is an object with the following properties:

While adding rules, you can specify a `transform()` function to fine-tune article content more thoroughly.
- `patterns`: required, list of [URLPattern](https://developer.mozilla.org/en-US/docs/Web/API/URLPattern) objects. See [the syntax for patterns](https://developer.mozilla.org/en-US/docs/Web/API/URL_Pattern_API).
- `selector`: optional, where to find the HTMLElement which contains main article content
- `unwanted`: optional, list of selectors to filter unwanted HTML elements from the last result
- `transform`, optional, function to fine-tune article content more thoroughly

Example rule with transformation:
Here is an example using rule with transformation:

```js
import { addQueryRules } from 'article-parser'
Expand Down
4 changes: 2 additions & 2 deletions dist/article-parser.browser.js

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions dist/article-parser.browser.js.map

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions dist/cjs/article-parser.js

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions dist/cjs/article-parser.js.map

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion dist/cjs/package.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"name": "article-parser-cjs",
"version": "6.0.4",
"version": "6.0.5",
"main": "./article-parser.js"
}
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"version": "6.0.4",
"version": "6.0.5",
"name": "article-parser",
"description": "To extract main article from given URL",
"homepage": "https://ndaidong.github.io/article-parser-demo/",
Expand Down
2 changes: 1 addition & 1 deletion src/rules.js
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ export const rules = [
},
{
patterns: [
'*://zingnews.vn/*'
{ hostname: 'zingnews.vn' }
],
unwanted: [
'.the-article-category',
Expand Down
2 changes: 1 addition & 1 deletion src/utils/findRulesByUrl.js
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ export default (urls = []) => {
const rules = getQueryRules()
for (const rule of rules) {
const { patterns } = rule
const matched = urls.some((url) => patterns.some((pattern) => new URLPattern(pattern).exec(url)))
const matched = urls.some((url) => patterns.some((pattern) => new URLPattern(pattern).test(url)))
if (matched) {
return rule
}
Expand Down
7 changes: 7 additions & 0 deletions src/utils/findRulesByUrl.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,13 @@ describe('test findRulesByUrl()', () => {
urls: [{}, ''],
expectation: {}
},
{
urls: ['https://zingnews.vn/article-slug.html'],
expectation: (result, expect) => {
expect(result).toBeTruthy()
expect(result.unwanted).toContain('.the-article-category')
}
},
{
urls: [1209, 'https://vietnamnet.vn/path/to/article'],
expectation: (result, expect) => {
Expand Down

0 comments on commit 0a70987

Please sign in to comment.