feed-extractor

To read & normalize RSS/ATOM/JSON feed data.

Attention

feed-reader has been renamed to @extractus/feed-extractor since v6.1.4

Demo

Install & Usage

Node.js

npm i @extractus/feed-extractor

# pnpm
pnpm i @extractus/feed-extractor

# yarn
yarn add @extractus/feed-extractor

// es6 module
import { read } from '@extractus/feed-extractor'

// CommonJS
const { read } = require('@extractus/feed-extractor')

// you can specify exactly path to CommonJS version
const { read } = require('@extractus/feed-extractor/dist/cjs/feed-extractor.js')

// extract a RSS
const result = await read('https://news.google.com/rss')
console.log(result)

Deno

// deno < 1.28
import { read } from 'https://esm.sh/@extractus/feed-extractor'

// deno > 1.28
import { read } from 'npm:@extractus/feed-extractor'

Browser

import { read } from 'https://unpkg.com/@extractus/feed-extractor@latest/dist/feed-extractor.esm.js'

Please check the examples for reference.

APIs

`read()`

Load and extract feed data from given RSS/ATOM/JSON source. Return a Promise object.

Syntax

read(String url)
read(String url, Object options)
read(String url, Object options, Object fetchOptions)

Parameters

`url` required

URL of a valid feed source

Feed content must be accessible and conform one of the following standards:

For example:

import { read } from '@extractus/feed-extractor'

const result = await read('https://news.google.com/atom')
console.log(result)

Without any options, the result should have the following structure:

{
  title: String,
  link: String,
  description: String,
  generator: String,
  language: String,
  published: ISO Date String,
  entries: Array[
    {
      title: String,
      link: String,
      description: String,
      published: ISO Datetime String
    },
    // ...
  ]
}

`options` optional

Object with all or several of the following properties:

normalization: Boolean, normalize feed data or keep original. Default true.
useISODateFormat: Boolean, convert datetime to ISO format. Default true.
descriptionMaxLen: Number, to truncate description. Default 210 (characters).
xmlParserOptions: Object, used by xml parser, view fast-xml-parser's docs
getExtraFeedFields: Function, to get more fields from feed data
getExtraEntryFields: Function, to get more fields from feed entry data

For example:

import { read } from '@extractus/feed-extractor'

await read('https://news.google.com/atom', {
  useISODateFormat: false
})

await read('https://news.google.com/rss', {
  useISODateFormat: false,
  getExtraFeedFields: (feedData) => {
    return {
      subtitle: feedData.subtitle || ''
    }
  },
  getExtraEntryFields: (feedEntry) => {
    const {
      enclosure,
      category
    } = feedEntry
    return {
      enclosure: {
        url: enclosure['@_url'],
        type: enclosure['@_type'],
        length: enclosure['@_length']
      },
      category: isString(category) ? category : {
        text: category['@_text'],
        domain: category['@_domain']
      }
    }
  }
})

`fetchOptions` optional

You can use this param to set request headers to fetch.

For example:

import { read } from '@extractus/feed-extractor'

const url = 'https://news.google.com/rss'
await read(url, null, {
  headers: {
    'user-agent': 'Opera/9.60 (Windows NT 6.0; U; en) Presto/2.1.1'
  }
})

You can also specify a proxy endpoint to load remote content, instead of fetching directly.

For example:

import { read } from '@extractus/feed-extractor'

const url = 'https://news.google.com/rss'

await read(url, null, {
  headers: {
    'user-agent': 'Opera/9.60 (Windows NT 6.0; U; en) Presto/2.1.1'
  },
  proxy: {
    target: 'https://your-secret-proxy.io/loadXml?url=',
    headers: {
      'Proxy-Authorization': 'Bearer YWxhZGRpbjpvcGVuc2VzYW1l...'
    }
  }
})

Passing requests to proxy is useful while running @extractus/feed-extractor on browser. View examples/browser-feed-reader as reference example.

Test

git clone https://github.com/extractus/feed-extractor.git
cd feed-extractor
npm i
npm test

Quick evaluation

git clone https://github.com/extractus/feed-extractor.git
cd feed-extractor
npm install

npm run eval https://news.google.com/rss

License

The MIT License (MIT)

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
.github/workflows		.github/workflows
dist		dist
examples		examples
src		src
test-data		test-data
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
build.js		build.js
build.test.js		build.test.js
eval.js		eval.js
index.d.ts		index.d.ts
package.json		package.json
reset.js		reset.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

feed-extractor

Attention

Demo

Install & Usage

Node.js

Deno

Browser

APIs

`read()`

Syntax

Parameters

`url` required

`options` optional

`fetchOptions` optional

Test

Quick evaluation

License

About

Releases

Packages

Languages

License

t2bot/feed-extractor

Folders and files

Latest commit

History

Repository files navigation

feed-extractor

Attention

Demo

Install & Usage

Node.js

Deno

Browser

APIs

read()

Syntax

Parameters

url required

options optional

fetchOptions optional

Test

Quick evaluation

License

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

`read()`

`url` required

`options` optional

`fetchOptions` optional

Packages