FeedIron TT-RSS Plugin

Reforge your feeds

Recipes moved to separate Repository

About	Table Of Contents
This is a plugin for Tiny Tiny RSS (tt-rss). It allows you to replace an article's contents by the contents of an element on the linked URL's page i.e. create a "full feed". Keep up to date by subscribing to the Release Feed	Installation Configuration tab Usage Filters General Options Global Options Testing Tab Full configuration example Xpath General Information

Installation

Checkout the directory into your plugins folder like this (from tt-RSS root directory):

$ cd /var/www/ttrss
$ git clone git://github.com/m42e/ttrss_plugin-feediron.git plugins.local/feediron

Then enable the plugin in TT-RSS preferences.

Optional

Install Readability.php using composer. Assuming composer is installed, navigate to the FeeIron plugin filter folder filters/fi_mod_readability with composer.json present and run:

$ composer install

Layout

After install in the TinyTinyRSS preferences menu you will find new tab called FeedIron. Under this tab you will have access to the FeedIron Configuration tab and the FeedIron Testing tab.

Configuration tab

The configuration for FeedIron is done in JSON format and will be displayed in the large configuration text field. Use the large field to enter/modify the configuration data and click the Save button to store it.

Additionally you can load predefined rules submitted by the community or export your own rules. To submit your own rules you can submit a pull request through Github.

Usage

There are Filters, general options and global options. Note: The rule type Must be defined and has to be one of the following: xpath, split or readability.

The best way to understand Feediron is to read the Full configuration example

Basic Configuration:

A Basic Configuration must define:

The site string. e.g. example.com
- Use the same configuration for multiple URL's by seperating them with the | Delimiter. e.g. "example.com|example.net"
- The configuration will be applied when the site string matches the <link> or <author> tag of the RSS feed item.
 - The <link> takes precedence over the <author>
 - <author> based configurations will NOT automatically show in the Testing Tab
The Filter type. e.g. "type":"xpath"
The Filter config. e.g. "xpath":"div[@id='content']" or the array "xpath": [ "div[@id='article']", "div[@id='footer']"]

Example:

{
  "example.com":{
    "type":"xpath",
    "xpath":"div[@id='content']"
  },
  "secondexample.com":{
    "type":"xpath",
    "xpath": [
      "div[@id='article']",
      "div[@id='footer']"
    ]
  }
}

_{Note: Take care while values are separated by a , (comma) using a trailing , (comma) is not valid.}

Filters:

xpath - "type":"xpath"
- xpath - "xpath":"xpath str" / [ "array of xpath str" ]
- index - "index":int
- multipage - "multipage":{options}
  - xpath - "xpath":"xpath str"
  - append - "append":bool
  - recursive - "recursive":bool
- start_element - "start_element":"str"
- join_element - "join_element":"str"
- end_element - "end_element":"str"
- cleanup - "cleanup":"xpath str" / [ "array of xpath str" ]
split - "type":"split"
- steps - "steps":[ array of steps ]
  - after - "after":"str"
  - before - "before":"str"
- cleanup - "cleanup":"/regex str/" / [ "/array of regex str/" ]
readability - "type":"readability" Note: Also accepts all Xpath type options
1. PHP-Readability
2. Readability.php (Optionally installed)
  - relativeurl - "relativeurl":"str"
  - removebyline - "removebyline":bool
  - normalize - "normalize":bool
  - prependimage - "prependimage":bool
  - mainimage - "mainimage":bool
  - appendimages - "appendimages":bool
  - allimages - "allimages":bool
- cleanup - "cleanup": "/regex str/" / [ "/array of regex str/" ]
tags - "tags":"{options}"
- xpath - "type": "xpath"
  - xpath - "xpath":"xpath str" / [ "array of xpath str" ]
- regex - "type": "regex"
  - pattern - "pattern": "/regex str/" / [ "/array of regex str/" ]
  - index - "index":int
- search - "type": "search"
  - pattern - "pattern": "/regex str/" / [ "/array of regex str/" ]
  - match - "match": "str" / [ "array of str" ]
- replace-tags - "replace-tags":bool
- cleanup - "cleanup":"xpath str" / [ "array of xpath str" ]
- split - "split":"str"

Xpath Filter

The xpath value is the actual Xpath-element to fetch from the linked page. Omit the leading // - they will get prepended automatically.

xpath - `"xpath":"xpath str" / [ "array of xpath str" ]`

Xpath string or Array of xpath strings

Single xpath string:

"example.com":{
  "type":"xpath",
  "xpath":"div[@id='content']"
}

Array of xpath strings:

"example.com":{
  "type":"xpath",
  "xpath":[
    "div[@id='footer']",
    "div[@class='content']",
    "div[@class='header']",
  ]
}

Xpaths are evaluated in the order they are given in the array and will be concatenated together. In the above example the output would be in the order of Footer -> Content -> Header instead of the normal Header -> Footer -> Content. See also concatenation elements

index - `"index": int`

Integer - Every xpath can also be an object consisting of an xpath element and an index element.

Selecting the 3rd Div in a page:

"example.com":{
	"type":"xpath",
	"xpath":[
		{
			"xpath":"div",
			"index":3
		}
	]
}

multipage - `"multipage":{[options]}`

This option indicates that the article is split into two or more pages (eventually). FeedIron can combine all the parts into the content of the article.

You have to specify a xpath which identifies the links (<a>) to the pages.

"example.com":{
	"type": "xpath",
	"multipage": {
		"xpath": "a[contains(@data-ga-category,'Pagination') and text() = 'Next']",
		"append": true,
		"recursive": true
	}
}

append - `"append":bool`

Boolean - If false, only the links are used and the original link is ignored else the links found using the xpath expression are added to the original page link.

recursive - `"recursive":bool`

Boolean - If true this option to parses every following page for more links. To avoid infinite loops the fetching stops if an url is added twice.

Concatenation Elements

start_element - `"start_element":"str"`

String - Prepends string to the start of content

"example.com":{
  "type":"xpath",
  "xpath":[
    "div[@id='footer']"
  ],
  "start_element":"The Footer is >"
}

Result: The Footer is >Footer Text

join_element - `"join_element":"str"`

String - Joins xpath array content together with string

"example.com":{
	"type":"xpath",
	"xpath":[
		"div[@id='footer']",
		"div[@class='header']"
	],
	"join_element":"<br><br>"
}

Result: Footer Text</div> Header Text

end_element - `"end_element":"str"`

String - Appends string to the end of content

"example.com":{
	"type":"xpath",
	"xpath":[
		"div[@class='header']"
	],
	"end_element":"< The Header was"
}

Result: Header Text< The Header was

Full Example of Concatenation Elements:

"example.com":{
	"type":"xpath",
	"xpath":[
		"div[@id='footer']",
		"div[@class='content']",
		"div[@class='header']"
	],
	"start_element":"The Footer is >",
	"join_element":"<br><br>",
	"end_element":"< The Header was"
}

Result: The Footer is >Footer Text >Content Text</div> Header Text< The Header was

cleanup - `"cleanup":"xpath str" / [ "array of xpath str" ]`

An array of Xpath-elements (relative to the fetched node) to remove from the fetched node.

"example.com":{
	"type":"xpath",
	"xpath":"div[@id='content']",
	"cleanup" : [ "~<script([^<]|<(?!/script))*</script>~msi" ]
}

split - `"type":"split"`

steps - `"steps":[ array of steps ]`

The steps value is an array of actions performed in the given order. If after is given the content will be split using the value and the second half is used, if before the first half is used. preg_split is used for this action.

"example.com":{
  "type":"split",
  "steps":[{
    "after": "/article-section clearfix\"\\W*>/",
    "before": "/<div\\W*class=\"module-box home-link-box/"
  },
  {
    "before": "/<div\\W*class=\"btwBarInArticles/"
  }
]
}

cleanup `"cleanup":[ "array of regex" ]`

Optional - An array of regex that are removed using preg_replace.

"example.com":{
  "type":"split",
  "steps":[{
    "after": "/article-section clearfix\"\\W*>/",
    "before": "/<div\\W*class=\"module-box home-link-box/"
  },
  {
    "before": "/<div\\W*class=\"btwBarInArticles/"
  }
],
"cleanup" : [ "~<script([^<]|<(?!/script))*</script>~msi" ]
}

Readability

The Readability modules are a automated method that attempts to isolate the relevant article text and images.

Basic Usage:

"example.com":{
	"type":"readability"
}

PHP-Readability

In built default, This option makes use of php-readability which is a fork of the original. All the extraction is performed within this module and has no configuration options

Readability.php

Optionally installed via composer Readability.php is a PHP port of Mozilla's Readability.js. All the extraction is performed within this module.

relativeurl - `"relativeurl":"str"`

Convert relative URLs to absolute. Like /test to http://host/test

"example.com":{
	"type":"readability",
	"relativeurl":"http:\/\/example.com\/"
}

removebyline - `"removebyline":bool`

Default value false

"example.com":{
	"type":"readability",
	"removebyline":true
}

normalize - `"normalize":bool`

Default value false

Converts UTF-8 characters to its HTML Entity equivalent. Useful to parse HTML with mixed encoding.

"example.com":{
	"type":"readability",
	"normalize":true
}

prependimage - `"prependimage":bool`

Default value false

Returns the main image of the article Prepended before the article.

"example.com":{
	"type":"readability",
	"prependimage":true
}

mainimage - `"mainimage":bool`

Default value false

Returns the main image of the article.

"example.com":{
	"type":"readability",
	"mainimage":true
}

appendimages - `"appendimages":bool`

Default value false

Returns all images in article appended after the article.

"example.com":{
	"type":"readability",
	"appendimages":true
}

allimages - `"allimages":bool`

Default value false

Returns all images in article without the article.

"example.com":{
	"type":"readability",
	"allimages":true
}

Tags Filter

FeedIron can fetch text from a page and save them as article tags. This can be used to improve the filtering options found in TT-RSS. Note: The Tag filter can use all the options available to the xpath filter and the modify option.

The order of execution for tags is:

Fetch Tag HTML.
Perform Cleanup tags individually.
Split Tags.
Modify Tags individually.
Strip any remaining HTML from Tags.

Usage Example:

"tags": {
    "type": "xpath",
    "replace-tags":true,
    "xpath": [
        "p[@class='topics']"
    ],
    "split":",",
    "cleanup": [
        "strong"
    ],
    "modify":[
      {
        "type": "replace",
        "search": "-",
        "replace": " "
      }
    ]
}

tags type xpath - `"type": "xpath"`

tags xpath - `"xpath":"xpath str" / [ "array of xpath str" ]`

"tags":{
	"type":"xpath",
  "xpath":"p[@class='topics']"
}

tags type regex - `"type": "regex"`

Uses PHP preg_match() in order to find and return a string from the article. Requires at least on pattern.

tags regex pattern - `"pattern": "/regex str/" / [ "/array of regex str/" ]`

"tags":{
	"type":"regex",
  "pattern": "/The quick.*fox jumped/"
}

tags regex index - `"index":int`

Specifies the number of the entry in article to return. Default value 1

"tags":{
	"type":"regex",
  "pattern": "/The quick.*fox jumped/",
  "index": 2
}

tags type search - `"type": "search"`

Search article using regex, if found it returns a pre-defined matching tag.

"tags":{
	"type":"search",
  "pattern": [
    "/feediron/",
    "/ttrss/"
  ],
  "match": [
    "FeedIron is here",
    "TT-RSS is here"
  ]
}

tags search pattern - `"pattern": "/regex str/" / [ "/array of regex str/" ]`

Must have corresponding match entries

tags search match - `"match": "str" / [ "array of str" ]`

Must have corresponding pattern entries. This can be inverted using the ! symbol at the beginning of the match entry to return if NO match is found

"tags":{
	"type":"search",
  "pattern": [
    "/feediron/",
    "/ttrss/"
  ],
  "match": [
    "!FeedIron is not here",
    "TT-RSS is here"
  ]
}

replace-tags - `"replace-tags":bool`

Default value false

Replace the article tags with fetched ones. By default tags are merged.

"tags":{
	"type":"xpath",
  "xpath":"p[@class='topics']",
  "replace-tags": true
}

split - `"split":"str"`

String - Splits tags using a delimiter

"tags":{
	"type":"xpath",
  "xpath":"p[@class='topics']",
  "split":"-"
}

Input: Tag1-Tag2-Tag3

Result: Tag1, Tag2, Tag3

General Options:

reformat / modify - "reformat":[array of options] "modify":[array of options]
- regex - "type":"regex"
  - pattern - "pattern":"/regex str/"
  - replace - "replace":"str"
- replace - "type":"replace"
  - search - "type":"search str" / [ "array of search str" ]
  - replace - "replace":"str"
force_charset - "force_charset":"charset"
force_unicode - "force_unicode":bool
tidy-source - "tidy-source":bool
tidy - "tidy":bool

reformat / modify - `"reformat":[array of options]` `"modify":[array of options]`

Reformat is an array of formatting rules for the url of the full article. The rules are applied before the full article is fetched. Where as Modify is an array of formatting rules for article using the same options.

regex - `"type":"regex"`

regex takes a regex in an option called pattern and the replacement in replace. For details see preg_replace in the PHP documentation.

pattern - `"pattern":"/regex str/"`

A regular expression or regex string.

replace - `"replace":"str"`

String to replace regex match with

Example reformat golem.de url:

"golem0Bde0C":{
  "type":"xpath",
  "xpath":"article",
  "reformat": [
    {
      "type": "regex",
      "pattern": "/(?:[a-z0-9A-Z\\/.\\:]*?)golem0Bde0C(.*)0Erss0Bhtml\\/story01.htm/",
      "replace": "http://www.golem.de/$1.html"
    }
  ]
}

replace - `"type":"replace"`

Uses the PHP function str_replace, which takes either a string or an array as search and replace value.

search - `"type":"search str" / [ "array of search str" ]`

String to search for replacement. If an array the order will match the replacement string order

replace - `"replace":"str" / [ "array of str" ]`

String to replace search match with. Array must have the same number of options as the search array.

Example search and replace instances of srcset with null:

{
  "type": "xpath",
  "xpath": "img",
  "modify": [
    {
      "type": "replace",
      "search": "srcset",
      "replace": "null"
    }
  ]
}

Example search and replace h1 and h2 tags with h3 tags:

"example.tld":{
  "type": "xpath",
  "xpath": "article",
  "modify": [
    {
      "type": "replace",
      "search": [
        "<h1>",
        "<\/h1>",
        "<h2>",
        "<\/h2>"
      ],
      "replace": [
        "<h3>",
        "<\/h3>",
        "<h3>",
        "<\/h3>"
      ]
    }
  ]
}

force_charset - `"force_charset":"charset"`

force_charset allows to override automatic charset detection. If it is omitted, the charset will be parsed from the HTTP headers or loadHTML() will decide on its own.

"example.tld":{
  "type": "xpath",
  "xpath": "article",
  "force_charset": "utf-8"
}

force_unicode - `"force_unicode":bool`

force_unicode performs a UTF-8 character set conversion on the html via iconv.

"example.tld":{
  "type": "xpath",
  "xpath": "article",
  "force_unicode": true
}

tidy-source - `"tidy-source":bool`

Optionally installed php-tidy. Default - false

Use tidy::cleanrepair to attempt to fix fetched article source, useful for improperly closed tags interfering with xpath queries.

Note: If Character set of page cannot be detected tidy will not be executed. In this case usage of force_charset would be required.

tidy - `"tidy":bool`

Optionally installed php-tidy. Default - true

Use tidy::cleanrepair to attempt to fix modified article, useful for unclosed tags such as iframes.

Note: If Character set of page cannot be detected tidy will not be executed. In this case usage of force_charset would be required.

Global options

debug - `"debug":bool`

Activate debugging information (Note: not for testing tab). Default - false

At the moment there is not that much debug information to be activated, this option must be places at the same level as the site configs.

Example:

{
  "example.com":{
    "type":"xpath",
    "xpath":"div[@id='content']"
  },
  "secondexample.com":{
    "type":"xpath",
    "xpath": [
      "div[@id='article']",
      "div[@id='footer']"
    ]
  },
  "debug":false
}

tidy-source - `"tidy-source":bool`

Allows you to disable globally the use of php-tidy on the fetched html source. tidy-source. Default - true

Uses tidy::cleanrepair to attempt to fix fetched article source, useful for improperly closed tags interfering with xpath queries.

Example:

{
  "example.com":{
    "type":"xpath",
    "xpath":"div[@id='content']"
  },
  "secondexample.com":{
    "type":"xpath",
    "xpath": [
      "div[@id='article']",
      "div[@id='footer']"
    ]
  },
  "tidy-source":false
}

Testing tab

The Testing tab is where you can debug/create your configurations and view a preview of the filter results. The configuration in the testing tab is identical to the configuration tab while omitting the domain/url.

{
  "type":"xpath",
  "xpath":"article"
}

Not

"example.tld":{
  "type":"xpath",
  "xpath":"article"
}

Full configuration example

{

  "heise.de": {
    "name": "Heise Newsticker",
    "url": "http://heise.de/ticker/",
    "type": "xpath",
    "xpath": "div[@class='meldung_wrapper']",
    "force_charset": "utf-8"
  },
  "berlin.de/polizei": {
    "type": "xpath",
    "xpath": "div[@class='bacontent']"
  },
  "n24.de": {
    "type": "readability",
  },
  "www.dorkly.com": {
    "type": "xpath",
    "multipage": {
      "xpath": "a[contains(@data-ga-category,'Pagination') and text() = 'Next']",
      "append": true,
      "recursive": true
    },
    "xpath": "div[contains(@class,'post-content')]"
  },
  "golem0Bde0C": {
    "type": "xpath",
    "xpath": "article",
    "multipage": {
      "xpath": "ol/li/a[contains(@id, 'atoc_')]",
      "append": true
    },
    "reformat": [
      {
        "type": "regex",
        "pattern": "/(?:[a-z0-9A-Z\\/.\\:]*?)golem0Bde0C(.*)0Erss0Bhtml\\/story01.htm/",
        "replace": "http://www.golem.de/$1.html"
      },
      {
        "type": "replace",
        "search": [
          "0A",
          "0C",
          "0B",
          "0E"
        ],
        "replace": [
          "0",
          "/",
          ".",
          "-"
        ]
      }
    ]
  },
  "oatmeal": {
    "type": "xpath",
    "xpath": "div[@id='comic']"
  },
  "blog.beetlebum.de": {
    "type": "xpath",
    "xpath": "div[@class='entry-content']",
    "cleanup": [ "header", "footer" ]
  },
  "sueddeutsche.de": {
    "type": "xpath",
    "xpath": [
      "h2/strong",
      "section[contains(@class,'authors')]"
    ],
    "join_element": "<p>",
    "cleanup": [
      "script"
    ]
  },
  "www.spiegel.de": {
    "type": "split",
    "steps": [
      {
        "after": "/article-section clearfix\"\\W*>/",
        "before": "/<div\\W*class=\"module-box home-link-box/"
      },
      {
        "before": "/<div\\W*class=\"btwBarInArticles/"
      }
    ],
    "cleanup" : [ "~<script([^<]|<(?!/script))*</script>~msi" ],
    "force_unicode": true
  },
  "debug": false

}

Xpath General Information

XPath is a query language for selecting nodes from an XML/html document.

Xpath Tools

To test your XPath expressions, you can use these Chrome extensions:

XPath Helper
xPath Viewer
xpathOnClick

Xpath Examples

Some XPath expressions you could need (the // is automatically prepended and must be omitted in the FeedMod configuration):

HTML5 <article> tag

<article>…article…</article>

//article

DIV inside DIV

<div id="content"><div class="box_content">…article…</div></div>`

//div[@id='content']/div[@class='box_content']

Multiple classes

<div class="post-body entry-content xh-highlight">…article…</div>

//div[starts-with(@class ,'post-body')]

or

//div[contains(@class, 'entry-content')]

Image tag

<a><img src='test.png' /></a>

img/..

Special Thanks

Thanks to mbirth who wrote af_feedmod who gave me a starting base.

Files

README.md

Latest commit

History

README.md

File metadata and controls

FeedIron TT-RSS Plugin

Installation

Optional

Layout

Configuration tab

Usage

Basic Configuration:

Filters:

Xpath Filter

xpath - "xpath":"xpath str" / [ "array of xpath str" ]

index - "index": int

multipage - "multipage":{[options]}

append - "append":bool

recursive - "recursive":bool

Concatenation Elements

start_element - "start_element":"str"

join_element - "join_element":"str"

end_element - "end_element":"str"

cleanup - "cleanup":"xpath str" / [ "array of xpath str" ]

split - "type":"split"

steps - "steps":[ array of steps ]

cleanup "cleanup":[ "array of regex" ]

Readability

PHP-Readability

Readability.php

relativeurl - "relativeurl":"str"

removebyline - "removebyline":bool

normalize - "normalize":bool

prependimage - "prependimage":bool

mainimage - "mainimage":bool

appendimages - "appendimages":bool

allimages - "allimages":bool

Tags Filter

tags type xpath - "type": "xpath"

tags xpath - "xpath":"xpath str" / [ "array of xpath str" ]

tags type regex - "type": "regex"

tags regex pattern - "pattern": "/regex str/" / [ "/array of regex str/" ]

tags regex index - "index":int

tags type search - "type": "search"

tags search pattern - "pattern": "/regex str/" / [ "/array of regex str/" ]

tags search match - "match": "str" / [ "array of str" ]

replace-tags - "replace-tags":bool

split - "split":"str"

General Options:

reformat / modify - "reformat":[array of options] "modify":[array of options]

regex - "type":"regex"

pattern - "pattern":"/regex str/"

replace - "replace":"str"

replace - "type":"replace"

search - "type":"search str" / [ "array of search str" ]

replace - "replace":"str" / [ "array of str" ]

force_charset - "force_charset":"charset"

force_unicode - "force_unicode":bool

tidy-source - "tidy-source":bool

tidy - "tidy":bool

Global options

debug - "debug":bool

tidy-source - "tidy-source":bool

Testing tab

Full configuration example

Xpath General Information

Xpath Examples

HTML5 <article> tag

DIV inside DIV

Multiple classes

Image tag

Special Thanks

xpath - `"xpath":"xpath str" / [ "array of xpath str" ]`

index - `"index": int`

multipage - `"multipage":{[options]}`

append - `"append":bool`

recursive - `"recursive":bool`

start_element - `"start_element":"str"`

join_element - `"join_element":"str"`

end_element - `"end_element":"str"`

cleanup - `"cleanup":"xpath str" / [ "array of xpath str" ]`

split - `"type":"split"`

steps - `"steps":[ array of steps ]`

cleanup `"cleanup":[ "array of regex" ]`

relativeurl - `"relativeurl":"str"`

removebyline - `"removebyline":bool`

normalize - `"normalize":bool`

prependimage - `"prependimage":bool`

mainimage - `"mainimage":bool`

appendimages - `"appendimages":bool`

allimages - `"allimages":bool`

tags type xpath - `"type": "xpath"`

tags xpath - `"xpath":"xpath str" / [ "array of xpath str" ]`

tags type regex - `"type": "regex"`

tags regex pattern - `"pattern": "/regex str/" / [ "/array of regex str/" ]`

tags regex index - `"index":int`

tags type search - `"type": "search"`

tags search pattern - `"pattern": "/regex str/" / [ "/array of regex str/" ]`

tags search match - `"match": "str" / [ "array of str" ]`

replace-tags - `"replace-tags":bool`

split - `"split":"str"`

reformat / modify - `"reformat":[array of options]` `"modify":[array of options]`

regex - `"type":"regex"`

pattern - `"pattern":"/regex str/"`

replace - `"replace":"str"`

replace - `"type":"replace"`

search - `"type":"search str" / [ "array of search str" ]`

replace - `"replace":"str" / [ "array of str" ]`

force_charset - `"force_charset":"charset"`

force_unicode - `"force_unicode":bool`

tidy-source - `"tidy-source":bool`

tidy - `"tidy":bool`

debug - `"debug":bool`

tidy-source - `"tidy-source":bool`