installation

Installing and tuning tvgrabpyAPI

Requirements

Python 2.7.9 or higher (currently not python 3.x)
The pytz module
The requests module
The DataTreeGrab module
Connection with the Internet

Installation

Especially under Windows, make sure Python 2.7.9 or higher is installed. You will find Python 2.7.12 for Windows here. I regularly test the package under Windows, but I do not use it there, so issues may slip my notice. Be sure to raise them!
Make sure the above mentioned Python 2 packages are installed on your system
Download the latest release and unpack it into a directory
Run:
- under Linux: sudo ./setup.py install from that directory
  The frontend script(s) and develloper test scripts will install into /usr/bin.
- under Windows depending on how you installed Python:
  - setup.py install from that directory
  - Or: Python setup.py install from that directory
  (the frontend script(s) and develloper test scripts will install into C:\Program Files\Python27\Scripts)
Run the frontend (presently only tv_grab_nl3.py) with --configure to create a configuration and a database file. You should do this as the user that will run the script every night as those files are placed in .xmltv in the user HOME directory. Running the script as root will fail except with --configure. The configuration then will be placed in /etc/tvgrabpyAPI/ as a fallback configuration. If you want the files there to be writable by other users you at precent must arange this yourself.
If you want the help text in your configuration and your log to be in another language then English (currently only Dutch), add the --language option. Ones you have a configuration you also can set the language there. This also works with the help screen. tv_grab_nl3.py --help --language nl will give you the Dutch help page.
Check the created configuration file ~/.xmltv/tv_grab_nl3.conf and activate the desired channels by removing the leading #.

Using a non standard configuration location

By default all configuration files, the database and the logfiles are placed in the user HOME directory in .xmltv. This should ensure that sufficient rights exist and that the database is not accessed by two users simultaniously, which would cause the second user to run without cache.

If you want to create and or access a configuration in a different location and/or with a different name, you use the --config_file option. You can use a relative path to your current location. Be sure you have sufficient rights or a fatal error will follow. The log-file will always be placed in the same location and with the same name, but with a .log extension. MythTV uses this option to place the configuration in the ~/.mythtv directory naming it after the coresponding videosource with an .xmltv extension.

For the sqlite database there is a similar --cache option. After the given name a .db extension will be added. Be carefull if you want to share this database between multiple users. If for whatever reason the datbase can not be accessed an error is given and the run is continued without cache functionality. You can force this (without an error) by giving None as the database name.

The logfile

For every time you run the script a logfile in the same place and with the same name as the configuration file, but with a .log extension is created. If that logfile already exists. it is renamed to .old and a new file is opened. You should regularily at least check the top of the logfile as any messages on lineup changes or program updates will be placed there.

You can set what to log through log_level and match_log_level in your configuration. Except for log_level 8 and 16 (they manage the same but 8 to the screen and 16 to the log) and the report on the actual settings everything going to the log is by default also printed to the screen. Use quiet/verbose to dis/enable logging to the screen.

You can also choose to mail the log after completion. In your configuration you find the following:

mail_log = False
mail_log_address = postmaster
mailserver = localhost
mailport = 25

Set mail_log to True to activate the mailing. If you have a default Linux mailserver running on your machine as many Linux distributions do, the above settings should work as postmaster is a mandatory address. However you can use any mailserver that does not need SSL/startTLS or authentication to send a mail. Many provider mailservers qualify as they restrict uncontrolled access by only allowing clients from within their own sub-net on internet. Gmail, Hotmail etc. won't work.
You are advised to try the first time running from a console as any errors on sending the mail won't make it to the log as it then already is closed for sending!

Alternatly most cron-servers allow for mailing the output from cron-jobs. Check its manual if you want to try this. The advantage is that you receive the complete log including any follow-up from for instance mythfilldatabase. If you use the quiet option this will not show much as all cron can do is sent the screen output!

The Database

The default name and location for the database is ~/.xmltv/program_cache3.db. It is used to store three groups of data:

All lineup data retrieved on running --configure and used to create and update the configuration file. Not all this data is also stored in the configuration file. Especially for sites like humo.be that group channels together on their own pages, this grouping information is retrieved from the database. So if for whatever reason you delete the database-file, first run --configure to recreate the database and fill it with the lineup data.
All previously fetched programme listings not dating further back then 1 day. On running, all programme listings for before midnight the previous day is cleared from the database. All programme data is stored as it was retrieved, before integrating the data from the respective sources, but after applying some name and genre translations to unify the data from the respective sources.
You can clear this data with the --clear_cache option
theTVDB.com lookup results.
You can clear this data with the --clear_ttvdb option.

On start-up after verifying its sanity a backup copy is made of the database. If the database fails this sanity check a previous backup copy is restored. Sqlite can run in a very save journaling modus, but with the heavy use that is made of the database, this would make the program extremely slow. Therefor we have choosen for a slightly less save modus, falling back to the backup in case of the unlikely event of database damage due to a crash interupted write operation. So at most the results of a run in progress can get lost. In over a year I have not had need to fall back to this backup. The size of the databasefile can grow to over a Gigabyte! When only containing lineup data it will be several Megabytes depending on the amount of sources and channels covered by your frontend.

About channels and sources

sourceid: the ID for a source you can find by running --show-sources
chanid: the unique source independent ID for a channel. By default also the xmltvID. This is item 3 in a channel-configuration-line
channelid: the for a source unique ID for a channel. Often the ID used by that source, but sometimes constructed from the channelname. Together with the sourceid it gives a default chanid. This is item <sourceid> + 4 in a channel-configuration-line.
logo-sourceid: This is the second item from the end in a channel-configuration-line. Check --show-logo-sources for a list.

A grabber frontend extracts data from several sources, the Dutch/Flemish grabber uses eleven and a twelfth to combine listings for time-share channels. They can range from cable and broadcasting company-sites to newssites and public JSON services. Every source suplies listings for a number of channels (less then ten to over a hundred), for a number of days (4 to more then 14, actually one has data for months in the future), with a varying degree of detail, completness and accuracy. All this data overlaps to some degree and this API combines this data in one single as accurate and as complete as possible set of listings. Most of this process is managed by settings in the JSON data-files, but as user there also are many settings you can adjust to your own preferences. Use --show-sources and --show-detail-sources to get a list of available sources. Check for individual channels the available channelids in the configuration-line.

Most important is the prime_source, this is the source that for that channel delivers the best start- (and stop-) times. First all sources are ordered in decreasing dependability and for some channel-groups a specific prime_source is set as default. Sometimes, as you yourself also can do, a prime_source is set for a single channel.
Next you can set a prefered_description. By default the longest description available from any of the sources is used, but that does not need to be the best. I for instance prefer the descriptions from the Dutch vpro.nl (7) source.
You can disable a source either globally or for single channels using the disable_source = <sourceid> option (or --disable_source <sourceid> on the commandline). Once you disable it globally you can not enable it for a single channel. To disable only detail-pages use: disable_detail_source = <sourceid>. If a source is showing problems we also can (temporarilly) disable it in the JSON datafile.

<under construction>

theTVDB.com

The API can do a lookup for programmes on theTVDB.com. Whenever a lookup is succesfull and a ttvdbID is retrieved, data for all known episodes on theTVDB.com is fetched and stored into the database. If no ttvdbID could be found the failure also is stored. Whenever a programme is found in the database (either with data or as a failure) and that data is not older then 30 days, no request to theTVDB.com is done and only available data is used. Once the data gets older then 30 days, a check on updates is done on the theTVDB.com and the 'last_request_date' is updated. By default data for several languages as set in the grabber-datafile, but always including English, is fetched and stored. Mostly this will be the grabber language and the channel language.

theTVDB.com lookup (database or internet) is only attempted for programmes with a genre marked as being a series genre. Through the ttvdb_lookup_level you can further specify which series will qualify. You can set this both global and for individual channels:

0: This is effectively the same as setting disable_ttvdb to True, but with one difference. Once you disable ttvdb global you can not enable it for an individual channel. Setting ttvdb_lookup_level globally to 0 leaves room to set a different level for individual channels.
1: Only do a lookup if an episode title and an empty season number are found. This is what was used in tv_grab_nl.py version 2 and is intended to find a missing season/episode numbering or to convert an absolute episode number to a season/episode number
2: Do a lookup for all series that already have an episode title.
3: Do a lookup on all series. This setting potentially will give a lot of failures.

Some channel-groups (with radio or local stations) are always excluded.

It can happen that a wrong ttvdbID is associated with a title. Whenever an already existing title is added to theTVDB.com, that title is appended with the premiere year. So 'Castle' being an old British series about old Castles, the series most of us know as 'Castle' is stored as 'Castle (2009)'. To fix this we have added a ttvdb_alias table to the database we already for you have filled with 'Castle (2009)', 'Father Brown (2013)' and 'The Player (2015)', but others might occure or you get an airing of the original series.
You can manage this table with the --add-ttvdb-title <progamme name> [<two letter language code>] [<ttvdbid>] commandline interface. If the name contains spaces you must enclose it in quotes and the language and ttvdbid are optional. This can be a different language then the interface language set with the --language option. Running tv_grab_nl3.py --add-ttvdb-title castle returns:

The series "castle" is not jet known!

theTVDB Search Results:
  1 ->     82607: (en) Castle
  2 ->     83462: (nl) Castle (2009)
  3 ->    261384: (en) An Englishman's Castle
  4 ->    251457: (en) The Hero Yoshihiko and the Devil King's Castle
  5 ->     83462: (en) Castle (2009)
  6 ->    266240: (en) Dani's Castle
  7 ->    277084: (en) Mysteries at the Castle
  8 ->     74533: (en) Queenie's Castle
  9 ->     78403: (en) Eureeka's Castle
 10 ->    184991: (en) Noah's Castle
 11 ->    247852: (en) The Queen's Castle
 12 ->     74787: (en) Takeshi's Castle
 13 ->    251944: (en) The Castle of Sand (2011)
 14 ->    251872: (en) The Castle of Sand (2004)
Enter choice (first number, q to abort):

But you could for instance retrieve the French data using the Dutch interface:

tv_grab_nl3.py --add-ttvdb-title castle fr --language nl

So here you select the ttvdb_alias and ttvdbID that should be used. After selecting a value the alias is stored and all episode data for that series is retrieved and stored. Selecting 2 will return (after 1 or 2 seconds):

Removing old instance
Adding "Castle (2009)" under alias "castle" as ttvdbID: 83462 to the database for lookups!

If already data is found on querying the database you for instance will see instead of The series "castle" is not jet known!:

The series "castle" is already saved under ttvdbID: 83462 -> Castle (2009)
    for the languages: (fr, en, nl, de)

Followed by the list to select from. If you are satisfied with this press q to leave the database as is.

Known languages on theTVDB.com are: ("cs", "da", "de", "el", "en", "es", "fi", "fr", "he", "hr", "hu", "it", "ja", "ko", "nl", "no", "pl", "pt", "ru", "sl", "sv", "tr", "zh"). However not all will be available for all series. They then will be substituted by the English version.

Sometimes the right series is not shown. In that case you can suply the right ttvdbid after the series title. For instance for "Secrets and Lies" there are two versions:

277531 "Secrets & Lies" from Network Ten in 2014
279214 "Secrets and Lies (2015)" from ABC in 2015

If you query "Secrets and Lies" only the first one is shown, if you query "Secrets & Lies" both are listed. So to attach the alias "Secrets and Lies" to 279214 "Secrets and Lies (2015)" you use `--add-ttvdb-title "Secrets and Lies" 279214 and it will show for instance:

The series "secrets and lies" is already saved under ttvdbID: 277531 -> Secrets & Lies
    for the languages: (fr, en, nl, de)
	
	Do you want to replace it with 279214 ("Secrets and Lies (2015)")? <y/n>
	
Please press y or n.

You can clear all theTVDB.com data with the --clear_ttvdb option.
You can disable ttvdb-lookup either globally or for single channels using the disable_ttvdb option. Once you disable it globally you can not enable it for a single channel. If the cache function is disabled, ttvdb-lookup is disabled automatically too!

At all times you can check on what ttvdb matches were made in ~/.xmltv/ttvdb.log. Here all matches and failures from the last run are logged in detail. The previous log is stored as ~/.xmltv/ttvdb.log.old.

Both the ttvdb seriesid and the ttvdb episodeid are stored in the xmltv output to be used by the application you use the data with.

Detail Pages

Sources are very different in how they deliver their data. Aside from JSON or HTML they can have a single (detailed) page for every channel and every day, a single page containing all channels and days or anything in between. Also some sources only have basic (time and title) data on their main page(s) with a detailed page for every programme, sometimes grouping all airings of a single show for one day together. These pages often have a maximum of data, but also take most of the time any run takes. With the old version 2 Dutch/Flemish grabber fetching all data for all days and all channels would take more then 24 hours and that with all sources fetching in parallel. This API already reduces the time needed by reusing more data from the cache, but as we all have our prefered channels and shows we will never use a lot of this data, so you can set the amount of detail to fetch based on the channel, the source, the genre and the nearness in time reducing total fetch time further.

The fastest fetch can be achieved with the --use-only-cache option. Adding this option on the commandline excludes fetching any new data from the sources and will create xmltv output based only on the data already present in the database in essence recreating the output from the last normal run. As I understand TVheadend does not buffer listing data like MythTV does. By linking the grabber with this option to TVheadend and running the grabber normally once or twice a day through a cron-job to update the database, you can improve the reaction-time in TVheadend significantly. I have no real experience with TVheadend, but you might have to encapsulate the script in a small shell-script, like included with tv_grab_nl.py, if you can not add your own options to the TVheadend call.
To do this move the frontend from the path (/bin/usr) where it is normally placed to for instance your home directory and place a shell script like:

#!/bin/bash

XML_OUT=${HOME}/tv_out.xml

[ -f $XML_OUT ] && rm -f $XML_OUT

${HOME}/tv_grab_nl3.py --use-only-cache $@ --output $XML_OUT

[ $? == 0 -a -f $XML_OUT ] &&  cat $XML_OUT

in usr/bin naming it for instance tv_grab_nl3.sh.
The moving of the frontend is needed to prevent tvheadend from both detecting the frontend and the shell script. In your daily cronjob you include the path to the new location in your home directory as in the above shellscript. Be sure to run both as the same user as otherwise a different database is used!

There are several option that regulate whether detail-pages are fetched:

You can fetch in fast modus, for all channels or for selected channels. This means that for those channels no detail-pages are fetched at all.
You can set slowdays to a value other than None, again global for all channels or for selected channels. This means that the value of fast is discarded and that up to the number of days set detail-pages are fetched, but not for the rest. Setting slowdays to the same value or higher as days is the same as setting fast to False. Setting it to 0 turns fast modus on. Be aware that if slowdays is set to a value other then None, the value for fast is disgarded!!
You can disable detail fetching from any of the sources by adding disable_detail_source = <sourceid> global or for selected channels. With the --show-detail-sources option you can get a list of available detail-sources.
Another option is to select for wat genres to fetch details. You can in tv_grab_xx_py.set either list the genres you DO want to fetch details for or add "all" and list genres to NOT fetch genres for. If you see a lot of programmes without a genre, either add "none" to the list to include those without genre or use "all" and exclude all genres you are not interested in. The genres you should add here are the ones from before cattrans. You can find those also in tv_grab_xx_py.set.

Any previously fetched detail-page stored in the database is always used and merged. On any fetch a maximum of one detail-page for every programme is fetched. Any other detail-pages from another source will be fetched on a next run.

Further Tuning

<under construction>

Retrieving your data

<under construction>

The Options

add_hd_id
add-ttvdb-title
always_use_json
cache
capabilities
cattrans
clean_cache
clear_cache
clear_ttvdb
compat
config_file
configure
days
desc_length
description
disable_detail_source
disable_source
disable_ttvdb
fast
global_timeout
group_active_channels
help
legacy_xmltvids
language
log_level
logos
long_descr
mark_HD
mail_log
mail_log_address
mailport
mailserver
match_log_level
max_overlap
max_simultaneous_fetches
nocattrans
nologos
offset
output_file
output_tz
output-windows-codeset
overlap_strategy
prefered_description
preferredmethod
prime_source
quiet
ratingstyle
save_options
show-detail-sources
show-logo-sources
show-sources
slow
slowdays
ttvdb_lookup_level as of 1.0.5
use-only-cache
utc
use_split_episodes
verbose
version
xmltvid_alias

tv_grab_xx_py.set

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

installation

Installing and tuning tvgrabpyAPI

Requirements

Installation

Using a non standard configuration location

The logfile

The Database

About channels and sources

theTVDB.com

Detail Pages

Further Tuning

Retrieving your data

Clone this wiki locally