Skip to content
Yacoby edited this page Jul 19, 2011 · 2 revisions

The parsers live in applicaton/parsers with the parser settings in applicaton/parsers/parser.ini.

A parser contains a class for parsing a page, inheriting from Search_Parser_Site_Page which should contain the methods to extract mod details if the page is a mod page.

In the most basic case, all a parser needs to define methods to check if a given url is a valid page or if it is a page that contains a mod and functions to extract mod details.

The parser must contain the two following functions:

protected function doIsValidModPage($url);
protected function doIsValidPage($url);

The first checks if the page is a mod page. In other words, if the page should be parsed to extract mod details. The second checks if the page should be visited to attempt to find more mods. There is no need to check if a page is a mod page in the doIsValidModPage. This is automatically taken into account and mod pages will be visited by default.

By default, when parsing a mod page, the superclass tries if they exist to call the functions getITEM, where ITEM is either Game, Name, Author, Description, Category, Version. Hence to parse the name you would just define the function

public function parseName(){
    //Do parsing here
}

The page html can be accessed from $this->_html, which is an instance of simple_html_dom. The documentation for that can be found here

If the result of that function is null, it is assumed that an error has occurred and an exception will be thrown.

It is expected that getGame will return either 'MW' for a Morrowind mod or 'OB' for an Oblivion mod.

When complete, the parser needs to be added to application/parsers/parsers.ini. A description of all the settings can be found in application/parsers/defaults.ini

The format of the section for your the site should be [www.host.com:site] (The name after the colon indicates that the site inherits its details from site)

For a more complex example involving logging in, AJAX requests and using an alternative method of parsing HTML (E.g. RegEx) see application/parsers/TesNexus.php

Clone this wiki locally