[names] Use OpenUp! database for caching #15

re1 · 2020-04-18T14:01:24Z

The caching database is required to work independently from the JACQ services and has to be used remotely. A working database for the OpenUp! project exists and will also be used by the main JACQ services.

The existing OpenUp! database has to be fully implemented to Match OpenUp! functionality.
In order to prevent data corruption by new service features especially sources without endpoint are to be preserved!

Caching process

Queries are currently cached in the OpenUp! project as described in Confluence (Common Names Webservice Caching System) and are to be implemented the same way from the JACQ Common Names Webservice.

ℹ️ Queries are identified by their SHA1 hash.

Implementation

The implementation process can roughly be split into 5 steps for reference:

Define interfaces for all existing tables and prepare for remote caching
Check incoming requests against the remote cache and compare timestamps
Retrieve data from source caches where no source endpoint is available
Write uncached and outdated queries to source caches
Compare local web service cache against remote web service cache and decide on the final implementation.

Step 5 largely depends on the development process as further insight might change the scope of this issue.

re1 · 2020-05-19T10:44:29Z

The following chart shows the currently documented caching system using the OpenUp! database.

Source: JACQ Common Names Webservice Caching System

re1 · 2020-05-27T07:36:16Z

In this following chart additional information was added to better explain the process of caching and using web service results.

The web service cache might already include responses from sources no longer available which are still to be included in the results.

re1 · 2020-05-28T13:48:36Z

The following Web service sources were originally used in the OpenUp! project and therefore still cached. Although not all sources are still online, service implementations are to be defined in order to query cached responses.

id	url
1	http://www.catalogueoflife.org/col/webservice?response=full&format=php&name=
2	http://131.130.131.9/taxamatch/jsonRPC/json_rpc_taxamatchMdld.php
3	http://www.eu-nomen.eu/portal/soap.php?wsdl=1
4	http://webtjenester.artsdatabanken.no/Artsnavnebase.asmx?WSDL
5	http://wboe.oeaw.ac.at/api/taxonid/
6	http://ws.luomus.fi/triplestore/search?predicate=dwc:scientificName&object=
7	https://taxon.artdatabankensoa.se/TaxonService.svc?wsdl
8	http://www.meertens.knaw.nl/pland/rest/?format=php&plantnaam=
9	http://localhost:8080/jacq-names/rest/names/common/?query=

re1 · 2020-06-02T09:51:41Z

The following services are to be added to the OpenUp! Web service cache after they are implementing WebServiceCache (ids can change!):

id	url
10	http://www.dnp.go.th
11	http://ylist.info/

INSERT INTO tbl_service
VALUES (10, 'http://www.dnp.go.th'),
       (11, 'http://ylist.info/');

re1 · 2020-06-02T13:07:54Z

WBÖ / DBÖ source (5) uses a similar response format as the JACQ Legacy source and in the OpenUp! implementation it also uses the JACQ Legacy (NHMW) service interface to fetch the original web service response.

The following example uses the query hash 40ce6cd2e9e0bf2aee00bd81c569b83ee4509749 (likely some form of Homogyne discolor Cass.).

s:856:"[
  {
    "taxon_id": "2892",
    "geography": "aut",
    "scope": 100,
    "dboe_id": "beleg:127439",
    "name": "Rahmblödschal ",
    "language": "bars",
    "match": true,
    "reference": "http:\/\/dboema.acdh.oeaw.ac.at\/dboe\/beleg\/127439",
    "type": ["\/name\/common"]
  }, { ... }, { ... }
]

The JACQ CNS will only use cached results for now unless stated otherwise.
So far no working queries were found to test this source.

re1 · 2020-07-20T16:01:12Z

The caching strategy has been updated with Web Service response caching functionality including incremental caching as described in [names] Incremental Web service caching and timeouts (#21).

The following chart shows how common names are collected from both cached and uncached static and Web Service sources.

re1 · 2020-09-23T12:31:29Z

The existing OpenUp! Database has Web Service responses cached in serialized PHP format. Most already cached responses are therefore hard to read and require a considerable amount of extra work. It might be a good idea to reevaluate the serialization or format of responses.

re1 · 2020-10-28T10:38:21Z

The OpenUp database table tbl_webservice_cache contains many values not relevant to the application. Such values can be removed to improve search times. The following values are to be removed:

N;
b:0;
s:0:"";
s:3:"[] ";
a:1:{i:0;O:8:"stdClass":0:{}}
a:2:{s:5:"error";s:0:"";s:6:"result";a:0:{}}
O:8:"stdClass":1:{s:13:"ArtssokResult";O:8:"stdClass":0:{}}
a:2:{s:5:"error";s:22:"no database connection";s:6:"result";a:0:{}}
O:8:"stdClass":1:{s:29:"GetTaxaBySearchCriteriaResult";O:8:"stdClass":0:{}}
a:1:{i:0;O:8:"stdClass":3:{s:10:"vernacular";s:14:"Chinese Chives";s:13:"language_code";s:2:"eu";s:8:"language";s:21:"English-United States";}}
a:2:{i:0;O:8:"stdClass":3:{s:10:"vernacular";s:13:"gerani cineri";s:13:"language_code";N;s:8:"language";s:7:"Catalan";}i:1;O:8:"stdClass":3:{s:10:"vernacular";s:11:"silkesnäva";s:13:"language_code";s:2:"sv";s:8:"language";s:7:"Swedish";}}
a:2:{s:5:"error";s:0:"";s:6:"result";a:1:{i:0;a:7:{s:10:"searchtext";s:2:"Os";s:19:"searchtextNearmatch";s:0:"";s:11:"rowsChecked";i:34709;s:4:"type";s:3:"uni";s:8:"database";s:5:"freud";s:18:"includeCommonNames";b:1;s:12:"searchresult";a:0:{}}}}

In addition there are about 500 queries with unusable responses including the query text with the format like so:

a:2:{s:5:"error";s:0:"";s:6:"result";a:1:{i:0;a:7:{s:10:"searchtext";s:2:"Os";s:19:"searchtextNearmatch";s:0:"";s:11:"rowsChecked";i:35410;s:4:"type";s:3:"uni";s:8:"database";s:5:"freud";s:18:"includeCommonNames";b:1;s:12:"searchresult";a:0:{}}}}

Those values where found using the following query and checking it's result:

SELECT COUNT(response) AS c, response
FROM tbl_webservice_cache
GROUP BY response
HAVING c > 1
ORDER BY LENGTH(response);

As deleting by comparing to the query takes considerably long, therefore values for deletion are added manually:

DELETE
FROM tbl_webservice_cache
WHERE response IN ('N;', 'b:0;', 's:0:"";', 's:3:"[]
";', 'a:1:{i:0;O:8:"stdClass":0:{}}',
                   'a:2:{s:5:"error";s:0:"";s:6:"result";a:0:{}}',
                   'O:8:"stdClass":1:{s:13:"ArtssokResult";O:8:"stdClass":0:{}}',
                   'a:2:{s:5:"error";s:22:"no database connection";s:6:"result";a:0:{}}',
                   'O:8:"stdClass":1:{s:29:"GetTaxaBySearchCriteriaResult";O:8:"stdClass":0:{}}',
                   'a:1:{i:0;O:8:"stdClass":3:{s:10:"vernacular";s:14:"Chinese Chives";s:13:"language_code";s:2:"eu";s:8:"language";s:21:"English-United States";}}',
                   'a:2:{i:0;O:8:"stdClass":3:{s:10:"vernacular";s:13:"gerani cineri";s:13:"language_code";N;s:8:"language";s:7:"Catalan";}i:1;O:8:"stdClass":3:{s:10:"vernacular";s:11:"silkesnäva";s:13:"language_code";s:2:"sv";s:8:"language";s:7:"Swedish";}}',
                   'a:2:{s:5:"error";s:0:"";s:6:"result";a:1:{i:0;a:7:{s:10:"searchtext";s:2:"Os";s:19:"searchtextNearmatch";s:0:"";s:11:"rowsChecked";i:34709;s:4:"type";s:3:"uni";s:8:"database";s:5:"freud";s:18:"includeCommonNames";b:1;s:12:"searchresult";a:0:{}}}}')
   OR response LIKE
      'a:2:{s:5:"error";s:0:"";s:6:"result";a:1:{i:0;a:7:{s:10:"searchtext";s:%:"%";s:19:"searchtextNearmatch";s:0:"";s:11:"rowsChecked";i:%;s:4:"type";s:3:"uni";s:8:"database";s:5:"freud";s:18:"includeCommonNames";b:1;s:12:"searchresult";a:0:{}}}}';

re1 · 2020-10-29T12:08:46Z

Many rows of the tbl_webservice_cache table have the same column values and can be removed. A simple SQL script for deduplication would be

DELETE w1
FROM tbl_webservice_cache w1
         INNER JOIN tbl_webservice_cache w2
WHERE w1.id < w2.id
  AND w1.service_id = w2.service_id
  AND w1.query = w2.query
  AND w1.response = w2.response;

As many rows are locked in the process there might be a an error stating "The total number of locks exceeds the lock table size". This can be avoided by manually setting the service_id value to compare.

...
  AND w1.service_id = 1
  AND w2.service_id = 1
...

re1 added the enhancement New feature or request label Apr 18, 2020

re1 self-assigned this Apr 18, 2020

re1 mentioned this issue Apr 18, 2020

[names] Match OpenUp! functionality #1

Open

30 tasks

re1 pinned this issue May 3, 2020

re1 mentioned this issue May 27, 2020

[names] Remove PHP serialization and hashing for Web service cache #18

Open

re1 mentioned this issue Jul 20, 2020

[names] Incremental Web service caching and timeouts #21

Closed

re1 removed their assignment Mar 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[names] Use OpenUp! database for caching #15

[names] Use OpenUp! database for caching #15

re1 commented Apr 18, 2020 •

edited

Loading

re1 commented May 19, 2020 •

edited

Loading

re1 commented May 27, 2020

re1 commented May 28, 2020

re1 commented Jun 2, 2020 •

edited

Loading

re1 commented Jun 2, 2020 •

edited

Loading

re1 commented Jul 20, 2020

re1 commented Sep 23, 2020

re1 commented Oct 28, 2020 •

edited

Loading

re1 commented Oct 29, 2020 •

edited

Loading

[names] Use OpenUp! database for caching #15

[names] Use OpenUp! database for caching #15

Comments

re1 commented Apr 18, 2020 • edited Loading

Caching process

Implementation

re1 commented May 19, 2020 • edited Loading

re1 commented May 27, 2020

re1 commented May 28, 2020

re1 commented Jun 2, 2020 • edited Loading

re1 commented Jun 2, 2020 • edited Loading

re1 commented Jul 20, 2020

re1 commented Sep 23, 2020

re1 commented Oct 28, 2020 • edited Loading

re1 commented Oct 29, 2020 • edited Loading

re1 commented Apr 18, 2020 •

edited

Loading

re1 commented May 19, 2020 •

edited

Loading

re1 commented Jun 2, 2020 •

edited

Loading

re1 commented Jun 2, 2020 •

edited

Loading

re1 commented Oct 28, 2020 •

edited

Loading

re1 commented Oct 29, 2020 •

edited

Loading