Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[names] Use OpenUp! database for caching #15

Open
5 tasks done
re1 opened this issue Apr 18, 2020 · 9 comments
Open
5 tasks done

[names] Use OpenUp! database for caching #15

re1 opened this issue Apr 18, 2020 · 9 comments
Labels
enhancement New feature or request

Comments

@re1
Copy link
Owner

re1 commented Apr 18, 2020

The caching database is required to work independently from the JACQ services and has to be used remotely. A working database for the OpenUp! project exists and will also be used by the main JACQ services.

The existing OpenUp! database has to be fully implemented to Match OpenUp! functionality.
In order to prevent data corruption by new service features especially sources without endpoint are to be preserved!

Caching process

Queries are currently cached in the OpenUp! project as described in Confluence (Common Names Webservice Caching System) and are to be implemented the same way from the JACQ Common Names Webservice.

ℹ️ Queries are identified by their SHA1 hash.

Implementation

The implementation process can roughly be split into 5 steps for reference:

  1. Define interfaces for all existing tables and prepare for remote caching
  2. Check incoming requests against the remote cache and compare timestamps
  3. Retrieve data from source caches where no source endpoint is available
  4. Write uncached and outdated queries to source caches
  5. Compare local web service cache against remote web service cache and decide on the final implementation.

Step 5 largely depends on the development process as further insight might change the scope of this issue.

@re1 re1 added the enhancement New feature or request label Apr 18, 2020
@re1 re1 self-assigned this Apr 18, 2020
@re1 re1 pinned this issue May 3, 2020
@re1
Copy link
Owner Author

re1 commented May 19, 2020

The following chart shows the currently documented caching system using the OpenUp! database.

Source: JACQ Common Names Webservice Caching System

@re1
Copy link
Owner Author

re1 commented May 27, 2020

In this following chart additional information was added to better explain the process of caching and using web service results.

webservice-caching

The web service cache might already include responses from sources no longer available which are still to be included in the results.

@re1
Copy link
Owner Author

re1 commented Jun 2, 2020

The following services are to be added to the OpenUp! Web service cache after they are implementing WebServiceCache (ids can change!):

id url
10 http://www.dnp.go.th
11 http://ylist.info/
INSERT INTO tbl_service
VALUES (10, 'http://www.dnp.go.th'),
       (11, 'http://ylist.info/');

@re1
Copy link
Owner Author

re1 commented Jun 2, 2020

WBÖ / DBÖ source (5) uses a similar response format as the JACQ Legacy source and in the OpenUp! implementation it also uses the JACQ Legacy (NHMW) service interface to fetch the original web service response.

The following example uses the query hash 40ce6cd2e9e0bf2aee00bd81c569b83ee4509749 (likely some form of Homogyne discolor Cass.).

s:856:"[
  {
    "taxon_id": "2892",
    "geography": "aut",
    "scope": 100,
    "dboe_id": "beleg:127439",
    "name": "Rahmblödschal ",
    "language": "bars",
    "match": true,
    "reference": "http:\/\/dboema.acdh.oeaw.ac.at\/dboe\/beleg\/127439",
    "type": ["\/name\/common"]
  }, { ... }, { ... }
]

The JACQ CNS will only use cached results for now unless stated otherwise.
So far no working queries were found to test this source.

@re1
Copy link
Owner Author

re1 commented Jul 20, 2020

The caching strategy has been updated with Web Service response caching functionality including incremental caching as described in [names] Incremental Web service caching and timeouts (#21).

The following chart shows how common names are collected from both cached and uncached static and Web Service sources.

webservice-caching

@re1
Copy link
Owner Author

re1 commented Sep 23, 2020

The existing OpenUp! Database has Web Service responses cached in serialized PHP format. Most already cached responses are therefore hard to read and require a considerable amount of extra work. It might be a good idea to reevaluate the serialization or format of responses.

@re1
Copy link
Owner Author

re1 commented Oct 28, 2020

The OpenUp database table tbl_webservice_cache contains many values not relevant to the application. Such values can be removed to improve search times. The following values are to be removed:

  • N;
  • b:0;
  • s:0:"";
  • s:3:"[] ";
  • a:1:{i:0;O:8:"stdClass":0:{}}
  • a:2:{s:5:"error";s:0:"";s:6:"result";a:0:{}}
  • O:8:"stdClass":1:{s:13:"ArtssokResult";O:8:"stdClass":0:{}}
  • a:2:{s:5:"error";s:22:"no database connection";s:6:"result";a:0:{}}
  • O:8:"stdClass":1:{s:29:"GetTaxaBySearchCriteriaResult";O:8:"stdClass":0:{}}
  • a:1:{i:0;O:8:"stdClass":3:{s:10:"vernacular";s:14:"Chinese Chives";s:13:"language_code";s:2:"eu";s:8:"language";s:21:"English-United States";}}
  • a:2:{i:0;O:8:"stdClass":3:{s:10:"vernacular";s:13:"gerani cineri";s:13:"language_code";N;s:8:"language";s:7:"Catalan";}i:1;O:8:"stdClass":3:{s:10:"vernacular";s:11:"silkesnäva";s:13:"language_code";s:2:"sv";s:8:"language";s:7:"Swedish";}}
  • a:2:{s:5:"error";s:0:"";s:6:"result";a:1:{i:0;a:7:{s:10:"searchtext";s:2:"Os";s:19:"searchtextNearmatch";s:0:"";s:11:"rowsChecked";i:34709;s:4:"type";s:3:"uni";s:8:"database";s:5:"freud";s:18:"includeCommonNames";b:1;s:12:"searchresult";a:0:{}}}}

In addition there are about 500 queries with unusable responses including the query text with the format like so:

  • a:2:{s:5:"error";s:0:"";s:6:"result";a:1:{i:0;a:7:{s:10:"searchtext";s:2:"Os";s:19:"searchtextNearmatch";s:0:"";s:11:"rowsChecked";i:35410;s:4:"type";s:3:"uni";s:8:"database";s:5:"freud";s:18:"includeCommonNames";b:1;s:12:"searchresult";a:0:{}}}}

Those values where found using the following query and checking it's result:

SELECT COUNT(response) AS c, response
FROM tbl_webservice_cache
GROUP BY response
HAVING c > 1
ORDER BY LENGTH(response);

As deleting by comparing to the query takes considerably long, therefore values for deletion are added manually:

DELETE
FROM tbl_webservice_cache
WHERE response IN ('N;', 'b:0;', 's:0:"";', 's:3:"[]
";', 'a:1:{i:0;O:8:"stdClass":0:{}}',
                   'a:2:{s:5:"error";s:0:"";s:6:"result";a:0:{}}',
                   'O:8:"stdClass":1:{s:13:"ArtssokResult";O:8:"stdClass":0:{}}',
                   'a:2:{s:5:"error";s:22:"no database connection";s:6:"result";a:0:{}}',
                   'O:8:"stdClass":1:{s:29:"GetTaxaBySearchCriteriaResult";O:8:"stdClass":0:{}}',
                   'a:1:{i:0;O:8:"stdClass":3:{s:10:"vernacular";s:14:"Chinese Chives";s:13:"language_code";s:2:"eu";s:8:"language";s:21:"English-United States";}}',
                   'a:2:{i:0;O:8:"stdClass":3:{s:10:"vernacular";s:13:"gerani cineri";s:13:"language_code";N;s:8:"language";s:7:"Catalan";}i:1;O:8:"stdClass":3:{s:10:"vernacular";s:11:"silkesnäva";s:13:"language_code";s:2:"sv";s:8:"language";s:7:"Swedish";}}',
                   'a:2:{s:5:"error";s:0:"";s:6:"result";a:1:{i:0;a:7:{s:10:"searchtext";s:2:"Os";s:19:"searchtextNearmatch";s:0:"";s:11:"rowsChecked";i:34709;s:4:"type";s:3:"uni";s:8:"database";s:5:"freud";s:18:"includeCommonNames";b:1;s:12:"searchresult";a:0:{}}}}')
   OR response LIKE
      'a:2:{s:5:"error";s:0:"";s:6:"result";a:1:{i:0;a:7:{s:10:"searchtext";s:%:"%";s:19:"searchtextNearmatch";s:0:"";s:11:"rowsChecked";i:%;s:4:"type";s:3:"uni";s:8:"database";s:5:"freud";s:18:"includeCommonNames";b:1;s:12:"searchresult";a:0:{}}}}';

@re1
Copy link
Owner Author

re1 commented Oct 29, 2020

Many rows of the tbl_webservice_cache table have the same column values and can be removed. A simple SQL script for deduplication would be

DELETE w1
FROM tbl_webservice_cache w1
         INNER JOIN tbl_webservice_cache w2
WHERE w1.id < w2.id
  AND w1.service_id = w2.service_id
  AND w1.query = w2.query
  AND w1.response = w2.response;

As many rows are locked in the process there might be a an error stating "The total number of locks exceeds the lock table size". This can be avoided by manually setting the service_id value to compare.

...
  AND w1.service_id = 1
  AND w2.service_id = 1
...

@re1 re1 removed their assignment Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant