Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[names] Match OpenUp! functionality #1

Open
24 of 30 tasks
re1 opened this issue Aug 15, 2019 · 11 comments
Open
24 of 30 tasks

[names] Match OpenUp! functionality #1

re1 opened this issue Aug 15, 2019 · 11 comments
Labels
enhancement New feature or request

Comments

@re1
Copy link
Owner

re1 commented Aug 15, 2019

@re1 re1 added the enhancement New feature or request label Aug 15, 2019
@re1 re1 self-assigned this Aug 15, 2019
@re1
Copy link
Owner Author

re1 commented Aug 20, 2019

/commonNames/ from OpenUp! is currently mapped to /names/common/. Documentation suggests using /commonNames/ while the type argument is always set to /name/common/. The OpenRefine Reconciliation Service API describes the type argument as

specifying the types of result e.g., person, product, ... The actual format of each type depends on the service (e.g., "Q515" as a Wikidata type)

For consistency the route will be set to /commonNames/ if no further reasons are given. A valid use case might be a shared path for the names service under /names/ leading to /names/common/ for common names and /names/scientific/ for scientific names.
Another option is the implementation of both paths by using regular expressions in the @Path annotation (https://stackoverflow.com/a/17002237/7826291).

@re1
Copy link
Owner Author

re1 commented Aug 24, 2019

OpenUp! throws a BadRequest error if the type parameter is not /name/common. This does not help the user but might be closer to the original OpenRefine specs. Might need feedback.

@re1
Copy link
Owner Author

re1 commented Aug 30, 2019

Sources might be deprecated or unreachable. Web services a prioritized because they are easier to implement and usually more up to date.

@re1
Copy link
Owner Author

re1 commented Sep 4, 2019

Found some documentation for Artdatabanken's code and API.

@re1
Copy link
Owner Author

re1 commented Sep 6, 2019

JACQ internal common names are to be included as a data source. They are currently available from http://131.130.131.9/taxamatch/jsonRPC/json_rpc_taxamatchMdld.php using legacy code. This code will likely be migrated later.

The data source uses JSON-RPC and receives POST requests in the format of

{
  "id": 1,
  "method": "getMatchesService",
  "params": [
    "vienna",
    "Cynodon dactylon",
    {
      "includeCommonNames": true
    }
  ]
}

the result for this particular request looks like this (only one common name is included here):

{
    "id": 1,
    "result": {
        "error": "",
        "result": [
            {
                "searchtext": "Cynodon dactylon",
                "searchtextNearmatch": "",
                "rowsChecked": 33900,
                "type": "multi",
                "database": "freud",
                "includeCommonNames": true,
                "searchresult": [
                    {
                        "genus": "Cynodon",
                        "distance": "0",
                        "ratio": 1,
                        "taxon": "Cynodon Rich. (Poaceae)",
                        "ID": "12747",
                        "taxonID": "21179",
                        "family": "Poaceae",
                        "species": [
                            {
                                "name": "dactylon",
                                "distance": 0,
                                "ratio": 1,
                                "taxon": "Cynodon dactylon (L.) Pers.",
                                "taxonID": "1753",
                                "family": "Poaceae",
                                "syn": "",
                                "synID": 0,
                                "commonNames": [
                                    {
                                        "id": "11127",
                                        "name": "مرغ",
                                        "language": "fas",
                                        "geography": "Islamic Republic of Iran, Iran (independent political entity: country, state, region,...), (, Iran,IR, , 00)",
                                        "period": "recent",
                                        "reference": "Mozaffarian, V. (2007): 1-671; index."
                                    }
                                ]
                            }
                        ]
                    }
                ]
            }
        ]
    },
    "error": null
}

@re1
Copy link
Owner Author

re1 commented Sep 20, 2019

Meertens KNAW (Pland) returns either as a website or a serialized PHP array. Libraries to deserialize PHP in Java are rare and mpstly outdated. The most popular seems to be Pherialize which has not been updated in 6 years.
Alternatively the parser could be written from scratch.

@re1
Copy link
Owner Author

re1 commented Sep 22, 2019

Both Artsdatabanken and Dyntaxa are providing SOAP endpoints which are hard to use and as it seems poorly documented too. Using WSDL files to generate code is recommended but adds a huge amount of complexity for an otherwise simple task.

@re1
Copy link
Owner Author

re1 commented Apr 18, 2020

Sources without endpoint are to be sourced exclusively from cache as stated in #15.

@re1 re1 pinned this issue May 3, 2020
@re1
Copy link
Owner Author

re1 commented May 4, 2020

Dyntaxa endpoint was implemented as follows in OpenUp!: https://github.com/wkollernhm/openup/blob/master/protected/components/Sources/DyntaxaSe.php

@re1
Copy link
Owner Author

re1 commented May 26, 2020

Many static source tables already used in OpenUp! do not have a unique combination of columns to use for identification. In order to use those tables for JPA mapping an id column should be added:

alter table tbl_source_{table_name}
	add id int not null auto_increment primary key;

The text in braces {table_name} is only a placeholder for the name of the source table without its prefix (tbl_source_).

This change is required for the following OpenUp! tables:

  • tbl_source_azerbaijan
  • tbl_source_czech_jiri_bezo1
  • tbl_source_czech_jiri_roztoci
  • tbl_source_czech_jiri_vacnatci
  • tbl_source_hungarian_peregovits
  • tbl_source_linnaeus_projects
  • tbl_source_ukrainian_kobiv

Although these changes do not lead to possible data loss, it might be a good idea to backup the OpenUp! database before using its sources and caches directly.

mysqldump --user={user} --password={password} --host {host} {database} > {database}.sql

2020-05-27: The remote tables have been altered accordingly. In case of a static source update they will have to be updated again manually.

@re1
Copy link
Owner Author

re1 commented Jun 2, 2020

Regarding test it might be a good idea to make a list of query parameters to test specific common name sources. For static sources they can be looked up from the database. The following scientific names are found in multiple Web services:

Scientific name Source IDs
Eriophorum 1, 2, 3, 8

@re1 re1 removed their assignment Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant