Releases · opensanctions/yente

09 Oct 13:16

pudo

v3.7.3

404b4b3

v3.7.3

Improvements to matching of company names
Disable phonetic matching on names that do not use a Western-style alphabet
Fix a race condition in the indexer which can delete the active index

Full Changelog: v3.7.2...v3.7.3

Assets 2

05 Oct 10:10

pudo

v3.7.2

3816157

v3.7.2

This release is very focussed on improving the scoring quality of the matcher system. Four areas in particular have seen work:

Improvements to the candidate generation system which finds possible matches using ElasticSearch. The candidate generation is the step before the generation of result scores, which pre-selected possible matches from the OpenSanctions database. It has been re-worked to assign higher scores to literal name matches, and to weight the individual terms in a company or person name in more detail (in particular, considering company type information less strongly).
We've made the logic-v1 matching implementations for Jaro-Winkler and Metaphone more precise in their ratings, meaning they score higher for close matches but also decrease in score for invalid candidates.
We've introduced a method to assign custom weights to the features in the logic-v1 algorithm, allowing API users to fine-tune the scoring system to their needs. More information: https://www.opensanctions.org/docs/api/scoring/#tuning
We've re-introcuced the Jaro-Winkler and Soundex implementations from yente 3.6.1 and frozen those in place, providing stability to any adopters.

What's Changed

Add schema facet and option to specify which facets are included in the response by @jbothma in #332
Bump jellyfish from 1.0.0 to 1.0.1 by @dependabot in #333
Bump elasticsearch[async] from 8.9.0 to 8.10.0 by @dependabot in #334
Bump fastapi from 0.103.1 to 0.103.2 by @dependabot in #336

New Contributors

@jbothma made their first contribution in #332

Full Changelog: v3.7.0...v3.7.2

Contributors

jbothma and dependabot

Assets 2

18 Sep 08:50

pudo

v3.7.0

9a72c47

v3.7.0

Introduces an improved scoring system for the /match API, see: https://www.opensanctions.org/articles/2023-09-18-scoring-rules/
Limits phonetic name searches to searches targeted at people
Fixes a bug regarding the preview popups shown in OpenRefine

What's Changed

Bump followthemoney from 3.5.3 to 3.5.4 by @dependabot in #331

Full Changelog: v3.6.2...v3.7.0

Contributors

dependabot

Assets 2

13 Sep 12:27

pudo

v3.6.2

9f8bf07

v3.6.2

This is mainly a maintenance release that updates software components. It introduces two new features:

The changed_since query parameter on both the /match and /search endpoints constrains results to only entities which have changed since the given ISO timestamp.
The API now has CORS access enabled, which is used by the OpenRefine reconciliation API.

What's Changed

Bump aiofiles from 23.1.0 to 23.2.1 by @dependabot in #309
Bump fingerprints from 1.1.0 to 1.1.1 by @dependabot in #313
Bump fastapi from 0.101.0 to 0.101.1 by @dependabot in #314
Bump orjson from 3.9.4 to 3.9.5 by @dependabot in #316
Update types-aiofiles requirement from <23.2,>=23.1.0.4 to >=23.1.0.4,<23.3 by @dependabot in #317
support since parameter for incremental scans by @everplays in #315
Bump fastapi from 0.101.1 to 0.103.0 by @dependabot in #319
Bump fastapi from 0.103.0 to 0.103.1 by @dependabot in #321
Bump actions/checkout from 3 to 4 by @dependabot in #322
Bump orjson from 3.9.5 to 3.9.7 by @dependabot in #324
Bump followthemoney from 3.5.2 to 3.5.3 by @dependabot in #325
Bump docker/setup-qemu-action from 2 to 3 by @dependabot in #326
Bump docker/metadata-action from 4 to 5 by @dependabot in #327
Bump docker/login-action from 2 to 3 by @dependabot in #328
Bump docker/setup-buildx-action from 2 to 3 by @dependabot in #329
Bump docker/build-push-action from 4 to 5 by @dependabot in #330

New Contributors

@everplays made their first contribution in #315

Full Changelog: v3.6.1...v3.6.2

Contributors

everplays and dependabot

Assets 2

08 Aug 14:04

pudo

v3.6.1

4b81ba0

v3.6.1

This version includes a lot of small changes based on customer feedback. In particular:

Introduce an exclude_dataset query parameter to /match and /search to remove a single dataset from results.
Make the maximal result count of /match configurable via the server variable YENTE_MAX_MATCHES
The index freshness check now tests if the new index has the given alias assigned, not just if it exists. This should handle partial indexing more gracefully.

What's Changed

Bump elasticsearch[async] from 8.8.2 to 8.9.0 by @dependabot in #302
Bump uvicorn[standard] from 0.23.1 to 0.23.2 by @dependabot in #305
Bump fastapi from 0.100.0 to 0.100.1 by @dependabot in #303
Bump countrynames from 1.15.1 to 1.15.2 by @dependabot in #304
Bump fastapi from 0.100.1 to 0.101.0 by @dependabot in #307
Bump orjson from 3.9.2 to 3.9.3 by @dependabot in #306
Bump orjson from 3.9.3 to 3.9.4 by @dependabot in #308

Full Changelog: v3.6.0...v3.6.1

Contributors

dependabot

Assets 2

24 Jul 09:14

pudo

v3.6.0

168722c

v.3.6.0

This release includes improved metadata handling for datasets, introduces some new entity types in the followthemoney data model and allows for less performance-heavy matching queries using the fuzzy flag. In detail:

We've introduced several new entity types in the followthemoney data model which will be used to provide more detailed information regarding politically exposed persons. We advise all users to update the API now so that the new entity types will be reflected correctly.
Using the /match API on a very large dataset can cause heavy load on the ElasticSearch index because of the Levenshtein-based fuzzy matching it uses. In this version, we've introduced a fuzzy= query parameter, which lets users disable that functionality. Please note that this doesn't affect the scores generated by the API; but it may lead to less recall on very specific queries.

What's Changed

Pydantic 2 by @pudo in #287
Bump elasticsearch[async] from 8.8.0 to 8.8.2 by @dependabot in #280
Bump uvicorn[standard] from 0.23.0 to 0.23.1 by @dependabot in #293

Full Changelog: v3.5.0...v3.6.0

Contributors

pudo and dependabot

Assets 2

04 Jul 10:25

pudo

v3.5.0

ed3faaa

v.3.5.0

This is a simple maintenance release that should improve performance and memory consumption of the application.

What's Changed

Bump orjson from 3.9.0 to 3.9.1 by @dependabot in #269
Bump fastapi from 0.96.0 to 0.97.0 by @dependabot in #270
Bump fastapi from 0.97.0 to 0.98.0 by @dependabot in #273
Bump types-aiofiles from 23.1.0.3 to 23.1.0.4 by @dependabot in #268
Bump fastapi from 0.98.0 to 0.99.1 by @dependabot in #278
Bump nomenklatura from 3.0.3 to 3.1.0 by @dependabot in #279

Full Changelog: v3.4.1...v3.5.0

Contributors

dependabot

Assets 2

05 Jun 11:46

pudo

v3.4.1

ea2d93f

v3.4.1

This release tries to improve error handling, and avoid some situations where async gets locked out by slow blocking situations.

The default manifest file included in the container now indexes the default collection instead of all.

What's Changed

Bump orjson from 3.8.11 to 3.8.12 by @dependabot in #253
Bump countrynames from 1.14.3 to 1.15.0 by @dependabot in #254
Bump fastapi from 0.95.1 to 0.95.2 by @dependabot in #255
Bump types-aiofiles from 23.1.0.2 to 23.1.0.3 by @dependabot in #257
Bump followthemoney from 3.3.0 to 3.4.0 by @dependabot in #258
Bump ubuntu from 23.04 to 23.10 by @dependabot in #256
Bump orjson from 3.8.12 to 3.9.0 by @dependabot in #264
Bump asyncstdlib from 3.10.7 to 3.10.8 by @dependabot in #265
Bump fastapi from 0.95.2 to 0.96.0 by @dependabot in #266
Bump elasticsearch[async] from 8.7.0 to 8.8.0 by @dependabot in #260
Bump nomenklatura from 2.11.0 to 2.14.0 by @dependabot in #267

Full Changelog: v3.4.0...v3.4.1

Contributors

dependabot

Assets 2

08 May 10:11

pudo

v3.4.0

cd2849b

v3.4.0

This release completely re-works the way in which the OpenSanctions API will score matches in the /match API.

Until now, the API has used a simple statistical model to assign a match quality score to each result it has returned. With the new release of yente 3.4, we've made that mechanism more flexible: clients can now select one of a set of supported algorithms to optimise the behaviour of the API for their use case.

With the new release, we've added three new scoring systems to augment the existing model (now called regression-v1):

regression-v2 is a new statistical model for matching people and companies. Unlike regression-v1 it uses pronounciation-based (phonetic/soundex) comparison for entity names, and it has reduced the impact of birthdates as a decision criterion. The new model will generally produce much lower scores for results, so you may want to reduce your matching threshold parameter in the API to 0.5 or 0.6.
name-based is a simple scoring mechanism based on name similarity only. It uses two criteria, the Jaro-Winkler string distance mechanism and the Soundex phonetic algorithm. This can be a useful tool to conduct matching on data where you only have entity names, and no other details such as birth dates, nationalities, etc.
name-qualified uses the score from the name-based mechanism but then considers other criteria, such as birth dates, nationalities, tax and registration identifiers. If any of these mismatch between the query and the result, the score is lowered. This attempts to anticipate a simple review process that a human analyst might otherwise undertake when a result is found.

What's Changed

Bump asyncstdlib from 3.10.6 to 3.10.7 by @dependabot in #250
Bump types-aiofiles from 23.1.0.1 to 23.1.0.2 by @dependabot in #249
Bump orjson from 3.8.10 to 3.8.11 by @dependabot in #246
Bump uvicorn[standard] from 0.21.1 to 0.22.0 by @dependabot in #247
Multiple scoring algorithms by @pudo in #251
Stable patches by @pudo in #248

Full Changelog: v3.3.1...v3.4.0

Contributors

pudo and dependabot

Assets 2

25 Apr 13:27

pudo

v3.3.1

2c715bb

v3.3.1

Updates nomenklatura to a new version that is fully statement-based.

What's Changed

Bump aiocsv from 1.2.3 to 1.2.4 by @dependabot in #238
Bump orjson from 3.8.8 to 3.8.10 by @dependabot in #237
Bump elasticsearch[async] from 8.6.2 to 8.7.0 by @dependabot in #235
Bump types-aiofiles from 23.1.0.0 to 23.1.0.1 by @dependabot in #232
Bump structlog from 22.3.0 to 23.1.0 by @dependabot in #236

Full Changelog: v3.3.0...v3.3.1

Contributors

dependabot

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

Releases: opensanctions/yente

v3.7.3

v3.7.2

What's Changed

New Contributors

Contributors

v3.7.0

What's Changed

Contributors

v3.6.2

What's Changed

New Contributors

Contributors

v3.6.1

What's Changed

Contributors

v.3.6.0

What's Changed

Contributors

v.3.5.0

What's Changed

Contributors

v3.4.1

What's Changed

Contributors

v3.4.0

What's Changed

Contributors

v3.3.1

What's Changed

Contributors