Skip to content

v3.4.0

Compare
Choose a tag to compare
@pudo pudo released this 08 May 10:11
· 526 commits to main since this release

This release completely re-works the way in which the OpenSanctions API will score matches in the /match API.

Until now, the API has used a simple statistical model to assign a match quality score to each result it has returned. With the new release of yente 3.4, we've made that mechanism more flexible: clients can now select one of a set of supported algorithms to optimise the behaviour of the API for their use case.

With the new release, we've added three new scoring systems to augment the existing model (now called regression-v1):

  • regression-v2 is a new statistical model for matching people and companies. Unlike regression-v1 it uses pronounciation-based (phonetic/soundex) comparison for entity names, and it has reduced the impact of birthdates as a decision criterion. The new model will generally produce much lower scores for results, so you may want to reduce your matching threshold parameter in the API to 0.5 or 0.6.

  • name-based is a simple scoring mechanism based on name similarity only. It uses two criteria, the Jaro-Winkler string distance mechanism and the Soundex phonetic algorithm. This can be a useful tool to conduct matching on data where you only have entity names, and no other details such as birth dates, nationalities, etc.

  • name-qualified uses the score from the name-based mechanism but then considers other criteria, such as birth dates, nationalities, tax and registration identifiers. If any of these mismatch between the query and the result, the score is lowered. This attempts to anticipate a simple review process that a human analyst might otherwise undertake when a result is found.

What's Changed

Full Changelog: v3.3.1...v3.4.0