Skip to content

Commit

Permalink
feat: update vaa-matching
Browse files Browse the repository at this point in the history
Update `vaa-matching` on a variety of counts:

- Remove the types that are defined in `vaa-shared` as of commmit
  324fdd8
- Rewrite handling of subdimensions so that they can be used universally
  with all distance metrics. This involves restructuring the `metric`
  functions in a composable format
- Implement a Euclidian distance metric
- Remove the `AbsoluteMaximum` missing value imputation method because
  it does not fit the general mathematical matching model
- Create an example implementation of `CategoricalQuestion` for multiple
  choice questions whose values cannot be ordered
- Change both question implementation to adhere more closely to the
  upcoming `vaa-data` model
- Implement passing of question weights to the matching algorithm
- Abbreviate and rename some types
- Apply the new TS style guide requirements to the code
- Rewrite many of the tests
- Update the README

Also apply the required changes to consumers of `vaa-matching`.
  • Loading branch information
kaljarv committed Sep 5, 2024
1 parent e66167d commit 56dfdf7
Show file tree
Hide file tree
Showing 45 changed files with 1,777 additions and 1,217 deletions.
16 changes: 11 additions & 5 deletions frontend/src/lib/utils/matching/LikertQuestion.ts
Original file line number Diff line number Diff line change
@@ -1,20 +1,26 @@
import {MultipleChoiceQuestion} from '$voter/vaa-matching';
import type {CoordinateOrMissing, Id} from 'vaa-shared';
import {OrdinalQuestion} from '$voter/vaa-matching';

/**
* A dummy question object for matching.
*/
export class LikertQuestion extends MultipleChoiceQuestion {
export class LikertQuestion extends OrdinalQuestion {
public readonly category: QuestionCategoryProps;
constructor({id, values, category}: LikertQuestionOptions) {
super(id, values);
super({id, values});
this.category = category;
}

normalizeValue(value: number): CoordinateOrMissing {
// The current frontend implemenation of questions uses numbers for choice keys
return super.normalizeValue(`${value}`);
}
}
/**
* Options for a dummy question object for matching.
*/
interface LikertQuestionOptions {
id: ConstructorParameters<typeof MultipleChoiceQuestion>[0];
values: ConstructorParameters<typeof MultipleChoiceQuestion>[1];
id: Id;
values: ConstructorParameters<typeof OrdinalQuestion>[0]['values'];
category: QuestionCategoryProps;
}
2 changes: 1 addition & 1 deletion frontend/src/lib/utils/matching/imputePartyAnswers.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import {error} from '@sveltejs/kit';
import {MISSING_VALUE} from '$voter/vaa-matching';
import {MISSING_VALUE} from 'vaa-shared';
import {logDebugError} from '$lib/utils/logger';
import {mean} from './mean';
import {median} from './median';
Expand Down
17 changes: 11 additions & 6 deletions frontend/src/lib/utils/matching/match.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ import {error} from '@sveltejs/kit';
import {
type MatchingOptions,
MatchingAlgorithm,
DistanceMetric,
MissingValueDistanceMethod,
DISTANCE_METRIC,
MISSING_VALUE_METHOD,
type MatchableQuestionGroup
} from '$voter/vaa-matching';
import {LikertQuestion} from './LikertQuestion';
Expand All @@ -33,9 +33,9 @@ export async function match<E extends EntityProps>(
): Promise<RankingProps<E>[]> {
// Create the algorithm instance
const algorithm = new MatchingAlgorithm({
distanceMetric: DistanceMetric.Manhattan,
distanceMetric: DISTANCE_METRIC.Manhattan,
missingValueOptions: {
missingValueMethod: MissingValueDistanceMethod.AbsoluteMaximum
method: MISSING_VALUE_METHOD.RelativeMaximum
}
});

Expand All @@ -51,7 +51,7 @@ export async function match<E extends EntityProps>(
questions.push(
new LikertQuestion({
id: q.id,
values: q.values.map((o) => ({value: o.key})),
values: q.values.map((o) => ({id: `${o.key}`, value: o.key})),
category: q.category
})
);
Expand Down Expand Up @@ -85,7 +85,12 @@ export async function match<E extends EntityProps>(
}

// Perform the matching
return algorithm.match(questions, voter, entities, matchingOptions);
return algorithm.match({
questions,
reference: voter,
targets: entities,
options: matchingOptions
});
}

/**
Expand Down
2 changes: 1 addition & 1 deletion frontend/src/lib/utils/tests/matching.test.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import {expect, test} from 'vitest';
import {imputePartyAnswers, mean, median} from '../matching';
import {MockCandidate, MockParty} from './mock-objects';
import {MISSING_VALUE} from '$voter/vaa-matching';
import {MISSING_VALUE} from 'vaa-shared';

test('Mean and median', () => {
expect(mean([1, 2, 2, 2, 10]), 'Mean').toEqual((1 + 2 + 2 + 2 + 10) / 5);
Expand Down
84 changes: 56 additions & 28 deletions frontend/src/lib/voter/vaa-matching/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,35 @@
# Matching algorithms (`vaa-matching`)

The `vaa-matching` module provides a generic implementation for matching algorithms used in VAAs.

In order to use the algorithms provided, the inputs must fulfil certain requirements defined by the [`MatchableQuestion`](/shared/src/matching/matchableQuestion.type.ts) and [`HasAnswers`](/shared/src/matching/hasAnswers.type.ts) interfaces.

## Features

- Supports any type of question
- Sample implementations provided for ordinal questions (including Likert questions of any scale) and categorical questions
- For custom question implementations, the only requirement is that answers to can be represented as positions in uni- or multidimensional space
- Manhattan, directional and Euclidean distance metrics
- Assign arbitrary weights to questions
- Computing matches for subgroups of questions, such as for specific themes
- Projecting positions to a lower-dimensional space, such as a 2D map, using custom projector

See also [Future developments](#future-developments)

## Dependencies

`vaa-shared`: Definitions related to matching space distances, matchable questions and entities having answers to these are shared between this and other `vaa` modules.

## Quick start

1. Create question objects that implement [`MatchableQuestion`](./src/question/matchableQuestion.ts)
2. Create candidate objects and a voter passing that implement [`HasMatchableAnswers`](./src/answer/hasMatchableAnswers.ts). With regard to the matching algorithm, there's no difference between the voter and candidates
3. Instantiate a [`MatchingAlgorithm`](./src/algorithms/matchingAlgorithm.ts) object with suitable options
4. Call the algorithm's [`match`](./src/algorithms/matchingAlgorithm.ts) method passing as arguments the questions to match for, the voter and the candidates as well as optional [`MatchingOptions`](./src/algorithms/matchingAlgorithm.type.ts). The algorithm returns an array of [`Match`](./src/match/match.ts) objects for each candidate.
1. You can get submatches for question categories by supplying `questionGroups` in the options. These are objects implementing [`MatchableQuestionGroup`](./src/question/matchableQuestionGroup.ts). The submatches are contained in the [`subMatches`](./src/match/subMatch.ts) array of the [`Match`](./src/match/match.ts) objects.
2. Note that only those questions that the voter has answered will be considered in the matching.
1. Create question objects that implement [`MatchableQuestion`](/shared/src/matching/matchableQuestion.type.ts)
2. Create candidate objects and a voter that implement [`HasAnswers`](/shared/src/matching/hasAnswers.type.ts). With regard to the matching algorithm, there’s no difference between the voter and candidates
3. Instantiate a [`MatchingAlgorithm`](./src/algorithms/matchingAlgorithm.ts) with suitable options
4. Call the algorithm’s [`match`](./src/algorithms/matchingAlgorithm.ts) method passing as arguments the questions to match for, the voter, the candidates and optional [`MatchingOptions`](./src/algorithms/matchingAlgorithm.type.ts). The algorithm returns an array of [`Match`](./src/match/match.ts) objects for each candidate ordered by ascending distance
1. The `Match` objects contain a reference to the `entity` they target
2. You can get submatches for question categories by supplying `questionGroups` in the options. These are objects implementing [`MatchableQuestionGroup`](./src/question/matchableQuestionGroup.ts). The submatches are contained in the [`subMatches`](./src/match/subMatch.ts) array of the [`Match`](./src/match/match.ts) objects
3. You can weight the questions by supplying `questionWeights` in the options
4. Only those questions that the voter has answered will be considered in the matching

## Trying it out

Expand All @@ -23,43 +45,49 @@ To that end, the module is built following these principles:

1. Enable mixing of different kinds of questions
2. Provide flexible methods of dealing with missing values
3. Do not expect question values to fall into a specific numeric range by default
3. Do not expect answers to questions to be inherently numeric
4. Prefer reliability even at the cost of performance
5. Prefer verbosity if it can help avoid confusion

## Paradigm

The general operational paradigm of the algorithm is to treat the voter and the candidate as positions in a multidimensional space. To compute a match we measure the distance between the two in that space and return a value, which is normalized so that it represents the fraction of the maximum possible distance in that space, i.e. 0–100 %.

The algorithm also supports a more complicated method of matching, in which the positions are project to a lower-dimensional space before calculating the distance. This method is used often to place the persons into a two or three-dimensional space with axes such as 'Economical left–right' and 'Value conservative–liberal'. In this kind of projection, different weights are given to the dimensions in source space (that is, the questions) in calculating the positions in the target space. In the basic method, on the contrary, all of the questions have equal weights in calculating the distance.
The algorithm also supports a more complicated method of matching, in which the positions are projected to a lower-dimensional space before calculating the distance. This method is used often to place the persons into a two or three-dimensional space with axes such as ‘Economical left–right’ and ‘Value conservative–liberal’. In this kind of projection, different weights are given to the dimensions in source space (that is, the questions) in calculating the positions in the target space. In the basic method, on the contrary, all of the questions have equal weights in calculating the distance.

In order to do this, you need to supply a [`MatchingSpaceProjector`](./src/algorithms/matchingSpaceProjector.ts) when instantiating the algorithm.

### Process

The process of computing a match goes roughly as follows. (This is a simplified example that does not use subcategory matching or projection into a lower-dimensional space.)
The process of computing a match goes roughly as follows. (This is a simplified example that does not use subcategory matching, question weighting or projection into a lower-dimensional space.)

1. [`MatchingAlgorithm.match()`](./src/algorithms/matchingAlgorithm.ts) is called and passed an array of [`MatchableQuestion`s](./src/question/matchableQuestion.ts)[^1], a `Voter` and an array of `Candidates`, which we'll refer to as `Entities` below. Both of these must implement the [`HasMatchableAnswers`](./src/answer/hasMatchableAnswers.ts) interface.
2. The `Voter` is queried for the questions they have answered by calling the getter `Voter.answers` and the questions they have not answered are removed from matching. If no questions remain, an error will be thrown.
3. Before using the actual answers, we create the [`MatchingSpace`](./src/space/matchingSpace.ts) in which we position the `Entities`. Usually this is simply a space with `N` dimensions of equal weight, where `N` is the number of questions.[^2]
4. To place them in the space, we iterate over each `Entity` and over each [`MatchableQuestion`](./question/mathcableQuestion.ts).
- The position is represented by an `N`-length array of numbers ([`MatchingSpacePosition`](./src/space/position.ts)).
- We build that get by getting the `Entity`'s answer to each question by querying their `answers` record with the question id.
- The values we have are still of an unknown type, so we pass them to the question's [`normalizeValue()`](./src/question/matchableQuestion.ts) method, which returns either a [`MatchingSpaceCoordinate`](./src/space/matchingSpace.ts) (`number` in range 0–1 or `undefined`).[^3]
5. Now that the `Entities` are positioned in a normalized space, we can finally calculate the matches, that is, the normalized distances of the `Candidates` to the `Voter`.
6. To do that, we iterate over each `Entity` and calculate their distance to the `Voter` in each dimension of the [`MatchingSpace`](./src/space/matchingSpace.ts), sum these up and take their average.[^4]
- Recall that some coordinates of the `Candidates`' positions may be `undefined`.[^5] To calculate distances between these and the `Voter`'s respective coordinates, we need to impute a value in place of the missing one.[^6]
7. Finally, we use the distances to create a [`Match`](./src/match/match.ts) object for each of the `Candidates`, which includes a reference to the `Candidate` and the distance. The [`Matches`](./src/match/match.ts) also contain getters for presenting the distance as a percentage score etc.[^7]
8. The [`Matches`](./src/match/match.ts) are returned and may be passed on to UI components. They are ordered by ascending match distance.
1. [`MatchingAlgorithm.match()`](./src/algorithms/matchingAlgorithm.ts) is called and passed an array of [`MatchableQuestion`](/shared/src/matching/matchableQuestion.type.ts)[^1], a `reference` (usually the voter) and a `targets` array (usually the candidates or parties), which well refer to as entities below. All of these must implement the [`HasAnswers`](/shared/src/matching/hasAnswers.type.ts) interface. [Options](./src/algorithms/matchingAlgorithm.type.ts) may also be provided.
2. The `reference` is queried for the questions they have answered by calling the [`answers`](/shared/src/matching/hasAnswers.type.ts) getter. The questions they have not answered are removed from matching. If no questions remain, an error will be thrown.
3. Before using the actual answers, we create the [`MatchingSpace`](./src/space/matchingSpace.ts) in which we position the entities. Usually this is simply a space with `N` dimensions of equal weight, where `N` is the number of questions.[^2]
4. To place them in the space, we iterate over each entity and over each question.
- The position is represented by an `N`-length array of numbers ([`Position`](./src/space/position.ts)) with possible subdimensions.
- We build that get by getting the entity’s answer to each question by querying their [`answers`](/shared/src/matching/hasAnswers.type.ts) record with the question [id](/shared/src/matching/id.type.ts).
- The values we have are still of an unknown type, so we pass them to the questions [`normalizeValue(value: unknown)`](/shared/src/matching/matchableQuestion.ts) method, which returns either a [`CoordinateOrMissing`](/shared/src/matching/distance.type.ts) (`number` or `undefined`) or an array of these.[^3]
5. Now that the entities are positioned in a normalized space, we can finally calculate the matches, that is, the normalized distances of the `targets` to the `reference`.
6. To do that, we iterate over each `target` and calculate their distance to the `reference` in each dimension of the [`MatchingSpace`](./src/space/matchingSpace.ts), sum these up and normalise.[^4]
- Recall that some coordinates of the `targets` positions may be `undefined`. To calculate distances between these and the `reference`s respective coordinates, we need to impute a value in place of the missing one.[^5]
7. Finally, we use the distances to create a [`Match`](./src/match/match.ts) object for each of the `targets`, which includes a reference to the `target` entity and the distance. The [`Match`es](./src/match/match.ts) also contain getters for presenting the distance as, e.g., a percentage score.[^6]
8. The [`Match`es](./src/match/match.ts) are returned and may be passed on to UI components. They are ordered by ascending match distance.

[^1]: The questions have to implement a [`normalizeValue()`](./src/question/matchableQuestion.ts) method, because otherwise we don't know how to compare the answer values, whose type is unspecified, to each other.
## Future developments

[^2]: Some question types, such as ranked preference questions, create multiple subdimensions with a total weight of 1.
1. Distances are now measured using combinatios of [`kernels`](./src/distance/metric.ts) and other helper functions. It seems likely that we could simplify all of these to simple distance measurements in a vector space.
2. Provide an example implementation of a ranked preference question, which involves creating `f(n) / (2 * f(n-2))` subdimensions where `n` is the number of options and `f(•)` the factorial to map each pairwise preference.
3. Implement Manhattan-directional hybrid and Mahalanobis distance metrics.

[^3]: Actually, [`normalizeValue()`](./src/question/matchableQuestion.ts) can also return an array of numbers if the question is such that creates multiple subdimensions.
[^1]: The questions have to implement a [`normalizeValue()`](/shared/src/matching/matchableQuestion.type.ts) method, because otherwise we don’t know how to compare the answer values, whose type is unspecified, to each other.

[^2]: Some question types, such as ranked preference questions, create multiple subdimensions with a total weight of 1.

[^4]: To be precise, it's the average weighted by the weights of the dimensions, which may differ from 1 in case of questions creating subdimensions.
[^3]: [`normalizeValue()`](/shared/src/matching/matchableQuestion.ts) returns an array of numbers if the question is one that creates multiple subdimensions, such as a [`CategoricalQuestion`](./src/question/categoricalQuestion.ts)

[^5]: The `Voter`'s coordinates cannot be `undefined`.
[^4]: To be precise, it’s the average weighted by the weights of the dimensions, which may differ from 1 in case of questions creating subdimensions.

[^6]: There are several methods to this, which are defined in the constructor options to the [`MatchingAlgorithm`](./src/algorithms/matchingAlgorithm.ts).
[^5]: There are several methods to this, which are defined in the constructor options to the [`MatchingAlgorithm`](./src/algorithms/matchingAlgorithm.ts).

[^7]: And in case of subcategory matching, they also contain [`subMatches`](./src/match/subMatch.ts) for each category.
[^6]: And in case of subcategory matching, they also contain [`subMatches`](./src/match/subMatch.ts) for each category.
29 changes: 14 additions & 15 deletions frontend/src/lib/voter/vaa-matching/examples/example.ts
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
import type {AnswerDict, HasAnswers, MatchableQuestion} from 'vaa-shared';
import {
DistanceMetric,
DISTANCE_METRIC,
MatchingAlgorithm,
MissingValueDistanceMethod,
MultipleChoiceQuestion,
type HasMatchableAnswers,
type MatchableQuestion,
MISSING_VALUE_METHOD,
OrdinalQuestion,
type MatchableQuestionGroup,
type MatchingOptions
} from '..';
import type {AnswerDict} from '../src/entity/hasMatchableAnswers';

/**
* Simple example.
Expand All @@ -25,7 +23,7 @@ function main(
): void {
// Create dummy questions
const questions = Array.from({length: numQuestions}, (i: number) =>
MultipleChoiceQuestion.fromLikert(`q${i}`, likertScale)
OrdinalQuestion.fromLikert({id: `q${i}`, scale: likertScale})
);

// Create answer subgroup
Expand Down Expand Up @@ -54,23 +52,24 @@ function main(

// Manhattan matching algorithm
const manhattan = new MatchingAlgorithm({
distanceMetric: DistanceMetric.Manhattan,
distanceMetric: DISTANCE_METRIC.Manhattan,
missingValueOptions: {
missingValueMethod: MissingValueDistanceMethod.AbsoluteMaximum
method: MISSING_VALUE_METHOD.RelativeMaximum
}
});

// Directional matching algorithm
const directional = new MatchingAlgorithm({
distanceMetric: DistanceMetric.Directional,
distanceMetric: DISTANCE_METRIC.Directional,
missingValueOptions: {
missingValueMethod: MissingValueDistanceMethod.AbsoluteMaximum
method: MISSING_VALUE_METHOD.RelativeMaximum
}
});

// Get matches for both methods
const manhattanMatches = manhattan.match(questions, voter, candidates, matchingOptions);
const directionalMatches = directional.match(questions, voter, candidates, matchingOptions);
const args = {questions, reference: voter, targets: candidates, options: matchingOptions};
const manhattanMatches = manhattan.match(args);
const directionalMatches = directional.match(args);

// Generate output string
let output = `Questions: ${numQuestions} • Likert scale ${likertScale}\n`;
Expand All @@ -93,7 +92,7 @@ function main(
* @returns Answer dict
*/
function createAnswers(
questions: MatchableQuestion[],
questions: Array<MatchableQuestion>,
answerValue: number | ((index: number) => number),
missing = 0
): AnswerDict {
Expand All @@ -110,7 +109,7 @@ function createAnswers(
/**
* A dummy candidate object for matching.
*/
class Candidate implements HasMatchableAnswers {
class Candidate implements HasAnswers {
constructor(
public readonly name: string,
public answers: AnswerDict
Expand Down
1 change: 0 additions & 1 deletion frontend/src/lib/voter/vaa-matching/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
*/

export * from './src/algorithms';
export * from './src/entity';
export * from './src/distance';
export * from './src/match';
export * from './src/missingValue';
Expand Down
Loading

0 comments on commit 56dfdf7

Please sign in to comment.