Metrics for Speech Translation (M4ST)

Evaluation of metrics for Speech Translation

Installation

From source:

git clone https://github.com/alan-turing-institute/ARC-M4ST
cd ARC-M4ST
python -m pip install .

Usage

The code used to generate the experiment results for the main experiments with the WMT, DEMETR, and CallHome datasets is in the scripts directory, which all have readmes describing their data sources and main scripts. It also contains a 4th exploratory experiment looking at the potential impact of transcription errors.

Metric classes for ChrF, BLEU, COMET, MetricX, and BLASER-2.0 are defined in src/m4st/metrics. All of them provide a get_scores method which takes an instance of a TranslationDataset, which takes lists of predictions, and optionally sources, references, and languages (see the docstring of the TranslationDataset class).

Compiling notes

brew install typst

typst compile notes.typ

Ollama

To use the Ollama client, which is one way to corrupt sentences randomly, you need to install https://ollama.com, and run the server.

License

Distributed under the terms of the MIT license. Some of the datasets used and metric code use different permissive licenses - see the relevant files/their documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
.github/workflows		.github/workflows
configs/demetr		configs/demetr
doc		doc
notebooks		notebooks
scripts		scripts
src/m4st		src/m4st
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Metrics for Speech Translation (M4ST)

Installation

Usage

Compiling notes

Ollama

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

alan-turing-institute/ARC-m4st

Folders and files

Latest commit

History

Repository files navigation

Metrics for Speech Translation (M4ST)

Installation

Usage

Compiling notes

Ollama

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages