audio-embedding

A simple Python script for extracting audio embeddings.

Requirements

PyTorch
Transformers
fairseq

Model

Download the provided model and put into ./model folder.

Installation

pip install -r requirements.txt

Usage

Here's an example of how you can use audio_embeddings:

python audio_embedding.py -i demo/sample_audio.wav -o outputs/short.npy -b 1280 -f 16000

usage: audio_embedding.py [-h] [-i INPUT] [-o OUTPUT] [-b BLOCK] [-f FREQ]

Image caption CLI

optional arguments:
  -h, --help                        show this help message and exit
  -i INPUT, --input INPUT           Input directory path, such as ./sample.wav)
  -o OUTPUT, --output OUTPUT        Output directory, such as output.csv
  -b BLOCK, --block BLOCK           Block length
  -f FREQ, --freq FREQ              Audio file frequency

Functional way

from audio_embedding import extract_embeddings
from model_engine import get_model, get_processor
from utils import concat_and_rescale, save_embeddings

import pandas as pd
import numpy as np
import uuid

AUDIO_PATH = r"./demo/sample_audio.wav"
OUTPUT_PATH = f"./outputs/embedding_{uuid.uuid4()}"

BLOCK_LENGTH = 1280
TARGET_SR = 16000

model = get_model()
processor = get_processor()

# Extract Embeddings
raw_embeddings = extract_embeddings(
    audio_path=AUDIO_PATH,
    model=model,
    processor=processor,
    block_length=BLOCK_LENGTH,
    target_sr=TARGET_SR,
)

# Embedding post-processing
embeddings = concat_and_rescale(raw_embeddings)
print(embeddings.shape)

# Save Embeddings
save_embeddings(OUTPUT_PATH, embeddings)

To-Do

Clipping silence
Model downloader
Drop duplicate rows and columns

License

This project is licensed under the Apache Licence 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
demo		demo
model		model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
audio_embedding.py		audio_embedding.py
demo.ipynb		demo.ipynb
demo.py		demo.py
model_engine.py		model_engine.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

audio-embedding

Requirements

Model

Installation

Usage

Functional way

To-Do

License

About

Uh oh!

Uh oh!

Languages

License

cobanov/audio-embedding

Folders and files

Latest commit

History

Repository files navigation

audio-embedding

Requirements

Model

Installation

Usage

Functional way

To-Do

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages