Skip to content

Commit

Permalink
Publish new blog post, site improvements
Browse files Browse the repository at this point in the history
  • Loading branch information
sevagh committed Nov 24, 2023
1 parent e36107d commit 235d317
Show file tree
Hide file tree
Showing 21 changed files with 328 additions and 72 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
*.bin.gz
*.wav
!/test/data/*.wav
!/test/*.wav
*.json
!models/*.bin.gz
docs/_site
Expand Down
4 changes: 1 addition & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
# free-music-demixer

A free client-side static website for music demixing (aka music source separation) using the AI model Open-Unmix (with UMX-L weights):
<br>
<img src="docs/assets/images/music-demix.png" width="50%"/>
A [free static website](https://freemusicdemixer.com) for client-side music demixing (aka music source separation) using the AI model Open-Unmix (with UMX-L weights).

I transliterated the original PyTorch model Python code to C++ using Eigen. It compiles to WebAssembly with Emscripten. The UMX-L weights are quantized (mostly uint8, uint16 for the last 4 layers) and saved with the ggml binary file format. They are then gzipped. This reduces the 425 MB of UMX-L weights down to 45 MB, while achieving similar performance (verified empirically using BSS metrics).

Expand Down
3 changes: 2 additions & 1 deletion docs/_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ plugins:
favicon: assets/favicon.ico

# dont mess with SEO here
human_description: "AI-based music demixer running for free in your browser"
human_description: "Split songs, demix music, and separate stems with our AI-based tool: free, private, and unlimited use directly in your browser"
description: ""

url: "https://freemusicdemixer.com"
Expand All @@ -22,3 +22,4 @@ social:
- https://github.com/sevagh

google_analytics: 'G-B0BF5H94FM'
future: false
2 changes: 1 addition & 1 deletion docs/_layouts/default.html
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ <h2>{{ site.human_description | default: site.github.project_tagline }}</h2>
<a href="{{ '/about' | relative_url }}" class="btn btn-github">About</a>
<a href="{{ '/blog' | relative_url }}" class="btn btn-github">Blog</a>
<a href="{{ '/sponsors' | relative_url }}" class="btn btn-github">Sponsors</a>
<a href="{{ site.github.repository_url }}" class="btn btn-github"><span class="icon"></span>View on GitHub</a>
<a href="{{ site.github.repository_url }}" class="btn btn-github"><span class="icon"></span>Source code</a>
</section>
</div>
</header>
Expand Down
92 changes: 92 additions & 0 deletions docs/_posts/2023-11-24-Music-demixing-terminology.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
---
layout: post
title: "Music demixing, song splitting, stem separation: what's the difference?"
category: getting-started
tags: [stems, music-demixer, history, literature, academic]
header_class: post
description: "An overview of the different synonyms and terms for music demixing such as song splitting, stem separation, etc."
keywords: music demixing, song splitting, song splitter, stem separation
intro: "What's in a name? An analysis of the different names used to refer to the task of separating a mixed song into its isolated stems."
---

<h2>Table of contents</h2>
* Table of contents
{:toc}

{{ page.intro }}

## Who defined Music demixing?

I technically coined the term music demixing. OK, that's a heavy exaggeration. What happened is in 2021, I participated in the [Music Demixing AI Challenge 2021](https://www.aicrowd.com/challenges/music-demixing-challenge-ismir-2021), on which a subsequent [paper was published after the end of the challenge](https://arxiv.org/abs/2108.13559), describing the challenge and the winning systems.

I participated in the challenge as part of my [master's thesis](https://sevag.xyz/thesis), whose topic was on the time-frequency uncertainty principle and its role in music demixing.

I needed to write about music demixing in my thesis, and my advisor told me that I needed an academic citation that defined such a term before I could use it. The funny thing is neither the summary paper on MDX 21, nor the participant papers, nor the organizers, provided a formal, citable definition of the term "music demixing."

At the end of the day, I had to be the one! I included a definition of music demixing in [my submission to the challenge](https://github.com/sevagh/xumx-sliCQ/tree/v1#xumx-slicq):

>Music source separation is the task of extracting an estimate of one or more isolated sources or instruments (for example, drums or vocals) from musical audio. The task of music demixing or unmixing considers the case where the musical audio is separated into an estimate of all of its constituent sources that can be summed back to the original mixture.
I would say that the vast majority of my sources and references from the world of digital signal processing, engineering, and software conferences/publications (e.g. DAFx/Digital Audio Effects, ISMIR/Music Information Retrieval, ICASSP/IEEE Acoustic Speech Signal Processing), throughout my papers and thesis use the term **music source separation.**

In this post, I want to explore the different terms used for this task.

## Math/science/engineering: source separation

(**parts of this are copied verbatim from my thesis!** [which you can read in full if you're curious](https://escholarship.mcgill.ca/concern/theses/3197xr696))

Typical music recordings are mono or stereo mixtures, with multiple sound objects (drums, vocals, etc.) sharing the same track. To manipulate the individual sound objects, the stereo audio mixture needs to be separated into a track for each different sound source, in a process called audio source separation [[1]](#1).

The paper on the Music Demixing Challenge 2021 [[2]](#2) provides a summary of why the audio source separation problem has been interesting to researchers:
>Audio source separation has been studied extensively for decades as it brings benefits in our daily life, driven by many practical applications, e.g., hearing aids, speech diarization, etc. In particular, music source separation (MSS) attracts professional creators because it allows the remixing or reviving of songs to a level never achieved with conventional approaches such as equalizers. Suppressing vocals in songs can also improve the experience of a karaoke application, where people can enjoy singing together on top of the original song (where the vocals were suppressed), instead of relying on content developed specifically for karaoke applications
Computational source separation has a history of at least 50 years [[1]](#1), originating from the tasks of computational auditory scene analysis (CASA) and blind source separation (BSS). In CASA, the goal is to computationally extract individual streams from recordings of an acoustic scene [[3]](#3), based on the definition of ASA (auditory scene analysis) [[4]](#4). BSS [[5]](#5) solves a subproblem of CASA which aims to recover the sources of a "mixture of multiple, statistically independent sources that are received with separate sensors" [[3]](#3). The term "blind" refers to there being no prior knowledge of what the sources are, and how they were mixed together. <span class="highlight">In CASA and BSS, therefore, the mixed audio contains unknown sources combined in unknown ways that must be separated.</span>

By contrast, in music source separation and music demixing, the sources are typically known, or have known characteristics. That is to say, in music source separation, the task is not to separate all of the distinct sources in the mixture, but to extract a predefined set of sources, e.g.: harmonic and percussive sources, or the common four sources defined by the MUSDB18-HQ dataset [[6]](#6): vocals, drums, bass, and other. Music demixing can be considered as the reverse of a simple (no effects) mixing process of *stems* in a recording studio:
<img src="/assets/blog/post3/mixdemix.webp" width="65%" alt="mixing-demixing-diagram"/>

A stem is a grouping of individually recorded instrument tracks that have been combined together in a common category. For example, a drum stem could include all of the tracks of a drum kit (e.g., snare, tom, hihat), and a vocal stem could include all of the vocal tracks from the different singers in the song. [Izotope](https://www.izotope.com/en/learn/stems-and-multitracks-whats-the-difference.html) and [LANDR](https://blog.landr.com/stems-in-music/), two music tech companies, have written about stems and their history.

<span class="highlight">In this light we can see that music demixing is simply a combination of multiple music source separation subproblems for all of the desired target stems.</span>

## Music industry: stems and splitters

The theoretical underpinnings of modern AI and deep learning techniques were [beginning to be discovered by 1960](https://people.idsia.ch/~juergen/firstdeeplearner.html), but the computational power available was too low to take advantage of those ideas (nowadays this is inversed; the insane levels of compute power available in the world have led to huge and powerful AI models like ChatGPT)

Being neither a musician nor a music producer, or audio engineer, I can't speak with authority on the landscape and history of how people or products have approached stem isolation. All I know is that each time I talked about some new algorithm or piece of code I discovered with one of my musician friends, they'd always come back with "oh yeah I have an izotope plugin for that." <span class="highlight">Theory and practice are related but not strictly dependent on one another: real-world products can be created before there is a mathematical proof for how they work.</span>

Here's a nice story of the [journey of the HitnMix RipX DAW](https://hitnmix.com/2023/07/17/history-of-audio-separation/); they describe how they had been working in the space of commercial music separation offerings since 2001, when I was not yet 10 years old. [Another story from Wired](https://www.wired.com/story/upmixing-audio-recordings-artificial-intelligence/) discusses the industry and how various academics have over time created startups or products for practical uses in the music industry, such as salvaging or cleaning up old Beatles recordings.

However, when it comes to product offerings, the terminology ends up being different from the academic paper, by necessity since it's targeted for a different audience. Let's check some google search results:
* **Song splitters:** <https://vocalremover.org/splitter-ai>, <https://www.bandlab.com/splitter>, <https://www.lalal.ai/>, <https://voice.ai/tools/stem-splitter>, <https://splitter.ai/>, <https://songdonkey.ai/>, ...
* **Stem separators:** actually the same results as the above
* **Music demixers:** <https://freemusicdemixer.com>, <https://www.demixer.com/>, <https://demixor.com/>, <https://www.audioshake.ai/>
* **Music instrument isolator:** significant overlap with 'song splitters', and some more e.g. <https://vocalremover.org/>, <https://moises.ai/>, <https://www.jamorphosia.com/>

These websites and products are all operating in the same space as the academic research papers, with perhaps subtle differences in their outputs. Their customers are different, with papers on music source separation written for fellow academics, and commercial products for stem separation aimed at musicians and music producers.

Commercial offerings and products in the space include LALAL.ai, XTRAX Stems by Audionamix, RX10 by Izotope, Spleeter by Deezer, Moises.ai by Zynaptiq, Stem remover by Wavesfactory, Audioshake.ai, etc. So many choices! What's your favorite?

## Conclusion

This isn't comprehensive, but it gathers all of the synonyms and terms for music demixing that I've encountered over the years in one place. Hope this helps!

## References

<a id="1">[1]</a>
Rafii, Zafar, Antoine Liutkus, Fabian-Robert Stöter, Stylianos Ioannis Mimilakis, Derry Fitzgerald, and Bryan Pardo. 2018. “An overview of lead and accompaniment separation in music.” IEEE/ACM Transactions on Audio, Speech, and Language Processing; <https://arxiv.org/abs/1804.08300>

<a id="2">[2]</a>
Mitsufuji, Yuki, Giorgio Fabbro, Stefan Uhlich, and Fabian-Robert Stöter. 2021. “Music demixing challenge at ISMIR 2021.” arXiv preprint arXiv:2108.13559; <https://arxiv.org/abs/2108.13559>

<a id="3">[3]</a>
Wang, DeLiang, and Guy J. Brown. 2006. “Fundamentals of computational auditory scene analysis.” In Computational auditory scene analysis: Principles, algorithms, and applications, edited by DeLiang Wang and Guy J. Brown. Wiley-IEEE-Press.

<a id="4">[4]</a>
Bregman, Albert S. 1994. Auditory scene analysis: The perceptual organization of sound. MIT Press.

<a id="5">[5]</a>
Jutten, Christian, and Jeanny Hérault. 1991. “Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture.” Signal Processing 24 (1): 1–10.

<a id="6">[6]</a>
Rafii, Zafar, Antoine Liutkus, Fabian-Robert Stöter, Stylianos Ioannis Mimilakis, and Rachel Bittner. (2017) “The MUSDB18 corpus for music separation”; (2019) “MUSDB18-HQ: an uncompressed version of MUSDB18.”
Binary file added docs/assets/blog/post3/mixdemix.webp
Binary file not shown.
Binary file added docs/assets/clips/paranoid_jaxius_bass.mp3
Binary file not shown.
Binary file removed docs/assets/clips/paranoid_jaxius_bass.wav
Binary file not shown.
Binary file added docs/assets/clips/paranoid_jaxius_drums.mp3
Binary file not shown.
Binary file removed docs/assets/clips/paranoid_jaxius_drums.wav
Binary file not shown.
Binary file added docs/assets/clips/paranoid_jaxius_melody.mp3
Binary file not shown.
Binary file removed docs/assets/clips/paranoid_jaxius_melody.wav
Binary file not shown.
Binary file added docs/assets/clips/paranoid_jaxius_vocals.mp3
Binary file not shown.
Binary file removed docs/assets/clips/paranoid_jaxius_vocals.wav
Binary file not shown.
101 changes: 100 additions & 1 deletion docs/assets/css/style.scss
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,6 @@ div.mdx-container-batch {
padding: 2%;
overflow: auto; /* Will add a scrollbar if the content overflows */
z-index: 1;
opacity: 0.9;
background: url('../images/mixer_batch.webp') no-repeat center center/cover;
}

Expand Down Expand Up @@ -519,3 +518,103 @@ div.tag-cloud {
//font-weight: bold; /* for a bold appearance like headings */
//font-size: 1.2em; /* adjust the size if needed */
}

// spinner for loading weights

.overlay {
position: absolute; /* Changed to absolute within the container */
display: flex;
align-items: center;
justify-content: center;
width: 100%;
height: 100%;
top: 0;
left: 0;
background-color: rgba(0,0,0,0.9); /* semi-transparent overlay */
z-index: 2;
cursor: pointer;
}

.loader {
display: none;
position: absolute;
top: 50%;
left: 46.5%;
border: 5px solid #f3f3f3;
border-radius: 50%;
border-top: 5px solid #3498db;
width: 50px;
height: 50px;
-webkit-animation: spin 2s linear infinite; /* Safari */
animation: spin 2s linear infinite;
}

/* Safari */
@-webkit-keyframes spin {
0% { -webkit-transform: rotate(0deg); }
100% { -webkit-transform: rotate(360deg); }
}

@keyframes spin {
0% { transform: rotate(0deg); }
100% { transform: rotate(360deg); }
}

#load-weights, #load-weights-2 {
position: relative; /* Needed to use z-index */
z-index: 3; /* Higher than the overlay's z-index */
}

#load-weights, #load-weights-2 {
font-size: 20px; /* Larger font size for better visibility */
padding: 16px 24px; /* Adjust padding to maintain visual balance with larger font */
margin-bottom: 16px; /* Extra space below the button */
cursor: pointer; /* Cursor indicates it's clickable */
color: #94ffff; /* White text color for better contrast */
background-color: #c71585; /* Add a background color to make it stand out (Tomato color for example) */
font-family: "Courier New", monospace;
border: none; /* Remove border for a modern look */
border-radius: 5px; /* Rounded corners */
text-transform: uppercase; /* Uppercase text for emphasis */
font-weight: bold; /* Bold font weight */
transition: background-color 0.3s; /* Smooth background color transition for hover effect */
}
//color: #94ffff; /* Light Cyan */
//background: #c71585; /* Medium Violet Red */

#load-weights:hover, #load-weights-2:hover {
background-color: #8b008b; /* Slightly darker background on hover */
}

#load-weights:disabled, #load-weights-2:disabled {
background-color: #cccccc; /* Greyed out background for disabled state */
color: #b7b7b7; // darker grey for the text
cursor: not-allowed; /* 'Not allowed' cursor for disabled button */
}

.centered-text {
text-align: center;
margin-bottom: 10px; /* Adjust this value as needed for spacing */
}

.centered-text p {
margin-top: 5px; /* Adjust top margin */
margin-bottom: 5px; /* Adjust bottom margin */
/* Other styles, if needed */
}

blockquote {
color: #8b008b; /* Dark Magenta */
border-left: 3px solid #8b008b; /* Optional: adds a left border to your blockquote */
margin: 10px 20px; /* Adjusts top and bottom margin */
padding-left: 15px; /* Adds padding to the left of the blockquote */
font-style: italic; /* Optional: italicizes the quote text */
}

.highlight {
background-color: #e1f77e; /* Light yellow background */
color: #333; /* Darker text color for contrast */
padding: 2px 5px; /* Padding around the text */
border-radius: 3px; /* Optional: rounded corners */
font-size: 18px;
}
Binary file modified docs/assets/images/mixer_batch.webp
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/blog.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
header_class: blog
description: Freemusicdemixer's blog, with content on music demixing, stem separation, source separation, neural networks, C++, webassembly, and more
keywords: music demixing, AI model, Open-Unmix, UMX-L, free music demixer, stem separation, free stem separation, stems, music demixer, isolate stems, isolate sources
keywords: music demixing, stem separation, song splitting, AI model, Open-Unmix, UMX-L, free music demixer, isolate stems, private, unlimited use
---

# Blog
Expand Down
Loading

0 comments on commit 235d317

Please sign in to comment.