Content Based Singing Voice Extraction From a Musical Mixture

Audio Examples of paper: Content Based Singing Voice Extraction From a Musical Mixture (ICASSP 2020)

Pritish Chandna, Merlijn Blaauw, Jordi Bonada, Emilia Gómez

Music Technology Group, Universitat Pompeu Fabra, Barcelona

Examples for unison choir singing. The songs and singers singers shown here were not used for the training.

	Unison Mixture	Single voice synthesized by SCM model
Example 1 (2 singers in unison)

Example 2 (3 singers in unison)

Example 3 (3 singers in unison)

Example 4 (4 singers in unison)

Examples for vocal effects removal, on real world examples.

	Vocals with Effects	Clean Voice synthesized With SCM Model
Example 1 (Reverb)

Example 2 (Growling)

Examples from proprietary dataset. The songs and singers singers shown here were not used for the training.

	Mixture of vocals and backing track	Vocals output by Wave-U-Net[1]	Vocals Synthesized Using the SCS model, with the target singer set to one of the singers in the training set with the same gender.	Vocals Synthesized Using the SCM model	Vocals Synthesized Using the SS[2] model	Vocals Synthesized Using the UNET[3]model
Example 1

Example 2

Example 3

Example 4

Example 5

Example 6

Examples from MedleyDB [4]. The songs and singers singers shown here were not used for the training.

	Mixture of vocals and backing track	Original Vocals, Re-synthesized using WORLD vocoder	Vocals Synthesized Using the SCS model, with the target singer set to one of the singers in the training set with the same gender.	Vocals Synthesized Using the SCM model	Vocals Synthesized Using the SS model	Vocals Synthesized Using the UNET model
Example 1

Example 2

Example 3

Example 4

[1] Stoller, Daniel, Sebastian Ewert, and Simon Dixon. "Wave-u-net: A multi-scale neural network for end-to-end audio source separation." arXiv preprint arXiv:1806.03185 (2018).

[2] Jansson, Andreas, et al. "Singing voice separation with deep U-Net convolutional networks." (2017).

[3] Chandna, Pritish, et al. "A Vocoder Based Method for Singing Voice Extraction." ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019.

[4] Bittner, Rachel M., et al. "Medleydb: A multitrack dataset for annotation-intensive mir research." ISMIR. Vol. 14. 2014.