Audio Examples of paper: Content Based Singing Voice Extraction From a Musical Mixture (ICASSP 2020)

Pritish Chandna, Merlijn Blaauw, Jordi Bonada, Emilia Gómez

Music Technology Group, Universitat Pompeu Fabra, Barcelona

Examples for unison choir singing. The songs and singers singers shown here were not used for the training.

Unison Mixture Single voice synthesized by SCM model
Example 1 (2 singers in unison)
Example 2 (3 singers in unison)
Example 3 (3 singers in unison)
Example 4 (4 singers in unison)

Examples for vocal effects removal, on real world examples.

Vocals with Effects Clean Voice synthesized With SCM Model
Example 1 (Reverb)
Example 2 (Growling)

Examples from proprietary dataset. The songs and singers singers shown here were not used for the training.

Mixture of vocals and backing track Vocals output by Wave-U-Net[1] Vocals Synthesized Using the SCS model, with the target singer set to one of the singers in the training set with the same gender. Vocals Synthesized Using the SCM model Vocals Synthesized Using the SS[2] model Vocals Synthesized Using the UNET[3]model
Example 1
Example 2
Example 3
Example 4
Example 5
Example 6

Examples from MedleyDB [4]. The songs and singers singers shown here were not used for the training.

Mixture of vocals and backing track Original Vocals, Re-synthesized using WORLD vocoder Vocals Synthesized Using the SCS model, with the target singer set to one of the singers in the training set with the same gender. Vocals Synthesized Using the SCM model Vocals Synthesized Using the SS model Vocals Synthesized Using the UNET model
Example 1
Example 2
Example 3
Example 4

[1] Stoller, Daniel, Sebastian Ewert, and Simon Dixon. "Wave-u-net: A multi-scale neural network for end-to-end audio source separation." arXiv preprint arXiv:1806.03185 (2018).

[2] Jansson, Andreas, et al. "Singing voice separation with deep U-Net convolutional networks." (2017).

[3] Chandna, Pritish, et al. "A Vocoder Based Method for Singing Voice Extraction." ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019.

[4] Bittner, Rachel M., et al. "Medleydb: A multitrack dataset for annotation-intensive mir research." ISMIR. Vol. 14. 2014.