Unsupervised Single-Channel Audio Separation
with Diffusion Source Priors

Runwu Shi 1, Chang Li 2, Jiang Wang 1, Rui Zhang3, Nabeela Khan1,
Benjamin Yen1, Takeshi Ashizawa1, Kazuhiro Nakadai 1

1Department of Systems and Control Engineering, Institute of Science Tokyo
2University of Science and Technology of China, 3University of Hong Kong

Task 1: 1 Speech + 1 Sound Event Separation

Audio Example 1

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 2

Audio Example 2

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 2

Audio Example 3

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 2

Audio Example 4

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 2

Audio Example 5

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 2

Task 1: 1 Speech + 2 Sound Event Separation

Audio Example 1

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 3
Separation Waveform 3
Separation Waveform 2

Audio Example 2

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 3
Separation Waveform 3
Separation Waveform 2

Audio Example 3

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 3
Separation Waveform 3
Separation Waveform 2

Audio Example 4

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 3
Separation Waveform 3
Separation Waveform 2

Audio Example 5

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 3
Separation Waveform 3
Separation Waveform 2

Task 2: 2 Sound Event Separation

Audio Example 1

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 2

Audio Example 2

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 2

Audio Example 3

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 2

Audio Example 4

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 2

Audio Example 5

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 2

Task 2: 3 Sound Event Separation

Audio Example 1

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 3
Separation Waveform 3
Separation Waveform 2

Audio Example 2

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 3
Separation Waveform 3
Separation Waveform 2

Audio Example 3

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 3
Separation Waveform 3
Separation Waveform 2

Audio Example 4

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 3
Separation Waveform 3
Separation Waveform 2

Audio Example 5

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 3
Separation Waveform 3
Separation Waveform 2

Task 3: 2 Speech Separation

Audio Example 1

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 2

Audio Example 2

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 2

Audio Example 3

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 2

Audio Example 4

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 2

Audio Example 5

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 2

Failure Case 1 (Speaker Ambiguity)

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 2

Failure Case 2 (Speaker Ambiguity)

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 2

Failure Case 3 (Speaker Ambiguity)

Mixture

Separation Results

Ground Truth

Mixture Waveform 1
Separation Waveform 1
Ground Truth Waveform 1
Separation Waveform 2
Separation Waveform 2