Listening Tests for "Multi-Aspect Conditioning for Diffusion-Based Multi-Instrument Music Synthesis"

1International Audio Laboratories Erlangen, Germany
2Tel Aviv University
We provide here both of the listening tests described in the paper, one to asses realism, and one to asses version similarity.

Listening Test 1 - Realism

We provide the first listening test for realism, which was conducted according to the MUSHRA protocol, using the webMUSHRA implementation by Schoeffler et al. (see paper). We synthesize the same score excerpt using varius methods and ask the user to rate realism. We also provide a reference sample from a real musical performance of the same score excerpt.
The following were compared:

Question 1

Real
Vocoded
GM
Fluid

Hawth.
Uncond.
Cond.


Question 2

Real
Vocoded
GM
Fluid

Hawth.
Uncond.
Cond.


Question 3

Real
Vocoded
GM
Fluid

Hawth.
Uncond.
Cond.


Question 4

Real
Vocoded
GM
Fluid

Hawth.
Uncond.
Cond.


Question 5

Real
Vocoded
GM
Fluid

Hawth.
Uncond.
Cond.


Question 6

Real
Vocoded
GM
Fluid

Hawth.
Uncond.
Cond.


Question 7

Real
Vocoded
GM
Fluid

Hawth.
Uncond.
Cond.


Question 8

Real
Vocoded
GM
Fluid

Hawth.
Uncond.
Cond.


Question 9

Real
Vocoded
GM
Fluid

Hawth.
Uncond.
Cond.


Question 10

Real
Vocoded
GM
Fluid

Hawth.
Uncond.
Cond.


Listening Test 2 - Similarity

We provide the second listening test for version similarity. The goal of this test is to measure the effectiveness of version conditioning in obtaining perceptual characteristics of the reference version, including acoustics, timbre, and style.
We randomly choose a reference version, and provide the listener with an audio excerpt from the corresponding recording. We then use our model to synthesize each score excerpt with three different version conditions, of the same instrumentation, one of which is the reference version, and the other two randomly sampled.
We request the listener to rate the similarity of each synthesized score excerpt to the reference audio excerpt.

Question 1

Reference

Synthesized, version 1 (Reference)
Synthesized, version 2
Synthesized, version 3


Question 2

Reference

Synthesized, version 1
Synthesized, version 2 (Reference)
Synthesized, version 3


Question 3

Reference

Synthesized, version 1
Synthesized, version 2 (Reference)
Synthesized, version 3


Question 4

Reference

Synthesized, version 1 (Reference)
Synthesized, version 2
Synthesized, version 3


Question 5

Reference

Synthesized, version 1
Synthesized, version 2
Synthesized, version 3 (Reference)


Question 6

Reference

Synthesized, version 1 (Reference)
Synthesized, version 2
Synthesized, version 3


Question 7

Reference

Synthesized, version 1
Synthesized, version 2
Synthesized, version 3 (Reference)


Question 8

Reference

Synthesized, version 1 (Reference)
Synthesized, version 2
Synthesized, version 3


Question 9

Reference

Synthesized, version 1
Synthesized, version 2 (Reference)
Synthesized, version 3


Question 10

Reference

Synthesized, version 1
Synthesized, version 2 (Reference)
Synthesized, version 3