Converting the same wav file twice two 16Khz results in different output. This is a problem when you use sox as part of a pipeline for a recognition task. For instance I'm working in speaker recognition and my results are not consistent when I use the same data.
I'm working on Mac OS X
How to reproduce using the attachment:
⇒ soxi 1_1_2_1.wav
Input File : '1_1_2_1.wav'
Channels : 1
Sample Rate : 8000
Precision : 16-bit
Duration : 00:00:06.84 = 54722 samples ~ 513.019 CDDA sectors
File Size : 109k
Bit Rate : 128k
Sample Encoding: 16-bit Signed Integer PCM
⇒ sox 1_1_2_1.wav -r 16000 -b 16 default.wav
⇒ sox 1_1_2_1.wav -r 16000 -b 16 default2.wav
⇒ diff default.wav default2.wav
Binary files default.wav and default2.wav differ
It seems that the problem occurs when you start from a 8Khz sampling rate and try to convert to 16KHz
That’s working as expected; the dither noise is (pseudo-)random, so will be different each time.
If you need bit-identical output, use
-R(or turn dithering of with-D, but audio quality will be slightly worse then).