Let's say I want to filter a sound file, from the beginning until the peak value, so I get the rest of the file, starting at the peak until the end. Normally,
sox "file.flac" "trimmed.flac" gain -n -1 silence 1 1s -1d
Should work, for all files, based on what's happening (normalize the file so the peak is -1d, then remove "silence" where silence is everything until the first sample that is -1dB). Doing this currently produces an empty audio file, the silence threshold is never met.
One could think that it's a precision issue, and that using a value slightly more forgiving for threshold would work. Except, this isn't the case.
Look at this batch of commands, which reproduces the issue in my laptop:
> sox -r48000 -n -b16 -c1 hum.flac synth 0:30t sin 432 vol -1dB fade h 14 0:30t 14 && soxi -d hum.flac
00:00:30.00
This creates a sound file of 30seconds, with peak volume of -1db around the middle (because fade-in and fade-out times are 14 seconds each).
> sox -V1 hum.flac trimmed.flac gain -n -1 silence 1 1s -1d && soxi -d trimmed.flac
00:00:00.00
> sox -V1 hum.flac trimmed.flac gain -n -1 silence 1 1s -3.95d && soxi -d trimmed.flac
00:00:16.07
All threshold values between -1d and -3.94d (using two decimals precision) return me an empty audio file. That's.... a huge difference from what it should be. And it' worse when I try to do this with my actual audio files, in my music library (where I am using the aforementioned commands to locate where in the file the peak is), I've had to use as threshold -12.61d to get a non-empty file in some cases.... which is just... wow, why? There's definitely something off with the handling of the threshold parameter of the silence filter.
Splitting the command up, and just using -V -V for the silence filter part, I get:
> sox "hum.flac" "gain.flac" gain -n -1
> sox -V -V "gain.flac" "trimmed.flac" silence 1 1s -2d
sox: SoX v14.4.2
time: Oct 28 2023 17:52:23
issue: Gentoo
uname: Linux Zarielle 6.1.57-gentoo #1 SMP PREEMPT_DYNAMIC Sun Oct 22 14:06:17 -05 2023 x86_64
compiler: gcc 13.2.1 20230826
arch: 1288 48 88 L OMP
sox INFO formats: detected file format type `flac'
sox DBUG flac: API version 13
Input File : 'gain.flac'
Channels : 1
Sample Rate : 48000
Precision : 16-bit
Duration : 00:00:30.00 = 1440000 samples ~ 2250 CDDA sectors
File Size : 639k
Bit Rate : 170k
Sample Encoding: 16-bit FLAC
Endian Type : little
Reverse Nibbles: no
Reverse Bits : no
Comment : 'Comment=Processed by SoX'
sox INFO sox: Overwriting `trimmed.flac'
sox INFO flac: encoding at 16 bits per sample
Output File : 'trimmed.flac'
Channels : 1
Sample Rate : 48000
Precision : 16-bit
Sample Encoding: 16-bit FLAC
Endian Type : little
Reverse Nibbles: no
Reverse Bits : no
Comment : 'Comment=Processed by SoX'
sox INFO sox: effects chain: input 48000Hz 1 channels (multi) 16 bits 00:00:30.00
sox INFO sox: effects chain: silence 48000Hz 1 channels (multi) 16 bits unknown length
sox INFO sox: effects chain: output 48000Hz 1 channels (multi) 16 bits unknown length
sox DBUG sox: start-up time = 0.000651
Yes, silence has other problems too, like trimming from 0.02 seconds after silence starts and coughing out a random selection of garply after the trimmed audio (the second fixed in sox_ng)
Thanks for the exhaustive analysis.
https://codeberg.org/sox_ng/sox_ng/issues/395