When using the '-l' option with the 'silence' effect, some parts of the recording get repeated at the end. This repeating period can last as long as a second, and it seems to include random fragments from the recording, ending with the very end of it being played again.
I'm working with recordings of speech, and this command reliably reproduces the bug in every recording I looked at:
play recording.wav silence -l 1 0.2 0.1% -1 1.0 3%
Here's the output with '-V -V' added to options:
play DBUG formats: opening format plugin 'lsx_alsa_format_fn': library 0x5ff7ddc83be0, entry point 0x716ee9e6e960
play DBUG formats: opening format plugin 'lsx_amr_nb_format_fn': library 0x5ff7ddc856e0, entry point 0x716ee9e67ae0
play DBUG formats: opening format plugin 'lsx_amr_wb_format_fn': library 0x5ff7ddc86d80, entry point 0x716ee9e62730
play DBUG formats: opening format plugin 'lsx_caf_format_fn': library 0x5ff7ddc87b10, entry point 0x716ee9c37550
play DBUG formats: opening format plugin 'lsx_fap_format_fn': library 0x5ff7ddc8a8c0, entry point 0x716ee9987550
play DBUG formats: opening format plugin 'lsx_flac_format_fn': library 0x5ff7ddc8b060, entry point 0x716ee9980570
play DBUG formats: opening format plugin 'lsx_gsm_format_fn': library 0x5ff7ddc8b8a0, entry point 0x716ee9979820
play DBUG formats: opening format plugin 'lsx_lpc10_format_fn': library 0x5ff7ddc8c390, entry point 0x716ee963f6f0
play DBUG formats: opening format plugin 'lsx_mat4_format_fn': library 0x5ff7ddc8caf0, entry point 0x716ee963a550
play DBUG formats: opening format plugin 'lsx_mat5_format_fn': library 0x5ff7ddc8d420, entry point 0x716ee9633550
play DBUG formats: opening format plugin 'lsx_paf_format_fn': library 0x5ff7ddc8dc10, entry point 0x716ee962c550
play DBUG formats: opening format plugin 'lsx_pvf_format_fn': library 0x5ff7ddc8e3f0, entry point 0x716ee9625550
play DBUG formats: opening format plugin 'lsx_sd2_format_fn': library 0x5ff7ddc8ecf0, entry point 0x716ee961e550
play DBUG formats: opening format plugin 'lsx_sndfile_format_fn': library 0x5ff7ddc8f710, entry point 0x716ee9617540
play DBUG formats: opening format plugin 'lsx_vorbis_format_fn': library 0x5ff7ddc8ff00, entry point 0x716ee9610110
play DBUG formats: opening format plugin 'lsx_w64_format_fn': library 0x5ff7ddc90c70, entry point 0x716ee9609550
play DBUG formats: opening format plugin 'lsx_wavpack_format_fn': library 0x5ff7ddc91450, entry point 0x716ee9602c30
play DBUG formats: opening format plugin 'lsx_xi_format_fn': library 0x5ff7ddc92200, entry point 0x716ee95fd550
play DBUG sox: Looking for a default device: trying format 'alsa'
play WARN alsa: can't encode 0-bit Unknown or not applicable
play DBUG alsa: selecting format 1: U8 (Unsigned 8 bit)
play: SoX v14.4.2
time: Sep 5 2023 16:21:31
issue: Ubuntu
uname:
compiler: gcc 11.4.0
arch: 1288 48 88 L OMP
play INFO formats: detected file format type 'wav'
play DBUG wav: Searching for 66 6d 74 20
play DBUG wav: WAV Chunk fmt
play DBUG wav: Searching for 64 61 74 61
play DBUG wav: WAV Chunk data
play DBUG wav: Reading Wave file: Microsoft PCM format, 1 channel, 16000 samp/sec
play DBUG wav: 32000 byte/sec, 2 block align, 16 bits/samp, 176640 data bytes
play DBUG wav: 88320 Samps/chans
play DBUG wav: Searching for 4c 49 53 54
Input File : 'recording.wav'
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:05.52 = 88320 samples ~ 414 CDDA sectors
File Size : 177k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM
Endian Type : little
Reverse Nibbles: no
Reverse Bits : no
play DBUG alsa: selecting format 2: S16_LE (Signed 16 bit Little Endian)
Output File : 'default' (alsa)
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Sample Encoding: 16-bit Signed Integer PCM
Endian Type : little
Reverse Nibbles: no
Reverse Bits : no
play INFO sox: effects chain: input 16000Hz 1 channels (multi) 16 bits 00:00:05.52
play INFO sox: effects chain: silence 16000Hz 1 channels (multi) 16 bits unknown length
play INFO sox: effects chain: output 16000Hz 1 channels (multi) 16 bits unknown length
play DBUG sox: automatically entering interactive mode
play DBUG sox: start-up time = 0.284572
In:100% 00:00:05.52 [00:00:00.00] Out:86.1k [ | ] Hd:4.6 Clip:0
Done.
https://codeberg.org/sox_ng/sox_ng/issues/258
Note that the 'duration' of the silence effect defaults to meaning "in samples", not "in seconds".
From what you say it sounds like there is a problem, but I wanted to make sure you're aware of checking for silences of one sample. For seconds you need to say 1s and -1s