-
Notifications
You must be signed in to change notification settings - Fork 37
info.md
Hou-Ning Hu edited this page Mar 26, 2017
·
2 revisions
Because librosa
use FFMPEG
backend while torch audio
use SoX
backend, there is a small difference in reading files with a start offset.
So, convert all mp3
with following command will do the trick.
sox input.mp3 output.mp3 trim 0
Since FFMPEG
seems counting start
in, this will make librosa not able to tell the start point we want, and a small shift caused a large difference in amplitude. Convert them will be a temporal cure, but if we really look into the fine detail we will find that there is still a gap of values between the librosa and lua audio version.
Input #0, mp3, from 'fireworks_shift_1274_6.mp3':
Metadata:
encoder : Lavf56.4.101
Duration: 00:00:04.08, start: 0.050113, bitrate: 64 kb/s
Stream #0:0: Audio: mp3, 22050 Hz, stereo, s16p, 64 kb/s
S**tty result.
rosa.max=0.999969482422, rosa.min=-1.0, lua.max=1.0, lua.min=-1.0
Path data/fireworks_shift_1274_6.mp3: rosa.shape=(89856,), lua.shape=(89854,)
Librosa:
[ 0. 0. 0. ..., 0.19934082 0.22732544
0.21987915]
Torch:
[ 0. 0. 0. ..., 0.04422661 0.04547431
0.16926192]
Round to 4 decimals
Total Diff: 1865.41
Avg Diff: 0.0207604
Max Diff: 0.3136
Min Diff: -0.3041
Input #0, mp3, from 'fire.mp3':
Metadata:
encoder : Lavf57.40.100
Duration: 00:00:04.10, start: 0.050113, bitrate: 161 kb/s
Stream #0:0: Audio: mp3, 22050 Hz, stereo, s16p, 160 kb/s
Seems helped.
rosa.max=0.999969482422, rosa.min=-1.0, lua.max=0.956068873405, lua.min=-1.0
Path data/fire.mp3: rosa.shape=(90432,), lua.shape=(90432,)
Librosa:
[ 0. 0. 0. ..., -0.00106812 -0.0007019
0.00125122]
Torch:
[ 0. 0. 0. ..., -0.00245564 -0.00034991
0.00209353]
Round to 4 decimals
Total Diff: 1780.92
Avg Diff: 0.0196935
Max Diff: 0.2994
Min Diff: -0.2904
Input #0, mp3, from 'fire2.mp3':
Metadata:
encoder : Lavf57.40.100
Duration: 00:00:04.10, start: 0.050113, bitrate: 64 kb/s
Stream #0:0: Audio: mp3, 22050 Hz, mono, s16p, 64 kb/s
Oh, mono mode is closer.
rosa.max=0.960571289062, rosa.min=-1.0, lua.max=0.943222105503, lua.min=-0.981544613838
Path data/fire2.mp3: rosa.shape=(90432,), lua.shape=(90432,)
Librosa:
[ 0. 0. 0. ..., 0.00140381 0.00143433
0.0027771 ]
Torch:
[ 0. 0. 0. ..., 0.00152187 0.00204326
0.00288566]
Round to 4 decimals
Total Diff: 1671.49
Avg Diff: 0.0184834
Max Diff: 0.2857
Min Diff: -0.2775
Input #0, mp3, from 'fire3.mp3':
Metadata:
encoder : Lavf57.40.100
Duration: 00:00:04.08, start: 0.025057, bitrate: 64 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, mono, s16p, 64 kb/s
Sample rate is not helping.
rosa.max=0.962280273438, rosa.min=-0.998626708984, lua.max=0.975194513798, lua.min=-0.960788428783
Path data/fire3.mp3: rosa.shape=(179712,), lua.shape=(179712,)
Librosa:
[ 0.00000000e+00 0.00000000e+00 0.00000000e+00 ..., 0.00000000e+00
0.00000000e+00 3.05175781e-05]
Torch:
[ 0.00000000e+00 0.00000000e+00 0.00000000e+00 ..., -9.49576497e-06
7.80075788e-06 1.77286565e-05]
Round to 4 decimals
Total Diff: 2531.38
Avg Diff: 0.0140858
Max Diff: 0.2518
Min Diff: -0.2508
Input #0, mp3, from 'fire4.mp3':
Metadata:
encoder : Lavf57.40.100
Duration: 00:00:04.02, start: 0.023991, bitrate: 64 kb/s
Stream #0:0: Audio: mp3, 22050 Hz, stereo, s16p, 64 kb/s
What is wrong with ffmpeg?
rosa.max=0.999969482422, rosa.min=-1.0, lua.max=1.0, lua.min=-1.0
Path data/fire4.mp3: rosa.shape=(88704,), lua.shape=(88704,)
Librosa:
[ 0. 0. 0. ..., 0.19934082 0.22732544
0.21987915]
Torch:
[ 0. 0. 0. ..., 0.16926192 0.21725897
0.22740155]
Round to 4 decimals
Total Diff: 1865.31
Avg Diff: 0.0210285
Max Diff: 0.3136
Min Diff: -0.3041
Input #0, mp3, from 'fire5.mp3':
Duration: 00:00:04.13, start: 0.000000, bitrate: 64 kb/s
Stream #0:0: Audio: mp3, 22050 Hz, stereo, s16p, 64 kb/s
OH MY! I guess this is it!
rosa.max=0.999969482422, rosa.min=-1.0, lua.max=1.0, lua.min=-1.0
Path data/fire5.mp3: rosa.shape=(91008,), lua.shape=(90432,)
Librosa:
[ 0. 0. 0. ..., 0.00692749 -0.00219727
-0.00576782]
Torch:
[ 0. 0. 0. ..., -0.25617316 -0.2670404
-0.26867551]
Round to 4 decimals
Total Diff: 0.901599
Avg Diff: 9.96991e-06
Max Diff: 0.000100017
Min Diff: -0.000100017