audio artefacts after segment and transcode

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

audio artefacts after segment and transcode

Philipp Hasenfratz
Hi everyone

I am transcoding larger videos on a set of computers in parallel. I do this by segmenting an input file at key-frames (ffmpeg -i ... -f segment), then transcode parts using GNU parallel, then recombine parts into one output file using ffmpeg -f concat -i ...). This works well, but I had issues with audio being not in sync with videos or having audio "artefacts". I solved that by transcoding audio separately, but I would prefer the more direct solution to transcode both audio and video in one step.

If segmented video parts are concat together I sometimes experience audio "artefacts" (audio discontinues and then starts again) and I can't find a way to avoid it. I would very much like some insights from your side why this happens or how this artefacts can be avoided. I could not find any solution using aresample filter, async, vsync, forcing constant frame rate.

I prepared a simple working example (all files stored in /tmp; I increased I-Frames in the input file so that audio artefacts are getting more pronounced). I also included ffprobe of the input and output-file after the example, maybe you already see a problem there without executing this example (I might just don't see the trees in the forest anymore...).

############

# generated Output file is at http://www.pentachoron.net/output.mov, if you only want to hear/see the result

# step 1: download example file, input.avi

cd /tmp && wget http://www.pentachoron.net/input.avi

# step 2: create segments

ffmpeg -y -hide_banner -i /tmp/input.avi -f segment -segment_time 0.5 -reset_timestamps 1 -segment_list /tmp/input_part.list -segment_list_type ffconcat -r 25 -c:v copy -c:a copy -strict experimental -c:s copy -map v? -map a? -map s? /tmp/input_part_%06d.mp4

# step 3: process each "segment" (parts created in step 2)

for f in `seq -f %06g 1 59`; do ffmpeg -y -hide_banner -i input_part_$f.mp4 -c:v libx265 -map v? -map a? -map s? /tmp/output_part_$f.mp4; done

# step 4: create a ffconcat file for the output file

for f in /tmp/output_part_*.mp4; do echo "file '$f'" >>/tmp/output_part.list; done

# step 5: create output file

ffmpeg -y -hide_banner -safe 0 -f concat -i /tmp/output_part.list -c:v copy -c:a copy -c:s copy -map v? -map a? -map s? /tmp/output.mov

############

ffprobe of the input file:

user@sonne:/tmp$ ffprobe input.avi
ffprobe version 4.1.3 Copyright (c) 2007-2019 the FFmpeg developers
  built with gcc 8 (Debian 8.3.0-4)
  configuration: --disable-decoder=amrnb --disable-decoder=libopenjpeg --disable-libopencv --disable-outdev=sdl2 --disable-podpages --disable-sndio --disable-stripping --enable-libaom --enable-avfilter --enable-avresample --enable-gcrypt --enable-gnutls --enable-gpl --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libfdk-aac --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libilbc --enable-libkvazaar --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libx265 --enable-libxvid --enable-libzvbi --enable-nonfree --enable-opencl --enable-opengl --enable-postproc --enable-pthreads --enable-shared --enable-version3 --enable-libwebp --incdir=/usr/include/x86_64-linux-gnu --libdir=/usr/lib/x86_64-linux-gnu --prefix=/usr --toolchain=hardened --enable-frei0r --enable-chromaprint --enable-libx264 --enable-libiec61883 --enable-libdc1394 --enable-vaapi --enable-libmfx --disable-altivec --shlibdir=/usr/lib/x86_64-linux-gnu
  libavutil      56. 22.100 / 56. 22.100
  libavcodec     58. 35.100 / 58. 35.100
  libavformat    58. 20.100 / 58. 20.100
  libavdevice    58.  5.100 / 58.  5.100
  libavfilter     7. 40.101 /  7. 40.101
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  3.100 /  5.  3.100
  libswresample   3.  3.100 /  3.  3.100
  libpostproc    55.  3.100 / 55.  3.100
Input #0, avi, from 'input.avi':
  Metadata:
    IAS1            : Deutsch
    IAS2            : English
    encoder         : Lavf58.20.100
  Duration: 00:00:30.04, start: 0.000000, bitrate: 861 kb/s
    Stream #0:0: Video: mpeg4 (Simple Profile) (xvid / 0x64697678), yuv420p, 576x432 [SAR 1:1 DAR 4:3], 719 kb/s, 25 fps, 25 tbr, 25 tbn, 25 tbc
    Stream #0:1: Audio: mp3 (U[0][0][0] / 0x0055), 48000 Hz, stereo, fltp, 128 kb/s

ffprobe of the output file:

user@sonne:/tmp$ ffprobe output.mov
ffprobe version 4.1.3 Copyright (c) 2007-2019 the FFmpeg developers
  built with gcc 8 (Debian 8.3.0-4)
  configuration: --disable-decoder=amrnb --disable-decoder=libopenjpeg --disable-libopencv --disable-outdev=sdl2 --disable-podpages --disable-sndio --disable-stripping --enable-libaom --enable-avfilter --enable-avresample --enable-gcrypt --enable-gnutls --enable-gpl --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libfdk-aac --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libilbc --enable-libkvazaar --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libx265 --enable-libxvid --enable-libzvbi --enable-nonfree --enable-opencl --enable-opengl --enable-postproc --enable-pthreads --enable-shared --enable-version3 --enable-libwebp --incdir=/usr/include/x86_64-linux-gnu --libdir=/usr/lib/x86_64-linux-gnu --prefix=/usr --toolchain=hardened --enable-frei0r --enable-chromaprint --enable-libx264 --enable-libiec61883 --enable-libdc1394 --enable-vaapi --enable-libmfx --disable-altivec --shlibdir=/usr/lib/x86_64-linux-gnu
  libavutil      56. 22.100 / 56. 22.100
  libavcodec     58. 35.100 / 58. 35.100
  libavformat    58. 20.100 / 58. 20.100
  libavdevice    58.  5.100 / 58.  5.100
  libavfilter     7. 40.101 /  7. 40.101
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  3.100 /  5.  3.100
  libswresample   3.  3.100 /  3.  3.100
  libpostproc    55.  3.100 / 55.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'output.mov':
  Metadata:
    major_brand     : qt
    minor_version   : 512
    compatible_brands: qt
    encoder         : Lavf58.20.100
  Duration: 00:01:01.25, start: 0.000000, bitrate: 342 kb/s
    Stream #0:0(eng): Video: hevc (Main) (hev1 / 0x31766568), yuv420p(tv, progressive), 576x432 [SAR 1:1 DAR 4:3], 172 kb/s, 23.94 fps, 50 tbr, 12800 tbn, 25 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 162 kb/s (default)
    Metadata:
      handler_name    : SoundHandler

if you would execute the steps 1-5 above and watch /tmp/output.mov you'll notice that the sound "clacks" (stops and then starts again). Do you have an explanation or do you know how this audio artefacts can be solved? Can it be that it's just an issue with codec timebases or because libx265 is using a variable frame rate (ffprobe of output.mov has an effective fps of 23.94 while input.avi has a constant frame rate of 25 fps)? I would very much appreciate some help.

Thank you,

Philipp
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: audio artefacts after segment and transcode

kumowoon1025
> I am transcoding larger videos on a set of computers in parallel. I do this by segmenting an input file at key-frames (ffmpeg -i ... -f segment), then transcode parts using GNU parallel, then recombine parts into one output file using ffmpeg -f concat -i ...). This works well, but I had issues with audio being not in sync with videos or having audio "artefacts". I solved that by transcoding audio separately, but I would prefer the more direct solution to transcode both audio and video in one step.

Probably transcoding video and audio (that’s been segmented while stream copying) in one step is more or less causing this…
If you can live with just encoding in one step you might get better results?
Of course then you’ll need to decode the whole file from start to finish, but that’s not as cpu intensive, and not reliable, as you’ve seen.

> Input #0, avi, from 'input.avi':
>  Metadata:
>    IAS1            : Deutsch
>    IAS2            : English
>    encoder         : Lavf58.20.100
>  Duration: 00:00:30.04, start: 0.000000, bitrate: 861 kb/s
>    Stream #0:0: Video: mpeg4 (Simple Profile) (xvid / 0x64697678), yuv420p, 576x432 [SAR 1:1 DAR 4:3], 719 kb/s, 25 fps, 25 tbr, 25 tbn, 25 tbc
>    Stream #0:1: Audio: mp3 (U[0][0][0] / 0x0055), 48000 Hz, stereo, fltp, 128 kb/s

yuv4 and pcm_f32le/be fits for this, I think. So from this,

> # step 2: create segments
>
> ffmpeg -y -hide_banner -i /tmp/input.avi -f segment -segment_time 0.5 -reset_timestamps 1 -segment_list /tmp/input_part.list -segment_list_type ffconcat -r 25 -c:v copy -c:a copy -strict experimental -c:s copy -map v? -map a? -map s? /tmp/input_part_%06d.mp4

try changing it to

ffmpeg -y -hide_banner -i /tmp/input.avi -f segment -segment_time 0.5 -segment_list /tmp/input_part.list -segment_list_type ffconcat -map 0? -c copy -c:v yuv4 -c:a pcm_f32le /tmp/input_part_%06d.mov

Segment sizes should be longer though, at 0.5 seconds the overhead would not be insignificant. I’m guessing it was just for the demo?

> # step 3: process each "segment" (parts created in step 2)
>
> for f in `seq -f %06g 1 59`; do ffmpeg -y -hide_banner -i input_part_$f.mp4 -c:v libx265 -map v? -map a? -map s? /tmp/output_part_$f.mp4; done

And encode the segments in your distributed/parallel setup.

ffmpeg -y -hide_banner -i /tmp/input_part_$INDEX.mov -c:v libx265 -c:a eac3 /tmp/output_part_$INDEX.mov

> # step 4: create a ffconcat file for the output file
>
> for f in /tmp/output_part_*.mp4; do echo "file '$f'" >>/tmp/output_part.list; done

The first line in the ffconcat being ffconcat version 1.0 seems to help, you should probably just use the generated ffconcat segment list as the template,

sed 's/input/output/g' /tmp/input_part.list > /tmp/output_part.list

> # step 5: create output file
>
> ffmpeg -y -hide_banner -safe 0 -f concat -i /tmp/output_part.list -c:v copy -c:a copy -c:s copy -map v? -map a? -map s? /tmp/output.mov

And putting it all together should be the same.

> Do you have an explanation or do you know how this audio artefacts can be solved? Can it be that it's just an issue with codec timebases or because libx265 is using a variable frame rate (ffprobe of output.mov has an effective fps of 23.94 while input.avi has a constant frame rate of 25 fps)? I would very much appreciate some help.

The timebase thing could bake sense, something something rounding issues when segmenting, timestamps being unaligned, type of thing? But I don’t think x265 does variable frame rates (not sure), regardless in an mp4 it’s most definitely constant. Set the framerate during the encoding step if that’s important, the “normal” ones you can use abbreviations for (ntsc, pal, film, ntsc-film, etc) to pass the right rate instead of rounding the decimals.

25fps is pal
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: audio artefacts after segment and transcode

Philipp Hasenfratz
Hi Ted

>> I am transcoding larger videos on a set of computers in parallel. I do this by segmenting an input file at key-frames (ffmpeg -i ... -f segment), then transcode parts using GNU parallel, then recombine parts into one output file using ffmpeg -f concat -i ...). This works well, but I had issues with audio being not in sync with videos or having audio "artefacts". I solved that by transcoding audio separately, but I would prefer the more direct solution to transcode both audio and video in one step.
>
> Probably transcoding video and audio (that’s been segmented while stream copying) in one step is more or less causing this…

After some tests yesterday when I applied your suggestions, and after some tests of my own that I have conducted myself prior to my initial post. I can tell my thoughts on this:

I think that the segmentation (ffmpeg -i ... -f segment ...) itself does not change anything. The segmentation is just splitting up the input file (keeping timestamps, copying data). The problem arises afterwards when processing the parts/segments. Somehow the timestamps are getting out of sync and I have a feeling that this is because of the segmentation (to be precise: muxer and encoder do not have the whole input file, but only a part of it).

Each part video is getting demuxed, decoded, encoded and muxed again. And somewhere in this process timestamps are getting modified (either (de-)muxer or by the de- or encoder). This might be because the container or stream codec needs to have another tbn (codec timebase) or changes from a constant frame rate to a variable frame rate or because the container has another requirements on the timebase than the input container has specified.

If we were using the whole file as input, like ffmpeg -i input.avi -c:v libx265 -c:a aac output.mov. ffmpeg/libav does take good care of this and the muxing/encoding works like a charm. But when first segmenting the input into parts and encode/mux them separately, ffmpeg/libav does not have the full picture and tries to fill in gaps. One sign that this is happening are warnings like "[mov @ 0x561bc46a3d80] Non-monotonous DTS in output stream 0:1; previous: 121611520, current: 121611024; changing to 121611521. This may result in incorrect timestamps in the output file.". Stream 0:1 is audio and looking at the timestamps, audio stream seems to be "behind". ffmpeg is then doing the only thing it can (lacking the whole picture because of the segmentation), it corrects the timestamp to the best known value. Unfortunately this must result in an audio gap, creating the audio artefacts.

The question is, why is this happening? If libx265 has a constant frame rate of 25. And the origin video has a constant frame rate of 25. Why can audio lack behind (why we don't have enough audio samples)? I currently can only explain this by libx265 encoder, or maybe the mov muxer somehow changing the framerate to 24.542 (as mediainfo/ffprobe tell me).

Or another question to potentially solve the issue: how could I tell ffmpeg/libav to keep the timestamps as long as possible ("timestamp passthrough") so that the ending ffmpeg -f concat -i XYZ call still has the original timestamps and might see the whole picture of the original video again?

> If you can live with just encoding in one step you might get better results?
> Of course then you’ll need to decode the whole file from start to finish, but that’s not as cpu intensive, and not reliable, as you’ve seen.

Thank you for suggesting! Yes that would actually make sense that a "pre-encoding" (into yuv4, rawvideo or so) in the segmentation phase might improve the situation and I ran through your suggestion (using yuv4 and pcm in pre-segmentation). The result is better, but I can still here some artefacts (less pronounced, but still there). The reason why I would prefer to avoid this pre-segmentation into a "raw-format" is IO boundness: a lot of videos such as timelapses or raw video captures from a camera are in 2k+. Thus pre-transcoding them into yuv4 or rawvideo will produce enormous amounts of data. Thus easily get IO bound which would annihilate the performance uplift of a Multi-Computer solution to fastly transcode a video. Still I do agree with you that it's only a matter of decoding (and convert to a raw stream) which is far less CPU intensive than the encoding. Therefore this would be a possible scenario for "small resolution" videos.

>> # step 2: create segments
>>
>> ffmpeg -y -hide_banner -i /tmp/input.avi -f segment -segment_time 0.5 -reset_timestamps 1 -segment_list /tmp/input_part.list -segment_list_type ffconcat -r 25 -c:v copy -c:a copy -strict experimental -c:s copy -map v? -map a? -map s? /tmp/input_part_%06d.mp4
>
> try changing it to
>
> ffmpeg -y -hide_banner -i /tmp/input.avi -f segment -segment_time 0.5 -segment_list /tmp/input_part.list -segment_list_type ffconcat -map 0? -c copy -c:v yuv4 -c:a pcm_f32le /tmp/input_part_%06d.mov
>
> Segment sizes should be longer though, at 0.5 seconds the overhead would not be insignificant. I’m guessing it was just for the demo?

Segment sizes: Yes, exactly, I was just using a short segment_time of 0.5 for the demo. So that the audio artefacts are getting more pronounced. A common value that I normally choose is something between 10 and 30 seconds (depending on GOP / key-frame-interval).

you used -c:a pcm_f32le. In my example I forgot to add an audio codec in the test-setup I was presenting, sorry for that. I normally have it in as well.

>> # step 4: create a ffconcat file for the output file
>>
>> for f in /tmp/output_part_*.mp4; do echo "file '$f'" >>/tmp/output_part.list; done
>
> The first line in the ffconcat being ffconcat version 1.0 seems to help, you should probably just use the generated ffconcat segment list as the template,
>
> sed 's/input/output/g' /tmp/input_part.list > /tmp/output_part.list

Right, that's the better solution.

>> Do you have an explanation or do you know how this audio artefacts can be solved? Can it be that it's just an issue with codec timebases or because libx265 is using a variable frame rate (ffprobe of output.mov has an effective fps of 23.94 while input.avi has a constant frame rate of 25 fps)? I would very much appreciate some help.
>
> The timebase thing could bake sense, something something rounding issues when segmenting, timestamps being unaligned, type of thing? But I don’t think x265 does variable frame rates (not sure), regardless in an mp4 it’s most definitely constant. Set the framerate during the encoding step if that’s important, the “normal” ones you can use abbreviations for (ntsc, pal, film, ntsc-film, etc) to pass the right rate instead of rounding the decimals.

Right. The -r must be in the encoding. Kind of doesn't make sense in combination with -c:v copy of course...

regarding variable frame rate for x265:

mediainfo ./output.mov # and ./output.mp4
...
Frame rate mode                          : Variable
Frame rate                               : 24.542 FPS
Minimum frame rate                       : 8.333 FPS
Maximum frame rate                       : 25.000 FPS
Original frame rate                      : 25.000 FPS
...

both .mp4 and .mov show a frame rate of 24.542 (and a min/max that is not the same), that's why I was referring to variable frame rate.

I appreciated your reply

Philipp
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".