Multiple FFMpeg-Cuda-HLS-Transcoding Instances -> Deadlock Behavior

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiple FFMpeg-Cuda-HLS-Transcoding Instances -> Deadlock Behavior

halise
Hello,

i'm having a hard time rendering multiple hls instances with nvidia
gear. The details are in the forum post. Here is a little summary.
Everytime without any exception, when i'm running multiple instances of
ffmpeg with cuda support, all instrances until one instance are stopping
to work. It looks like a deadlock. Because this behavior isn't happening
when i'm using libx264, i asked nvidia - how to fix it. I made a youtube
video for you guys to see whats happening.
https://www.youtube.com/watch?v=QOaf7v_Gwwk

I tried every new nvidia driver and the beavior was the same. Then i
build ffmpeg again with gcc 8, still the same behavior. I will try to
rebuild ffmpeg with gcc 7.

https://forums.developer.nvidia.com/t/multiple-ffmpeg-cuda-hls-transcoding-instances-deadlock-behavior/121461


How did i build ffmpeg:


./configure --enable-libx264 --enable-cuvid --enable-gpl --enable-libnpp
--enable-cuda --disable-cuda-sdk --enable-nonfree
--extra-cflags=-I/usr/local/cuda-10.2/include
--extra-ldflags=-L/usr/local/cuda-10.2/lib64 && make -j 8


How i'm executing ffmpeg:


exec $FFMPEG_PATH
-vsync 0
-loglevel debug
-threads:v 1
-threads:a 1
-filter_threads 1
-thread_queue_size 1024
-hwaccel cuda
-hwaccel_device 0
-hwaccel_output_format cuda
-deint adaptive
-i “udp://$MULTICAST_ADDRESS:$PORT”
-filter_complex
“[v:0]split=4[temp1][temp2][source][temp3];[temp1]scale_npp=858:480[480p];[temp2]scale_npp=640:360[wide360p];[temp3]scale_npp=426:240[240p]”
-g 50 -sc_threshold 0
-map [wide360p]
-preset medium
-c:v:0 h264_nvenc
-preset fast
-profile:v baseline
-b:v:0 600k
-bufsize 24k
-minrate 400k -maxrate 600k
-map [480p]
-c:v:1 h264_nvenc
-preset medium
-profile:v baseline
-b:v:1 1000k
-bufsize 56k
-minrate 800k -maxrate 1600k
-preset fast
-map [source]
-c:v:2 h264_nvenc
-preset medium
-profile:v baseline
-preset fast
-b:v:2 3600k
-minrate 2000k -maxrate 4000k
-bufsize 144k
-map [240p]
-c:v:3 h264_nvenc
-preset medium
-profile:v baseline
-zerolatency 1
-preset fast
-b:v:3 400k
-bufsize 16k
-map a:0
-c:a aac
-b:a 128k
-ac 2
-map a:1
-c:a aac
-b:a 96k
-ac 2
-f hls
-hls_time 4
-hls_list_size 0
-hls_flags append_list
-hls_allow_cache 0
-hls_playlist_type event
-master_pl_name $MASTER_PLAYLIST_NAME
-var_stream_map “a:0,agroup:audio,default:yes,language:DEU
a:1,agroup:audio,language:FR v:0,agroup:audio v:1,agroup:audio,
v:2,agroup:audio, v:3,agroup:audio”
$SEGMENT_FILE_NAME
$MEDIA_PLAYLIST_PREFIX


Please help me to fix this behavior.


Best regards

Marco Kittel


_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".

pEpkey.asc (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Multiple FFMpeg-Cuda-HLS-Transcoding Instances -> Deadlock Behavior

Brainiarc7
On Mon, 4 May 2020 at 11:22, Marco Kittel <[hidden email]> wrote:

> Hello,
>
> i'm having a hard time rendering multiple hls instances with nvidia
> gear. The details are in the forum post. Here is a little summary.
> Everytime without any exception, when i'm running multiple instances of
> ffmpeg with cuda support, all instrances until one instance are stopping
> to work. It looks like a deadlock. Because this behavior isn't happening
> when i'm using libx264, i asked nvidia - how to fix it. I made a youtube
> video for you guys to see whats happening.
> https://www.youtube.com/watch?v=QOaf7v_Gwwk
>
> I tried every new nvidia driver and the beavior was the same. Then i
> build ffmpeg again with gcc 8, still the same behavior. I will try to
> rebuild ffmpeg with gcc 7.
>
>
> https://forums.developer.nvidia.com/t/multiple-ffmpeg-cuda-hls-transcoding-instances-deadlock-behavior/121461
>
>
> How did i build ffmpeg:
>
>
> ./configure --enable-libx264 --enable-cuvid --enable-gpl --enable-libnpp
> --enable-cuda --disable-cuda-sdk --enable-nonfree
> --extra-cflags=-I/usr/local/cuda-10.2/include
> --extra-ldflags=-L/usr/local/cuda-10.2/lib64 && make -j 8
>
>
> How i'm executing ffmpeg:
>
>
> exec $FFMPEG_PATH
> -vsync 0
> -loglevel debug
> -threads:v 1
> -threads:a 1
> -filter_threads 1
> -thread_queue_size 1024
> -hwaccel cuda
> -hwaccel_device 0
> -hwaccel_output_format cuda
> -deint adaptive
> -i “udp://$MULTICAST_ADDRESS:$PORT”
> -filter_complex
>
> “[v:0]split=4[temp1][temp2][source][temp3];[temp1]scale_npp=858:480[480p];[temp2]scale_npp=640:360[wide360p];[temp3]scale_npp=426:240[240p]”
> -g 50 -sc_threshold 0
> -map [wide360p]
> -preset medium
> -c:v:0 h264_nvenc
> -preset fast
> -profile:v baseline
> -b:v:0 600k
> -bufsize 24k
> -minrate 400k -maxrate 600k
> -map [480p]
> -c:v:1 h264_nvenc
> -preset medium
> -profile:v baseline
> -b:v:1 1000k
> -bufsize 56k
> -minrate 800k -maxrate 1600k
> -preset fast
> -map [source]
> -c:v:2 h264_nvenc
> -preset medium
> -profile:v baseline
> -preset fast
> -b:v:2 3600k
> -minrate 2000k -maxrate 4000k
> -bufsize 144k
> -map [240p]
> -c:v:3 h264_nvenc
> -preset medium
> -profile:v baseline
> -zerolatency 1
> -preset fast
> -b:v:3 400k
> -bufsize 16k
> -map a:0
> -c:a aac
> -b:a 128k
> -ac 2
> -map a:1
> -c:a aac
> -b:a 96k
> -ac 2
> -f hls
> -hls_time 4
> -hls_list_size 0
> -hls_flags append_list
> -hls_allow_cache 0
> -hls_playlist_type event
> -master_pl_name $MASTER_PLAYLIST_NAME
> -var_stream_map “a:0,agroup:audio,default:yes,language:DEU
> a:1,agroup:audio,language:FR v:0,agroup:audio v:1,agroup:audio,
> v:2,agroup:audio, v:3,agroup:audio”
> $SEGMENT_FILE_NAME
> $MEDIA_PLAYLIST_PREFIX
>
>
> Please help me to fix this behavior.
>
>
> Best regards
>
> Marco Kittel
>
>
>
Do me a favor:

Run:

dmesg | grep NVRM

And then post back as you run the failing command(s).

Run your command as shown, see the edits:

exec $FFMPEG_PATH
-vsync 0
-loglevel debug
-threads 1
-hwaccel cuda
-hwaccel_device 0
-extra_hw_frames 3
-hwaccel_output_format cuda
-i “udp://$MULTICAST_ADDRESS:$PORT”
-filter_complex
“[v:0]yadif_cuda=0:-1:0,split=4[temp1][temp2][source][temp3];[temp1]scale_npp=858:480[480p];[temp2]scale_npp=640:360[wide360p];[temp3]scale_npp=426:240[240p]”
-g 50
-map [wide360p]
-preset llhp
-c:v:0 h264_nvenc
-profile:v baseline
-b:v:0 600k
-bufsize 24k
-minrate 400k -maxrate 600k
-map [480p]
-c:v:1 h264_nvenc
-preset llhp
-profile:v baseline
-b:v:1 1000k
-bufsize 56k
-minrate 800k -maxrate 1600k
-map [source]
-c:v:2 h264_nvenc
-preset llhp
-profile:v baseline
-preset fast
-b:v:2 3600k
-minrate 2000k -maxrate 4000k
-bufsize 144k
-map [240p]
-c:v:3 h264_nvenc
-preset llhp
-profile:v baseline
-zerolatency 1
-b:v:3 400k
-bufsize 16k
-map a:0
-c:a aac
-b:a 128k
-ac 2
-map a:1
-c:a aac
-b:a 96k
-ac 2
-f hls
-hls_time 4
-hls_list_size 0
-hls_flags append_list
-hls_allow_cache 0
-hls_playlist_type event
-master_pl_name $MASTER_PLAYLIST_NAME
-var_stream_map “a:0,agroup:audio,default:yes,language:DEU
a:1,agroup:audio,language:FR v:0,agroup:audio v:1,agroup:audio,
v:2,agroup:audio, v:3,agroup:audio”
$SEGMENT_FILE_NAME
$MEDIA_PLAYLIST_PREFIX

Remember to include full output from the console.

I removed some superflous options (see the preset above) and added in
-extra_hw_frames 3 which will help with stability especially with
hardware-accelerated deinterlacing. The filter in use is now yadif_cuda.
For documentation on yadif_cuda, see
https://ffmpeg.org/ffmpeg-filters.html#yadif_005fcuda
You may also want to tune the rate control in NVENC (via the private codec
option -rc:v ).
For usage, see ffmpeg -h encoder=h264_nvenc

The output I requested above from dmesg will confirm *if* your GPU (and GPU
driver) is tripping up on XID errors.
For more on the same, see
https://docs.nvidia.com/deploy/xid-errors/index.html

Warm regards,

Dennis.
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: Multiple FFMpeg-Cuda-HLS-Transcoding Instances -> Deadlock Behavior

Gabriel Balaich
In reply to this post by halise
On Mon, 4 May 2020 at 02:22, Marco Kittel <[hidden email]> wrote:

> Hello,
>
> i'm having a hard time rendering multiple hls instances with nvidia
> gear. The details are in the forum post. Here is a little summary.
> Everytime without any exception, when i'm running multiple instances of
> ffmpeg with cuda support, all instrances until one instance are stopping
> to work. It looks like a deadlock. Because this behavior isn't happening
> when i'm using libx264, i asked nvidia - how to fix it. I made a youtube
> video for you guys to see whats happening.
> https://www.youtube.com/watch?v=QOaf7v_Gwwk
>
> I tried every new nvidia driver and the beavior was the same. Then i
> build ffmpeg again with gcc 8, still the same behavior. I will try to
> rebuild ffmpeg with gcc 7.
>
>
> https://forums.developer.nvidia.com/t/multiple-ffmpeg-cuda-hls-transcoding-instances-deadlock-behavior/121461
>
>
> How did i build ffmpeg:
>
>
> ./configure --enable-libx264 --enable-cuvid --enable-gpl --enable-libnpp
> --enable-cuda --disable-cuda-sdk --enable-nonfree
> --extra-cflags=-I/usr/local/cuda-10.2/include
> --extra-ldflags=-L/usr/local/cuda-10.2/lib64 && make -j 8
>
>
> How i'm executing ffmpeg:
>
>
> exec $FFMPEG_PATH
> -vsync 0
> -loglevel debug
> -threads:v 1
> -threads:a 1
> -filter_threads 1
> -thread_queue_size 1024
> -hwaccel cuda
> -hwaccel_device 0
> -hwaccel_output_format cuda
> -deint adaptive
> -i “udp://$MULTICAST_ADDRESS:$PORT”
> -filter_complex
>
> “[v:0]split=4[temp1][temp2][source][temp3];[temp1]scale_npp=858:480[480p];[temp2]scale_npp=640:360[wide360p];[temp3]scale_npp=426:240[240p]”
> -g 50 -sc_threshold 0
> -map [wide360p]
> -preset medium
> -c:v:0 h264_nvenc
> -preset fast
> -profile:v baseline
> -b:v:0 600k
> -bufsize 24k
> -minrate 400k -maxrate 600k
> -map [480p]
> -c:v:1 h264_nvenc
> -preset medium
> -profile:v baseline
> -b:v:1 1000k
> -bufsize 56k
> -minrate 800k -maxrate 1600k
> -preset fast
> -map [source]
> -c:v:2 h264_nvenc
> -preset medium
> -profile:v baseline
> -preset fast
> -b:v:2 3600k
> -minrate 2000k -maxrate 4000k
> -bufsize 144k
> -map [240p]
> -c:v:3 h264_nvenc
> -preset medium
> -profile:v baseline
> -zerolatency 1
> -preset fast
> -b:v:3 400k
> -bufsize 16k
> -map a:0
> -c:a aac
> -b:a 128k
> -ac 2
> -map a:1
> -c:a aac
> -b:a 96k
> -ac 2
> -f hls
> -hls_time 4
> -hls_list_size 0
> -hls_flags append_list
> -hls_allow_cache 0
> -hls_playlist_type event
> -master_pl_name $MASTER_PLAYLIST_NAME
> -var_stream_map “a:0,agroup:audio,default:yes,language:DEU
> a:1,agroup:audio,language:FR v:0,agroup:audio v:1,agroup:audio,
> v:2,agroup:audio, v:3,agroup:audio”
> $SEGMENT_FILE_NAME
> $MEDIA_PLAYLIST_PREFIX
>
>
> Please help me to fix this behavior.
>
>
> Best regards
>
> Marco Kittel
>
> _______________________________________________
> ffmpeg-user mailing list
> [hidden email]
> https://ffmpeg.org/mailman/listinfo/ffmpeg-user
>
> To unsubscribe, visit link above, or email
> [hidden email] with subject "unsubscribe".


I think the issue you're running into may have to do with NVIDIA drivers
only allowing 2-3 simultaneous encodes from one video card at any given
time, unless you're using an NVIDIA Quadro in-which there is no encode
limit.

Fortunately, even on consumer cards this limit is artificially imposed and
can be bypassed with a patch:
https://github.com/keylase/nvidia-patch
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".