High CPU usage during scale_npp to low resolutions with multiple instances

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

High CPU usage during scale_npp to low resolutions with multiple instances

Valentin Schweitzer
1234567890123456789012345678901234567890123456789012345678901234567890

Hi,

when using scale_npp to scale a test video down from 1920x1080 to
1024x576 or lower with multiple processes in parallel, CPU usage is
unusually high.For context, when scaling the same video down to
1280x720, CPU usage stays at about0.5% per FFmpeg instance. When
scaling down too 1024x576 or lower, CPU usage per FFmpeg process rises
to about 3.0%. The values listed here appear when starting 29
instances of FFmpeg in parallel. The effect is less pronounced but
still visible at 10 instances in parallel. Hardware used for this
is an AMD EPYC 7401P 24 Core + NVIDIA Quadro RTX 4000.

To generate 100s of random noise in 1080p (which will be the test video):

ffmpeg -y -hide_banner -f lavfi -i nullsrc=s=1920x1080 -filter_complex
"geq=random(1)*255:128:128;aevalsrc=-2+random(0)" -vcodec rawvideo
-acodec pcm_s16le -t 100 noise.mkv

Now rescale the test video to 720p:

ffmpeg -hide_banner -y -i noise.mkv -vf
hwupload_cuda,scale_npp=w=1280:h=720:format=nv12 -vcodec h264_nvenc -an
-f null NUL

This should not cause very high CPU usage. Now rescale the same video to
576p:

ffmpeg -hide_banner -y -i noise.mkv -vf
hwupload_cuda,scale_npp=w=1024:h=576:format=nv12 -vcodec h264_nvenc -an
-f null NUL

This should cause about 5 or 6 times as much CPU usage.

This might be caused by some NVIDIA optimizations, but it does not
seem to be documented and I have yet to find a good place to ask more
in-depth questions about NVIDIA encoding hardware.
So, if anyone has encountered a similar issue or knows why this issue
might occur, I would be grateful about any advice.

Greetings,
Valentin

_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: High CPU usage during scale_npp to low resolutions with multiple instances

Brainiarc7
On Mon, 30 Mar 2020, 15:22 Valentin Schweitzer, <[hidden email]> wrote:

> 1234567890123456789012345678901234567890123456789012345678901234567890
>
> Hi,
>
> when using scale_npp to scale a test video down from 1920x1080 to
> 1024x576 or lower with multiple processes in parallel, CPU usage is
> unusually high.For context, when scaling the same video down to
> 1280x720, CPU usage stays at about0.5% per FFmpeg instance. When
> scaling down too 1024x576 or lower, CPU usage per FFmpeg process rises
> to about 3.0%. The values listed here appear when starting 29
> instances of FFmpeg in parallel. The effect is less pronounced but
> still visible at 10 instances in parallel. Hardware used for this
> is an AMD EPYC 7401P 24 Core + NVIDIA Quadro RTX 4000.
>
> To generate 100s of random noise in 1080p (which will be the test video):
>
> ffmpeg -y -hide_banner -f lavfi -i nullsrc=s=1920x1080 -filter_complex
> "geq=random(1)*255:128:128;aevalsrc=-2+random(0)" -vcodec rawvideo
> -acodec pcm_s16le -t 100 noise.mkv
>
> Now rescale the test video to 720p:
>
> ffmpeg -hide_banner -y -i noise.mkv -vf
> hwupload_cuda,scale_npp=w=1280:h=720:format=nv12 -vcodec h264_nvenc -an
> -f null NUL
>
> This should not cause very high CPU usage. Now rescale the same video to
> 576p:
>
> ffmpeg -hide_banner -y -i noise.mkv -vf
> hwupload_cuda,scale_npp=w=1024:h=576:format=nv12 -vcodec h264_nvenc -an
> -f null NUL
>
> This should cause about 5 or 6 times as much CPU usage.
>
> This might be caused by some NVIDIA optimizations, but it does not
> seem to be documented and I have yet to find a good place to ask
>


Set this environment  variable: CUDA_DEVICE_MAX_CONNECTIONS=2

Then retest and report back.

>
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: High CPU usage during scale_npp to low resolutions with multiple instances

Brainiarc7
On Mon, 30 Mar 2020, 15:31 Dennis Mungai, <[hidden email]> wrote:

> On Mon, 30 Mar 2020, 15:22 Valentin Schweitzer, <[hidden email]>
> wrote:
>
>> 1234567890123456789012345678901234567890123456789012345678901234567890
>>
>> Hi,
>>
>> when using scale_npp to scale a test video down from 1920x1080 to
>> 1024x576 or lower with multiple processes in parallel, CPU usage is
>> unusually high.For context, when scaling the same video down to
>> 1280x720, CPU usage stays at about0.5% per FFmpeg instance. When
>> scaling down too 1024x576 or lower, CPU usage per FFmpeg process rises
>> to about 3.0%. The values listed here appear when starting 29
>> instances of FFmpeg in parallel. The effect is less pronounced but
>> still visible at 10 instances in parallel. Hardware used for this
>> is an AMD EPYC 7401P 24 Core + NVIDIA Quadro RTX 4000.
>>
>> To generate 100s of random noise in 1080p (which will be the test video):
>>
>> ffmpeg -y -hide_banner -f lavfi -i nullsrc=s=1920x1080 -filter_complex
>> "geq=random(1)*255:128:128;aevalsrc=-2+random(0)" -vcodec rawvideo
>> -acodec pcm_s16le -t 100 noise.mkv
>>
>> Now rescale the test video to 720p:
>>
>> ffmpeg -hide_banner -y -i noise.mkv -vf
>> hwupload_cuda,scale_npp=w=1280:h=720:format=nv12 -vcodec h264_nvenc -an
>> -f null NUL
>>
>> This should not cause very high CPU usage. Now rescale the same video to
>> 576p:
>>
>> ffmpeg -hide_banner -y -i noise.mkv -vf
>> hwupload_cuda,scale_npp=w=1024:h=576:format=nv12 -vcodec h264_nvenc -an
>> -f null NUL
>>
>> This should cause about 5 or 6 times as much CPU usage.
>>
>> This might be caused by some NVIDIA optimizations, but it does not
>> seem to be documented and I have yet to find a good place to ask
>>
>
>
> Set this environment  variable: CUDA_DEVICE_MAX_CONNECTIONS=2
>
> Then retest and report back.
>


One more thing: Could you show us the output of:

numactl --hardware
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: High CPU usage during scale_npp to low resolutions with multiple instances

Valentin Schweitzer


Set this environment  variable: CUDA_DEVICE_MAX_CONNECTIONS=2

Then retest and report back.





One more thing: Could you show us the output of:

numactl --hardware

Thanks for your reply. We should have clarified that we are on Windows.
Unfortunately, setting the environment variable CUDA_DEVICE_MAX_CONNECTIONS
to 2 does not make a difference. The closest we got to a numactl equivalent
on Windows is the NUMA view in the Task Manager which shows four NUMA nodes
on our 24-core processor. Given this information, is it possible that either
the Windows scheduler or the NVIDIA driver is having troubles with different
ffmpeg instances being distributed to different NUMA nodes so that a lot of
data has to be transferred between NUMA nodes, limiting the CPU? Are there
any mitigations to this or is there anything else that we can analyze to
clarify why different resolutions behave so differently on our machine?

Greetings,
Valentin



_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: High CPU usage during scale_npp to low resolutions with multiple instances

Brainiarc7
On Wed, 8 Apr 2020, 15:23 Valentin Schweitzer, <[hidden email]> wrote:

>
>
> Set this environment  variable: CUDA_DEVICE_MAX_CONNECTIONS=2
>
> Then retest and report back.
>
>
>
>
>
> One more thing: Could you show us the output of:
>
> numactl --hardware
>
> Thanks for your reply. We should have clarified that we are on Windows.
> Unfortunately, setting the environment variable CUDA_DEVICE_MAX_CONNECTIONS
> to 2 does not make a difference. The closest we got to a numactl equivalent
> on Windows is the NUMA view in the Task Manager which shows four NUMA nodes
> on our 24-core processor. Given this information, is it possible that
> either
> the Windows scheduler or the NVIDIA driver is having troubles with
> different
> ffmpeg instances being distributed to different NUMA nodes so that a lot of
> data has to be transferred between NUMA nodes, limiting the CPU? Are there
> any mitigations to this or is there anything else that we can analyze to
> clarify why different resolutions behave so differently on our machine?
>
> Greetings,
> Valentin
>


Hey there,

In your BIOS, disable the following options and retest:

1. SMT support (toggle to disabled).
2. X2APIC (toggle to disabled).

This will be the equivalent of setting "gaming mode" on the Ryzen consumer
processors. Apply these changes, retest and report back.

>
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".