When using h264_cuvid, memory allocation differs between different nvidia GPUs

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

When using h264_cuvid, memory allocation differs between different nvidia GPUs

Panagiotis Malakoudis
I am using h264_cuvid in some systems. GTX10xx cards have the power to
decode many full hd streams concurrently.
I have found that memory allocated in the GPU differs between GTX 1050 Ti
and GTX 1070 Ti. With the same command, same settings, ffmpeg allocates
only 87MB in GTX 1050 Ti while it allocates 153MB in GTX 1070 Ti. There is
an exact 66MB difference which is critical in my application if it can be
avoided. Is there any reason why this happens and can it be avoided?

Example command:
ffmpeg -c:v h264_cuvid -surfaces 8 -f mpegts -i
https://samples.ffmpeg.org/V-codecs/h264/HD-h264.ts -vcodec libx264 -preset
veryfast -crf 23 -c:a copy -f mpegts transcoded.ts
In GTX 1050 Ti, nvidia-smi output:
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU
Memory |
|  GPU       PID   Type   Process name                             Usage
  |
|=============================================================================|
|    0     28243      C   ffmpeg
87MiB |
+-----------------------------------------------------------------------------+
In GTX 1070 Ti, nvidia-smi output:
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU
Memory |
|  GPU       PID   Type   Process name                             Usage
  |
|=============================================================================|
|    0     10291      C   ffmpeg
 153MiB |
+-----------------------------------------------------------------------------+
_______________________________________________
ffmpeg-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: When using h264_cuvid, memory allocation differs between different nvidia GPUs

Brainiarc7
Hello there,

On the system using more VRAM, can you lower the -surfaces value from 8 to
~6?

Also, in place of h264_cuvid, use nvdec instead.

Example:

ffmpeg -c:v nvdec -surfaces 8 -f mpegts -i
https://samples.ffmpeg.org/V-codecs/h264/HD-h264.ts -vcodec libx264 -preset
veryfast -crf 23 -c:a copy -f mpegts transcoded.ts

On Thu, 13 Sep 2018 at 11:31, Panagiotis Malakoudis <[hidden email]>
wrote:

> I am using h264_cuvid in some systems. GTX10xx cards have the power to
> decode many full hd streams concurrently.
> I have found that memory allocated in the GPU differs between GTX 1050 Ti
> and GTX 1070 Ti. With the same command, same settings, ffmpeg allocates
> only 87MB in GTX 1050 Ti while it allocates 153MB in GTX 1070 Ti. There is
> an exact 66MB difference which is critical in my application if it can be
> avoided. Is there any reason why this happens and can it be avoided?
>
> Example command:
> ffmpeg -c:v h264_cuvid -surfaces 8 -f mpegts -i
> https://samples.ffmpeg.org/V-codecs/h264/HD-h264.ts -vcodec libx264
> -preset
> veryfast -crf 23 -c:a copy -f mpegts transcoded.ts
> In GTX 1050 Ti, nvidia-smi output:
>
> +-----------------------------------------------------------------------------+
> | Processes:                                                       GPU
> Memory |
> |  GPU       PID   Type   Process name                             Usage
>   |
>
> |=============================================================================|
> |    0     28243      C   ffmpeg
> 87MiB |
>
> +-----------------------------------------------------------------------------+
> In GTX 1070 Ti, nvidia-smi output:
>
> +-----------------------------------------------------------------------------+
> | Processes:                                                       GPU
> Memory |
> |  GPU       PID   Type   Process name                             Usage
>   |
>
> |=============================================================================|
> |    0     10291      C   ffmpeg
>  153MiB |
>
> +-----------------------------------------------------------------------------+
> _______________________________________________
> ffmpeg-user mailing list
> [hidden email]
> http://ffmpeg.org/mailman/listinfo/ffmpeg-user
>
> To unsubscribe, visit link above, or email
> [hidden email] with subject "unsubscribe".
_______________________________________________
ffmpeg-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: When using h264_cuvid, memory allocation differs between different nvidia GPUs

Panagiotis Malakoudis
Lowering surfaces in GTX 1070 Ti to 2 only reduces memory to 129MB, so it
is still not the same as GTX 1050 Ti. Also, 2 surfaces are not enough for
correct decoding, decoding has errors.
Using nvdec with command:
ffmpeg -hwaccel nvdec -f mpegts -i
https://samples.ffmpeg.org/V-codecs/h264/HD-h264.ts -vcodec libx264 -preset
veryfast -crf 23 -c:a copy -f mpegts -y output.ts
uses even more memory, 321MB on GTX 1070 Ti compared to 153MB with
h264_cuvid.
-surfaces is not applicable on -hwaccel nvdec




Στις Πέμ, 13 Σεπ 2018 στις 1:28 μ.μ., ο/η Dennis Mungai <[hidden email]>
έγραψε:

> Hello there,
>
> On the system using more VRAM, can you lower the -surfaces value from 8 to
> ~6?
>
> Also, in place of h264_cuvid, use nvdec instead.
>
> Example:
>
> ffmpeg -c:v nvdec -surfaces 8 -f mpegts -i
> https://samples.ffmpeg.org/V-codecs/h264/HD-h264.ts -vcodec libx264
> -preset
> veryfast -crf 23 -c:a copy -f mpegts transcoded.ts
>
> On Thu, 13 Sep 2018 at 11:31, Panagiotis Malakoudis <[hidden email]>
> wrote:
>
> > I am using h264_cuvid in some systems. GTX10xx cards have the power to
> > decode many full hd streams concurrently.
> > I have found that memory allocated in the GPU differs between GTX 1050 Ti
> > and GTX 1070 Ti. With the same command, same settings, ffmpeg allocates
> > only 87MB in GTX 1050 Ti while it allocates 153MB in GTX 1070 Ti. There
> is
> > an exact 66MB difference which is critical in my application if it can be
> > avoided. Is there any reason why this happens and can it be avoided?
> >
> > Example command:
> > ffmpeg -c:v h264_cuvid -surfaces 8 -f mpegts -i
> > https://samples.ffmpeg.org/V-codecs/h264/HD-h264.ts -vcodec libx264
> > -preset
> > veryfast -crf 23 -c:a copy -f mpegts transcoded.ts
> > In GTX 1050 Ti, nvidia-smi output:
> >
> >
> +-----------------------------------------------------------------------------+
> > | Processes:                                                       GPU
> > Memory |
> > |  GPU       PID   Type   Process name                             Usage
> >   |
> >
> >
> |=============================================================================|
> > |    0     28243      C   ffmpeg
> > 87MiB |
> >
> >
> +-----------------------------------------------------------------------------+
> > In GTX 1070 Ti, nvidia-smi output:
> >
> >
> +-----------------------------------------------------------------------------+
> > | Processes:                                                       GPU
> > Memory |
> > |  GPU       PID   Type   Process name                             Usage
> >   |
> >
> >
> |=============================================================================|
> > |    0     10291      C   ffmpeg
> >  153MiB |
> >
> >
> +-----------------------------------------------------------------------------+
> > _______________________________________________
> > ffmpeg-user mailing list
> > [hidden email]
> > http://ffmpeg.org/mailman/listinfo/ffmpeg-user
> >
> > To unsubscribe, visit link above, or email
> > [hidden email] with subject "unsubscribe".
> _______________________________________________
> ffmpeg-user mailing list
> [hidden email]
> http://ffmpeg.org/mailman/listinfo/ffmpeg-user
>
> To unsubscribe, visit link above, or email
> [hidden email] with subject "unsubscribe".
_______________________________________________
ffmpeg-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: When using h264_cuvid, memory allocation differs between different nvidia GPUs

Panagiotis Malakoudis
I did more tests and there is a consistent difference in VRAM usage when
running same commands in GTX 1050 Ti and GTX 1070 Ti.
NVENC encoding also excibits same behaviour, for example:
ffmpeg -f mpegts -i https://samples.ffmpeg.org/V-codecs/h264/HD-h264.ts
-vcodec h264_nvenc -c:a copy -f mpegts transcoded.ts
all defaults, allocates 94MB in 1050 Ti but 163MB in 1070 Ti, for a
difference of 69MB.
Combining encoding AND decoding in one command, for example:
ffmpeg -hwaccel cuvid -c:v h264_cuvid -f mpegts -i
https://samples.ffmpeg.org/V-codecs/h264/HD-h264.ts -vcodec h264_nvenc -c:a
copy -f mpegts transcoded.ts
1070 Ti: 269MB, 1050 Ti: 200MB.
So we see a difference of 66MB when only decoding, 69MB when only encoding,
69MB when both decoding+encoding.
I also tested two different driver versions, 384.130 and 390.77, both give
same results.
What is interesting is that testing in Windows gives different, lower VRAM
values for GTX 1070 Ti but still more than GTX 1050 Ti in Linux (I couldn't
test GTX 1050 Ti in Windows yet). For example, decoding uses 132MB when
same command in Linux uses 153MB.

I don't think it is a bug in ffmpeg, probably something going on in nvidia
drivers. Where I could report this to nvidia?


Στις Πέμ, 13 Σεπ 2018 στις 3:49 μ.μ., ο/η Panagiotis Malakoudis <
[hidden email]> έγραψε:

> Lowering surfaces in GTX 1070 Ti to 2 only reduces memory to 129MB, so it
> is still not the same as GTX 1050 Ti. Also, 2 surfaces are not enough for
> correct decoding, decoding has errors.
> Using nvdec with command:
> ffmpeg -hwaccel nvdec -f mpegts -i
> https://samples.ffmpeg.org/V-codecs/h264/HD-h264.ts -vcodec libx264
> -preset veryfast -crf 23 -c:a copy -f mpegts -y output.ts
> uses even more memory, 321MB on GTX 1070 Ti compared to 153MB with
> h264_cuvid.
> -surfaces is not applicable on -hwaccel nvdec
>
>
>
>
> Στις Πέμ, 13 Σεπ 2018 στις 1:28 μ.μ., ο/η Dennis Mungai <[hidden email]>
> έγραψε:
>
>> Hello there,
>>
>> On the system using more VRAM, can you lower the -surfaces value from 8 to
>> ~6?
>>
>> Also, in place of h264_cuvid, use nvdec instead.
>>
>> Example:
>>
>> ffmpeg -c:v nvdec -surfaces 8 -f mpegts -i
>> https://samples.ffmpeg.org/V-codecs/h264/HD-h264.ts -vcodec libx264
>> -preset
>> veryfast -crf 23 -c:a copy -f mpegts transcoded.ts
>>
>> On Thu, 13 Sep 2018 at 11:31, Panagiotis Malakoudis <[hidden email]>
>> wrote:
>>
>> > I am using h264_cuvid in some systems. GTX10xx cards have the power to
>> > decode many full hd streams concurrently.
>> > I have found that memory allocated in the GPU differs between GTX 1050
>> Ti
>> > and GTX 1070 Ti. With the same command, same settings, ffmpeg allocates
>> > only 87MB in GTX 1050 Ti while it allocates 153MB in GTX 1070 Ti. There
>> is
>> > an exact 66MB difference which is critical in my application if it can
>> be
>> > avoided. Is there any reason why this happens and can it be avoided?
>> >
>> > Example command:
>> > ffmpeg -c:v h264_cuvid -surfaces 8 -f mpegts -i
>> > https://samples.ffmpeg.org/V-codecs/h264/HD-h264.ts -vcodec libx264
>> > -preset
>> > veryfast -crf 23 -c:a copy -f mpegts transcoded.ts
>> > In GTX 1050 Ti, nvidia-smi output:
>> >
>> >
>> +-----------------------------------------------------------------------------+
>> > | Processes:                                                       GPU
>> > Memory |
>> > |  GPU       PID   Type   Process name                             Usage
>> >   |
>> >
>> >
>> |=============================================================================|
>> > |    0     28243      C   ffmpeg
>> > 87MiB |
>> >
>> >
>> +-----------------------------------------------------------------------------+
>> > In GTX 1070 Ti, nvidia-smi output:
>> >
>> >
>> +-----------------------------------------------------------------------------+
>> > | Processes:                                                       GPU
>> > Memory |
>> > |  GPU       PID   Type   Process name                             Usage
>> >   |
>> >
>> >
>> |=============================================================================|
>> > |    0     10291      C   ffmpeg
>> >  153MiB |
>> >
>> >
>> +-----------------------------------------------------------------------------+
>> > _______________________________________________
>> > ffmpeg-user mailing list
>> > [hidden email]
>> > http://ffmpeg.org/mailman/listinfo/ffmpeg-user
>> >
>> > To unsubscribe, visit link above, or email
>> > [hidden email] with subject "unsubscribe".
>> _______________________________________________
>> ffmpeg-user mailing list
>> [hidden email]
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-user
>>
>> To unsubscribe, visit link above, or email
>> [hidden email] with subject "unsubscribe".
>
>
_______________________________________________
ffmpeg-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: When using h264_cuvid, memory allocation differs between different nvidia GPUs

Rodney Baker
On Saturday, 15 September 2018 2:16:19 ACST Panagiotis Malakoudis wrote:

> I did more tests and there is a consistent difference in VRAM usage when
> running same commands in GTX 1050 Ti and GTX 1070 Ti.
> NVENC encoding also excibits same behaviour, for example:
> ffmpeg -f mpegts -i https://samples.ffmpeg.org/V-codecs/h264/HD-h264.ts
> -vcodec h264_nvenc -c:a copy -f mpegts transcoded.ts
> all defaults, allocates 94MB in 1050 Ti but 163MB in 1070 Ti, for a
> difference of 69MB.
> Combining encoding AND decoding in one command, for example:
> ffmpeg -hwaccel cuvid -c:v h264_cuvid -f mpegts -i
> https://samples.ffmpeg.org/V-codecs/h264/HD-h264.ts -vcodec h264_nvenc -c:a
> copy -f mpegts transcoded.ts
> 1070 Ti: 269MB, 1050 Ti: 200MB.
> So we see a difference of 66MB when only decoding, 69MB when only encoding,
> 69MB when both decoding+encoding.
> I also tested two different driver versions, 384.130 and 390.77, both give
> same results.
> What is interesting is that testing in Windows gives different, lower VRAM
> values for GTX 1070 Ti but still more than GTX 1050 Ti in Linux (I couldn't
> test GTX 1050 Ti in Windows yet). For example, decoding uses 132MB when
> same command in Linux uses 153MB.
>
> I don't think it is a bug in ffmpeg, probably something going on in nvidia
> drivers. Where I could report this to nvidia?
>
[...]

Is this possibly related to the differing architecture (number of Cuda cores
and/or the Framebuffer size) on the different cards? 768 with 4Mb FB on the
GTX 1050 Ti , 2432 with 8MB FB on the 1070 Ti.

I would expect the driver to need to allocate more resources to the latter...

--
==============================================================
Rodney Baker VK5ZTV
[hidden email]
CCNA #CSCO12880208
==============================================================


_______________________________________________
ffmpeg-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".