FFmpeg single threaded bottleneck

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

FFmpeg single threaded bottleneck

Gabriel Balaich
TLDR; FFmpeg seems to limit certain processes or aspects of a command to a
single processing thread and it's limiting my ability to write / run
complex commands. I'm trying to understand why that is the case and if
there is a way to get around it.

Hey there, about two years ago I inquired via this email list as to why I
may be getting errors like this "real-time buffer [Capture Card] [video
input] too full or near too full (62% of size: 2147480000 [rtbufsize
parameter])! frame dropped!" constantly spammed in my console when trying
to run large / complex FFmpeg commands with a multitude of inputs and
outputs. I thought the issue may have been my inability to increase my
buffer size passed max INT, but was told that that was likely not the
issue. In the end, I ultimately accepted that commands this complex just
weren't possible as I could never really pinpoint the issue. But recently
I've very much wanted to record even more complex sources to multiple
outputs via a single command, and I think I've come closer to finding what
the issue is.

For example, here is a stripped down version of the command I currently use:
 ffmpeg -y `
-thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M -i
audio="Analog (1+2) (RME Fireface UC)" `
-thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M -i
audio="ADAT (5+6) (RME Fireface UC)" `
-thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M
-video_size 1920x1080 -framerate 60 `
-pixel_format yuv420p -i video="Game Capture HD60 Pro (Video)
(#01)":audio="Game Capture HD60 Pro (Audio) (#01)" `
-thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M
-video_size 3440x1440 -framerate 100 `
-pixel_format nv12 -i video="Video (00 Pro Capture HDMI 4K+)":audio="ADAT
(3+4) (RME Fireface UC)" `
-thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M
-video_size 3840x2160 -framerate 60 `
-pixel_format nv12 -i video="AVerMedia HD Capture GC573
1":audio="SPDIF/ADAT (1+2) (RME Fireface UC)" `
-map 2:0,2:1 -map 2:1 -c:v h264_nvenc -preset: hp -r 60 -rc-lookahead 120
-pix_fmt yuv420p -b:v 100M -minrate 100M `
-maxrate 100M -bufsize 100M -c:a aac -ar 44100 -b:a 320k -af
"aresample=async=250" `
-ss 00:00:02.482 -vsync 1 -max_muxing_queue_size 9999
C:\Users\gabri\Videos\FFmpeg\EL\EL.ts `
-map 3:0,3:1 -map 3:1 -c:v h264_nvenc -preset: hp -r 100 -rc-lookahead 200
-pix_fmt nv12 -b:v 288M -minrate 288M `
-maxrate 288M -bufsize 288M -c:a aac -ar 44100 -b:a 320k -af "atrim=0.034,
asetpts=PTS-STARTPTS, aresample=async=250" `
-ss 00:00:01.904 -vsync 1 -max_muxing_queue_size 9999
C:\Users\gabri\Videos\FFmpeg\MW\MW.ts `
-c:v h264_nvenc -preset: hp -r 60 -rc-lookahead 120 -pix_fmt nv12 `
-b:v 288M -minrate 288M -maxrate 288M -bufsize 288M -c:a aac -ar 44100 -b:a
320k `
-filter_complex (('"[4:v]setpts=PTS-STARTPTS[v1]',
'[4:a]atrim=1.615,asetpts=PTS-STARTPTS,aresample=async=250[a1]',
'[0:a]atrim=4.879,asetpts=PTS-STARTPTS,aresample=async=250[a2]',
'[1:a]atrim=4.199,asetpts=PTS-STARTPTS,aresample=async=250[a3]"') -join
';') `
-map "[v1]" -map "[a1]" -map "[a2]" -map "[a3]" `
-vsync 1 -max_muxing_queue_size 9999
C:\Users\gabri\Videos\FFmpeg\AM\AM%02d.ts

Just to quickly explain I'm recording 3 video sources and 5 audio sources.
For the video, I'm recording a camera (4k60), my main computer monitor
(3440x1440@100FPS), and then a video game console (1920x1080p60, though I'd
like to record 4k on this one too). And for the audio, I'm recording
Discord (voice chat), my microphone EQd and compressed, my raw microphone
at a lower gain (incase the treated one clips so I have something to fall
back on), the audio from my desktop computer, and the audio from whatever
game console I'm capturing at any given moment. The end result, I'm
recording basically everything I'm doing at my setup simultaneously and
synchronized for easy editing with something like Adobe Premiere in post,
with the benefit of having every source recorded to its own output stream
for individual modification.

When I run this command I get the "real-time buffer [Capture Card] [video
input] too full or near too full (62% of size: 2147480000 [rtbufsize
parameter])! frame dropped!" spammed in my console when I start the command
and when I end the command. But only at the beginning and end like I just
described, for the most part, the command runs "normally" and in real-time,
albeit just barely. Seeing as the error is pointing out that my buffer is
too full I was under the impression the issue was the buffer (as previously
stated), but upon further inspection, I can see that FFmpeg is completely
saturating one of my processing threads when running the above command:
[image: image.png]

As a result, when I open something on the computer while running this
command I drop frames, even something as simple as the calculator app,
because anything that requests even a little bit of power from that thread
takes it away from FFmpeg which is capping it out. Furthermore if I try to
add further complexity to my command, like say another 4k capture card with
a 4K input and output to the command (to replace the 1080p card) it just
can't run in real-time, my RAM usage just keeps increasing until my RAM is
completely saturated and my computer bluescreens. Inversely, if I run each
output from the first command as a separate process in three seperate
instances of powershell as opposed to all in one like this:
ffmpeg -y `
-thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M -i
audio="Analog (1+2) (RME Fireface UC)" `
-thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M -i
audio="ADAT (5+6) (RME Fireface UC)" `
-thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M
-video_size 3840x2160 -framerate 60 `
-pixel_format nv12 -i video="AVerMedia HD Capture GC573
1":audio="SPDIF/ADAT (1+2) (RME Fireface UC)" `
-c:v h264_nvenc -preset: hp -r 60 -rc-lookahead 120 -pix_fmt nv12 `
-b:v 288M -minrate 288M -maxrate 288M -bufsize 288M -c:a aac -ar 44100 -b:a
320k `
-filter_complex (('"[2:v]setpts=PTS-STARTPTS[v1]',
'[2:a]atrim=1.615,asetpts=PTS-STARTPTS,aresample=async=250[a1]',
'[0:a]atrim=4.879,asetpts=PTS-STARTPTS,aresample=async=250[a2]',
'[1:a]atrim=4.199,asetpts=PTS-STARTPTS,aresample=async=250[a3]"') -join
';') `
-map "[v1]" -map "[a1]" -map "[a2]" -map "[a3]" `
-vsync 1 -max_muxing_queue_size 9999 C:\Users\gabri\Videos\AM.ts

ffmpeg -y `
-thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M
-video_size 1920x1080 -framerate 60 `
-pixel_format yuv420p -i video="Game Capture HD60 Pro (Video)
(#01)":audio="Game Capture HD60 Pro (Audio) (#01)" `
-map 0:0,0:1 -map 0:1 -c:v h264_nvenc -preset: hp -r 60 -rc-lookahead 120
-pix_fmt yuv420p -b:v 100M -minrate 100M `
-maxrate 100M -bufsize 100M -c:a aac -ar 44100 -b:a 320k -af
"aresample=async=250" `
-ss 00:00:02.482 -vsync 1 -max_muxing_queue_size 9999
C:\Users\gabri\Videos\EL.ts

ffmpeg -y `
-thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M
-video_size 3440x1440 -framerate 100 `
-pixel_format nv12 -i video="Video (00 Pro Capture HDMI 4K+)":audio="ADAT
(3+4) (RME Fireface UC)" `
-map 0:0,0:1 -map 0:1 -c:v h264_nvenc -preset: hp -r 100 -rc-lookahead 200
-pix_fmt nv12 -b:v 288M -minrate 288M `
-maxrate 288M -bufsize 288M -c:a aac -ar 44100 -b:a 320k -af "atrim=0.034,
asetpts=PTS-STARTPTS, aresample=async=250" `
-ss 00:00:01.904 -vsync 1 -max_muxing_queue_size 9999
C:\Users\gabri\Videos\MW.ts


But still, at the same time (all three commands started and ended within
less than half a second), the load seems to be more evenly distributed with
no core consistently over 60% usage:
[image: image.png]

Furthermore, there are no real-time buffer errors or dropped frames at all,
all three commands start and end gracefully, and if I open another basic
application while running these commands separately they carry on without a
hiccup. So it would seem pretty obvious to me that the problem was never
the buffer size, but that a process within a single instance of FFmpeg is
limited to a single processing thread, regardless of how many inputs and
outputs you have. Basically, from what I can tell, FFmpeg is incapable of
scaling vertically.

My plan is to upgrade my capture PC with a 16 core Threadripper (1950X) and
another 4K capture card so I can capture from three 4k cards simultaneously
with FFmpeg while simultaneously running an instance of OBS that is
streaming to Twitch and Discord at the same time. The reason for this is so
I can stream at 1080p with the scene functionality of OBS, but then record
each source at its native resolution / FPS to separate streams using FFmpeg
for easy / high-quality post-production editing, as opposed to just
recording with OBS in-which each video source is baked into one stream at a
much lower resolution and framerate. This way if I wanted to remove my
camera from the from a clip, grab something from a scene that wasn't
currently displaying in the active OBS scene, or full screen my camera /
gameplay everything stays crisp because with FFmpeg each video source is
being outputted to its own video stream at full quality.

However, I don't think it's possible for me to do that with a single FFmpeg
command with how FFmpeg is currently functioning in accordance with my
testing. My only option is to run each command separately which on the
surface seems like a fairly simple thing to do, the only problem is...
getting them to start at the same time is no simple task. Even when using a
separate script to start each command as their own processes
programmatically I have a possible drift between each source of about half
a second, as opposed less than a tenth of a second when running everything
as one command. Basically, it's hard to keep things synchronized. On top of
that, it's difficult to manage the output of each command through a single
terminal, and trying to end each process gracefully with "q" or [ctr+c] is
nigh impossible without dipping into an actual programming language that
allows you to start a process or multiple processes while retaining the
ability to send input to the STDIN.

So really my main questions are:
1) What part of an FFmpeg process is limited to a single processing thread?
2) Why is that process limited to a single processing thread?
3) Is it possible to force the said process to use multiple processing
threads?

I've attached the full console output for the two above scenarios I
provided earlier (one command vs separate commands), keep in mind that
while the second scenario is using multiple FFmpeg processes they are still
being run simultaneously, so each input and output is being run at the same
time, just like the first command.

_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".

image.png (172K) Download Attachment
image.png (131K) Download Attachment
Combined Command Log.txt (24K) Download Attachment
Separate Command Log.txt (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: FFmpeg single threaded bottleneck

kumowoon1025
Hi,

Some values don't look right, try getting rid of them.
-thread_queue_size 9999 seems arbitrary, it is queue length, not bytes
-indexmem 9999 seems arbitrary, pretty sure default value is bigger
-rtbufsize 2147.48M is kind of abusive, especially for the audio inputs

I don't think you should be trying to buffer more, if the buffer keeps growing then it won't last.

I can't really tell what the dshow input mapping looks like, but I think this is about the limit of your system.
With a 6800K, assuming the GPU is full sized,  are there enough lanes left for 3 additional capture cards?
Using the hardware encoder for so many streams at once might also have to do with it, you could try saving
the raw input to fast enough scratch disk to check for that quickly.

Regards,
Ted Park
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: FFmpeg single threaded bottleneck

Gabriel Balaich
Thanks for the feedback.

On Thu, 14 May 2020 at 01:52, Edward Park <[hidden email]> wrote:

> Hi,
>
> Some values don't look right, try getting rid of them.
> -thread_queue_size 9999 seems arbitrary,

it is queue length, not bytes

I was getting the error "Thread message queue blocking; consider raising
the thread_queue_size option" when I left -thread_queue_size at default,
the reason I set it at 9999 is that that is the max it will let me set it
before the command just errors out. When I remove "-thread_queue_size 9999"
the errors come back and I drop a massive amount of frames even when doing
a single 4k60 input / output.

-indexmem 9999 seems arbitrary, pretty sure default value is bigger
>
-indexmem is one of the magical options I never really understood, I added
it at some point (over 2 years ago at least) hoping it would solve this
issue. I can't seem to find any information on what the default is, and
when I remove it from the command it doesn't change the results. That being
said any single option's relevancy in regards to my commands, at least as
far as I can tell, is pretty low considering that everything works just
fine with every option I have when I'm running each input(s) / output in
its own instance of FFmpeg yet simultaneously.


> -rtbufsize 2147.48M is kind of abusive, especially for the audio inputs
>
> I don't think you should be trying to buffer more, if the buffer keeps
> growing then it won't last.
>
I couldn't try more buffer if I wanted to, 2147.48M (max INT) is the
maximum buffer size allowed. But even then it only overfills if the
hardware can't keep up, which is only shown to be the case when transcoding
over 9K60 worth of video in a single FFmpeg instance.


> I can't really tell what the dshow input mapping looks like, but I think
> this is about the limit of your system.
> With a 6800K, assuming the GPU is full sized,  are there enough lanes left
> for 3 additional capture cards?
>
As seen in the screenshots my 6800k is only being overly taxed if I'm
running all the inputs / outputs in one command / one instance of FFmpeg, *and
only on one thread with plenty of headroom left on all other threads* (see
task manager screenshots). When I separate them into multiple commands
running in different processes, but still at the same time with all the
same options, the 6800k isn't even at 35% total usage with plenty of
headroom per thread. So it seems pretty clear to me that the 6800k is not
the bottleneck, even so, I'm replacing it with a Threadripper (1950x, 2.5
times as powerful as my 6800k) as described in the original message so I
can have headroom to run FFmpeg and OBS at the same time.


> Using the hardware encoder for so many streams at once might also have to
> do with it, you could try saving
> the raw input to fast enough scratch disk to check for that quickly.

I'm using a GTX 1080 which has dual NVENC processing chips (see NVIDIA
encode matrix:
https://developer.nvidia.com/video-encode-decode-gpu-support-matrix), as
can be seen in my screenshots the encoder is only at 40% usage, and while
Nvidia typically only allows you to do 2-3 encodes at once it's a
pseudo-limitation enforced by software which can be bypassed with a patch:
https://github.com/keylase/nvidia-patch

Just to further show that the hardware is not yet an issue in itself, I can
run 4 separate 4k60 transcodes simultaneously in real-time using just the
6800k and the GTX 1080 with 30% headroom left on the CPU, 20% headroom on
the GPUs encoding chips, multiple gigabytes of VRAM still available on the
GPU, over 12gb of available system memory, and below 30% SSD usage. The one
caveat being that each input / output has to be running in *separate
instances of FFmpeg*, as soon as I try to transcode more than 9K60 in a *single
FFmpeg command / instance* a single thread on my 6800K will reach 100%,
despite the rest of the chip having 70% headroom, and then the command gets
behind filling the buffer until there is no memory left.

Just to make it clear, from my extensive testing the issue only presents
itself when running massive commands with 3x or more 4K60 transcodes in *one
instance* of *FFmpeg*, *when I run them separately but still at the same
time I have zero issues*... Other than the fact that I have to run them in
separate instances which is what I'm trying to avoid due to synchronization
issues, among others. What I'm really trying to determine here is what part
of a single FFmpeg instance is being limited to 1 thread when transcoding
3+ 4k60 streams.
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: FFmpeg single threaded bottleneck

Brainiarc7
On Thu, May 14, 2020, 12:33 Gabriel Balaich <[hidden email]> wrote:

> Thanks for the feedback.
>
> On Thu, 14 May 2020 at 01:52, Edward Park <[hidden email]> wrote:
>
> > Hi,
> >
> > Some values don't look right, try getting rid of them.
> > -thread_queue_size 9999 seems arbitrary,
>
> it is queue length, not bytes
>
> I was getting the error "Thread message queue blocking; consider raising
> the thread_queue_size option" when I left -thread_queue_size at default,
> the reason I set it at 9999 is that that is the max it will let me set it
> before the command just errors out. When I remove "-thread_queue_size 9999"
> the errors come back and I drop a massive amount of frames even when doing
> a single 4k60 input / output.
>
> -indexmem 9999 seems arbitrary, pretty sure default value is bigger
> >
> -indexmem is one of the magical options I never really understood, I added
> it at some point (over 2 years ago at least) hoping it would solve this
> issue. I can't seem to find any information on what the default is, and
> when I remove it from the command it doesn't change the results. That being
> said any single option's relevancy in regards to my commands, at least as
> far as I can tell, is pretty low considering that everything works just
> fine with every option I have when I'm running each input(s) / output in
> its own instance of FFmpeg yet simultaneously.
>
>
> > -rtbufsize 2147.48M is kind of abusive, especially for the audio inputs
> >
> > I don't think you should be trying to buffer more, if the buffer keeps
> > growing then it won't last.
> >
> I couldn't try more buffer if I wanted to, 2147.48M (max INT) is the
> maximum buffer size allowed. But even then it only overfills if the
> hardware can't keep up, which is only shown to be the case when transcoding
> over 9K60 worth of video in a single FFmpeg instance.
>
>
> > I can't really tell what the dshow input mapping looks like, but I think
> > this is about the limit of your system.
> > With a 6800K, assuming the GPU is full sized,  are there enough lanes
> left
> > for 3 additional capture cards?
> >
> As seen in the screenshots my 6800k is only being overly taxed if I'm
> running all the inputs / outputs in one command / one instance of FFmpeg,
> *and
> only on one thread with plenty of headroom left on all other threads* (see
> task manager screenshots). When I separate them into multiple commands
> running in different processes, but still at the same time with all the
> same options, the 6800k isn't even at 35% total usage with plenty of
> headroom per thread. So it seems pretty clear to me that the 6800k is not
> the bottleneck, even so, I'm replacing it with a Threadripper (1950x, 2.5
> times as powerful as my 6800k) as described in the original message so I
> can have headroom to run FFmpeg and OBS at the same time.
>
>
> > Using the hardware encoder for so many streams at once might also have to
> > do with it, you could try saving
> > the raw input to fast enough scratch disk to check for that quickly.
>
> I'm using a GTX 1080 which has dual NVENC processing chips (see NVIDIA
> encode matrix:
> https://developer.nvidia.com/video-encode-decode-gpu-support-matrix), as
> can be seen in my screenshots the encoder is only at 40% usage, and while
> Nvidia typically only allows you to do 2-3 encodes at once it's a
> pseudo-limitation enforced by software which can be bypassed with a patch:
> https://github.com/keylase/nvidia-patch
>
> Just to further show that the hardware is not yet an issue in itself, I can
> run 4 separate 4k60 transcodes simultaneously in real-time using just the
> 6800k and the GTX 1080 with 30% headroom left on the CPU, 20% headroom on
> the GPUs encoding chips, multiple gigabytes of VRAM still available on the
> GPU, over 12gb of available system memory, and below 30% SSD usage. The one
> caveat being that each input / output has to be running in *separate
> instances of FFmpeg*, as soon as I try to transcode more than 9K60 in a
> *single
> FFmpeg command / instance* a single thread on my 6800K will reach 100%,
> despite the rest of the chip having 70% headroom, and then the command gets
> behind filling the buffer until there is no memory left.
>
> Just to make it clear, from my extensive testing the issue only presents
> itself when running massive commands with 3x or more 4K60 transcodes in
> *one
> instance* of *FFmpeg*, *when I run them separately but still at the same
> time I have zero issues*... Other than the fact that I have to run them in
> separate instances which is what I'm trying to avoid due to synchronization
> issues, among others. What I'm really trying to determine here is what part
> of a single FFmpeg instance is being limited to 1 thread when transcoding
> 3+ 4k60 streams.
> _______________________________________________
> ffmpeg-user mailing list
> [hidden email]
> https://ffmpeg.org/mailman/listinfo/ffmpeg-user
>
> To unsubscribe, visit link above, or email
> [hidden email] with subject "unsubscribe".



Disable Game Mode in Windows 10 and retest.
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: FFmpeg single threaded bottleneck

Gabriel Balaich
>
> Disable Game Mode in Windows 10 and retest.
>
Just tried disabling game mode and unfortunately, I still have one thread
being completely maxed out when running all three outputs in one instance
of FFmpeg like so:
ffmpeg -y `
-thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M -i
audio="Analog (1+2) (RME Fireface UC)" `
-thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M -i
audio="ADAT (5+6) (RME Fireface UC)" `
-thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M
-video_size 1920x1080 -framerate 60 `
-pixel_format yuv420p -i video="Game Capture HD60 Pro (Video)
(#01)":audio="Game Capture HD60 Pro (Audio) (#01)" `
-thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M
-video_size 3440x1440 -framerate 100 `
-pixel_format nv12 -i video="Video (00 Pro Capture HDMI 4K+)":audio="ADAT
(3+4) (RME Fireface UC)" `
-thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M
-video_size 3840x2160 -framerate 60 `
-pixel_format nv12 -i video="AVerMedia HD Capture GC573
1":audio="SPDIF/ADAT (1+2) (RME Fireface UC)" `
-map 2:0,2:1 -map 2:1 -c:v h264_nvenc -preset: hp -r 60 -rc-lookahead 120
-pix_fmt yuv420p -b:v 100M -minrate 100M `
-maxrate 100M -bufsize 100M -c:a aac -ar 44100 -b:a 320k -af
"aresample=async=250" `
-ss 00:00:02.482 -vsync 1 -max_muxing_queue_size 9999
C:\Users\gabri\Videos\EL.ts `
-map 3:0,3:1 -map 3:1 -c:v h264_nvenc -preset: hp -r 100 -rc-lookahead 200
-pix_fmt nv12 -b:v 288M -minrate 288M `
-maxrate 288M -bufsize 288M -c:a aac -ar 44100 -b:a 320k -af "atrim=0.034,
asetpts=PTS-STARTPTS, aresample=async=250" `
-ss 00:00:01.904 -vsync 1 -max_muxing_queue_size 9999
C:\Users\gabri\Videos\MW.ts `
-c:v h264_nvenc -preset: hp -r 60 -rc-lookahead 120 -pix_fmt nv12 `
-b:v 288M -minrate 288M -maxrate 288M -bufsize 288M -c:a aac -ar 44100 -b:a
320k `
-filter_complex (('"[4:v]setpts=PTS-STARTPTS[v1]',
'[4:a]atrim=1.615,asetpts=PTS-STARTPTS,aresample=async=250[a1]',
'[0:a]atrim=4.879,asetpts=PTS-STARTPTS,aresample=async=250[a2]',
'[1:a]atrim=4.199,asetpts=PTS-STARTPTS,aresample=async=250[a3]"') -join
';') `
-map "[v1]" -map "[a1]" -map "[a2]" -map "[a3]" `
-vsync 1 -max_muxing_queue_size 9999 C:\Users\gabri\Videos\AM.ts
[image: image.png]

As before, when I run each output in its own instance of FFmpeg that thread
doesn't even reach 60% usage. Maybe worth mention / pointing out, when I'm
running all the inputs / outputs in one instance it's always an issue with
that thread specifically, the one at index 4 (when starting at 0) as can be
seen in the screenshot above.

Also, just as a sanity check, my photos are showing right? I guess I didn't
stop to think that this email list may not support in-line images.

_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".

image.png (157K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: FFmpeg single threaded bottleneck

Carl Zwanzig

ISTR a thread here not that long ago about threading and hardware
acceleration; may want to search that out.

z!
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: FFmpeg single threaded bottleneck

Gabriel Balaich
So just read a question on stack exchange in-which someone was experiencing
something similar to me, it looked like the issue may have been the audio
codec being used was limited to one thread for processing, here is the
question:
https://video.stackexchange.com/questions/15996/ffmpeg-encoding-and-core-usage

So I decided to remove every audio device / stream from a command for
testing:
ffmpeg -y `
-thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M
-video_size 3440x1440 -framerate 100 `
-pixel_format nv12 -i video="Video (00 Pro Capture HDMI 4K+)" `
-map 0 -c:v h264_nvenc -preset: hp -r 100 -rc-lookahead 200 -pix_fmt nv12
-b:v 288M -minrate 288M `
-maxrate 288M -bufsize 288M -an `
-vsync 1 -max_muxing_queue_size 9999 `
C:\Users\gabri\Videos\MW1.ts `
-map 0 -c:v h264_nvenc -preset: hp -r 100 -rc-lookahead 200 -pix_fmt nv12
-b:v 288M -minrate 288M `
-maxrate 288M -bufsize 288M -an `
-vsync 1 -max_muxing_queue_size 9999 `
C:\Users\gabri\Videos\MW2.ts `
-map 0 -c:v h264_nvenc -preset: hp -r 100 -rc-lookahead 200 -pix_fmt nv12
-b:v 288M -minrate 288M `
-maxrate 288M -bufsize 288M -an `
-vsync 1 -max_muxing_queue_size 9999 `
C:\Users\gabri\Videos\MW3.ts `
-map 0 -c:v h264_nvenc -preset: hp -r 100 -rc-lookahead 200 -pix_fmt nv12
-b:v 288M -minrate 288M `
-maxrate 288M -bufsize 288M -an `
-vsync 1 -max_muxing_queue_size 9999 `
C:\Users\gabri\Videos\MW4.ts

3440x1440@100FPS is essentially 4K60 in terms of bandwidth, so with the
above command, I am essentially encoding 16K60 in real-time (see attached
log for full output). For a second I thought I may have found the issue,
that being that the audio processing is being limited to one thread...
Alas, that same thread on my processor that seems to be getting most of the
processing load on the commands with audio streams included is still
getting most of the processing load without the audio streams included,
albeit to a lesser extent:
https://i.postimg.cc/2j9LLSRc/image.png
[image: image.png]

Though it's interesting nonetheless that when encoding that much more video
in a single command the thread in question is under a less stressful load
than when encoding much less video but several audio streams. This seems to
just further confirm to me that some aspect of every transcode in a single
FFmpeg instance that is being run through a single processing thread, which
is where my bottleneck resides.

ISTR a thread here not that long ago about threading and hardware
> acceleration; may want to search that out.
>
Thanks for the heads up, but I can't seem to find it in the last 100
threads, I'll keep looking.

_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".

image.png (130K) Download Attachment
4x 3440x1440@100FPS.txt (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: FFmpeg single threaded bottleneck

Carl Eugen Hoyos-2
Am Do., 14. Mai 2020 um 21:50 Uhr schrieb Gabriel Balaich
<[hidden email]>:
>
> So just read a question on stack exchange in-which someone was experiencing
> something similar to me, it looked like the issue may have been the audio
> codec being used was limited to one thread for processing, here is the
> question:
> https://video.stackexchange.com/questions/15996/ffmpeg-encoding-and-core-usage

There is no multi-threaded audio encoding.

> So I decided to remove every audio device / stream from a command for
> testing:
> ffmpeg -y `
> -thread_queue_size 9999 -indexmem 9999 -f dshow -rtbufsize 2147.48M
> -video_size 3440x1440 -framerate 100 `
> -pixel_format nv12 -i video="Video (00 Pro Capture HDMI 4K+)" `
> -map 0 -c:v h264_nvenc -preset: hp -r 100 -rc-lookahead 200 -pix_fmt nv12
> -b:v 288M -minrate 288M `
> -maxrate 288M -bufsize 288M -an `
> -vsync 1 -max_muxing_queue_size 9999 `
> C:\Users\gabri\Videos\MW1.ts `
> -map 0 -c:v h264_nvenc -preset: hp -r 100 -rc-lookahead 200 -pix_fmt nv12
> -b:v 288M -minrate 288M `
> -maxrate 288M -bufsize 288M -an `
> -vsync 1 -max_muxing_queue_size 9999 `
> C:\Users\gabri\Videos\MW2.ts `
> -map 0 -c:v h264_nvenc -preset: hp -r 100 -rc-lookahead 200 -pix_fmt nv12
> -b:v 288M -minrate 288M `
> -maxrate 288M -bufsize 288M -an `
> -vsync 1 -max_muxing_queue_size 9999 `
> C:\Users\gabri\Videos\MW3.ts `
> -map 0 -c:v h264_nvenc -preset: hp -r 100 -rc-lookahead 200 -pix_fmt nv12
> -b:v 288M -minrate 288M `
> -maxrate 288M -bufsize 288M -an `
> -vsync 1 -max_muxing_queue_size 9999 `
> C:\Users\gabri\Videos\MW4.ts
>
> 3440x1440@100FPS is essentially 4K60 in terms of bandwidth, so with the
> above command, I am essentially encoding 16K60 in real-time (see attached
> log for full output). For a second I thought I may have found the issue,
> that being that the audio processing is being limited to one thread...
> Alas, that same thread on my processor that seems to be getting most of the
> processing load on the commands with audio streams included is still
> getting most of the processing load without the audio streams included,
> albeit to a lesser extent:

Is the issue you see only reproducible with dshow input or also with testsrc2
input?

I suspect there will always be a bottleneck in a load situation like above
but it may be interesting to narrow it down.

Carl Eugen
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".