Programmatically combining audio and video streams
I have a code base that currently expects audio and video to be in the same AVFormatContext, and this works great for sources (typically network connections) that contain both audio and video multiplexed together, such as in MP4 or AVI container.
I’m trying to get the same code to work with separate containers for audio and video, such as youtube Webm URLs for separate audio and video.
I don’t want to have to rewrite a lot of the existing code.
I was wondering if it would be reasonably straightforward (say 100 lines of code or less) to write a function that combines two separate AVFormatContexts into a new composite AVFormatContext containing 2 streams (the component streams of the 2 donor AVFormatContexts).
This is, as I said, using network connections so data will be arriving asynchronously, i.e. with respect to the 2 connections possibly being out-of-phase and they’ll need to be synchronized (wrt buffering) so that matching PTS’s can be read out of both streams… but also the data will be arriving willy-nilly, in arbitrary chunks that don’t correspond to frame or (high-level) packet boundaries.
The implication being that just because a socket has data on it ready to read, doesn’t mean it’s enough to form a complete packet, and hence a call to av_read_packet() might block (or more accurately, keep calling the read_packet() handler until it’s acquired enough data). This is not friendly to single-threading… so the only way I can think to do this is to have one thread per connection, have them each call av_read_packet() for their respective connections, and then synchronize on their respective timecodes.
Kind of ugly.
I’m reading from buffers where I know how much data is in each connection’s buffer, but not necessarily how much data is required to read a full packet, for instance. (If I were reading from a file where all the data were already present, this would be trivial because av_packet_read() would never block.)
I figure this must be a fairly common operation to do, so someone must have figured out a reasonable approach.
Ideally I’d like the two combined connections on the same AVFormatConnection so my existing code could be convinced it’s reading from a single connection.
But if that’s not feasible, whatever requires the least amount of restructuring of the existing code (or maybe just the least overall complexity) would be great.
Pointers to snippets of code that I could leverage would be great! I was looking at doc/examples/remuxing.c but couldn’t figure out how to add more streams to an existing output AVFormatContext, etc.
Re: Programmatically combining audio and video streams
On Thu, May 10, 2018, at 1:04 PM, Philip Prindeville wrote:
> I have a code base that currently expects audio and video to be in the
> same AVFormatContext, and this works great for sources (typically
> network connections) that contain both audio and video multiplexed
> together, such as in MP4 or AVI container.
This mailing list (ffmpeg-user) is only for questions involving the FFmpeg command-line tools (ffmpeg, ffplay, ffprobe). Usage questions involving the FFmpeg libraries (libavcodec, libavformat, etc) should be asked at libav-user.