I am streaming audio between two linux machines using ffmpeg and ffplay. The
sender is using rtsp to transport audio using a USB microphone over TCP. The
receiver (listener) is receiving and playing the audio received using
Using /‘ashowinfo’/, I am able to see a line containing various information
for each input audio frame. The option /nostats/ is used to avoid other
stats interleaved within the log output. “n” denotes the (sequential) number
of the input frame, starting from 0. The receiver and sender are both using
the /-af/ option flag to see the frame counts. However when the stream ends,
the audio frame numbers are completely off as shown below. I have searched
before, but have not seen any specific or technical algorithm that ffmpeg
uses to justify how frame counts are defined. Also I am not sure why my
sender with the microphone audio has various nb_sample sizes with each frame
increment? I have noticed that the audio streamed from rtsp is always
received as having each frame with "nb_sample" of 1024. I have tried adding
the /frame_size/ option but with no change in consistent number of samples
per frame input for the sender. The aim is to have a consist method to
correlate where the sender and receiver are at once the audio stream has
Below are the log files outputted for each command with their respective
After discussing with support within the #ffmpeg IRC channel, this approach
doesn't seem to be too reliable or consistent to keep track of frame count
from sender and receiver. A "frame" in ffmpeg at the decoding side is just
"however many samples the decoder decoded from one packet". The thing with
the audio the audio formats have packets that contain frames of differing
size so input and outputs can differ in count.
It was recommended that if 100% sure that receiver will get all packets,
then I should check the count of packets fed to muxer, and then the count of
packets read from demuxer, those in theory should match. How ever on the
reciever end it was always missign a couple of packets that were not
repeated. This could be depended on how the sender was stopped, it might be
possible that there was some audio that passed through the filter chain but
didn't get encoded/sent.
True. Having multiple -f options on the output section of the ffmpeg sender
command line is unnecessary and has been adjusted since originally posted.
Only the last one takes effect, but doesn't trigger an error.
The input is a stream you’re reading at real time, the concept of frames as when you decode a format that’s framed for network streaming doesn’t apply the same.
> The aim is to have a consist method to correlate where the sender and
> receiver are at once the audio stream has been stopped.
Isn’t that a pretty consistent method in itself? They’re both at the end. If synchronization throughout is important, maybe you could have separate processes for recording and streaming, and the server and receiver process can refer to an external clock.