Export raw (binary) subtitle data

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Export raw (binary) subtitle data

Johannes Bauer
Hi there,

I recently got a dashcam and noticed that mplayer spits out odd warnings
about the contained subtitles:

Subtitle word 'pie~ejpgnlekkjgijeiliojpeeeicko' too long!

$ ffmpeg -i 06031028_0041.MOV
[...]
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '06031028_0041.MOV':
  Metadata:
    major_brand     : qt
    minor_version   : 0
    compatible_brands: qt
    creation_time   : 2017-06-03 10:28:50
    comment         : AMBARELLA A7L
  Duration: 00:03:00.00, start: 0.020000, bitrate: 13059 kb/s
    Stream #0:0(eng): Video: h264 (Main) (avc1 / 0x31637661),
yuvj420p(pc), 1280x720 [SAR 1:1 DAR 16:9], 11998 kb/s, 50 fps, 50 tbr,
1200k tbn, 100 tbc (default)
    Metadata:
      creation_time   : 2017-06-03 10:28:50
      handler_name    : Ambarella AVC
      encoder         : Ambarella AVC encoder
    Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz,
stereo, fltp, 127 kb/s (default)
    Metadata:
      creation_time   : 2017-06-03 10:28:50
      handler_name    : Ambarella AAC
    Stream #0:2(eng): Subtitle: mov_text (text / 0x74786574), 0 kb/s
(default)
    Metadata:
      creation_time   : 2017-06-03 10:28:50
      handler_name    : Ambarella EXT


I do suspect that this "subtitle" information is raw position/speed data
from the camera. Certainly looks like it. I'd like to export these
subtitles using ffmpeg, then "render" them (i.e., reverse engineer the
format, then make something human-readable of them) and then
re-integrate into the videostream. That way, I could later define what I
want to see (or even render multiple different subtitle streams which
would be pretty cool).

So far, I've had no luck exporting the subtitles, however:

# ffmpeg -i 06031028_0041.MOV y.srt
[...]
Output #0, srt, to 'y.srt':
  Metadata:
    major_brand     : qt
    minor_version   : 0
    compatible_brands: qt
    comment         : AMBARELLA A7L
    encoder         : Lavf56.40.101
    Stream #0:0(eng): Subtitle: subrip (srt) (default)
    Metadata:
      creation_time   : 2017-06-03 10:28:50
      handler_name    : Ambarella EXT
      encoder         : Lavc56.60.100 srt
Stream mapping:
  Stream #0:2 -> #0:0 (mov_text (native) -> subrip (srt))
Press [q] to stop, [?] for help
size=      16kB time=00:02:59.02 bitrate=   0.8kbits/s

Gives something like this:
1
00:00:00,020 --> 00:00:01,020
<font face="(null)" size="0" color="#000000">{\an7}</font>

2
00:00:01,020 --> 00:00:02,020
<font face="(null)" size="0" color="#000000">{\an7}</font>

3
00:00:02,020 --> 00:00:03,020
<font face="(null)" size="0" color="#000000">{\an7}</font>

4
00:00:03,020 --> 00:00:04,020
<font face="(null)" size="0" color="#000000">{\an7}</font>
[...]

I.e., an entry every second, but obviously unable to decode it.

Is there a way to export the raw binary subtitle data just the way it's
interleaved within the video to a file using ffmpeg? If so, I'd greatly
appreciate any pointers.

Thanks very much,
Cheers,
Joe
_______________________________________________
ffmpeg-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: Export raw (binary) subtitle data

Moritz Barsnick
On Sat, Jun 03, 2017 at 15:33:22 +0200, Johannes Bauer wrote:
> Subtitle word 'pie~ejpgnlekkjgijeiliojpeeeicko' too long!

Hmm, it looks like their display duration isn't defined correctly,
perhaps. (That's an mplayer message, not libav*).

> Is there a way to export the raw binary subtitle data just the way it's
> interleaved within the video to a file using ffmpeg? If so, I'd greatly
> appreciate any pointers.

I don't use it myself, but I just tried, and this gives you
*something*, by playing a copy of the subtitle stream into a file by
using the raw data muxer:

$ ffmpeg -i 06031028_0041.MOV -map 0:s -c copy -f data 06031028_0041.mov_text.dat

You would need to understand what raw mov_text looks like, in order to
build your own parser. ;-)

Alas, the timestamps also get lost - they're not partof the payload.
You may need to parse them out differently (e.g. from ffprobe or from
the SRT conversion).

HTH,
Moritz
_______________________________________________
ffmpeg-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: Export raw (binary) subtitle data

Johannes Bauer
Hi Moritz,

On 05.06.2017 15:59, Moritz Barsnick wrote:
> On Sat, Jun 03, 2017 at 15:33:22 +0200, Johannes Bauer wrote:
>> Subtitle word 'pie~ejpgnlekkjgijeiliojpeeeicko' too long!
>
> Hmm, it looks like their display duration isn't defined correctly,
> perhaps. (That's an mplayer message, not libav*).

Oh, I thought it maybe was the actual length of the word that it
complained about.

>> Is there a way to export the raw binary subtitle data just the way it's
>> interleaved within the video to a file using ffmpeg? If so, I'd greatly
>> appreciate any pointers.
>
> I don't use it myself, but I just tried, and this gives you
> *something*, by playing a copy of the subtitle stream into a file by
> using the raw data muxer:
>
> $ ffmpeg -i 06031028_0041.MOV -map 0:s -c copy -f data 06031028_0041.mov_text.dat

Yes! This seems to be what I'm looking for!

> You would need to understand what raw mov_text looks like, in order to
> build your own parser. ;-)

Yup, I'll try to get reversing right away.

> Alas, the timestamps also get lost - they're not partof the payload.
> You may need to parse them out differently (e.g. from ffprobe or from
> the SRT conversion).

Hmmm, yes, unfortunate. I was hoping to get that information as well,
but I think that every line corresponds to one occurence -- so I can put
everything back together.

Thanks so much for your help. I'm really curious to see if I can figure
this out.

One thing made me wonder, however: What if the binary data contained
'\n'? Then it probably would cause me some grief, wouldn't it? Some
binary dump format (e.g. timestamp duration base64-encoded) would be
ideal to avoid this. But I'm happy with what I have now :-)

Thanks again!
Cheers,
Joe
_______________________________________________
ffmpeg-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: Export raw (binary) subtitle data

Moritz Barsnick
On Mon, Jun 05, 2017 at 19:35:34 +0200, Johannes Bauer wrote:
> Oh, I thought it maybe was the actual length of the word that it
> complained about.

I'm not sure, I didn't look into the source code generating that
message.

> Hmmm, yes, unfortunate. I was hoping to get that information as well,
> but I think that every line corresponds to one occurence -- so I can put
> everything back together.
>
> Thanks so much for your help. I'm really curious to see if I can figure
> this out.
>
> One thing made me wonder, however: What if the binary data contained
> '\n'? Then it probably would cause me some grief, wouldn't it? Some
> binary dump format (e.g. timestamp duration base64-encoded) would be
> ideal to avoid this. But I'm happy with what I have now :-)

Yes, if you rely on "one line is one geo-position", your logic gets
borked by additional pseudo-EOL.

Another thing came to mind, since you asked: There was a patch on
ffmpeg-devel for a "textdata" muxer (and demuxer), which would be able
to represent arbitrary data in base64, and also support timestamps:
https://ffmpeg.org/pipermail/ffmpeg-devel/2016-May/194445.html

That patch apparently never even got reviewed, or it was obsoleted by
one of Stefano's other patches. I still build my ffmpeg with it. It
does the trick quite niftily:

$ ffmpeg -i test.mov -map 0:s -c copy -f fftextdata -
[...]
0:00:00.000000
AAA=
;
0:00:03.160000
AENJIDE5NDQgc3RhcnRldCBodXNtw7hkcmVuZSAiIkhqZW1tZXRzCmZvcnNrbmluZ3NpbnN0aXR1dHQiIiwgSEZJLCAt
;
0:00:09.156000
AAA=
;
0:00:09.360000
ADctIGh2b3IgaHVzaG9sZG5pbmdzZWtzcGVydGVyCmpvYmJlciBtZWQgZWtzcGVyaW1lbnRlciAt
;

That's your raw mov_text in base64, with timestamps, and actually quite
parsable! (It looks like there are some extra bytes in the output,
perhaps those markers from the raw mov_text.)

You could have probably also requested your SRT output not to use ASS
tags. I'm not sure SRT is willing to carry arbitrary binary data
though, that's why base64 is better.

But please keep in mind that when going from subtitle to something
else, you are most likely losing some properties - most notably the
duration. In your case - using the assumption that it's actually a
geo-position correlated with the timestamps - that seems tolerable.

Gruß,
Moritz

P.S.: I owe you at least two beers if you figure out from which movie
that is. ;-) (I mapped that movie's SRT arbitrarily into a MOV file
with mov_text, for testing.)
_______________________________________________
ffmpeg-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: Export raw (binary) subtitle data

Johannes Bauer
On 05.06.2017 21:31, Moritz Barsnick wrote:

> On Mon, Jun 05, 2017 at 19:35:34 +0200, Johannes Bauer wrote:
>> Oh, I thought it maybe was the actual length of the word that it
>> complained about.
>
> I'm not sure, I didn't look into the source code generating that
> message.
>
>> Hmmm, yes, unfortunate. I was hoping to get that information as well,
>> but I think that every line corresponds to one occurence -- so I can put
>> everything back together.
>>
>> Thanks so much for your help. I'm really curious to see if I can figure
>> this out.
>>
>> One thing made me wonder, however: What if the binary data contained
>> '\n'? Then it probably would cause me some grief, wouldn't it? Some
>> binary dump format (e.g. timestamp duration base64-encoded) would be
>> ideal to avoid this. But I'm happy with what I have now :-)
>
> Yes, if you rely on "one line is one geo-position", your logic gets
> borked by additional pseudo-EOL.

Yup, but I figured out so far that the data contains length info, so it
could be parsed properly... but this here:

> Another thing came to mind, since you asked: There was a patch on
> ffmpeg-devel for a "textdata" muxer (and demuxer), which would be able
> to represent arbitrary data in base64, and also support timestamps:
> https://ffmpeg.org/pipermail/ffmpeg-devel/2016-May/194445.html

Would be sooooo much nicer! It's pretty much *perfectly* what I wanted!

I'll try to build ffmpeg myself (currently using the Ubuntu one) and
patch this in. Also going to subscribe to ffmpeg-dev in order to ask why
it isn't mainlined. It is *super* useful.

Thank you so much again!

Cheers,
Joe

> P.S.: I owe you at least two beers if you figure out from which movie
> that is. ;-) (I mapped that movie's SRT arbitrarily into a MOV file
> with mov_text, for testing.)

Oh man, no chance. Here's my wild guess:
http://www.imdb.com/title/tt3213684/ -- I don't speak any Norwegian, so
I had Google help me :-)
_______________________________________________
ffmpeg-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: Export raw (binary) subtitle data

Johannes Bauer
In reply to this post by Moritz Barsnick
Hi Moritz,

On 05.06.2017 21:31, Moritz Barsnick wrote:
> But please keep in mind that when going from subtitle to something
> else, you are most likely losing some properties - most notably the
> duration. In your case - using the assumption that it's actually a
> geo-position correlated with the timestamps - that seems tolerable.

Just a quick followup: https://github.com/johndoe31415/dcstdecode

You're mentioned in the Acknowledgements. Again thank you very much for
your help, I wouldn't have been able to figure this out myself.

Cheers,
Joe
_______________________________________________
ffmpeg-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: Export raw (binary) subtitle data

Moritz Barsnick
On Sat, Jun 10, 2017 at 15:45:00 +0200, Johannes Bauer wrote:
> Just a quick followup: https://github.com/johndoe31415/dcstdecode
>
> You're mentioned in the Acknowledgements. Again thank you very much for
> your help, I wouldn't have been able to figure this out myself.

Reverse-engineering FTW! Cool that you figured this all out and put the
info up for everyone to find. And thanks for the credits, I appreciate
it.

Moritz
_______________________________________________
ffmpeg-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".