Elementary stream parsing

See the image below for an updated block-schematics of gmerlin-avdecoder as addition to my last blog entry:

You see, what's new here: A parser, which converts "bad" packets into "good" packets. Now what does that exactly mean? A good (or well formed) packet has the following properties:

It always has a correct timestamp (the presentation time at least)
It has a flag to determine if the packet is a keyframe
It has a valid duration. This is necessary for frame accurate repositioning of the stream after seeking. At least if you want to support variable framerates.

Good packets (e.g. from quicktime files) can directly be passed to the decoder, and it's possible to build an index from them. Unfortunately some formats (most notably MPEG program- and transport streams), don't have such nice packets. You neither know where a frame starts, nor whether it is a keyframe. Large frames can be split across many packets, some packets can contain several small frames. Also timestamps are not available for each frame. To make things worse, these formats are very widely used. You find them in .mpg files, on DVDs, (S)VCDs, BluRay disks and in DVB broadcasts. Also all newer formats for consumer cameras (HDV and AVCHD) use MPEG-2 transport streams. Since they are important for video editing applications, so sample accurate access is a main goal here.

The first solutions for this were realized inside the decoders. libmpeg2 is very tolerant with regard to the frame alignment and ffmpeg has an AVParser, which splits a continuous stream into frames. Additional routines were written for building an index.

It was predictable that this would not be the ultimate solution. The decoders got very complicated and building the index was not possible without firing up an ffmpeg decoder (because the AVParser doesn't tell about keyframes) so index building was very slow.

So I spent some time to write a parsers for elementary A/V streams, which parse the streams to get all necessary information for creating well formed packets.

After that worked, I could be sure, that every codec always gets well-formed packets. What followed then, was by far the biggest cleanup in the history of gmerlin-avdecoder. Many things could be simplified, lots of cruft got kicked out, duplicate code got moved to central locations.

New features are:

Decoding of (and sample accurate seeking within) elementary H.264 streams
Sample accurate access for elementary MPEG-1/2 streams
Sample accurate access for MPEG program streams with DTS- and LPCM audio
Faster seeking because non-reference frames are skipped while approaching the new stream position
Much cleaner and simpler architecture.

The cleaner architecture doesn't necessarily mean less bugs, but they are easier to fix now :)

Truths or lies - decide yourself

Friday, February 27, 2009