Sunday, March 15, 2009

Dirac video

Libquicktime and gmerlin-avdecoder now support Dirac in quicktime. En- and decoding is done with the libschrödinger library. Having already implemented support for lots of other video codecs I noticed some things, both positive and negative.

Positive
  • Very precise specification of the uncompressed video format. Interlacing (including field order) is stored in the stream as well as singal ranges (video range or full range). This brings direct support for lots of colormodels.

  • Support for > 8 bit encoding. This is really a rare feature. While ffmpeg always sticks with 8 bit even for codecs with 10 bit or 12 bit modes, the libschrödinger API has higher precision options. Not sure if these modes are really supported internally by now.

  • It seems to aim for scalability from low-end internet downloads to intra-only modes for video editing applications. A lossless mode is also there. Whether it performs equally well for all usage scenarios is yet to be found out.

  • It pretends to be patent free. But since the patent jungle is so dense, it is almost impossible to prove this.

  • It has the BBC behind it, which hopefully means serious development, funding and a chance for a wide deployment.
Negative

Sequence end code in an own sample
In the quicktime mapping specification it is required, that the sequence end code (a 13 byte string telling that the stream ends here) must be in an own sample. This is a mess, since for all Quicktime codecs I know (even the most disgusting ones) 1 sample always corresponds to 1 frame. Having a sample, which is no frame, screws up the timing because there is a "duration" associated with the sequence-end-sample, which makes the total stream duration seem larger than is actually is. Also, a frame accurate demuxer will expect one frame more than the file actually has. For both libquicktime and gmerlin-avdecoder I wrote workarounds for this (they simply discard the last packet).

If I had written the mapping spec, I would require the sequence end code to be appended to the last frame (in the same sample). In addition it can be made optional since it's not really needed in quicktime.

If libquicktime encodes dirac files, everything is done according to the spec. Conformance to the spec is more important than my personal opinion about it :)

No ctts atom required
Quicktime timestamps (as given by the stts atom) are decoding timestamps. For all low-delay streams (i.e. streams without B-frames), these are equal to the presentation timestamps. For H.264 and MPEG4 ASP streams with B-frames, the ctts atom specifies the difference between PTS and DTS for each frame and lets the demuxer calculate correct presentation timestamps without touching the video data. If the ctts atom is missing, such quicktime files become as disgusting as AVIs with B-frames. Unfortunately the ctts atom isn't required by the mapping spec, which means we'll see such files in the wild.

The good news is, that the ctts atom isn't mentioned at all in the spec. From this I conclude that it is not forbidden either. Therefore, libquicktime always writes a ctts atom if the stream has B-frames.

On the decoding side (in gmerlin-avdecoder), the quicktime demuxer checks if a dirac stream has a ctts atom. If yes, it is used and everything is fine. If not (i.e. if the file wasn't written by libquicktime), a parser is fired up and an index must be built if one wants sample accuracy. The good news is that the parser is pretty simple and the same thing is needed anyway for decoding dirac in MPEG transport streams.

Other news

gmerlin-avdecoder also supports Dirac in Ogg and MPEG-2 transport streams.