Wednesday, April 3, 2013

gavf: A multimedia container format for gmerlin


Having programmed a lot of demultiplexers in gmerlin-avdecoder I found out that there is no ideal container format. For my taste an ideal container format
  • Is as codec agnostic as possible, i.e. doesn't require codec specific hacks in (de-)multiplexers. AVI is surprisingly good in this respect. Ogg and mov/mp4 fail miserably here.
  • Supports sample accurate timing. This means that all streams have timestamps in their native timescale. This is solved well in Ogg and mp4, while matroska and many other formats fail.
  • Is fully streamable. This means that a stream can be encoded from a live source and sent over a (non-seekable) channel like a socket. Ogg streams have this property but mov/mp4 doesn't.
  • Is as simple as possible.
Designing a multimedia format for gmerlin was mostly a matter of serializing the C-structs, which were already present in gavl, like A/V formats, compression descriptions and metadata. Furthermore I used some tricks:
  • Use variable length integers like in matroska but extended for 64 bit
  • Introduce so called synchronization headers. They come typically before video keyframes and contain timestamps of the next packets for all elementary streams. If you seek to a sync header you have the full information about the timing when you restart decoding from that point.
  • Write timestamps relative to the last sync header. This means smaller numbers (fewer bytes) but full accuracy and 64 bit resolution. A similar approach is found in matroska files.
  • Eliminate redundant fields. E.g. a video stream with constant framerate and no B-frames doesn't need per-frame timestamps at all.
  • Split global per-stream information into a header (at the beginning of a file) and a footer (at the end of the file). For decoding the file (e.g. when streaming) the header is sufficient. The footer contains e.g. the indices for seeking. In the case of a live stream, there is no footer at all. But it can be generated trivially when the stream is saved to a file.
  • Make bitstream-level parsing of the elemtary streams unnecessary. This means that some fields, which might come handy on the demuxer level, are available in the container format. Examples are the frame type (I-, P- or B-frame) and timecodes.
  • Support arbitrary global and per-stream metadata
  • Allow to update the global metadata on-the-fly. This allows to wrap webradio streams into gavf streams without loosing the song titles.
  • Support chapters and subtitles (text based and graphical).

Now the question is, why yet another multimedia format? Well it's true that there are way too many formats out there as every multimedia programmer knows too well. So let me make clear why I developed gavf. I wanted:
  • to store uncompressed A/V data in all formats, which are supported by gavl. This is especially important for testing and debugging
  • to save a compressed stream (e.g. from an rtsp source) without depending on 3rd party libraries
  • to transfer A/V streams from one gmerlin program to another via a pipe or a socket.
  • to prove, that I can design a format, which is better than all the others :)
All these goals couldn't be met with any existing container format, but they are all met by gavf, so it was worth the effort.

Supported codecs

As mentioned already, gavf supports compressed and uncompressed data. In the uncompressed case, the format is completely defined by the audio- or video format, the ID of the compression info is set to GAVL_CODEC_ID_NONE then. For audio streams, the compression can be one of the following:
  • alaw
  • ulaw
  • mp2
  • mp3
  • AC3
  • AAC
  • Vorbis
  • Flac
  • Opus
  • Speex
For video, we support:
  • JPEG
  • PNG
  • TIFF
  • TGA
  • MPEG-1
  • MPEG-2
  • MPEG-4 (a.k.a. Divx)
  • H.264 (including AVCHD)
  • Theora
  • Dirac
  • DV (several variants)
  • VP8
These allow to wrap a huge number of formats into gavf streams. Adding new codecs is mostly a matter of defining them in gavl/compression.h and adding support for them in gmerlin-avdecoder at least.

Application support

I won't promote gavf as a container format for interchanging multimedia content. In fact the current design makes it even impossible. The gavf format can change without a warning from one gavl version to another. And there are no version fields inside gavf files to ensure backward compatibility. For now, I use it exclusively for passing files between gmerlin applications of the same version.

If, however, someone likes gavf so much that he or she wants it to be more wide spread, it needs some additional work. First of all we need a formal specification document. Secondly we need to add version fields to the internal data structures so one can write backwards compatible (de-)muxers. None of these will be done by me though.

The current svn version of gmerlin has some support for gavf:
  • A reference (de-)multiplexer in gavl (gavl/gavf.h)
  • gavf demultiplexing support in gmerlin-avdecoder
  • An encoder plugin for creating gavf files with gmerlin_transcoder. It supports compressing with the standalone codecs
  • A connector for reading and writing gavf streams via regular files, pipes and sockets. It's the basis of the gavftools, which will be described in another post.

Tuesday, April 2, 2013

Standalone codec plugins for gmerlin

After having implemented the A/V connectors for gmerlin, it was easy to implement standalone codec plugins, which (de-)compress an A/V stream. This means that in addition to simplified A/V processing with on-the-fly format conversion we can also do on-the-fly (de-)compression. There is just one plugin type (for compression and decompression of audio and video): bg_codec_plugin_t defined in gmerlin/plugin.h. In addition to the common stuff (creation, destruction, setting parameters), there are a number of functions, which are specific to codec functionality. For decompression these are:

gavl_audio_source_t * (*connect_decode_audio)(void * priv,
                                              gavl_packet_source_t * src,
                                              const gavl_compression_info_t * ci,
                                              const gavl_audio_format_t * fmt,
                                              gavl_metadata_t * m);

gavl_video_source_t * (*connect_decode_video)(void * priv,
                                              gavl_packet_source_t * src,
                                              const gavl_compression_info_t * ci,
                                              const gavl_video_format_t * fmt,
                                              gavl_metadata_t * m);

The decompressor will get the compressed packets from the packet source. Additional arguments are the compression info, the format (which might be incomplete) and the metadata of the A/V stream. They return an audio- or video source, from where you can read the uncompressed frames.

For opening a compressor, we need to call one of:

gavl_audio_sink_t * (*open_encode_audio)(void * priv,
                                         gavl_compression_info_t * ci,
                                         gavl_audio_format_t * fmt,
                                         gavl_metadata_t * m);

gavl_video_sink_t * (*open_encode_video)(void * priv,
                                         gavl_compression_info_t * ci,
                                         gavl_video_format_t * fmt,
                                         gavl_metadata_t * m);

gavl_video_sink_t * (*open_encode_overlay)(void * priv,
                                           gavl_compression_info_t * ci,
                                           gavl_video_format_t * fmt,
                                           gavl_metadata_t * m);

It will return the sink where we can push the A/V frames. The other arguments are the same as if we open a decoder, but in this case they will be changed by the call. After opening the compressor and before passing the first frame, we need to set a packet sink where the compressed packets will be written:

void (*set_packet_sink)(void * priv, gavl_packet_sink_t * s);

The decompressors work in pull mode, the compressors work in push mode. These are the most suitable modes in typical usage scenarios.

The potential delay between compressed packets and uncompressed frames is handled internally. The decompressor simply reads enough packets to it can output one uncompressed frame. The compressor outputs compressed frame as they become available. When the compressor is destroyed, it might flush it's internal state resulting in one or more compressed packets to be written. This means that at the moment you destroy a compressor, the packet sink must still be able to accept packets.

There are decompressor plugins as part of gmerlin-avdecoder, which handle most formats. The gmerlin-encoders package contains compressor plugins for most formats as well.

Software A/V connectors for gmerlin

As mentioned earlier, I programmed generic connectors for A/V frames and compressed packets. They are much more sophisticated than the old API (based on callback functions), because they also do implicit format conversion and buffer management. The result is a simplified plugin API (consisting of fewer functions) and simplified applications. The stuff is implemented in gavl (include gavl/connectors.h), so it can be used in gmerlin as well as in gmerlin_avdecoder without introducing new library dependencies. There are 3 types of modules:
  • Sources work in pull mode and do format conversion. They are used by input- and recording plugins
  • Sinks work in push mode and are used by output and encoding plugins
  • Connectors connect multiple sinks to a source
Example for the API usage
Assuming you want to read audio samples from a media file and send it to a sink. When you get an audio source (e.g. from gemerlin_avdecoder with bgav_get_audio_source()), your application can look like:

gavl_audio_source_t * src;
gavl_audio_sink_t * sink;
gavl_audio_frame_t * f;
gavl_source_status_t st;

/* Get source */
src = bgav_get_audio_source(dec, 0);

/* Tell the source to deliver the format needed by the sink */
gavl_audio_source_set_dst(src, 0, gavl_audio_sink_get_format(sink));

/* Processing loop */
  /*  Get a frame of internally allocated memory from the sink 
   *  (e.g. shared or mmamp()ed memory). Return value can be NULL.
  f = gavl_audio_sink_get_frame(sink);

  /* Read a frame from the source, if f == NULL we'll get a frame 
   * allocated and owned by the source itself
  st = gavl_audio_source_read_frame(src, &f);

  if(st != GAVL_SOURCE_OK)

  if(gavl_audio_sink_put_frame(sink, f) != GAVL_SINK_OK)

If you want to use the gavl_audio_connector_t, things get even a bit simpler:

gavl_audio_source_t * src;
gavl_audio_sink_t * sink;
gavl_audio_connector_t * conn;

/* Get source */
src = bgav_get_audio_source(dec, 0);

/* Create connector */
conn = gavl_audio_connector_create(src);

/* Connect sink (you can connect multiple sinks) */
gavl_audio_connector_connect(conn, sink);

/* Initialize */

/* Processing loop */

The gmerlin plugin API was changed to use only the sources and sinks for passing around frames. Text subtitles are transported in gavl packets, overlay subtitles are transported in video frames.

In addition to the lower level gavl converters, the sources support some more format conversions. For audio frames, we do buffering such that the number of samples per frame you read from the source can be different from what the source natively delivers. For video, we support a simple framerate conversion, which works by repeating of dropping frames.

The video processing API is completely analogous to the audio API described above. For compressed packets, things are slightly different because we don't do format conversion on compressed packets.

A number of gmerlin modules (e.g. the player and transcoder) are already converted to the new API. In many cases, lots of redundant code could be kicked out, so the resulting code is much simpler and easier to understand.