Thursday, November 13, 2008

Gmerlin pipelines explained

Building multimedia software on top of gavl saves a lot of time others spend on writing optimized conversion routines (gavl already has more than 2000 of them) and bullet-proof housekeeping functions.

On the other hand, gavl is a low-level library, which leaves lots of architectural decisions to the application level. And this means, that gavl will not provide you with fully featured A/V pipelines. Instead, you have to write them yourself (or use libgmerlin and take a look at include/gmerlin/filters.h and include/gmerlin/converters.h).

I'm not claiming to have found the perfect solution for the gmerlin player and transcoder, but nevertheless here is how it works:

Building blocks
The pipelines are composed of
  • A source plugin, which gets A/V frames from a media file, URL or a hardware device
  • Zero or more filters, which somehow change the A/V frames
  • A destination plugin. In the player it displays video or sends audio to the soundcard. For the transcoder, it encodes into media files.
  • Format converters: These are inserted on demand between any two of the above elements
Asynchronous pull approach
The whole pipeline is pull-based. Pull-based means, that each component requests data from the preceeding component. Asynchronous means that (in contrast to plain gavl), we make no assumption on how many frames/samples a component needs at the input for producing one output frame/sample. This makes it possible to do things like framerate conversion or framerate-doubling deinterlacing. As a consequence, filters and converters which remember previous frames need a reset function to forget about them (the player e.g. calls them after seeking).

Unified callbacks
In modular applications it's always important that modules know as little as possible about each other. For A/V pipelines this means, that each component gets data from the preceeding component using a unified callback, no matter if it's a filter, converter or source. There are prototypes in gmerlin/plugin.h
typedef int (*bg_read_audio_func_t)(void * priv, gavl_audio_frame_t* frame, int stream,
int num_samples);

typedef int (*bg_read_video_func_t)(void * priv, gavl_video_frame_t* frame, int stream);
These are provided by input plugins, converters and filters. The stream argument is only meaningful for media files which have more than one audio or video stream. How the pipeline is exactly constructed (e.g. if intermediate converters are needed) matters only during initialization, not in the time critical processing loop.

Asynchronous vs synchronous
As noted above, some filter types are only realizable if the architecture is asynchronous. Another advantage is that for a filter, the input- and output frame can be the same (in-place conversion). E.g. the timecode tweak filter of gmerlin looks like:
typedef struct
{
bg_read_video_func_t read_func;
void * read_data;
int read_stream;

/* Other stuff */
/* ... */
} tc_priv_t;

static int read_video_tctweak(void * priv, gavl_video_frame_t * frame,
int stream)
{
tc_priv_t * vp;
vp = (tc_priv_t *)priv;

/* Let the preceeding element fill the frame, return 0 on EOF */
if(!vp->read_func(vp->read_data, frame, vp->read_stream))
return 0;

/* Change frame->timecode */
/* ... */

/* Return success */
return 1;
}
A one-in-one-out API would need to memcpy the video data only for changing the timecode.

Of course in some situations outside the scope of gmerlin, asynchronous pipelines can cause problems. This is especially the case in editing applications, where frames might be processed out of order (e.g. when playing backwards). How to solve backwards playback for filters, which use previous frames, is left to the NLE developers. But it would make sense to mark gmerlin filters, which behave synchronously (most of them actually do), as such so we know we can always use them.

No comments: