Monday, December 14, 2009

Flash-free live web video solution

Some time ago, we all knew that video on the web was equivalent to the proprietary Flash technology. Also I used to say, that there might be political or psychological reasons for using Ogg/Theora, but never technical ones.

Well, the conditions have changed recently so it's time to bring an update on this.

HTML 5 video tag
The HTML 5 draft supports a <video> tag for embedding a video into a webpage as a native html element (e.g. without a plugin). Earlier versions of the draft even recommended that browsers should support Ogg/Theora as a format for the video. The Ogg/Theora recommendation was then removed and a lot of discussion was started around this. This wikipedia article summarizes the issue. Nevertheless, there are a number of browsers supporting OggTheora video out of the box, among these is Firefox-3.5.

Cortado plugin
In a different development, the Cortado java applet for theora playback was written. It is GPL licensed and you can just download it and put it into your webspace.

Now the cool thing about the <video> tag is, that browsers, which don't know about it, will display the contents between <video> and </video>, so you can include the applet code there. Researching a bit about the best way to do this, I read that the (often recommended) <applet> mechanism is not valid html. A better solution is here.

Webmasters side
Now if you have the live-stream at http://192.168.2.2:8000/stream.ogg your html page will look like:
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body>
<h1>Test video stream</h1>
<video tabindex="0"
src="http://192.168.2.2:8000/stream.ogg"
controls="controls"
alt="Test video stream"
title="Test video stream"
width="320"
height="240"
autoplay="true"
loop="nolooping">
<object type="application/x-java-applet"
width="320" height="240">
<param name="archive" value="cortado.jar">
<param name="code" value="com.fluendo.player.Cortado.class">
<param name="url" value="http://192.168.2.2:8000/stream.ogg">
<param name="autoplay" value="true">
<a href="http://192.168.2.2:8000/stream.ogg">Test video stream</a>
</object>
</video>
</body>
</html>

Note that for live-streams the "autoplay" option should be given. If not, firefox will try to load the first image (automatically anyway) to show it in the video widget. Then it will stop downloading the live-stream until you click start. Pretty obvious that this will mess up live-streaming.


Server side
For live streaming I just installed the icecast server, which came with my ubuntu. I just changed the passwords in /etc/icecast/icecast.xml, enabled the server in /etc/default/icecast2 and started it with /etc/init.d/icecast2 start.

Upstream side
There are lots of programs for streaming to icecast servers, most of them use libshout. I decided to create a new family of plugins for gmerlin: Broadcasting plugins. They have the identical API as encoder plugins (used by the transcoder or the recorder). The only difference is, that they don't produce regular files and must be realtime capable.

Using libshout is extremely simple:
  • Create a shout instance with shout_new()
  • Set parameters with the shout_set_*() functions
  • Call shout_open() to actually open the connection to the icecast server
  • Write a valid Ogg/Theora stream with shout_send(). Actually I took my already existing Ogg/Theora encoder plugin and replaced all fwrite() calls by shout_send().
See below for the libshout configuration of the gmerlin Ogg broadcaster.


Of course, some minor details were left out in my overview, read the libshout documentation for them. As upstream client, I use my new recorder application.

The result
See below a screenshot from firefox while it plays back a live stream:


Open Issues
The live-video experiment went extremely smooth. I discovered however some minor issues, which could be optimized away:
  • Firefox doesn't recognize live-streams (i.e. streams with infinite duration) properly. It displays a seek-slider which always sticks at the very end. Detecting a http stream as live can easily be done by checking the Content-Length field of the http response header.
  • The theora encoder (1.1.1 in my case) might be faster than the 1.0.0 series, but it's still way too slow. Live encoding of a 320x240 stream is possible on my machine but 640x480 isn't.
  • The cortado plugin has a loudspeaker icon, but no volume control (or it just doesn't work with my Java installation)
Other news
  • With the video switched off I can send audio streams to my wlan radio. This makes my older solution (based on ices2) obsolete.
  • The whole thing of course works with prerecorded files as well. In this case, you can just put the files into your webspace and your normal webserver will deliver them. No icecast needed.

Sunday, December 6, 2009

Introducing Gmerlin recorder

Gmerlin-recorder is a small application, which records audio and video from hardware devices. It was written as a more generic application, which should eventually replace camelot. See below for a screenshot:

As sources, we support:
  • Audio devices via OSS, Alsa, Esound, Pulseaudio and Jack
  • Webcams via V4L and V4L2
  • X11 grabbing
For monitoring you see the video image in the recorder window. For the audio you have a vumeter. With recorded streams you can do lots of stuff:
  • Filter the streams using gmerlin A/V filters
  • Write the streams to files using gmerlin encoder plugins
  • Send the encoded stream to an icecast server (will be described in detail in another blog post)
  • Save snapshots to images either manually or automatically
With all these features combined in the right way, you can use gmerlin-recorder for a large number of applications:
  • Make vlogs with your webcam and a microphone
  • Digitize analog music
  • Make a video tutorial for your application
  • Start your broadcasting station
  • Send music from your stereo to your WLAN radio
  • Make stop-motion movies
  • ...
Until the recorder can fully replace camelot, we need the following features, which are not there yet:
  • Fullscreen video display
  • Forwarding of the video stream via vloopback

Sunday, November 15, 2009

X11 grabbing howto

One gmerlin plugin I always wanted to program was an X11 grabber. It continuously grabs either the entire root window, or a user defined rectangular area of it. It is realized as a gmerlin video recorder plugin, which means that it behaves pretty much like a webcam.

Some random notes follow.

Transparent grab window
The rectangular area is defined by a grab window, which consists only of a frame (drawn by the window manager). The window itself must be completely transparent such that mouse clicks are sent to the window below. This is realized using the X11 Shape extension:
XShapeCombineRectangles(dpy, win,
ShapeBounding,
0, 0,
(XRectangle *)0,
0,
ShapeSet,
YXBanded);
The grab window can be moved and resized to define the grabbing area. This is more convenient than entering coordinates manually. After a resize, the plugin must be reopened because the image size of a gmerlin video stream cannot change within the processing loop.

Make the window sticky and always on top
Depending on the configuration the grab window can always be on top of the other windows. There is also a sticky option making the window appear on all desktops. This is done with the collowing code, which must be called each time before the window is mapped:
if(flags & (WIN_ONTOP|WIN_STICKY))
{
Atom wm_states[2];
int num_props = 0;
Atom wm_state = XInternAtom(dpy, "_NET_WM_STATE", False);
if(flags & WIN_ONTOP)
{
wm_states[num_props++] =
XInternAtom(dpy, "_NET_WM_STATE_ABOVE", False);
}
if(flags & WIN_STICKY)
{
wm_states[num_props++] =
XInternAtom(dpy, "_NET_WM_STATE_STICKY", False);
}

XChangeProperty(dpy, win, wm_state, XA_ATOM, 32,
PropModeReplace,
(unsigned char *)wm_states, num_props);
}

Grabbing methods
The grabbing itself is done on the root window only. This will grab all other windows inside the grab area. The easiest method is XGetImage, but that allocates a new image with each call. malloc()/free() cycles within the processing loop should be avoided whenever possible. XGetSubImage() allows to pass an allocated image. Much better of course is XShmGetImage(). It was roughly 3 times faster than XGetSubImage() in my tests.

Coordinate correction
If parts of the grabbing rectangle are outside the root window, you'll get a BadMatch error (usually exiting the program), no matter which function you use for grabbing. You must handle this case and correct the coordinates to stay within the root window.

Mouse cursor
Grabbing works for everything displayed on the screen (including XVideo overlays) except the mouse cursor. It must be obtained and drawn "manually" onto the grabbed image. Coordinates are read with XQueryPointer(). The cursor image can be obtained if the XFixes extension is available. First we request cursor change events for the whole screen with
XFixesSelectCursorInput(dpy, root,
XFixesDisplayCursorNotifyMask);
If the cursor changed and is within the grabbing rectangle we get the image with
im = XFixesGetCursorImage(dpy);
The resulting cursor image is then converted to a gavl_overlay_t and blended onto the grabbed image with a gavl_overlay_blend_context_t.

Not done (yet)
Other grabbing programs deliver images only when something has changed (resulting in a variable framerate stream). This can be achieved with the XDamage extension. Since the XDamage extension is (like many other X11 extensions) poorly documented, I didn't bother to implement this yet.

One alternative is to use gmerlins decimate video filter, which compares the images in memory. The result will be the same, but CPU usage will be slightly increased.

Saturday, October 31, 2009

gavl color channels

When programming something completely unrelated, I stumbled across a missing feature in gavl: Extract single color channels from a video frame into a grayscale frame. The inverse operation is to insert a color channel from a grayscale frame into a video frame (overwriting the contents of that channel). This allows you to assemble an image from separate color planes.

Both were implemented with a minimalistic API (1 enum and 3 functions), which works for all 35 pixelformats. First of all we have an enum for the channel definitions:
typedef enum
{
GAVL_CCH_RED, // Red
GAVL_CCH_GREEN, // Green
GAVL_CCH_BLUE, // Blue
GAVL_CCH_Y, // Luminance (also grayscale)
GAVL_CCH_CB, // Chrominance blue (aka U)
GAVL_CCH_CR, // Chrominance red (aka V)
GAVL_CCH_ALPHA, // Transparency (or, to be more precise, opacity)
} gavl_color_channel_t;

For getting the exact grayscale format for one color channel you first call:
int gavl_get_color_channel_format(const gavl_video_format_t * frame_format,
gavl_video_format_t * channel_format,
gavl_color_channel_t ch);
It returns 1 on success or 0 if the format doesn't have the requested channel. After you have the channel format, extracting and inserting is done with:
int gavl_video_frame_extract_channel(const gavl_video_format_t * format,
gavl_color_channel_t ch,
const gavl_video_frame_t * src,
gavl_video_frame_t * dst);

int gavl_video_frame_insert_channel(const gavl_video_format_t * format,
gavl_color_channel_t ch,
const gavl_video_frame_t * src,
gavl_video_frame_t * dst);
In the gmerlin tree, there are test programs exctractchannel and insertchannel which test the functions for all possible combinations of pixelformats and channels. They are in the gmerlin tree and not in gavl because we load and save the test images with gmerlin plugins.

Saturday, October 24, 2009

Major gmerlin player upgrade

While working on several optimization tasks of the player engine, I found out that the player architecture sucked. So I made a major upgrade (well, a downgrade actually since lots of code was kicked out). Let me elaborate what exactly was changed.

Below you see a block schematics of the player engine as it was before (subtitle handling is omitted for simplicity):
The audio- and video frames were read by the input thread from the file, pulled through the filter chains (see here) and pushed into the fifos.

The output threads for audio and video pulled the frames from the fifos and sent them to the soundcard or the display window.

The idea behind the separate input thread was that if CPU load is high and decoding a frame takes longer than it should, the output threads can still continue with the frames buffered in the fifos. It turned out that this was the only advantage of this approach, and it only worked if the average decoding time was still less than realtime.

The major disadvantage is, that if you have fifos with frames pushed at the input and pulled at the output, the system becomes very prone to deadlocks. If fact, the code for the fifos became bloated and messy over the time.

While programming a nice new feature (updating the video display while the seek slider is moved), the playback was messed up after seeking and I quickly blamed the fifos for this. The resulted was a big cleanup, the result is shown below:
You see, that the input thread and fifos are completely removed. Instead, the input plugin is protected by a simple mutex and the output threads do the decoding and processing themselves. The advantages are obvious:
  • Much less memory usage (one video frame instead of up to 8 frames in the fifo)
  • Deadlock conditions are much less likely (if not impossible)
  • Much simpler design, bugs are easier to fix
The only disadvantage is that if a file cannot be decoded in realtime, audio and video run out of sync. In the old design the input thread took care that the decoding of the streams didn't run too much out of sync. For these cases, I need to implement frame skipping. This can be done in several steps:
  • If the stream has B-frames, skip them
  • If the stream has P-frames, skip them (display only I-frames)
  • Display only every nth I-frame with increasing n
Frame skipping is the next major thing to do. But with the new architecture it will be much simpler to implement than with the old one.

Friday, September 25, 2009

Frame tables

When thinking a bit about video editing, I thought it might be nice to have the timestamps of all video frames in advance, i.e. without decoding the whole file. With that you can e.g. seek to the Nth frame or for a given absolute time you can find the closest video frame even for variable framerate files.

The seeking functions always take a (scaled) time as argument and I saw no reason to change that.

So what I needed was a table which allows the translation of framecounts to/from PTS values even for variable framerate files. For many fileformats a similar information is already stored internally (as a file index) so the task was only to convert the info into something usable and export it through the public API.

Since this feature is very generic and might get used in both gmerlin and gmerlin-avdecoder, I decided to put the stuff into gavl.

Implicit table compression
One very smart feature in the Quicktime format is the stts atom. That's because it stores a table of frame_count/frame_duration pairs. The nice thing is, that for constant framerate files (the vast majority) the table consists of just one entry and translating framecounts to/from timestamps becomes trivial. Only for variable framerate streams, the table has more entries and the translation functions need longer.

The implementation
The frame table is called gavl_frame_table_t and is defined in gavl.h. Note: The structure is public at present but might become private before the next public release.
Here, there are also the translation functions gavl_frame_table_frame_to_time(), gavl_frame_table_time_to_frame() and gavl_frame_table_num_frames().

If you use gmerlin-avdecoder for decoding files, you can use bgav_get_frame_table() to obtain a frame table of a video stream. It can be called after the stream has been fully initialized. Naturally you will want to use sample accurate decoding mode before obtaining the frame table. If you want to reuse the frame table after the file was closed, use gavl_frame_table_save() and gavl_frame_table_load().

Future extension
An also interesting information are the timecodes. First of all, what's the difference between timecodes and timestamps in gmerlin terminology? Timestamps are used to synchronize multiple streams (audio, video, subtitles) of a file with each other for playback. They usually start at zero or another small value and play an important role e.g. after seeking in the files. Timecodes are usually given for each video frame and are used to identify scenes in a bunch of footage. They can resemble e.g. recording time/date.

Timecodes will also be supported by the frame table, but this isn't implemented yet.

Tuesday, September 15, 2009

Back from Malta

After a number of adventure- and/or conference trips, I decided to make a more ordinary vacation. Since the goal was to extend the summer a bit, we decided to travel in the first 2 September weeks (when German weather can already start to suck) and move to the southernmost country, which can be reached without many difficulties. Easy traveling for Europeans means to stay within the EU (and Euro zone), so the destination was chosen to be Malta.

I knew a bit about Malta from TV documentaries, and while reading the Wikipedia article a bit, I became even more curious.

Here are some facts I figured out. For pictures click on the Wikipedia links, they have higher quality than the ones from my crappy mobile.
  • It was damn hot and humid. Not all people from the Northern countries can withstand that. I was happy to find out that after surviving 4 weeks in the monsoon season in Southern India, I'm more heat resistant than my Greek companion.
  • It is indeed very stress-free because everyone speaks English (unlike in other Southern European countries) and the islands are small enough to reach practically every destination by bus. Just make sure you stay in the capital Valletta, almost all bus routes start there. On the smaller island Gozo, all busses start in Victoria.
  • It has an extremely rich 7000 years old history. You can visit prehistoric temples (probably the oldest of their kind worldwide), churches, fortresses and remains from the Romans and Phoenicians. I already saw many churches from the inside before, I was not really a fan of Latin language and Roman history in school, and my companion didn't want to bother looking for sight seeing destinations at all. So I decided to concentrate on the prehistoric stuff.
  • We saw the temples of Ħaġar Qim, Mnajdra (both covered with giant protective tents now), Tarxien and Ġgantija. The latter ones were the most impressive, especially after we reached them after a long walk in the afternoon heat 20 minutes before the last admission :)
  • The temples of Skorba are the oldest ones, but not very spectacular and not accessible for the public.
  • No chance to get into the Hypogeum of Ħal-Saflieni. It was booked out 4 weeks in advance. That sucked.
  • If you like beaches, Malta is definitely not the #1 destination for you. There are just a few, a nice one we visited is Ramla Bay.
  • If you like diving at cliffs, Malta is a paradise for you.
  • Funny is the bit of British culture (Malta was British until 1964). People are driving on the left, you get chips with most meals, and the phone booths and mailboxes are red, like in the UK.
  • Some Maltese people know more German soccer teams than me.
Shortly after coming back home, German weather started to suck.

Friday, August 14, 2009

Quick & dirty fix for the latest linux NULL pointer vulnerability

This one is pretty scary. It is the result of several flaws in SELinux, pulseaudio and some obscure network protocols. Proper fixing of this would require work at many places in the code.

Up to now, Ubuntu doesn't have a patched kernel. In the meantime, place the following into the modprobe configuration:
install appletalk /bin/true
install ipx /bin/true
install irda /bin/true
install x25 /bin/true
install pppox /bin/true
install bluetooth /bin/true
install sctp /bin/true
install ax25 /bin/true
Then either unload these modules by hand (if they are loaded) or reboot the machine. One some systems I had to uninstall bluetooth support, which wasn't needed anyway. Naturally these protocols will stop working, but fortunately the exploit will stop working either :)

Friday, August 7, 2009

Spambot ladies @ blogspot

She left this heart warming comment on my blog. Out of curiosity, I wanted to see what Martha aka Susan blogs herself. Then I wondered, how a person, who isn't even able to blog one meaningful sentence, can be interested in my articles. But look, this lady seems to have an extremely wide range of interests (although she's not very imaginative in commenting):


She has multiple blogs under different names, and some people even talk to her as if she was a human.

Let's see how this develops in the future. Jack Crossfire, the original author of cinelerra and quicktime4linux (from which we forked libquicktime), also has something to say on this matter.

Saturday, August 1, 2009

AVCHD timecodes revealed

When playing with my new Canon HF200 camera, I got curious where the recording time (and date) is hidden in the AVCHD format.

The first idea was the SEI pic timing message of the H.264 stream. I already parse it for getting information whether pictures are frame- or field coded. So I extended my code to parse the timecode in HH:MM:SS:FF format, only to find out, that this info isn't present at all in my files :(

Googling for more informations about that, I found that nobody knows how to get the recording time and even professional programs fail to display it. But some very few programs do, so we know that it must be coded in the transport stream itself (and not in the other files written by the camera).

Finally I found this perl script, which extracts the date and time from canon mts files. It's a pretty simple implementation: It scans the multiplexed transport stream for a particular bit-pattern and then extracts the data. The script works for Canon-files but fails e.g. for Panasonic files.

Then I found where exactly the information is located: A H.264 stream has SEI (supplemental enhancement information) messages, which can contain additional (e.g. timing) information. For each SEI message the parser can obtain the message type (an integer) and the size of the message in bytes. AVCHD files have SEI messages of type 5, which means "user data unregistered" (== proprietary extension). The H.264 standard says, that these messages start with a 16 byte GUID followed by the payload.

Now take a look at the hexdump of such an SEI message:
17 ee 8c 60 f8 4d 11 d9 8c d6 08 00 20 0c 9a 66 ...`.M...... ..f
4d 44 50 4d 09 18 02 20 09 08 19 01 01 25 45 70 MDPM... .....%Ep
c7 f2 ff ff 71 ff ff ff ff 7f 00 00 65 84 e0 10 ....q.......e...
11 30 02 e1 07 ff ff ff ee 19 19 02 00 ef 01 c0 .0..............
00 00 ..
From this I found the following structure:
  • The GUID is the first 16 bytes. It's always the same for the info we want, but I found other SEI messages of type 5 with different GUIDs in AVCHD files.
  • 4 characters "MDPM". They occur in all files I looked at.
  • An unknown byte (0x09, other vendors have other values)
  • The byte 0x18 (probably indicating that year and month follow)
  • An unknown byte (0x02, other vendors have other values)
  • The year and month in 3 BCD coded bytes: 0x20 0x09 0x08
  • The byte 0x19 (probably indicating that day and time follow)
  • An unknown byte (0x01, other vendors have other values)
  • The day, hour, minute and second as 4 BCD encoded bytes (0x01 0x01 0x25 0x45)
In this case, I extract the recording time "2009-08-01 01:25:45" (which is correct).

The remainder of the SEI is completely unknown, but I'm sure if someone would figure out the complete data structure (including the unknown bytes), one might be able to extract other interesting informations.

These messages are present for almost all frames, but I plan to read them only from the first frame because the following ones are redundant.

Next project will be to clean up the parsing code in gmerlin-avdecoder and make the timecode actually appear along with the first decoded frame.

Monday, July 20, 2009

GPU accelerated H.264 decoding

A while ago I bought a new camera and I had to learn that my dual core machine isn't able to play footage encoded in highest quality level in realtime (second highest quality works). Fortunately, ffplay can't do that either so it's not gmerlins fault.

H.264 decoding on the graphics card can be done with vdpau or vaapi. The former is Nvidia specific and libavcodec can use it for H.264. The latter is vendor independent (it can use vdpau as backend on nvidia cards) but H.264 decoding with vaapi is not supported by ffmpeg yet.

In principle I prefer vendor independent solutions, but since I need H.264 support and ATI cards suck anyway on Linux, I tried VDPAU first.

The implementation in my libavcodec video frontend was straightforward after studying the MPlayer source. The VDPAU codecs are completely separated from the other codecs. They can simply be selected e.g. with avcodec_find_decoder_by_name("h264_vdpau"). Then, one must supply callback functions for get_buffer, release_buffer and draw_horiz_band. That's because the rendering targets are no longer frames in memory but rather handles of data-structures on the GPU. See here and here to see the details.

After the decoding, the image data is copied to memory by calling VdpVideoSurfaceGetBitsYCbCr. This brings of course a severe slowdown. A much better way would be to keep the frames in graphics memory as long as possible. But this needs to be done in a much more generic way: Images can be VDPAU or VAAPI video surfaces, OpenGL textures or whatever. Implementing generic support for video frames, which are not in regular RAM, will be another project.

Wednesday, July 1, 2009

Linux serial port driver madness

Serial ports are almost extinct on desktop PCs because everyone uses USB or ethernet. But for many automation tasks, RS-232 connections are still state of the art. That's mostly because they are well standardized since the 60s and UART chips are cheap.

I wanted to write a program, which uses a serial port for controlling 2 stepper motors. I wanted to write it such that users have the least possible trouble. Especially during the startup phase, a dialog box should show all available serial ports and let the user select the right one.

And while trying to get all available serial ports, I stumbled across 3 nasty linux bugs:

Bug 1: Always 4 serial port devices
One of the intentions of udev was that only the devices, which are physically present on the system, appear as nodes in the /dev directory. A big step forward compared to the situation before. Unfortunately, the serial driver always creates /dev/ttyS[0-3]. The reason (forgot the link) is that the ports of some exotic multi I/O board aren't detected properly. So the fix was to get 1000s of users into trouble instead of just making one special module parameter for that board.

Bug 2: open() succeeds for nonexistant ports
The manual page of open(2) says, that if one tries to open a device node with no physical device is behind it, errno is set to ENODEV or ENXIO. In the case of a non existing serial port, a valid filedescriptor is returned

Bug 3: tcgetattr returns EIO
According to the glibc manual EIO is usually used for physical read/write errors. For regular files this error means, that you should backup your data because the disk is about to die. A physical read/write error can never happen on a device which is physically nonexistant.

The second best thing would be to return ENXIO (no such device or address). The best thing would be to return EBADF (bad file descriptor) because the open() call before would have returned -1 already.

At least tcgetattr() doesn't succeed like open() so it can be used for the detection routine below.

The solution
The good thing is that once there is a workaround, the problem is quickly forgotten. Here's mine (returns 1 if the port is a physically existant serial device, 0 else):
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <termios.h>
#include <unistd.h>

int is_serial_port(const char * device)
{
int fd, ret = 1;
struct termios t;
fd = open(device, O_RDWR);

if(fd < 0)
return 0;

if(tcgetattr(fd, &t))
ret = 0;
close(fd);
return ret;
}

Of course this won't find ports, which are currently opened by another application.

Thursday, June 25, 2009

What to expect for the remaining century

Web censorship by police agencies, terrorism of the music industry against ordinary people and other scaring stories indicate that society changes dramatically these days. These changes are in my opinion hopelessly underestimated by most. But what exactly is happening? I'll try to explain it with some examples from history.

Technological developments are triggers for social changes
It has been always like this: If a new technology appears, it offers both chances and challenges. Chances to solve some problems of mankind come always along with the challenge to use it responsibly. If you look back in history, you'll find many examples, where inventions resulted in revolutionary changes of society.

6000 BC
One of the earliest known examples for this phenomenon is the invention of irrigation around 6000 BC in Mesopotamia. It triggered the following chain reaction:
  • Farmland became more efficient allowing a higher population density
  • Higher population density results either in self-extinction, mass exodus or a better organization. The Mesopotamians were smart enough to choose the latter.
  • Suddenly there was some essential infrastructure (the irrigation channels), which had to be maintained. People who maintain irrigation channels cannot work on the fields at the same time, so the division of labor had to be invented
  • If you have division of labor, you need trade
  • If you have trade, you need money
  • If you have money, you need mathematics
  • Money and trade hardly work if you don't invent some alphabet
What looks like an amazingly fast transition from an archaic rural society to an urban civilization is, in my opinion, simply a result of some smart farmers no longer wanting to wait for rain.

20th century
The beginning of the 20th century was characterized by by lots of revolutionary advances especially in electronic communication (radio and TV), transportation (cars and airplanes) and manufacturing (mass production). Some of the results follow:
  • Ordinary people could more easily stay informed about political affairs. Without a well informed population modern democracy is impossible.
  • What we call a global society today, became possible through electronic communication.
  • Many cultural genres, e.g. music styles, are no longer local (or national) phenomena but global ones.
  • In years where the weather sucks (from the farmers points of view) people no longer have to be afraid of famines (today's famines always have political reasons). This feeling of safety has a large impact on human mind and society.
As you might see, the technological revolution at the beginning of the 20th century was an important trigger for the development of the so called Western Civilization we are so proud of today.

Irresponsible uses of 20th century technology were European dictatorships (which utilized the then new electronic mass media for propaganda) resulting in 2 World Wars (made possible by the newly invented cars and planes).

The biggest challenge of the 20th century was not to start a nuclear war. Before, all weapons ever developed were eventually used. The atomic bomb was never used in a war by any nation except one.

Beginning of the 21st century
The beginning of the 21st century was characterized by masses of ordinary people beginning to use the internet. Many argue, that the internet is just another means of communication like the ones we had before. This is completely wrong imo. All mass media we had before were unidirectional (few content producers serve many consumers). The internet is multidirectional (every content consumer can also be a producer). This is a completely different topology of information flow with possibilities far beyond everything we had before. Just look at some examples of what was achieved by now:
  • People write an encyclopedia, which is larger, more up-to-date but not more incorrect than any paper-encyclopedia you could buy before.
  • 1000s of computer nerds spending their nights in front of their PCs wrote one of the world's best OSes.
  • In times of war or unrest, gagging orders no longer work
  • No matter what absurd theories you believe, you'll always find like-minded people.
  • Some internet movies become more popular than some commercial Hollywood productions.
Future developments
Some things I see coming:
  • Internet communities can make big achievements even though they are mostly self organized. Politicians are afraid of self organizing structures because they fear to become superfluous.
  • People, who work voluntarily on self-organized projects, will no longer feel the need to be ruled by politicians.
  • Big record companies will die out, because they terrorized their customers and ripped off musicians. Even Paul McCartney let Starbucks merchandise an album already. Also they still follow the obsolete model of few producers serving many consumers. Only small manufacturers of vinyl disks will survive (at least as long as there are DJs who know how to use them).
  • Traditional newspapers will become superfluous because they cost money and the 12 hour delay due to the printing time will be unacceptable for most people.
  • Traditional radio and TV-stations will die out because people don't want to read programming schedules anymore. They want to see and hear what they want when they want.
  • With no unidirectional mass media left, controlling the information flow will become impossible. As a result, Dictators and conspirators will have a hard time.
We see, that today's elites have enough reasons to try to control the internet. Actually there are 2 possible outcomes:

1. They learn that their undertaking is doomed to failure and try to arrange with the new conditions before it's too late.
2. They will continue trying and the WWW will move from today's client-server model to strongly encrypted p2p technologies like freenet.

I personally would prefer option 1, but in either case they'll lose.

Challenges
As said before, new technologies also bring new challenges:
  • Computer expertise becomes more and more important for ordinary people. In the last century, learning to read and write was already a good start of the career. Today, knowing to use a computer mouse is at least as important.
  • The fact that in the internet everyone can be content provider naturally results in a large percentage of bullshit. The preselection of information, which was done by editorial offices of the traditional mass media, must now be done by the consumer. The advantage of unfiltered information comes along with the task to decide yourself.
Even if you don't agree with my theories explained above, you know at least where the title of my blog comes from :)

If you agree with what I said, you'll like this movie.

Friday, June 19, 2009

Being ruled by morons

17. June 2009: German parliament commemorates the uprising in Eastern Germany of 1953. The uprising was called the start of a fight for freedom, which was successful in the end (resulting in the German reunification in 1990). Before 1990, the 17. of June was the national holiday in Western Germany.

18. June 2009: German parliament passes a law, which allows the federal German police agency to put together a blacklist of domains, which must be blocked by ISPs. This law violates many fundamental principles of German constitution like separation of powers, federalism, freedom of information and the prohibition of censorship.

Below you'll find some preliminary success stories regarding the latest developments:
  • As explained before, the actual target group of this law (producers and consumers of child pornography) can be more relaxed than before, because it comes pretty handy for them. You'll see none of them demonstrating against this.
  • Conspiracy theorists said from the beginning, that the fight against child pornography is just a bogus argument for installing a censorship infrastructure. Fortunately, some politicians were even stupid enough to admit this in public.
  • Before this madness started, there was no noticeable digital civil rights movement in Germany. This changed dramatically during the last months. Now they are many and perfectly organized. On Saturday, there will be coordinated demonstrations in at least 11 German cities.
  • An official online petition got signed by 134014 people, an all-time record in the history of online-petitions.
  • The German Social Democratic Party is one big step further in making itself superfluous. I predict that some of the last bright minds will leave the party before the next parliamentary elections in September.
  • Rulers of China and Iran no longer have to fear bothering remarks regarding their censorship policies from German politicians. Improved bilateral relations are good for German economy, which is highly dependent on foreign sales markets and energy resources.
  • Almost unknown a year ago, the German Pirate Party got 0.9% in the European parliamentary elections, with constantly rising popularity.
This is the start of a fight for freedom, which will be successful in the end.

Wednesday, June 10, 2009

gmerlin video thumbnailer

The project
The goal of this 2-day project was to write a gmerlin based video thumbnailer, which fails for fewer files than the gstreamer based thumbnailer shipped with Ubuntu. The result is a small (less than 350 lines) program, which takes a video file as argument and produces a small png preview. All required functionality was already available in gmerlin and gavl.

How to try
If you want, you can check out and install the latest gmerlin CVS. Make sure that gmerlin is compiled with libpng support. The program is called gmerlin-video-thumbnailer. To enable gmerlin thumbnails in nautilus, open gconf-editor and go to /desktop/gnome/thumbnailers. For each mimetype, you have a directory. In each directory, you have a key called command. For each mimetype you want to be thumbnailed by gmerlin, simply change the command to

/usr/local/bin/gmerlin-video-thumbnailer -s %s %i %o

Of course you must change /usr/local to the prefix where you installed gmerlin.

Bugs fixed
The thumbnailer quickly exposed some bugs in gmerlin-avdecoder and gavl. Looking at a nautilus window full of thumbnails is faster than playing 100s of movie files. 3 major bugs were fixed, one minor bug will get fixed before the next release.

References
The thumbnail standard, which seems to be used by gnome, is here. How thumbnailers are installed in gnome, is described here.

Saturday, June 6, 2009

Monday, May 25, 2009

What evildoers do, if the web gets censored

Lots of discussion happens in German cyberspace around the planned blocking of child porn. It is foreseen to redirect DNS queries for blacklisted domains to a "Stop" sign, which will tell the user, that the site (s)he wanted to visit is blocked. Contrary to earlier promises the IP addresses of the users will be made available to the law enforcement agencies for what they call real time surveillance.

The proponents (one of them confused "servers" and "surfers" in a parliament debate) try their best to make every opponent of this plan look like a pedophile.

I think it's the opposite: Child molesters and other evildoers will have a lot of fun once the blocking mechanism is operational.

Disclaimer
I neither plan nor endorse any of the activities listed below. I only describe theoretical possibilities to demonstrate, how easy it is to abuse such a nonsense.

Things to come
  • Pedophile newbies, who search for child porn, will find a leaked block-list from another country. The secret German list will leak as well, sooner or later. It was never as easy as today.
  • Less smart pedophiles will google for ways to circumvent the blockade. The first hit will be some commercial "surf anonymous" proxy provider. In the end, even more people will earn money with abused kids.
  • Smarter pedophiles will google better and configure another DNS server. If using another DNS server becomes illegal, they'll find a free SSL proxy. If SSL proxies become illegal, they'll use foreign VPN provider. Connections to foreign VPN providers cannot be illegal because they are used by lots of big companies.
  • Less smart child porn webmasters will simply use numeric IPs for linking to their sites
  • Smarter webmasters will regularly send a DNS query for their domain to a censored DNS server. If the returned IP number becomes incorrect, it's time to shut down the site and reopen it somewhere else under a different name. Can easily be scripted into a fully automated warning system. Also works if the block-list is really secret.
  • Never give your IP address to a bad guy. Today you can get hacked, if you have a poor firewall. Tomorrow the cops might pay you a visit. There is no firewall against cops wanting to visit you. That's because one can easily flood the DNS server of your German provider with forged requests for some blocked domain and your IP address as source. Should be no more than 50 lines in C, including comments.
  • The ultimate end of civilization is near if the first stop-sign worm appears. Not only because it'll DDOS the stop-sign servers. Most important effect will be to flood the law enforcement agencies with IPs of innocent people. Less time to concentrate on the bad guys.
Those, who know the facts and still want to install DNS blockades against child porn, are either pedophiles, criminals or conspirators who'd like a censorship mechanism for all kinds of unpleasant content. The latter might be the reason why each country maintains it's own block list.

Alternatives
Everyone except the pedophiles supports a serious fight against child abuse and child porn, not only in the internet. There are lots of organizations, who are committed to this. Interestingly, some of them are against blockades. Here are 3 German ones:

Carechild
Abuse victims against internet blockades
Self-help group of abused women

Sunday, May 10, 2009

Psychedelic visualization programming howto

A 3 day project was to enhance gmerlins default visualization (the one which comes with the gmerlin tarball) a bit. Here is a description, how this visual work. There are lots of other visuals out there, which are based on the same algorithm.

General alrorithm
It's actually pretty simple: Each frame is the last frame filtered a bit and with something new drawn onto it. This repeats over and over.

Drawing something
There are 2 absolutely trivial things to draw, which are directly generated by the audio signal:
  • Stereo waveform: Looks pretty much like a 2 beam oscilloscope
  • Vectorscope: One channel is the x-coordinate, the other one is the y-coordinate. The whole thing is then rotated by 45 degrees counterclockwise. It only sucks for mono signals, because it will display a pretty boring vertical line then.
The vectorscope is drawn as lines and as dots. This makes a total of 3 foreground patterns in 7 colors.

Filtering
There are 3 types of filters, which are always applied to the last frame: An image transformation, a colormatrix and a 3-tap blur filter. The blur filter is needed to wipe out artifacts when doing the image transform repeatedly on the same image.

The image transform uses the (usually bad) linear interpolation, which means that the recursive process also blurs the image. There are 8 transforms: Rotate left, rotate right, zoom in, zoom out, ripple, lens, sine and cosine.

Then there are 6 colormatrices. Each of these fades 1 or 2 of the RGB components to zero. Together with the foreground colors (which always have all 3 components nonzero) we get a large variation of the colors when they are recursively filtered.

Background colors
The image transformation sometimes makes areas in the current frame, which are outside of the last frame. These areas are filled with one of 6 background colors. Background colors are dark (all components < 0.5) to increase the contrast against the foreground.

Beat sensitivity
Then I wrote the most simple beat detection: For each audio frame, compute the sound energy (i.e. calculate the sum of the squared samples for each channel). If the energy is twice as large as the time averaged energy, we have a beat. The time averaged energy if calculated by doing a simple RC low-pass filtering of the energies for each frame:

avg_energy = avg_energy * 0.95 + new_energy * 0.05;

Variations
The following things can be changed on each beat (and a random decision with defined probability):
  • Foreground pattern (3)
  • Background color (5)
  • Transform (8)
  • Colormatrix (6)
This makes 5040 principal variations.

Result
I hacked a small tool (gmerlin_visualize), which takes an audio file and puts it into a video file together with the visualization. The result is here:

Simple music visualization from gmerlin on Vimeo.



As you can see, the video compression on vimeo isn't really suited for that kind of footage. But that's not vimeos fault. It's a general problem that video compression algorithms are optimized for natural images rather than computer generated images.

Possible improvements
The variety can easily be extended by adding new image transformations and new foreground patterns. Also, some contrast between foreground and background is sometimes a bit weak. A more careful selection of the colors and colormatrices would bring less variety but a nicer overall appearance.

Feel free to make a patch :)

Saturday, April 25, 2009

Lessons from LAC

I was at the Linux Audio Conference in Parma/Italy. The weather sucked most of the time, but everything else was great. These are the most important things I learned:

Jack rules
Of course I already heard about the Jack sound system before. But since I never had a linux distribution, which installs jack by default, I didn't pay much attention to it.

If you go to the LAC, you start to think that there is hardly any audio architecture besides jack. After looking a bit into Jack (and writing gmerlin plugins for Jack audio I/O) I learned, that it's actually pretty cool. I like especially the way, you can connect arbitrary jack applications together while they are running. This is a nice feature for the gmerlin-visualizer since it relies on grabbing the audio stream from another application.

Ambisonics rocks
I also heard about Ambisonics before but mostly ignored it. If you read the Wikipedia article, you get the impression, that ambisonics is a quite exotic recording/playback technique, which is used only by a small group of enthusiasts.

If you go to LAC, you get the impression that Ambisonics is the standard and things like stereo or 5.1 are quite exotic recording/playback techniques, which are used only by a small group of enthusiasts.

Audio vs. video development
I told the audio developers that they don't know how lucky they are. They agreed on an audio capture/playback architecture (jack), they all use the same internal format (floating point, not interleaved) for processing and they manage to make a conference each year.

In the video area, things are much worse. Capture and playback APIs exist, but they are far too many. Some people do internal processing in RGB, others prefer Y'CbCr. Converting to a common internal format is not feasible because video has much higher data rates than audio and conversion overhead must always be minimized. Also the video developer community is very fragmented, centered around different frameworks (each one being the standard). Cross project communication is sparse. The only common code is ffmpeg for video en-/decoding, but mostly because nobody wants to reimplement such a beast.

Gmerlin development
I talked a lot with Richard Spindler from the openmovieeditor, which makes heavy use of gmerlin libraries. We agreed that our collaboration works quite well because the development activities of gmerlin and openmovieeditor are nicely decoupled. Let's proceed like we already did.

Sunday, March 15, 2009

Dirac video

Libquicktime and gmerlin-avdecoder now support Dirac in quicktime. En- and decoding is done with the libschrödinger library. Having already implemented support for lots of other video codecs I noticed some things, both positive and negative.

Positive
  • Very precise specification of the uncompressed video format. Interlacing (including field order) is stored in the stream as well as singal ranges (video range or full range). This brings direct support for lots of colormodels.

  • Support for > 8 bit encoding. This is really a rare feature. While ffmpeg always sticks with 8 bit even for codecs with 10 bit or 12 bit modes, the libschrödinger API has higher precision options. Not sure if these modes are really supported internally by now.

  • It seems to aim for scalability from low-end internet downloads to intra-only modes for video editing applications. A lossless mode is also there. Whether it performs equally well for all usage scenarios is yet to be found out.

  • It pretends to be patent free. But since the patent jungle is so dense, it is almost impossible to prove this.

  • It has the BBC behind it, which hopefully means serious development, funding and a chance for a wide deployment.
Negative

Sequence end code in an own sample
In the quicktime mapping specification it is required, that the sequence end code (a 13 byte string telling that the stream ends here) must be in an own sample. This is a mess, since for all Quicktime codecs I know (even the most disgusting ones) 1 sample always corresponds to 1 frame. Having a sample, which is no frame, screws up the timing because there is a "duration" associated with the sequence-end-sample, which makes the total stream duration seem larger than is actually is. Also, a frame accurate demuxer will expect one frame more than the file actually has. For both libquicktime and gmerlin-avdecoder I wrote workarounds for this (they simply discard the last packet).

If I had written the mapping spec, I would require the sequence end code to be appended to the last frame (in the same sample). In addition it can be made optional since it's not really needed in quicktime.

If libquicktime encodes dirac files, everything is done according to the spec. Conformance to the spec is more important than my personal opinion about it :)

No ctts atom required
Quicktime timestamps (as given by the stts atom) are decoding timestamps. For all low-delay streams (i.e. streams without B-frames), these are equal to the presentation timestamps. For H.264 and MPEG4 ASP streams with B-frames, the ctts atom specifies the difference between PTS and DTS for each frame and lets the demuxer calculate correct presentation timestamps without touching the video data. If the ctts atom is missing, such quicktime files become as disgusting as AVIs with B-frames. Unfortunately the ctts atom isn't required by the mapping spec, which means we'll see such files in the wild.

The good news is, that the ctts atom isn't mentioned at all in the spec. From this I conclude that it is not forbidden either. Therefore, libquicktime always writes a ctts atom if the stream has B-frames.

On the decoding side (in gmerlin-avdecoder), the quicktime demuxer checks if a dirac stream has a ctts atom. If yes, it is used and everything is fine. If not (i.e. if the file wasn't written by libquicktime), a parser is fired up and an index must be built if one wants sample accuracy. The good news is that the parser is pretty simple and the same thing is needed anyway for decoding dirac in MPEG transport streams.

Other news

gmerlin-avdecoder also supports Dirac in Ogg and MPEG-2 transport streams.

Saturday, February 28, 2009

Parallelizing video routines

Basics
One project shortly after the last gmerlin release was to push the CPU usage beyond 50 % on my dual core machine. There is lots of talk about parallelization of commonly used video routines, but only little actual code.

In contrast to other areas (like scientific simulations) video routines are extremely simple to parallelize because most of them fulfill the following 2 conditions:
  • The outermost loop runs over scanlines of the destination image
  • The destination image is (except for a few local variables) the only memory area, where data are written
So the first step is to change the conversion functions of the type:
void conversion_func(void * data, int width, int height)
{
int i, j;
for(i = 0; i < height; i++)
{
for(j = 0; j < width; j++)
{
/* Do something here */
}
}
}
to something like:
void conversion_func(void * data, int width, int start, int end)
{
int i, j;
for(i = start; i < end; i++)
{
for(j = 0; j < width; j++)
{
/* Do something here */
}
}
}

The data argument points to a struct containing everything needed for the conversion like destination frame, source frame(s), lookup tables etc. The height argument was replaced by start and end arguments, which means that the function now processes a slice (i.e. a number of consecutive scanlines) instead of the whole image. The remaining thing now is to call the conversion function from multiple threads. It is important that the outermost loop is split to keep the overhead as small as possible.

The API perspective
Everything was implemented
  • With a minimal public API
  • Backwards compatible (i.e. if you ignore the API extensions things still work)
  • Completely independent of the actual thread implementation (i.e. no -lpthread is needed for gavl)
The whole magic can be boiled down to:
/* Slice process function, which is passed by gavl to the application */
typedef void (*gavl_video_process_func)(void * data, int start, int end);

/* Routine, which passes the actual work to the worker thread (which is managed by the application) */
typedef void (*gavl_video_run_func)(gavl_video_process_func func,
void * gavl_data,
int start, int end,
void * client_data, int thread);

/* Synchronization routine, which waits for the worker thread to finish */
typedef void (*gavl_video_stop_func)(void * client_data, int thread);

/* Set the maximum number of available worker threads (gavl might not need all of them
if you have more CPUs than scanlines) */
void gavl_video_options_set_num_threads(gavl_video_options_t * opt, int n);

/* Pass the run function to gavl */
gavl_video_options_set_run_func(gavl_video_options_t * opt,
gavl_video_run_func func,
void * client_data);

/* Pass the stop function to gavl */
void gavl_video_options_set_stop_func(gavl_video_options_t * opt,
gavl_video_stop_func func,
void * client_data);
Like always, the gavl_video_options_set_* routines have corresponding
gavl_video_options_get_* routines. These can be used for using the same
multithreading mechanism outside gavl (e.g. in a video filter).

The application perspective
As noted above there is no pthread specific code inside gavl. Everything is passed via callbacks. libgmerlin has a pthread based thread pool, which does exactly what gavl needs.

The thread pool is just an array of context structures for each thread:
typedef struct
{
/* Thread handle */
pthread_t t;

/* gavl -> thread: Do something */
sem_t run_sem;

/* thread -> gavl: I'm done */
sem_t done_sem;

/* Mechanism to make the fuction finish */
pthread_mutex_t stop_mutex;
int do_stop;

/* Passed from gavl */
void (*func)(void*, int, int);
void * data;
int start;
int end;
} thread_t;
The main loop (passed to pthread_create) for each worker thread looks like:
static void * thread_func(void * data)
{
thread_t * t = data;
int do_stop;
while(1)
{
sem_wait(&t->run_sem);

pthread_mutex_lock(&t->stop_mutex);
do_stop = t->do_stop;
pthread_mutex_unlock(&t->stop_mutex);

if(do_stop)
break;
t->func(t->data, t->start, t->len);
sem_post(&t->done_sem);
}
return NULL;
}
The worker threads are launched at program start and run all the time. As long as nothing is to do, they wait for the run_sem semaphore (using zero CPU). Launching new threads for every little piece of work would have a much higher overhead.

Passing work to a worker thread happens with the following gavl_video_run_func:
void bg_thread_pool_run(void (*func)(void*,int start, int len),
void * gavl_data,
int start, int end,
void * client_data, int thread)
{
bg_thread_pool_t * p = client_data;
p->threads[thread].func = func;
p->threads[thread].data = gavl_data;
p->threads[thread].start = start;
p->threads[thread].end = end;

sem_post(&p->threads[thread].run_sem);
}
Synchronizing (waiting for the work to finish) happens with the following gavl_video_stop_func:
void bg_thread_pool_stop(void * client_data, int thread)
{
bg_thread_pool_t * p = client_data;
sem_wait(&p->threads[thread].done_sem);
}
The whole thread pool implementation can be found in the gmerlin source tree in lib/threadpool.c (128 lines including copyright header).

Implementations
Parallelization was implemented for
Benchmarks
For benchmarking one needs a scenario where the parallelized processing routine(s) need the lions share of the total CPU usage. I decided to make a (completely nonsensical) gaussian blur with a radius of 50 over 1000 frames of a 720x576 PAL sequence. All code was compiled with default (i.e. optimized) options. Enabling 2 threads descreased the processing time from 81.31 sec to 43.35 sec.

Credits
Finally found this page for making source snippets in blogger suck less.

Friday, February 27, 2009

Elementary stream parsing

See the image below for an updated block-schematics of gmerlin-avdecoder as addition to my last blog entry:



You see, what's new here: A parser, which converts "bad" packets into "good" packets. Now what does that exactly mean? A good (or well formed) packet has the following properties:
  • It always has a correct timestamp (the presentation time at least)
  • It has a flag to determine if the packet is a keyframe
  • It has a valid duration. This is necessary for frame accurate repositioning of the stream after seeking. At least if you want to support variable framerates.
Good packets (e.g. from quicktime files) can directly be passed to the decoder, and it's possible to build an index from them. Unfortunately some formats (most notably MPEG program- and transport streams), don't have such nice packets. You neither know where a frame starts, nor whether it is a keyframe. Large frames can be split across many packets, some packets can contain several small frames. Also timestamps are not available for each frame. To make things worse, these formats are very widely used. You find them in .mpg files, on DVDs, (S)VCDs, BluRay disks and in DVB broadcasts. Also all newer formats for consumer cameras (HDV and AVCHD) use MPEG-2 transport streams. Since they are important for video editing applications, so sample accurate access is a main goal here.

The first solutions for this were realized inside the decoders. libmpeg2 is very tolerant with regard to the frame alignment and ffmpeg has an AVParser, which splits a continuous stream into frames. Additional routines were written for building an index.

It was predictable that this would not be the ultimate solution. The decoders got very complicated and building the index was not possible without firing up an ffmpeg decoder (because the AVParser doesn't tell about keyframes) so index building was very slow.

So I spent some time to write a parsers for elementary A/V streams, which parse the streams to get all necessary information for creating well formed packets.

After that worked, I could be sure, that every codec always gets well-formed packets. What followed then, was by far the biggest cleanup in the history of gmerlin-avdecoder. Many things could be simplified, lots of cruft got kicked out, duplicate code got moved to central locations.

New features are:
  • Decoding of (and sample accurate seeking within) elementary H.264 streams
  • Sample accurate access for elementary MPEG-1/2 streams
  • Sample accurate access for MPEG program streams with DTS- and LPCM audio
  • Faster seeking because non-reference frames are skipped while approaching the new stream position
  • Much cleaner and simpler architecture.
The cleaner architecture doesn't necessarily mean less bugs, but they are easier to fix now :)