Sunday, May 10, 2009

Psychedelic visualization programming howto

A 3 day project was to enhance gmerlins default visualization (the one which comes with the gmerlin tarball) a bit. Here is a description, how this visual work. There are lots of other visuals out there, which are based on the same algorithm.

General alrorithm
It's actually pretty simple: Each frame is the last frame filtered a bit and with something new drawn onto it. This repeats over and over.

Drawing something
There are 2 absolutely trivial things to draw, which are directly generated by the audio signal:
  • Stereo waveform: Looks pretty much like a 2 beam oscilloscope
  • Vectorscope: One channel is the x-coordinate, the other one is the y-coordinate. The whole thing is then rotated by 45 degrees counterclockwise. It only sucks for mono signals, because it will display a pretty boring vertical line then.
The vectorscope is drawn as lines and as dots. This makes a total of 3 foreground patterns in 7 colors.

There are 3 types of filters, which are always applied to the last frame: An image transformation, a colormatrix and a 3-tap blur filter. The blur filter is needed to wipe out artifacts when doing the image transform repeatedly on the same image.

The image transform uses the (usually bad) linear interpolation, which means that the recursive process also blurs the image. There are 8 transforms: Rotate left, rotate right, zoom in, zoom out, ripple, lens, sine and cosine.

Then there are 6 colormatrices. Each of these fades 1 or 2 of the RGB components to zero. Together with the foreground colors (which always have all 3 components nonzero) we get a large variation of the colors when they are recursively filtered.

Background colors
The image transformation sometimes makes areas in the current frame, which are outside of the last frame. These areas are filled with one of 6 background colors. Background colors are dark (all components < 0.5) to increase the contrast against the foreground.

Beat sensitivity
Then I wrote the most simple beat detection: For each audio frame, compute the sound energy (i.e. calculate the sum of the squared samples for each channel). If the energy is twice as large as the time averaged energy, we have a beat. The time averaged energy if calculated by doing a simple RC low-pass filtering of the energies for each frame:

avg_energy = avg_energy * 0.95 + new_energy * 0.05;

The following things can be changed on each beat (and a random decision with defined probability):
  • Foreground pattern (3)
  • Background color (5)
  • Transform (8)
  • Colormatrix (6)
This makes 5040 principal variations.

I hacked a small tool (gmerlin_visualize), which takes an audio file and puts it into a video file together with the visualization. The result is here:

Simple music visualization from gmerlin on Vimeo.

As you can see, the video compression on vimeo isn't really suited for that kind of footage. But that's not vimeos fault. It's a general problem that video compression algorithms are optimized for natural images rather than computer generated images.

Possible improvements
The variety can easily be extended by adding new image transformations and new foreground patterns. Also, some contrast between foreground and background is sometimes a bit weak. A more careful selection of the colors and colormatrices would bring less variety but a nicer overall appearance.

Feel free to make a patch :)

No comments: