Wednesday, January 27, 2010

Video quality characterization techniques

When developing video processing algorithms and tuning them for quality, one needs proper measurement facilities, otherwise one will end up doing voodoo. This post introduces two prominent methods for calculating the differences of two images (the "original" and the "reproduced" one) and get a value, which allows to estimate, how well the images coincide.

The most prominent method is the PSNR (peaked signal-to-noise ratio). It is based on the idea, that the reproduced image consists of the original plus a "noise signal". The noise level can be characterized by the signal-to-noise ratio and is usually given in dB. Values below 0 dB mean that the noise power is larger than the signal. For identical images (zero noise), the PSNR is infinite.

Advantage is, that it's a well established method and the calculation is extremely simple (see here for the formula). Disadvantage is, that it is a purely mathematical calculation of the noise power, while the human psychovisual system is completely ignored.

Thus, one can easily to make 2 images, which have different types of compression artifacts (i.e. from different codecs) and have the similar PSNR compared to the original. But one looks much better than the other. Therefore, current opinion among specialists is, that PSNR can be used or optimizing one codec, while it fails for comparing different codecs. Unfortunately, many codec comparisons in the internet still use PSNR.

SSIM (structural similarity) was fist suggested by Zhou Wang in the paper "Image Quality Assessment: From Error Visibility to Structural Similarity" (IEEE Transactions on image processing, Vol. 13, No. 4, April 2004, PDF).

The paper is very well written and I recommend anyone, who is interested, to read it. In short: The structural similarity is composed of 3 values:
  • Luminance comparison
  • Contrast comparison
  • Structure comparison
All these components are normalized such that they are 1.0 for identical images. The SSIM index is the product of the 3 components (optionally raised by an exponent). Obviously the product will be normalized as well.

A difference to PSNR is, that the SSIM index for a pixel is calculated by taking the surrounding pixels into account. It calculates some characteristic numbers known from statistics: The mean value, the standard deviation and correlation coefficient.

One problem with the SSIM is, that the algorithm has some free parameters, which are slightly different in each implementation. Therefore you should be careful when comparing your results with numbers coming from a different routine. I took the parameters from the original paper, i.e. K1 = 0.01, K2 = 0.03 and an 11x11 Gaussian window with a standard deviation of 1.5 pixels.

Both methods are available in gavl (SSIM only in CVS for now), but their APIs are slightly different. To calculate the PSNR, use:
void gavl_video_frame_psnr(double * psnr,
const gavl_video_frame_t * src1,
const gavl_video_frame_t * src2,
const gavl_video_format_t * format);

The src1, src2 and format arguments are obvious. The result (already in dB) is returned in psnr for each component. The order is RGB(A), Y'CbCr(A) or Gray(A) depending on the pixelformat. PSNR can be calculated for all pixelformats, but usually one will use a Y'CbCr format and take only the value for the Y' component. In all my tests the PSNR values for chrominance were much higher, so the luminance PSNR is the most pessimistic (i.e. most honest) value.

For SSIM you can use:
int gavl_video_frame_ssim(const gavl_video_frame_t * src1,
const gavl_video_frame_t * src2,
gavl_video_frame_t * dst,
const gavl_video_format_t * format);
The arguments src1, src2 and format are the same as for PSNR. The pixelformat however must be GAVL_GRAY_FLOAT, implying that only the luminance is taken into account. This decision was made after the experiences with PSNR. The SSIM indices for each pixel is then returned in dst, which must be created with the same format. The MSSIM (mean SSIM) for the whole image can then be obtained by averaging the SSIM values over all pixels. The function returns 1 if the SSIM could be calculated or 0 if the pixelformat was not GAVL_GRAY_FLOAT or the image is smaller than the 11x11 window.

It never matters which image is passed in src1 and which in src2 because both algorithms are symmetric.

Below you see 11 Lena images compressed with libjpeg at quality levels from 0 to 100 along with their PSNR, SSIM and file size:

Quality: 0, PSNR: 23.54 dB, SSIM: 0.6464, Size: 2819 bytes

Quality: 10, PSNR: 29.84 dB, SSIM: 0.8473, Size: 4305 bytes

Quality: 20, PSNR: 32.74 dB, SSIM: 0.9084, Size: 5890 bytes

Quality: 30, PSNR: 34.38 dB, SSIM: 0.9331, Size: 7376 bytes

Quality: 40, PSNR: 35.44 dB, SSIM: 0.9460, Size: 8590 bytes

Quality: 50, PSNR: 36.31 dB, SSIM: 0.9549, Size: 9777 bytes

Quality: 60, PSNR: 37.16 dB, SSIM: 0.9612, Size: 11101 bytes

Quality: 70, PSNR: 38.34 dB, SSIM: 0.9688, Size: 13034 bytes

Quality: 80, PSNR: 40.00 dB, SSIM: 0.9768, Size: 16410 bytes

Quality: 90, PSNR: 43.06 dB, SSIM: 0.9863, Size: 24308 bytes

Quality: 100, PSNR: 58.44 dB, SSIM: 0.9993, Size: 94169 bytes

With these numbers I made a plot, which shows the PSNR and SSIM as a function of the JPEG quality:
The JPEGs have the most visible differences for qualities between 0 and 40. In this range the SSIM curve has the largest gradient. Above 40 (where the visual quality doesn't change much), the SSIM becomes more or less linear and reaches almost 1 for the best quality.

The PSNR curve is a bit misleading. It has the steepest gradient for the highest quality. This is understandable because PSNR would become infinite for the lossless (perfect quality) case. It has however not much to do with the subjective impression. PSNR however is better for fine-tuning codecs at very high quality levels. That's because the PSNR values will change more strongly, while SSIM will always be almost one.

Now I have the proper tools to make a comparison of different video codecs.

Sunday, January 24, 2010

Disabling the X11 screensaver from a client

One of the most annoying experiences when watching videos with friends on cold winter evenings is, when the screensaver starts. Media players therefore need a way to switch that off by one or several means.

The bad news is, that there is no official method for such a trivial task, which works on all installations. In addition, there is the energy saving mode, which has nothing to do with the screensaver, and must thus be disabled separately.

The Xlib method
You get the screensaver status with XGetScreenSaver(), disable it with XSetScreenSaver() and restore it after video playback. Advantage is, that this method is core X11. Disadvatage is, that it never works.

Old gnome method
Older gnome versions had a way to ping the screensaver by executing the command:

gnome-screensaver-command --poke > /dev/null 2> /dev/null

Actually pinging the screensaver (which resets the idle timer) is a better method, because it restores the screensaver even if the player got killed (or crashed). The bad news is, that starting with some never gnome version (don't know exactly which), this stopped working. To make things worse, the command is still available and even gives zero return code, it's just a noop.

I never owned a Linux installation with KDE. But with a little help from a friends, I found a method. My implementation however is so ugly, that I won't show it here :)

The holy grail: Fake key events
After the old gnome variant stopped working for me, I finally found the XTest extension. It was developed to test XServers. I abuse it to send fake key events, which are handled indentically to real keystrokes. They will reset the idle counters of all screensaver variants, and will also disable the energy saving mode.

Also it's a ping approach (with the advantage described above). But it works with an X11 protocol request instead of forking a subprocess, so the overhead will be much smaller. The documentation for the XTest extension is from 1992, so I expect it to be present on all installations, which are sufficiently new for video playback.

Here is how I implemented it:

1. Include <X11/extensions/XTest.h>, link with -lXtst.

2. Test for presence of the XTest extension with XTestQueryExtension()

3. Get the keycode of the left shift key with XKeysymToKeycode()

4. Each 40 seconds, I press the key with

XTestFakeKeyEvent(dpy, keycode, True, CurrentTime);

5. One video frame later, I release the key with

XTestFakeKeyEvent(dpy, keycode, False, CurrentTime);

The one frame delay was done to make sure, that the press and release events will arrive with different timestamps. I don't want to know, what happens if press- and release-events for a key have identical timestamps.

This method will hopefully work forever, no matter what crazy ideas the desktop developers get in the future. Also, this is one more reason not to use any GUI toolkit for video playback. If you use Xlib and it's extensions, you have full access to all available features of X. When using a toolkit, you have just the features the toolkit developers think you deserve.

Tuesday, January 19, 2010

Gmerlin release on the horizon

New gmerlin prereleases are here:

Changes since the last public release are:
  • Great player simplification. Most changes are internal, but the user should notice much faster seeking. Also the GUI player now updates the video window while the seek-slider is moved.
  • Simplification of the plugin configuration: In many places, where we had an extra dialog for configuring plugins, it was merged with the rest of the configuration.
  • A new recorder application which records audio (with OSS, Pulseaudio, Alsa, Jack and ESound) and video (with V4L1, V4L2 or the new X11 grabber). Output can be written into files or broadcasted.
  • A new encoding frontend (used by the recorder and transcoder), which allows more consistant and unified configuration of encoding setups.
  • The video window of the GUI player has now a gmerlin icon and is grouped together with the other windows.
  • Like always: Tons of fixes, optimizations and smaller cleanups in all packages. Got another 18% speedup when building AVCHD indexes for example.
This will be the last gmerlin release of the 0.3.X series. There are just 2 major features left, which keep me from releasing 1.0.0:
  • Support for compressed A/V frames in the architecture. This should allow lossless transmultiplexing with the transcoder.
  • Support for configuration presets: This will make using gmerlin applications much easier. The presets can be shared among application, i.e. once you found a good encoding preset, you can use it both in the transcoder and the recorder.

Please test it as much as you can and sent problem reports to the gmerlin-general list.