libtheora vs. libschrödinger vs. x264

Doing a comparison of lossy codecs was always on my TODO list. That's mostly because of the codec-related noise I read (or skip) on mailing lists and propaganda for the royality free codecs with semi-technical arguments but without actual numbers. A while ago I made some quick and dirty PSNR tests, where x264 turned out to be the clear winner. But recently we saw new releases of libtheora and libschrödinger with improved encoding quality, so maybe now is the time to investigate things a bit deeper.

Specification vs implementation
With non-trivial compression techniques (and all techniques I tried are non-trivial) you must make a difference between a specification and an implementation. The specification defines how the compressed bitstream looks like, and suggests how the data can be compressed. I.e. it specifies if motion vectors can be stored with subpixel precision or if B-frames are possible. The implementation does the actual work of compressing the data. It has a large degree of freedom e.g. it lets you choose between several motion estimation methods or techniques for quantization or rate control. If you fully read and understood all specifications, you can make a rough estimation, which specification allows more powerful compression. But if you want numbers, you can only compare implementations.

This implies, that statements like "Dirac is better than H.264" (or vice versa) are inherently idiotic.

Candidates

libschroedinger-1.0.9
libtheora-1.1.0
x264 git version from 2010-03-19

Rules of the game
If compression algorithms are completely different, it's not easy to find comparable codec parameters. Some codecs are very good for VBR encoding but suck when forced to CBR. Some codecs are optimized for low-bitrate, others are work better at higher bitrates. Therefore I decided for very simple test rules:

All codec settings are set to their defaults found in the sourcecode of the libraries. This leaves the decision of good parameters to the developers of the libraries. I upgraded the codec parameters in libquicktime and the gmerlin encoding plugins for the newest library versions.
The only parameter, which is changed, corresponds to the global quality of the compression (all libraries have such a parameter). Multiple files are encoded with different quality settings.
From the encoded files the average bitrate is calculated and the quality (PSNR and MSSIM) is plotted as a function of the average bitrate.

Footage
Some lossless sequences in y4m format can be downloaded from the xiph site. I wanted a file, which has fast global motion as well as slower changing parts. Also the uncompressed size shouldn't be too large to keep the transcoding- and analysis time at a reasonable level. Therefore I decided to use the foreman. Of course for a better estimation you would need much more and longer sequences. Feel free to repeat the experiment with other files and tell about the results.

Analysis tool
I wrote a small tool gmerlin_vanalyze, which is called with the original and encoded files as only arguments. It will then output something like:

0 33385 47.137417 0.992292
1 17713 45.936294 0.989990
2 17693 45.998659 0.990233
3 17361 46.008802 0.990297
4 19253 46.144632 0.990582
5 19005 46.179699 0.990648

....

295 24454 45.174282 0.993100
296 23841 44.653152 0.992318
297 20966 43.848303 0.991013
298 13941 41.996157 0.987494
299 11682 41.852630 0.987321
# Average values
# birate      PSNR      SSIM
# 4941941.26  46.075177 0.991434

Each line consists of:

Frame number
Compressed size of this frame in bytes
Luminance PSNR in dB of this frame
Mean SSIM of this frame

The summary at the end consists of the total video bitrate (bits/second) as well as PSNR and SSIM averaged over all frames.

You can get this tool if you upgrade gavl, gmerlin and gmerlin-avdecoder from CVS. It makes use of a brand new feature (extracting compressed frames), which is needed for the video bitrate calculation (i.e. without the overhead from the container).

Results
See below for the PSNR and MSSIM results for the 3 libraries.

Conclusion
The quality vs. bitrate differences are surprisingly small. While x264 still wins, the royality free versions lag behind by just 2-3 dB of PSNR. Also surprising is that libtheora and libschroedinger are so close together given the fact, that Dirac has e.g. B-frames, while theora has just I- and P-frames. Depending on your point of view, this is good news for libtheora or bad news for libschroedinger

Another question is of course, if this comparison is completely fair. A further project could now be to take the codecs and tweak single parameters to check, how the quality can be improved. Also one might add other criteria like encoding-/decoding speed as well. Making tests with different types of footage would also give more insight.

To summarize that, I don't state that these numbers are the final wisdom. But at least they are numbers, and neither propaganda nor marketing.

1 comment:

Rob said...: PSNR isn't a good way of comparing different codecs when the differences in PSNR dB are small. SSIM correlates with perceived visual quality much better, so I'm glad you've included that in your test.

However, a difference of 1dB in overall PSNR will be a very noticeable difference in quality. 2-3dB is a massive difference.; March 21, 2010 at 11:55 AM

Truths or lies - decide yourself

Saturday, March 20, 2010