Specification vs implementation
With non-trivial compression techniques (and all techniques I tried are non-trivial) you must make a difference between a specification and an implementation. The specification defines how the compressed bitstream looks like, and suggests how the data can be compressed. I.e. it specifies if motion vectors can be stored with subpixel precision or if B-frames are possible. The implementation does the actual work of compressing the data. It has a large degree of freedom e.g. it lets you choose between several motion estimation methods or techniques for quantization or rate control. If you fully read and understood all specifications, you can make a rough estimation, which specification allows more powerful compression. But if you want numbers, you can only compare implementations.
This implies, that statements like "Dirac is better than H.264" (or vice versa) are inherently idiotic.
Candidates
- libschroedinger-1.0.9
- libtheora-1.1.0
- x264 git version from 2010-03-19
If compression algorithms are completely different, it's not easy to find comparable codec parameters. Some codecs are very good for VBR encoding but suck when forced to CBR. Some codecs are optimized for low-bitrate, others are work better at higher bitrates. Therefore I decided for very simple test rules:
- All codec settings are set to their defaults found in the sourcecode of the libraries. This leaves the decision of good parameters to the developers of the libraries. I upgraded the codec parameters in libquicktime and the gmerlin encoding plugins for the newest library versions.
- The only parameter, which is changed, corresponds to the global quality of the compression (all libraries have such a parameter). Multiple files are encoded with different quality settings.
- From the encoded files the average bitrate is calculated and the quality (PSNR and MSSIM) is plotted as a function of the average bitrate.
Some lossless sequences in y4m format can be downloaded from the xiph site. I wanted a file, which has fast global motion as well as slower changing parts. Also the uncompressed size shouldn't be too large to keep the transcoding- and analysis time at a reasonable level. Therefore I decided to use the foreman. Of course for a better estimation you would need much more and longer sequences. Feel free to repeat the experiment with other files and tell about the results.
Analysis tool
I wrote a small tool
gmerlin_vanalyze
, which is called with the original and encoded files as only arguments. It will then output something like:0 33385 47.137417 0.992292
1 17713 45.936294 0.989990
2 17693 45.998659 0.990233
3 17361 46.008802 0.990297
4 19253 46.144632 0.990582
5 19005 46.179699 0.990648
....
295 24454 45.174282 0.993100
296 23841 44.653152 0.992318
297 20966 43.848303 0.991013
298 13941 41.996157 0.987494
299 11682 41.852630 0.987321
# Average values
# birate PSNR SSIM
# 4941941.26 46.075177 0.991434
Each line consists of:
- Frame number
- Compressed size of this frame in bytes
- Luminance PSNR in dB of this frame
- Mean SSIM of this frame
You can get this tool if you upgrade gavl, gmerlin and gmerlin-avdecoder from CVS. It makes use of a brand new feature (extracting compressed frames), which is needed for the video bitrate calculation (i.e. without the overhead from the container).
Results
See below for the PSNR and MSSIM results for the 3 libraries.
Conclusion
The quality vs. bitrate differences are surprisingly small. While x264 still wins, the royality free versions lag behind by just 2-3 dB of PSNR. Also surprising is that libtheora and libschroedinger are so close together given the fact, that Dirac has e.g. B-frames, while theora has just I- and P-frames. Depending on your point of view, this is good news for libtheora or bad news for libschroedinger
Another question is of course, if this comparison is completely fair. A further project could now be to take the codecs and tweak single parameters to check, how the quality can be improved. Also one might add other criteria like encoding-/decoding speed as well. Making tests with different types of footage would also give more insight.
To summarize that, I don't state that these numbers are the final wisdom. But at least they are numbers, and neither propaganda nor marketing.