Monday, May 24, 2010

Comparison of WebM/VP8 and x264 High Profile

So far, all of the comparisons I have found between WebM--Google's newly minted standard for video/audio encoding that will hopefully be patent-free--and h.264 have suffered from several problems:

1. Objective measurements of VP8 have focused on PSNR, which is a deeply flawed way to measure video quality and to which VP8 was specifically tuned to conform (more on this later in the post).

2. VP8 has has only been compared with inferior commercial h.264 encoders and not with x264, which is free, open-source and the best h.264 encoder in the world.

3. VP8 has only been compared with baseline h.264 and not to high profile, which has substantially better performance and quality. There is a reason for this, which I do not agree with and will cover more later in the post.

For these reasons, I have conducted my own comparisons of WebM (using ffmpeg with Google's own patches, set to preset 360p) vs. x264 baseline and high profile, with settings tweaked to achieve the highest quality. I used videos which exemplified both clean and grainy clips of both animation and live action and which are freely available from Archive.org (downloaded in mpeg2 format). For x264, resolution was reduced to the closest anamorphic value to 360p possible (368p), and I used single-pass 'target size' to closely match the size of the VP8-encoded videos. Since the WebM specification includes both VP8 video and Ogg audio, the x264 encodes also included audio encoded to the same bitrate (64 kbps using the 360p preset). Both codecs achieved adequate and comparable encoding speeds, which were faster-than-realtime (except on extremely grainy content, which slowed the high profile encoding down considerably), so I won't include those numbers here. If you disagree with these methods, feel free to conduct your own tests. Links to download the source videos are in the headings.




This was actually the worst test for VP8, with visible banding on the color gradients and a disappointing loss of detail on the textures as compared with both baseline and high profile. Interestingly, baseline did a better job of avoiding jaggies on the bunny's ear than even high profile, and high profile seemed to introduce some artifacting in places, such as the butterfly... It doesn't really surprise me that x264 did so well in this test compared with VP8, since it has recently gotten some amazing improvements for animation.



High profile really earned its keep with this grainy source material by maintaining detail and avoiding introducing artifacts. Baseline kept more grain intact than VP8, but VP8 ended up with a sharper gremlin. Between VP8 and baseline, I call this one a tie.



Again, high profile really stands out in this sample with regard to the bright light. In contrast, both baseline and VP8 show some banding and blockiness, but are otherwise pretty similar. However, baseline seems to have introduced some wonkiness to the door by the upper-left corner of the '58' that's a bit troubling. Overall, both baseline and VP8 performed admirably in this test and were only slightly worse than high profile.


All three results were surprisingly similar in this test, with high profile managing to keep a bit more detail on the handwritten notes (you can barely see the notes are there at all on the VP8 and baseline encodes). I was impressed with how well VP8 does with the large text on the sign, with good, clean lines and crisp edges, though the pic itself is a little blurry overall.



Once again, high profile demonstrates its superior handling of fine details in this sample, both with regard to the film grain and the multitude of small, moving waves. Both baseline and VP8 provide a similar loss of detail relative to high profile, but VP8 seems to have introduced some odd color distortions (a couple of faint, greenish blotches) that push it below baseline insofar as quality is concerned. However, the Penn Museum watermark looks absolutely exquisite on the VP8 encoding, even surpassing high profile. The whites are whiter, the edges sharper (though there are some visible over-sharpening artifacts around the outside).


High profile came out on top in this high motion scene, but not by as large of a margin as I had anticipated. Both VP8 and baseline only produced a small amount of banding on the gradients, though VP8 seems to have maintained slightly more detail in the motion-blurred parts as compared with baseline. As before, the Penn Museum watermark looks way better in the VP8 encode than in either x264 encode.

Conclusions

Overall, WebM isn't bad but it isn't great either when compared with x264. The popular sentiment that its visual quality is better than baseline h.264 does not seem to be true when compared with x264, though the difference usually isn't terribly large (and it manages to win a few, too). High profile h.264 obviously performs significantly better than either baseline profile or WebM in certain applications (film grain and large patches of water come to mind), while differences in other applications are scarcely discernible. Additionally, one area where VP8 beat the pants off of even high profile x264 was onscreen text, so good work on that one fellas.

Other Thoughts (tl;dr)

Leading up to this comparison, I did quite a bit of reading from other sources and I saw several arguments repeated over and over, so I want to take some time to address them:

Something that keeps getting thrown around in defense of VP8 is that it's a new codec. Unfortunately, it's not. VP8 has been in development and use by On2 (the company that developed it and was later purchased by Google) for years and, while there is apparently still much room for optimization (i.e., to make it faster and less resource-intensive), it is far from "new."

Additionally, all previous comparisons have purposely confined h.264 to baseline profile. This is because "all hardware" only supports baseline profile, so to go any higher would require new hardware and--the line of reasoning goes--if you're making new hardware decoders anyway, it would be just as easy to support VP8 as to support the high profile. This is a specious argument for two reasons: 1.) plenty of hardware already supports high profile h.264 (e.g., Western Digital's TV Live device, as well as many BluRay Disc players) and 2.) if you're going to revamp hardware, why would you go through all that trouble for the "same" comparatively shitty quality (i.e., baseline vs. VP8)? Why not go for the best quality available (i.e., high profile h.264)? The reason, of course, is freedom from licensing fees, but that's a different argument.

I've heard many people say that VP8 is *objectively better* than h.264, based on PSNR measurements. It's true that PSNR is an objective measure, but it only vaguely matches up with human perception. This fault was made painfully clear once psychovisual optimizations were added to x264: although the video output unquestionably improved, PSNR values plummeted. Why does this matter? VP8 was designed to achieve high PSNR values, which left a side-effect of preferring blurrier videos. An analogy for this is the kids who score high on the SAT because they've been taught according to its format, but they can't write a coherent essay.

I've also seen quite a few comments stating that "Google wouldn't have spent $124 million on crap." I have issues with this statement because 1.) VP8 obviously isn't crap, but that doesn't mean it's the cat's pajamas either and 2.) just because Google is chock-full of smart people doesn't mean they are infallible.

Now, the patent issue is certainly something to be concerned about, and Dark Shikari addressed his suspicions during his assessment of VP8. VP8 is very similar to h.264 in a lot of ways, according to his investigation, which could be worrisome. However, his assessment is also encouraging in some ways in that the differences between VP8 and h.264 imply that On2 was cognizant of the various patent pitfalls and was actively avoiding them. For instance, Dark Shikari noted that VP8 is missing B-frames, which is an awful shortcoming from a compression standpoint, but it's smart from an intellectual property standpoint since B-frames are apparently patented pretty tightly. Unfortunately, the patent question likely won't be settled any time soon, but I'm cautiously optimistic that VP8 will turn out okay.

Finally, as Dark Shikari mentioned in his assessment of VP8, perhaps the most serious issue facing WebM (at least in my opinion) is that the finalized VP8 spec is based on buggy C code. For reference, specifications usually describe algorithms and behaviors, then programmers create code to achieve the desired behavior. Sometimes, people who are creating a spec that is not intended to be widely read/followed (e.g., a private company creating a closed-source codec/spec [nudge, nudge]) will, rather than describing the desired behavior, copy code from their reference encoder/decoder that performs the desired function. The reason this is a problem is that, if this code contains bugs (and some of it does in the case of VP8), all decoders that want to properly follow the spec *must* include these bugs or else not conform to the spec. Dark Shikari rightly compared it to the way Internet Explorer 6's flawed execution of HTML required Web developers to write bad code that conformed to those flaws.

Now, maybe Google will create an updated spec (WebM+ or 2.0 or Electric Boogaloo or whatever) that will fix all of the bugs and stand up to x264 high profile in quality, and maybe VP8 is similar enough to VP3 that some of the striking improvements from the Theora project can be applied to VP8. Until then, WebM will be good enough for the Web, I think, and hopefully will provide a modern, free-as-in-speech codec that everyone can get behind. I'll still be using x264 for my own encoding, but it's nice to see improvements in truly free video codecs.

Have an opinion on the subject? Leave me a comment and let me hear about it.