Friday, September 19, 2008

HandBrake With New Adv. x264 psy-rd

Update: I just finished conducting some tests using live, film-grained content and the details are posted below the cartoon data.

x264 just committed some fancy new features, known as psychovisual rate distortion and psychovisual trellis, to their git repository. These features are supposed to produce a subjectively superior picture by maintaining detail and minimizing blurriness during the encoding process (I guess; read more about it here, at the blog of Dark Shikari, a developer for the x264 codec).

However, if you read the posts at the Doom9 forums linked in his blog post, they mention that Peak Signal To Noise Ratio (PSNR)--basically the only objective measure of picture distortion available to me--will always be negatively affected by the new features, and freeze-frames will also look worse. This makes the new features sound a little like voodoo to me, since I tend to favor objective measures over subjective ones. Regardless, I decided to try it out.

I used a bitrate of 250 in an mp4 container with these adv. x264 settings:
ref=5:mixed-refs:bframes=6:bime:weightb:b-rdo:me=umh:subme=7:
filter=0,-1:trellis=2:threads=2
For the psy encoding, I followed those settings with these psy-specific options (first number sets the psychovisual rate distortion [value from 0.1-1] while the second controls the psychovisual trellis [again, 0.1-1]):
psy-rd=1,1
If I understood things correctly, psy-rd only works with a subme setting ≥6, btw, so keep that in mind if you want to use the psy settings. Here are a couple of screenshots from an episode of Futurama encoded using otherwise identical settings (v0.9.2 on the left [no psy], svn 1734m [psy-enabled] on the right):

As you can see, it's a pretty noticeable difference favoring the psy shot, especially in a few key areas. Of note, you can see significantly less blocking on the dark spots on the cloud in the psy shot, and the top platform against the red sky is much sharper.

Now for the numbers:

As expected, Avg. PSNR fell from 42.602 to 41.557 in the psy encode, and Global PSNR fell from 40.655 to 39.641.

Dark Shikari and the Doom9 guys said that encoding speeds would be relatively unchanged, and mine fell only slightly from approx. 20 fps to approximately 18 fps. However, despite using the exact same bitrate option, my filesize increased from 56.6MB to 58.1MB, a difference which may partially explain the picture improvements.

Apparently, the psy options really shine in retaining small details, which are necessarily absent from cartoon sources, so I intend to do another comparison soon using some live-action sources, including at least one with visible film grain.

Stay tuned.

Another potential concern to keep in mind is that, as usual, QuickTime can have trouble with these new options, especially using the Perian codec pack. It can make strange graphical glitches and generally look like shit. Stick to VLC if possible.

Update: I conducted similar tests to those performed on the cartoon source using the Kubrick classic, Dr. Strangelove. This source has some serious film grainage thanks to its age, so we can really see what psy-rd is all about.

I've taken my clips and dropped them into an animated gif, so click on the picture to see the comparison:

The first thing that really jumped out at me is that the psy version is much darker overall, with a much higher contrast ratio (i.e., the blacks are blacker, whites whiter, and the picture looks less gray). It's arguable whether the detail is better in these still-shots with psy-rd added, but it looks much sharper in motion.
Now the difference between the 2 encodings really comes out. Again, the psy-enabled shot is darker overall, with a higher contrast ratio, and the film grain is highly evident. In contrast, the non-psy shot looks as though a smoothing filter has been run over it, steamrolling the details.
Here, just as in the previous comparison, the differences are stark and easily noticeable. Hell, you can almost read the letter through the backside of the page!

Now for the numbers:
As in the earlier experience, avg. PSNR fell from 43.616 in the non-psy encoding to 42.602, while global PSNR fell from 43.371 to 42.347. Again, frames per second remained relatively stable between the 2 encodes.

Conclusions:
After the somewhat underwhelming cartoon-based comparison, I was undecided as to whether I would upgrade to the psy-enabled version, but this comparison has definitely sold me on it. In certain cases that have a lot of small, fine details (such as with Dr. Strangelove), the psychovisual analyses can create a 250 kbps video that rivals an 800-1000 kbps xvid encoding. This means smaller file sizes and faster streaming, which is always a good thing.

Let me know if you have any questions or comments.

For directions on compiling your own binary with the psy options enabled, visit my post here.

1 comment:

matt said...

non-psy versions of Strangelove look better to me.

Analytics Tracking Footer