Friday, July 4, 2008

Comparison of x264/h.264 Advanced Options (With Pics)

x264 is a popular (relatively) new codec that's extremely efficient and capable of producing a quality picture even at very low bitrates. Since it has largely replaced DivX/Xvid as the codec of choice for online video, users need to know how to maximize their video quality when using this new codec. The very best quality can only be achieved by tweaking the codec's advanced options, which is a daunting task. Therefore, I have conducted a side-by-side comparison of many of the most useful options so you can decide which ones work best for you (click the pictures to see full-size).
Note: the differences are often quite subtle. If you have trouble spotting them, try looking at the reflection on the fingernails, the quality of the teeth, the edge of the index finger against the black background, and the color gradients on the skin. These are the places where differences will be most apparent.
Some caveats: every video is different and different options work better for different sources. I tried to pick a test video that would be fairly representative of what you would encounter in normal use but your mileage may vary and I make no guarantees about anything. However, I tried to be as transparent and systematic as possible so you can feel free to reproduce my testing if you see fit***. In the comparisons, I will frequently refer to "peak signal-to-noise ratio" (PSNR; higher is better), so if you're interested, you can read up on it at the Wikipedia.

I used HandBrake for my testing, but the results should be equally applicable to mencoder and maybe other programs that support advanced x264 options. Extensive text-based discussions of these advanced options are available at the official HandBrake and Mencoder sites, so I will try not to overlap with them too much. You can think of my comparisons as a visual adjunct to these discussions.

2-Pass Encoding (-2; not really an x264 option, but very important)

The singlemost beneficial thing you can do for a variable bitrate encode is switching from single-pass encoding (the default) to 2-pass encoding. This means the encoder spends the first pass analyzing the file and figuring out where the most bits will be needed (e.g., high-motion scenes) and then it attempts to make the quality similar throughout the file. The drawback to this strategy, of course, is that it takes ~2x as long to finish, but the benefit is large (0.988 dB) and easily noticeable with the naked eye:


Subpixel Motion Estimation (subme / subq)

Subpixel motion esimation (subq) is the next-most-important option. HandBrake defaults to a setting of 4, but I recommend using a setting of 7, which is slow but provides an increase in PSNR of 0.292 dB beyond the default. Using a setting of 1 looks terrible and provides a PSNR hit of -0.325, though it has the benefit of being much faster than the default:



Trellis (trellis)

The trellis option provides the next largest benefit (0.158 dB) at the cost of a somewhat slower encode. Enabling this option also makes a striking difference in fine, hard-edged details, such as on-screen text, as demonstrated by the MPAA rating screen:



I've seen examples online of people using a setting of 2, which they claim improves the quality, but that just isn't supported by these benchmarks, so don't bother using a setting other than 1. Update: turns out using a value of 2 does do something (at the cost of slower encodes), but not unless your subme value is 6+.

The maximum subme value increased from 7 to 9 since I originally wrote this article. I used the new maximum setting for these encodings and they all turned out pretty similar with this highly complex setting, but you can still see some pretty noticeable improvements with each trellis bump. With trellis=0, we can see some blocking and a weird artifact on her tooth; the average encoding speed with this setting was 24.427734 fps. Trellis=1 is a little better but still has some discoloration, while the average encoding speed was inexplicably faster at 24.567919 fps. Trellis=2 is a little smoother than either of the other settings, but it had approximately 20% slower encoding speed, at 19.798990 fps, which may not be worthwhile for all applications.
PSNR was a real mystery to me in this comparison. The average PSNR actually fell as trellis use increased, despite some pretty clear visual improvements. Global PSNR also appears wonky, as it dips slightly with trellis=1 and then increases slightly more with trellis=2. These results really showcase the danger of relying on PSNR for assessment, as it is clearly an imperfect measurement.


Reference Frames (refs)

Next up, we have the number of reference frames. The default is 1, but increasing it to 3 provides an increase in PSNR of .079 dB and increasing it to a value of 5 provides a further increase in PSNR of .043 dB. Unfortunately, diminishing returns sets in quickly and ramping the value up to 12 provides only a further benefit of .034 dB at significant detriment to encoding speed:



Also, be wary of adding too many reference frames, as Quicktime may struggle to play back videos with large amounts. Furthermore, the number of reference frames interacts with the subpixel motion estimation setting to determine how much longer your encoding will take, so experiment to see what works best for you.

Mixed Reference Frames (mixed-refs)

Using >1 reference frames unlocks the ability to use mixed reference frames, which provides a modest improvement in PSNR of .035 dB:



B-Frames (bframes)

In my experience, the b-frames value is roughly as important as the reference frames value with respect to PSNR. Going from the default 0 b-frames to 1 b-frame provides a noticeable 0.12 dB benefit in PSNR. However, diminishing returns are apparent immediately whereby upping the value to 4 provides an additional benefit of only .069 dB, and ramping it up to 16 b-frames actually worsens the picture by -.033 dB:



According to the HandBrake wiki, you can use more b-frames in animated content (~10-15).

Weighted B-Frames and Pyramidal B-Frames

Once you enable ≥1 b-frame, you can also benefit from weighted b-frames and/or pyramidal b-frames. These options provide modest benefits to PSNR of .016 and .01 dB, respectively:




Be aware: enabling pyramidal b-frames is a "high-profile" feature that will totally bork Quicktime playback. Don't use it if you plan on watching your video on a Mac.

Motion Estimation Method (me)

This option determines the method used to estimate motion in your video. Options include hex (hexagon; the default), umh (uneven multi-hexagon; the highest quality), dia (diamond; not shown), and esa (exhaustive; not shown and not to be used). Umh is the highest quality but slightly slower than regular ol' hex; it provides a noticeable .159 dB improvement in PSNR:




Also included in the previous comparison is the motion estimation range (me-range) option, which provides a modest .048 dB improvement in PSNR at the cost of slower encodes.

Analysis (analyse)

The analysis method makes very little difference and I recommend sticking with the default. Switching the value to 'all' provides a miniscule .004 dB increase in PSNR (too small to see).

After enabling the 'all' analysis, you can also enable the use of 8x8 DCT analysis option, which, strangely, hurts PSNR by .07 dB:



No Fast P-Skip (no-fast-pskip)

The HandBrake wiki says this option helps "with blocking on solid colors like blue skies," but I didn't really see any benefit of it in that regard. On the other hand, it reduced the PSNR by a fairly substantial (considering the lack of benefit) -.048 dB, so it's up to you whether or not to use it:



Disabling P-skip might also have more of an impact on animated content, which generally features larger patches of solid colors than live content.

Deblocking Filter (filter)

This option tweaks x264's built-in deblocking filter to smooth out edges--and details--from the picture. It's kind of like using a PhotoShop filter on each frame, with positive values acting like a smoothing filter and negative values acting like a sharpening filter. The default, 0,0, is almost always the best, according to the HandBrake wiki, but I prefer a bit of smoothing and loss of detail (around 2,2). A lot of people (apparently) prefer using negative values, but I think that looks like crap and adds a lot of noise to edges:



Either way, you take a PSNR hit of approximately -.067 dB.

CABAC vs CAVLC

CABAC, the default option for HandBrake, is good for every use except when playback on iPod 5.5G and AppleTV are necessary. If you turn it off (cabac=0), HandBrake will use CAVLC instead, which is visibly detrimental and imparts a tremendous PSNR hit of -1.542(!!):



Direct Prediction (direct)

The HandBrake wiki makes this option sound really important, but changing the values had absolutely no effect on PSNR or appearance in my experience, so I suggest just leaving it alone:



Turbo First Pass (-T; again, not an advanced x264 option, but very important)

The turbo option significantly speeds-up the first pass of a 2-pass encoding by passing faster options for the first pass and slower, higher-quality options for the second pass. The HandBrake wiki mentions that the turbo option may slightly reduce quality, but my experience was exactly the opposite. Due to the nature of the option, I compared an encoding with a variety of slow, high-quality options with and without the turbo option included. Interestingly, this resulted in an increase in PSNR of .054 dB in the turbocharged encode:



Bidirectional Refinement (bime)

The next option, bidirectional refinement, depends on several other options that I couldn't readily identify, so I did a similar comparison to what I used with the turbo comparison (i.e., a variety of slow, high-quality options with and without bime). This yielded a PSNR benefit of .014 with bime turned on, which is probably unnoticeable, but worthwhile if you're looking for the best quality possible:



Threads (threads)

The threads option just allows you to specify the number of threads HandBrake will use while encoding. This makes no difference on single-core processors, but makes a huge difference on multicore processors. HandBrake defaults to automatically decide the "optimum" number, but I find I get slightly higher processor utilization if I assign a value of 4 on my dual-core AMD X2 4000+.

Deinterlacing (-d; not an x264 option, but it's pretty important so I figured I'd throw it in)

Interlaced video (as opposed to 'progressive') has a bunch of crazy-looking lines--known as scanlines or combing--in some frames and it makes videos look awful. If you want to get rid of them, HandBrake has a really great built-in deinterlacing filter that fixes things right up. I tried comparing "-d slowest" with plain ol' "-d" and they were exactly the same, so I've only included a comparison with and without the vanilla "-d" option:


Of note, deinterlacing does not affect PSNR, though it does significantly impair picture quality and introduces 'jaggies.'

Update (5/4/09): HandBrake has recently introduced a new option known as Decombing, which analyzes each frame and detects combing and only runs the deinterlacing filter on those specific frames. This is a huge improvement over regular ol' deinterlacing because it only affects the necessary frames and leaves the other intact. Furthermore, you can leave this option on all the time (i.e., include it in custom presets) and it will only kick in when necessary.

Adaptive B-frames: (b-adapt)

If I understand correctly, this setting analyzes your file and determines the optimum number of b-frames, up to the maximum you specify in the b-frames field. This means you can crank your b-frames setting up to, say, 16 and it will only use as many as it feels are necessary (almost never more than 10, so 16 is overkill, but you get the idea). Within this setting, you have 3 choices: 'none,' 'fast,' and 'optimum.' 'Fast' is much better than 'none' and 'optimum' is very slightly better than 'fast.' Importantly, 'fast' only imparts a slightly longer encoding speed than 'none,' while going from 'fast' to 'optimum' will drop your encoding speed by 50%(!!!) for a largely unnoticeable difference. Therefore, I recommend using 'fast' unless you're a real stickler for whom time is no object. I hope to get a comparison of these options up here in the near future, so stay tuned.

My personal settings

Each of these options may not provide much benefit on their own, but the improvements certainly add up. My settings result in a high-quality rip (PSNR of 41.232 vs 35.933 with the default options; approximately 13% improvement) that remains (mostly) compatible with Quicktime but with a 60-80% worsening in encoding speed (from ~60 frames per second with default options to ~15 frames per second with my settings). Here is a comparison of the vanilla x264 options and my personal settings so you can decide for yourself if it's worthwhile for you:



I also use a higher bitrate (800-900 kb/s) when doing my actual encodings (not in this comparison), which obviously improves quality a great deal. Here are my x264 options:

ref=5:mixed-refs:bframes=4:bime:weightb:b-rdo
:me=umh:subme=9:filter=0,-1:trellis=2:threads=4
I also use the non-x264 options of -2 and -T. Update (5/4/09): I used to be all about the 2-pass encoding, but I now prefer constant quality with CRF enabled. It takes the same time as 2-pass overall but gives consistent quality that is slightly better throughout the file, in my experience. I hope to post a comparison sometime in the near future.
***My source video is a preview for Jackie Brown that appears on a Pulp Fiction collector's edition something-or-other. All comparisons used frame 460 (or 1120 for deinterlacing comparisons). For testing, I used a bitrate of 300 kb/s, which was intentionally low so the benefits of different options would hopefully be more visible. Nevertheless, x264 is so good at preserving quality that I had to zoom in 400% to get a good look at the details. So as not to introduce artifacts from upscaling and/or jpg compression, I used png (lossless) screencaptures at the native resolution for all initial adjustments (zooming, compositing, etc) and only switched to jpg (lossy) at the very end for posting online. I placed all streams into a matroska mkv container and used AC3 passthrough for the audio.

8 comments:

Anonymous said...

Thanks for the data! Very much welcome for sharing.

Anonymous said...

PSNR is only meaningful if you are working from pristine source where you want to preserve as much as possible. As soon as you are working with less than pristine source, PSNR becomes worse than meaningless. Also, I fell obliged to note that when h264 fails in high motion, it goes blocky, and looks MUCH worse than xvid/divx with the same material. Just sayin'.

Anonymous said...

Awesome x264 writeup probably the best I've seen yet! It's your use of screen shots every other guy/girl is too lazy.

dvx said...

Hi,

I have several videos recorded and encoded with h.264 from two different ways. I want now to compare which has the best video quality.

I am new in this area. Can you explain how to do it?

BR,
Divyesh Virchande

Hunter K. said...

Hi Divyesh,
You can take screenshots while the video is running and compare them, but make sure you're taking the shots at your videos' native resolution, so you don't get any artifacting from the enlarging to fullscreen. You'll also want to make sure you're capturing exactly the same frame so it's an apples-to-apples comparison.

When you make an encoding, if you check your activity log, it should also include some metrics of quality, namely PSNR and/or SSIM. These numbers are the easiest way to tell a video's quality, though PSNR can sometimes diverge from what your eyes may perceive as "better." SSIM is often more accurate.

@Anon,
Thanks :)

Anonymous said...

Nice!

Anonymous said...

Can you post an export of your handbrake settings for download? (Both types 2-pass and CRF)
Thx!!

Eduardo said...

Good Morning K. Hunter found this blog almost a month and I have been used the setting "ref = 5: mixed-refs: bframes = 4: bime: weightb: b-rdo: me = umh: submitted = 9: filter = 0, -1: trellis = 2: threads = 4 "and have obtained excellent results, but I wonder if this setting is still the best or had some updates?

Analytics Tracking Footer