Friday, July 4, 2008

Comparison of x264/h.264 Advanced Options (With Pics)

x264 is a popular (relatively) new codec that's extremely efficient and capable of producing a quality picture even at very low bitrates. Since it has largely replaced DivX/Xvid as the codec of choice for online video, users need to know how to maximize their video quality when using this new codec. The very best quality can only be achieved by tweaking the codec's advanced options, which is a daunting task. Therefore, I have conducted a side-by-side comparison of many of the most useful options so you can decide which ones work best for you (click the pictures to see full-size).
Note: the differences are often quite subtle. If you have trouble spotting them, try looking at the reflection on the fingernails, the quality of the teeth, the edge of the index finger against the black background, and the color gradients on the skin. These are the places where differences will be most apparent.
Some caveats: every video is different and different options work better for different sources. I tried to pick a test video that would be fairly representative of what you would encounter in normal use but your mileage may vary and I make no guarantees about anything. However, I tried to be as transparent and systematic as possible so you can feel free to reproduce my testing if you see fit***. In the comparisons, I will frequently refer to "peak signal-to-noise ratio" (PSNR; higher is better), so if you're interested, you can read up on it at the Wikipedia.

I used HandBrake for my testing, but the results should be equally applicable to mencoder and maybe other programs that support advanced x264 options. Extensive text-based discussions of these advanced options are available at the official HandBrake and Mencoder sites, so I will try not to overlap with them too much. You can think of my comparisons as a visual adjunct to these discussions.

2-Pass Encoding (-2; not really an x264 option, but very important)

The singlemost beneficial thing you can do for a variable bitrate encode is switching from single-pass encoding (the default) to 2-pass encoding. This means the encoder spends the first pass analyzing the file and figuring out where the most bits will be needed (e.g., high-motion scenes) and then it attempts to make the quality similar throughout the file. The drawback to this strategy, of course, is that it takes ~2x as long to finish, but the benefit is large (0.988 dB) and easily noticeable with the naked eye:


Subpixel Motion Estimation (subme / subq)

Subpixel motion esimation (subq) is the next-most-important option. HandBrake defaults to a setting of 4, but I recommend using a setting of 7, which is slow but provides an increase in PSNR of 0.292 dB beyond the default. Using a setting of 1 looks terrible and provides a PSNR hit of -0.325, though it has the benefit of being much faster than the default:



Trellis (trellis)

The trellis option provides the next largest benefit (0.158 dB) at the cost of a somewhat slower encode. Enabling this option also makes a striking difference in fine, hard-edged details, such as on-screen text, as demonstrated by the MPAA rating screen:



I've seen examples online of people using a setting of 2, which they claim improves the quality, but that just isn't supported by these benchmarks, so don't bother using a setting other than 1. Update: turns out using a value of 2 does do something (at the cost of slower encodes), but not unless your subme value is 6+.

The maximum subme value increased from 7 to 9 since I originally wrote this article. I used the new maximum setting for these encodings and they all turned out pretty similar with this highly complex setting, but you can still see some pretty noticeable improvements with each trellis bump. With trellis=0, we can see some blocking and a weird artifact on her tooth; the average encoding speed with this setting was 24.427734 fps. Trellis=1 is a little better but still has some discoloration, while the average encoding speed was inexplicably faster at 24.567919 fps. Trellis=2 is a little smoother than either of the other settings, but it had approximately 20% slower encoding speed, at 19.798990 fps, which may not be worthwhile for all applications.
PSNR was a real mystery to me in this comparison. The average PSNR actually fell as trellis use increased, despite some pretty clear visual improvements. Global PSNR also appears wonky, as it dips slightly with trellis=1 and then increases slightly more with trellis=2. These results really showcase the danger of relying on PSNR for assessment, as it is clearly an imperfect measurement.


Reference Frames (refs)

Next up, we have the number of reference frames. The default is 1, but increasing it to 3 provides an increase in PSNR of .079 dB and increasing it to a value of 5 provides a further increase in PSNR of .043 dB. Unfortunately, diminishing returns sets in quickly and ramping the value up to 12 provides only a further benefit of .034 dB at significant detriment to encoding speed:



Also, be wary of adding too many reference frames, as Quicktime may struggle to play back videos with large amounts. Furthermore, the number of reference frames interacts with the subpixel motion estimation setting to determine how much longer your encoding will take, so experiment to see what works best for you.

Mixed Reference Frames (mixed-refs)

Using >1 reference frames unlocks the ability to use mixed reference frames, which provides a modest improvement in PSNR of .035 dB:



B-Frames (bframes)

In my experience, the b-frames value is roughly as important as the reference frames value with respect to PSNR. Going from the default 0 b-frames to 1 b-frame provides a noticeable 0.12 dB benefit in PSNR. However, diminishing returns are apparent immediately whereby upping the value to 4 provides an additional benefit of only .069 dB, and ramping it up to 16 b-frames actually worsens the picture by -.033 dB:



According to the HandBrake wiki, you can use more b-frames in animated content (~10-15).

Weighted B-Frames and Pyramidal B-Frames

Once you enable ≥1 b-frame, you can also benefit from weighted b-frames and/or pyramidal b-frames. These options provide modest benefits to PSNR of .016 and .01 dB, respectively:




Be aware: enabling pyramidal b-frames is a "high-profile" feature that will totally bork Quicktime playback. Don't use it if you plan on watching your video on a Mac.

Motion Estimation Method (me)

This option determines the method used to estimate motion in your video. Options include hex (hexagon; the default), umh (uneven multi-hexagon; the highest quality), dia (diamond; not shown), and esa (exhaustive; not shown and not to be used). Umh is the highest quality but slightly slower than regular ol' hex; it provides a noticeable .159 dB improvement in PSNR:




Also included in the previous comparison is the motion estimation range (me-range) option, which provides a modest .048 dB improvement in PSNR at the cost of slower encodes.

Analysis (analyse)

The analysis method makes very little difference and I recommend sticking with the default. Switching the value to 'all' provides a miniscule .004 dB increase in PSNR (too small to see).

After enabling the 'all' analysis, you can also enable the use of 8x8 DCT analysis option, which, strangely, hurts PSNR by .07 dB:



No Fast P-Skip (no-fast-pskip)

The HandBrake wiki says this option helps "with blocking on solid colors like blue skies," but I didn't really see any benefit of it in that regard. On the other hand, it reduced the PSNR by a fairly substantial (considering the lack of benefit) -.048 dB, so it's up to you whether or not to use it:



Disabling P-skip might also have more of an impact on animated content, which generally features larger patches of solid colors than live content.

Deblocking Filter (filter)

This option tweaks x264's built-in deblocking filter to smooth out edges--and details--from the picture. It's kind of like using a PhotoShop filter on each frame, with positive values acting like a smoothing filter and negative values acting like a sharpening filter. The default, 0,0, is almost always the best, according to the HandBrake wiki, but I prefer a bit of smoothing and loss of detail (around 2,2). A lot of people (apparently) prefer using negative values, but I think that looks like crap and adds a lot of noise to edges:



Either way, you take a PSNR hit of approximately -.067 dB.

CABAC vs CAVLC

CABAC, the default option for HandBrake, is good for every use except when playback on iPod 5.5G and AppleTV are necessary. If you turn it off (cabac=0), HandBrake will use CAVLC instead, which is visibly detrimental and imparts a tremendous PSNR hit of -1.542(!!):



Direct Prediction (direct)

The HandBrake wiki makes this option sound really important, but changing the values had absolutely no effect on PSNR or appearance in my experience, so I suggest just leaving it alone:



Turbo First Pass (-T; again, not an advanced x264 option, but very important)

The turbo option significantly speeds-up the first pass of a 2-pass encoding by passing faster options for the first pass and slower, higher-quality options for the second pass. The HandBrake wiki mentions that the turbo option may slightly reduce quality, but my experience was exactly the opposite. Due to the nature of the option, I compared an encoding with a variety of slow, high-quality options with and without the turbo option included. Interestingly, this resulted in an increase in PSNR of .054 dB in the turbocharged encode:



Bidirectional Refinement (bime)

The next option, bidirectional refinement, depends on several other options that I couldn't readily identify, so I did a similar comparison to what I used with the turbo comparison (i.e., a variety of slow, high-quality options with and without bime). This yielded a PSNR benefit of .014 with bime turned on, which is probably unnoticeable, but worthwhile if you're looking for the best quality possible:



Threads (threads)

The threads option just allows you to specify the number of threads HandBrake will use while encoding. This makes no difference on single-core processors, but makes a huge difference on multicore processors. HandBrake defaults to automatically decide the "optimum" number, but I find I get slightly higher processor utilization if I assign a value of 4 on my dual-core AMD X2 4000+.

Deinterlacing (-d; not an x264 option, but it's pretty important so I figured I'd throw it in)

Interlaced video (as opposed to 'progressive') has a bunch of crazy-looking lines--known as scanlines or combing--in some frames and it makes videos look awful. If you want to get rid of them, HandBrake has a really great built-in deinterlacing filter that fixes things right up. I tried comparing "-d slowest" with plain ol' "-d" and they were exactly the same, so I've only included a comparison with and without the vanilla "-d" option:


Of note, deinterlacing does not affect PSNR, though it does significantly impair picture quality and introduces 'jaggies.'

Update (5/4/09): HandBrake has recently introduced a new option known as Decombing, which analyzes each frame and detects combing and only runs the deinterlacing filter on those specific frames. This is a huge improvement over regular ol' deinterlacing because it only affects the necessary frames and leaves the other intact. Furthermore, you can leave this option on all the time (i.e., include it in custom presets) and it will only kick in when necessary.

Adaptive B-frames: (b-adapt)

If I understand correctly, this setting analyzes your file and determines the optimum number of b-frames, up to the maximum you specify in the b-frames field. This means you can crank your b-frames setting up to, say, 16 and it will only use as many as it feels are necessary (almost never more than 10, so 16 is overkill, but you get the idea). Within this setting, you have 3 choices: 'none,' 'fast,' and 'optimum.' 'Fast' is much better than 'none' and 'optimum' is very slightly better than 'fast.' Importantly, 'fast' only imparts a slightly longer encoding speed than 'none,' while going from 'fast' to 'optimum' will drop your encoding speed by 50%(!!!) for a largely unnoticeable difference. Therefore, I recommend using 'fast' unless you're a real stickler for whom time is no object. I hope to get a comparison of these options up here in the near future, so stay tuned.

My personal settings

Each of these options may not provide much benefit on their own, but the improvements certainly add up. My settings result in a high-quality rip (PSNR of 41.232 vs 35.933 with the default options; approximately 13% improvement) that remains (mostly) compatible with Quicktime but with a 60-80% worsening in encoding speed (from ~60 frames per second with default options to ~15 frames per second with my settings). Here is a comparison of the vanilla x264 options and my personal settings so you can decide for yourself if it's worthwhile for you:



I also use a higher bitrate (800-900 kb/s) when doing my actual encodings (not in this comparison), which obviously improves quality a great deal. Here are my x264 options:

ref=5:mixed-refs:bframes=4:bime:weightb:b-rdo
:me=umh:subme=9:filter=0,-1:trellis=2:threads=4
I also use the non-x264 options of -2 and -T. Update (5/4/09): I used to be all about the 2-pass encoding, but I now prefer constant quality with CRF enabled. It takes the same time as 2-pass overall but gives consistent quality that is slightly better throughout the file, in my experience. I hope to post a comparison sometime in the near future.
***My source video is a preview for Jackie Brown that appears on a Pulp Fiction collector's edition something-or-other. All comparisons used frame 460 (or 1120 for deinterlacing comparisons). For testing, I used a bitrate of 300 kb/s, which was intentionally low so the benefits of different options would hopefully be more visible. Nevertheless, x264 is so good at preserving quality that I had to zoom in 400% to get a good look at the details. So as not to introduce artifacts from upscaling and/or jpg compression, I used png (lossless) screencaptures at the native resolution for all initial adjustments (zooming, compositing, etc) and only switched to jpg (lossy) at the very end for posting online. I placed all streams into a matroska mkv container and used AC3 passthrough for the audio.

Sunday, June 22, 2008

Mac Version of Spore CC Uses Code From WINE

Instead of making an actual port of the Spore Creature Creator for Mac, it appears EA has simply bundled WINE code (see the 'transgaming' folder inside the app bundle) in with a Windows-compatible .exe file (it's something called Cider that they talked about at some developer conference a while back). This is pretty depressing for the whole "gaming on Macs" crowd, but it brings up some interesting thoughts about getting it to work on Linux.

This may not reflect much on the standard WINE install, since Transgaming has a spotty history of contributing back to the WINE community, but I'm hoping I can use this info to get it going on my Gutsy box. I'll make a new post if I can figure anything new out.

(Edit: I ran into dll hell for a while, but I got that part worked out. Now it starts up mostly error-free except I get an error that says "A required security module cannot be activated. This program cannot be executed (8008)." If anyone has a clue how to get past this, let me know.)

Update: no luck with the 8008 error, but I looked at the MD5 checksums for the windows .exe and the .exe file that's bundled into the Mac install and, although they both are 16.6 MB, their checksums are different:

3bcbb5ce269c4a38e73344fcb4c87949 for the Mac version

and

eb2834d31b2cb572a4f22947e000127e for the Windows version

[Let me know if you guys come up with different checksums]

This suggests that Cider does something (possibly at the compile time) that's making them different. I would assume it's something similar to the Darwine SDK that lets you compile Windows executables from source for use on Macs, but who knows. This in mind, you may have better or worse luck when trying to run the different versions with WINE since they're not exactly the same.)

For anyone interested, here's a list of the the .dll files that Transgaming included in the Mac version, which may give a clue to what is needed to run it in Linux:

cg.dll
cgD3D9.dll
d3d8.dll
d3d9.dll
d3drm.dll
ddraw.dll
dinput.dll
dinput8.dll
dmusic.dll
mfc42.dll
mfc71.dll
msvcirt.dll
msvcp70.dll
msvcp71.dll
msvcp80.dll
msvcr70.dll
msvcr71.dll
msvcr80.dll
msvcrt.dll
opengl32.dll
psapi.dll
squish_nosse.dll
squish.dll
stdole2.tlb
stdole32.tlb

Likewise, here's a list of the dll overrides they're using in their transgaming config file (located at Resources/Preferences/config in the Mac bundle. You can also find the system.reg file, which holds a lot of interesting info, in that folder):

[dlloverrides]
"commdlg" = "builtin, native"
"comdlg32" = "builtin, native"
"oleaut32" = "builtin, native"
"ver" = "builtin, native"
"version" = "builtin, native"
"shell" = "builtin, native"
"shell32" = "builtin, native"
"shfolder" = "builtin, native"
"shlwapi" = "builtin, native"
"lzexpand" = "builtin, native"
"lz32" = "builtin, native"
"comctl32" = "builtin, native"
"commctrl" = "builtin, native"
"advapi32" = "builtin, native"
"crtdll" = "builtin, native"
"mpr" = "builtin, native"
"winspool.drv" = "builtin, native"
"d3d8" = "builtin, native"
"d3d9" = "builtin, native"
"d3drm" = "builtin, native"
"ddraw" = "builtin, native"
"dinput" = "builtin, native"
"dinput8" = "builtin, native"
"dmusic" = "builtin, native"
"dsound" = "builtin, native"
"opengl32" = "builtin, native"
"msvcrt" = "native, builtin"
"rpcrt4" = "native, builtin"
"msvideo" = "builtin, native"
"msvfw32" = "builtin, native"
"quartz" = "builtin, native"
"mcicda.drv" = "builtin, native"
"mciseq.drv" = "builtin, native"
"mciwave.drv" = "builtin, native"
"mciavi.drv" = "native, builtin"
"mcianim.drv" = "native, builtin"
"msacm.drv" = "builtin, native"
"msacm" = "builtin, native"
"msacm32" = "builtin, native"
"midimap.drv" = "builtin, native"
"psapi" = "builtin, native"
"wininet" = "builtin, native"
"dbghelp" = "native, builtin"

Also located in the Mac bundle is a folder called MacOS that includes a pair of Unix executables called 'cider' and one called 'cidernoui.' I tried running these in my Linux terminal and it made my text go wonky in that session. Very strange.

Another file, Resources/transgaming/c_drive/Program Files/EA Games/Spore/Data/Config/ConfigManager.txt may be of interest.

Please comment if you have any info about that, or anything I've mentioned so far. Please correct me if I've gotten anything wrong.

- Hunter K.

Wednesday, June 18, 2008

HandBrake Dependencies List

Update (8/24/10): I wrote this post a long time ago and the dependencies have changed quite a bit since then, so here's an updated list. I'll leave the old post untouched for posterity. These are intended for use on a Debian-based system and can be pasted into an 'apt-get install' command:

build-essential autotools-dev libtool libgudev-1.0-dev intltool autoconf yasm libbz2-dev zlib1g-dev libwebkit-dev libnotify-dev libgstreamer0.10-dev libgstreamer-plugins-base0.10-dev wget subversion python python-gtk2-dev
You'll need to add libdvdcss2 to this list to work with copy-protected DVDs.

Original post: Here is a complete list of dependencies required to compile HandBrake from source. This is mostly for my own reference for whenever I reinstall my OS, but hopefully others will be able to benefit from it as well since it's so difficult to find a complete list elsewhere:

autoconf
automake
g++
g++-multilib
gcc
gcc-multilib
jam
libgcc
libtool
make
makedev
yasm (or nasm if you don't care about cpu extensions)
zlib1g-dev
build-essential (if you're on Ubuntu or a derivative)
dpkg-dev
libdvdcss2

Once you have these installed, just navigate to your source directory, type ./configure, then jam. The build process takes a long time and downloads a lot of things for codecs etc. When it's finished, you'll have a binary called HandBrakeCLI in your source folder, which can be run from its current location or moved to wherever is most convenient for you.

Friday, May 30, 2008

Fix for Netatalk nbp_rgstr: Connection timedout error

So, I had been playing around with VMWare Server in my Gutsy installation and when I restarted, my Netatalk daemon was failing with an error: nbp_rgstr: Connection timedout.

Luckily, I found this old forum post, which had the solution: Just go into your atalkd.conf file and uncomment the line that says eth0.

I'll walk you through it:

Open a terminal, then type:
sudo nano /etc/netatalk/atalkd.conf
Then delete the # sign in front of the word 'eth0'. Pretty simple!

Apparently, netatalk will just bind to whatever network interface is laying around, so it can sometimes get confused and bind to the crippled one used by VMWare Server. This fix just tells it to look for eth0 from now on instead of picking one willy-nilly.

Tuesday, May 27, 2008

CPU-optimized Backend for gHandBrake

Edit: The original author of gHandBrake's site appears to be down and I haven't heard from him since I submitted code to add most of the advanced x264 options to the interface, so I decided to offer my updated version right here (64-bit binary plus source code) until further notice. If you need to compile it for a 32-bit system, I have a list of dependencies for HandBrake here, and precompiled binaries of Yasm Debian/Ubuntu systems here. If you have any problems with any of it, just let me know in the comments and I'll try to help.

You can also download the unmodified gHandBrake (fewer features, but more stable) here. This package includes both the source tarball and a precompiled 64-bit deb binary for Ubuntu/Debian users.

Original post: I've been trying out gHandBrake, the excellent new GTK frontend for HandBrake. It's very early in development, so it doesn't have many of the advanced options--such as advanced x264 options--implemented yet but it seems quite solid and will eventually be a great replacement for HandBrakeGTK/RippedWire, which is basically just an incomplete port of the Windows GUI (of note, HandBrakeGTK does not actually pass advanced x264 options to the commandline).

Unfortunately, the precompiled binaries offered by the author do not appear to utilize HandBrake's processor-specific optimizations, so I decided to offer an optimized 64-bit build of the backend that would:
Download link

To use it, just download and extract, then replace the old ghandbrake-backend with the newer version. In a console, you would type (assuming it was extracted to your desktop):
sudo mv -f ~/Desktop/ghandbrake-backend /usr/bin
Since switching to the updated backend, my average encoding speed went from ~40 fps to ~115 fps, but YMMV.

Of course, if you compiled gHandBrake yourself from source, you would already have the optimizations included as long as you had the yasm assembler installed when you compiled. If you want to go this route, I have a precompiled yasm binary that recognizes SSE3 instructions in my previous post.

Please leave me a comment if you have any questions or concerns.

Tuesday, May 13, 2008

HandBrake 0.9.2 and yasm 0.7.1 precompiled deb binary for 64-bit linux

Update (5/15/09): The binaries on this page are woefully out of date, but I have working binaries built from the latest code available in my PPA repository. Directions for adding it to your package manager are available here.

Download: HandBrakeCLI-AMD64 0.9.2 (Thanks Alexander!)
Download: yasm-AMD64 0.7.1
yasm-i386_0.7.1 (it's in a tarball, but the deb is inside)

I just got finished doing some building for HandBrakeCLI. They used to provide a precompiled 64-bit binary on the official site, but they've stopped for some unknown reason, so I decided to provide a copy of my own.

This binary doesn't need to be installed. Just navigate to the directory that contains it and type
./HandBrakeCLI
followed by any desired options (for more information on using the command line with HandBrake, visit the HandBrake wiki).

One of the biggest pains in compiling HandBrake (other than the lack of a comprehensive list of dependencies) is the fact that, to get the most out of HandBrake's processor-specific optimizations, you have to have the yasm assembler installed. Unfortunately, the version in Ubuntu's repos only supports instructions up to SSE2.

To correct this, I had to download and compile a newer version of yasm (namely 0.7.0) before I built HandBrake. Once I got that going, HandBrake recognized my cpu's extensions and accelerated the encoding speed to approximately 3.5-4x faster than the stock, non-yasm compile.

This binary should recognize any extensions present on my cpu: an AMD X-2 4000+ cpu, which has all the usual goodies up to SSE3 (so you fancy-pants core2duo users are out of luck on the SSSE3). Edit: A fella named Alexander was kind enough to supply a build with additional support for SSSE3 and Cache_64. Thanks Alexander!

My binaries include the original source code, which I have not modified. If you have any problems with them, please leave a comment or drop me an email and I'll see if I can correct the problem.

Thursday, May 8, 2008

Hardy Never Worked, So I'm Back On Gutsy

As awesome as Hardy is, I could never get those damned random freezes I mentioned in my last post to stop, so I had to default back to Gutsy.

Fortunately, there were a few things I learned about integrating with Mac OS X 10.5 Leopard while fiddling with my Hardy install that are perfectly applicable in Gutsy, including screen sharing and networking with Netatalk.

Screen sharing was surprisingly easy to set up. Simply install xtightvncviewer from the repos:
sudo aptitude install xtightvncviewer
This should get make your Linux box accessible from your Leopard box, and you can also use it to control your Mac from Linux by typing into a linux terminal:
xtightvncviewer your.Mac's.IP.Here
You will likely have to enter some sort of password to gain access. There are also some command line options you can use to optimize your experience, which you can see by typing:
xtightvncviewer -help
If you plan on using screen sharing often from your Mac, you should consider adding a link to the program to your Dock. It's just a regular program located at /System/Library/CoreServices/Screen Sharing.app.

On to networking:

I had previously used Samba to network my Macs with my Gutsy/Hardy box because I had always heard it was the easiest method. This simply isn't true. All it takes to get things going using Mac-native networking is to bring up a terminal and type:
sudo aptitude install netatalk
If you're on Leopard, you will run into the issue of it not liking cleartext passwords, which the Netatalk version from the Ubuntu repos happens to use. To fix this, you can either do it the hard way: by recompiling a new version of Netatalk from source with a special option enabled, or you can do it the easy way: by jumping on your Leopard machine and open up a Terminal and type (all on one line, courtesy of a commenter at macosxhints.com):
defaults write com.apple.AppleShareClient
afp_cleartext_allow -bool true
Now Leopard will happily communicate with the standard Netatalk from the repos. As a word of caution: this method is less secure than the 'hard way,' but I just needed it for my home network, so it's not a big deal to me.

If you want shares to be accessible to your Mac from your Ubuntu box, you'll have to add them to the end of the file /etc/netatalk/AppleVolumes.default (ex. /media/sda1 at the very bottom of the file) and then restart netatalk by typing into a terminal:
sudo /etc/init.d/netatalk restart

Analytics Tracking Footer