Friday, June 20, 2014

True Hq2x Shader Comparison With xBR

Background
I was shocked to learn recently that the shader I and others have long called Hq2x is/was actually a misnamed port of another shader entirely! guest.r originally put out a 2.0 scale shader with a suffix "HqFilter," which stands for a part of the color blending code. Over time, this shader was ported to a million different emulators (including the official Metal Slug PC releases) and, at some point, the name got confused with another popular emulation upscaling algorithm, Hq2x. As CPU filters have been supplanted over time with GPU shaders, no one seemed to notice that no true shader port of the classic Hqnx algorithm--in any scale--existed in any language. EDIT: looks like there was an attempt to port it back in 2005 but it never caught on because it was incomplete and had some bugs, but it's something.

Fast-forward to a few weeks ago, when a shader programmer named Armada brought this up in RetroArch's IRC channel. He shared some pics of the "Hq4x" shader in the common-shaders repo (which itself was based on the old XML shader in bsnes' gitorious repo, which in turn was based on guest.r's ePSXe shader) and an identical pic taken using a CPU-filter version of Hq4x. The differences were obvious and indisputable:
I started digging through old forum posts and it turns out that several people had noticed this over the years, but no one ever posted any comparison pics or were able to backup their suspicion in any way.

Armada and I did find that there was an old DOSBox renderer called OpenGL-Hq that was at least a step in the right direction, being a hardware-accelerated implementation of the Hqnx algorithm, and it was helpful insofar as it demonstrated how the algorithm can use an external lookup texture for the detection. However, it was not particularly applicable to modern shader language, so Armada set out to port directly from the CPU filter's C implementation.

After some intense work, which included creating a program to generate the LUTs that Hqnx bases its calculations on, Armada completed his shader port (also copied into the common-shaders repo) and it works beautifully! Incidentally, the requirement for LUTs means that a true Hqnx port wasn't even possible until fairly recently, as SSNES/RetroArch is/was the first emulator (that I know of, at least) to support LUTs in shaders.

Comparison Shots
If you've read over my previous comparison of Hyllian's xBR vs Hqnx, xBR won by a landslide in pretty much every comparison, which is no surprise because it wasn't really an apples-to-apples comparison. That in mind, here are updated pics that show a true comparison between the two algorithms (2xBR shader first, Hq2x shader right after and Hq2x CPU filter third):


The first thing you'll notice in those Super Mario World shots is that Hq2x does a great job of killing the jaggies. Much better, in fact, than ScaleHQ from the other comparison, and almost as good as xBR. There are a couple of rough edges (Yoshi's nose is a good example), but Hq2x is also *very* fast, so reasonable tradeoffs here. Hq2x does completely ignore the light texture blobs in the ground, leaving them as hard-edged rectangles, while xBR turns them into ovals. You can also see some slight lingering differences between the Hq2x CPU filter and the Hq2x shader, where the shader actually does a significantly better job of handling various detections.








On these digitized shots from Earthworm Jim 1 and 2, though, the comparison sort of falls apart. xBR is able to spot the jaggies and smooth them out while Hq2x doesn't spot any patterns at all, due to them being outside of its LUT's detection matrix. In fact, Hqnx's inability to work on antialiased images is one of the reasons Hyllian developed xBR in the first place.

It's also worth looking at how the algorithms differ in handling text. For this comparison, I included the two extremes of xBR's corner detection, with the 'a' variant as the most rounded and the 'c' variant as the most square:
 

In this comparison, Hq2x is essentially indistinguishable from xBR's 'c' variant, insofar as the text is concerned. The xBR 'a' variant is of course substantially more bubbly, which may be desirable for some games.

Conclusion
My previous comparison wasn't really a fair fight, and I apologize to Mr. Steppin for misrepresenting his algorithm. This is a much better comparison of the algorithms, and in ideal conditions, Hq2x is almost identical to xBR in smoothing while running much faster. However, in other cases--particularly digitized artwork--limitations in Hq2x's pattern detection can leave some images completely unsmoothed.

The speed of Hq2x makes it attractive for certain use-cases, such as mobile, where performance is still of the utmost importance and xBR either doesn't hit full speed at all or else drains your battery. xBR, on the other hand, can handle a greater variety of images and is more likely to produce a pleasing image with the digitized artwork that became more common in the PSX era.