Thursday, June 25, 2015

Latency Testing

My student worker, Alex, borrowed a digital oscilloscope and photoresistor from a coworker of mine and we sat down at my workstation to collect some data in an area that's often discussed (vociferously!) but rarely actually tested: latency. Most latency testing is unscientific voodoo ("I can *feel* it") that also suffers from confused terminology (see: the fighting game community's complaints about "lag" and how it makes them drop their combos). In this case, we're specifically examining input latency; that is, the difference in time between pressing a button on the controller and the action taking effect on the screen.

Here's a picture of our test bench, which consisted of a button from my trusty Happ-modded Mad Catz SE wired into the aforementioned oscilloscope:
The input from the button is compared against the voltage running through the photoresistor attached to a battery (and a momentary switch to keep the resistor from just draining the battery):
The photoresistor gets placed against my computer monitor while the button is used to make things happen in the emulators. As the brightness changes underneath the photoresistor, the resistance also changes changes, and the oscilloscope displays the difference in time between the voltage drop in the button and the change in voltage from the resistor/battery circuit, which looks something like this:
We only had the equipment for a day, so I couldn't test as much as I would have liked, but I tried to be as consistent as possible. To that end, we sampled 5 data points for each variable and did all of the testing on the same machine. All SNES comparisons used the Super Famicom Controller Test  ROM, while the arcade comparisons used Espgaluda from Cave (in hindsight, probably not the best choice, but it's what I had on-hand). I also didn't have a good way of getting a baseline latency--I'm using a modern, crappy Dell LCD setup rather than a CRT and Windows 7 64-bit, which I chose out of both convenience and the assumption that it would be similar to a typical user's setup--so I was forced to provide data for *total system latency* rather than being able to isolate the latency caused by the individual variables. In attempting to get some sort of baseline, we held up the tester to the built-in gamepad testing applet in Windows, which gave results hovering around 75 ms, which is obviously not accurate, since some of our emulators performed better than that... With that in mind, these results should only be considered relative and not absolute.

Note: my full system info is Intel i7 Sandy Bridge with AMD R7 200 series with all of the GPU control panel crap turned off except for Eyefinity.

Anyway, here are the graphs that illustrate some of the more interesting comparisons:
First off, Aero compositing is bad news for both latency and variance. The increased variance is a real kick in the pants because it makes your performance less predictable. If you want consistent behavior and generally improved latency, stick with a "classic" non-Aero theme. Interestingly, disabling Aero did not seem to help with Higan.
Overall, this graph shows us that exclusive fullscreen is significantly better than windowed for latency, which is expected based on our Aero compositing findings. You'll notice there's no benefit to fullscreen in Higan (it's worse, in fact) because it's not *exclusive* fullscreen. Instead, it's what's known as "windowed" or "borderless" fullscreen. You can also see that ZSNES in exclusive fullscreen is extremely fast; faster than my supposed baseline of 75 ms :O
Higan had the highest latency figures here, even after correcting for the shaders--which I'll talk about more in a sec--with RetroArch about a frame lower (this includes data from both the snes9x and bsnes-compatible cores, which were not significantly different [87 ms vs 92 ms, which is within the variance of USB polling rates]). This also combines both windowed and fullscreen, which hurt ZSNES and ZMZ, the clear winners in exclusive fullscreen mode from our previous graph. Note: when ZSNES and ZMZ went into exclusive fullscreen, they broke Eyefinity, which other testing suggested adds up to ~8ms (or a half-frame) of latency, so keep that in mind when looking at their results.
This one was a dagger in my heart, but I'm posting it here anyway because of SCIENCE. I had always assumed that shaders would never increase latency because, in a worst-case scenario, they would just reduce the framerate (i.e., if the shader takes >16 ms to render). This is obviously not the case, as cgwg's crt-geom increases latency considerably in both Higan and RetroArch, as does crt-lottes. Crt-hyllian, on the other hand, has almost no effect on latency. To explore whether it's just heavy-duty math that causes the latency and whether it's exacerbated by multiple shader passes, I also tested Hyllian's xBR-lvl4-multipass in RetroArch. Shockingly, this one produced lower latency than no shader at all, which I find highly dubious.
I kept this one in here because there's been a contentious debate as to which of these platforms provides the best experience for emulating arcade games. However, there are some serious caveats to keep in mind before drawing too strong of a conclusion: 1.) this used a different test ROM from the SNES emus, 2.) the test ROM I used was selected out of convenience and actually had a lot of potentially confounding noise in the form of enemy bullets passing through my test area, 3.) GroovyMAME and RetroArch are really at their best running in KMS via Linux rather than Win64, so they would likely have more pronounced benefits vs mainline MAME if I could have measured that, and 4.) in initial testing the day before I ran these measurements, mainline MAME performed incredibly badly, with GroovyMAME close behind, which suggests that there may be some other variance involved.

That all said, these data indicate that RetroArch is approximately 1.5 frames slower than GroovyMAME, while the difference between mainline MAME and GroovyMAME is within the variance of USB polling rates. However, in light of the counfounders, I think the strongest conclusion we can draw reliably from the arcade comparison is that RetroArch isn't any *better* in Win64 (i.e., a null finding), so users should go with whichever platform has the features that best suit their needs rather than worrying about slim-to-nonexistent latency differences.


While the testing was not 100% reliable due to multiple confounders in several areas, we can see some trends emerge that can inform our discussions about latency in emulation. Windowed is definitely worse than fullscreen, and enabling Aero compositing is worse than without while also increasing variance and unpredictability. Shaders can actually cause excess latency, sometimes severely so. ZSNES, which has become a bit of a punching bag among SNES emulation scenesters, has outrageously low latency in fullscreen, so if you can stomach the terrible accuracy, there's actually some justification for using it now other than OMGSNOW!1! Alcaro's ZMZ also performed very well and can utilize more accurate emulation cores, so it can be a means to leverage some of ZSNES' latency benefits without being stuck with its poor accuracy.

In the future, I would like repeat these tests with a CRT monitor, which would have a predictable baseline of near 0 ms. I would also like to test latency in other environments, namely Linux+KMS. Finally, it would be very useful to have some comparative figures for original SNES hardware (both via CRT and upscaled via XRGB-Mini) and for RetroArch running via console.

Here is a link to download the raw data in Excel format, in case anyone would like to look at the numbers in more detail and/or perform other comparisons that I didn't think of.

EDIT: I think some people are drawing more conclusions from these data than is really appropriate; specifically, some folks are trying to draw direct comparison between the emulators/frontends tested. These data are simply not extensive enough for that. Furthermore, it's important to keep in mind that I didn't test the quality of sync, which could heavily affect the results. Namely, ZSNES and ZMZ both suffer from frequent audio crackling and frame stutters, which indicate issues with vsync, while RetroArch has none of either. I didn't test RA with vsync disabled (i.e., blocking on audio with video tearing), which could have an effect, and in general gameplay, users need to decide whether improvements in sync are worth minor (potential) increases in relative latency.


pantra said...

Great work and extremely interesting results! Though I couldn't help wondering what graphics hardware and drivers were used here. This might not be an issue anymore, but from some of my own testing years ago I suspect those could still be a factor, especially for the unexpected irregularities with the two CRT-shaders. For example back then when I tested with an NVIDIA Quadro card there were several global presets in the driver control panel like "3D App - Modeling AFR" or "3D App - Game Development" that seemed to affect timing and frame buffering behavior in OpenGL. I'd be very interested to see further results, so if you can, by all means keep at it! :)

Hunter K. said...

Yeah, I should probably make a note of that stuff in the post. I'm using an AMD GPU with everything turned off except for Eyefinity, which in my testing added up to ~8 ms (or a half frame) of latency. When ZSNES and ZMZ ran in exclusive fullscreen, they broke the Eyefinity desktop and likely gained that half frame, which could explain some of their great performance there.

Anonymous said...

What about CRT Royale?

Anonymous said...

Oh boy, some ammunition for RA haters.

Hunter K. said...

My GPU wasn't strong enough to get full speed with crt-royale, so I couldn't really test it.

I'm sure it will be brought up, but to hold these tests *against* RA would be distorting facts. As I mentioned in the article, there are *many* counfounders that can be going on in the background, from drivers to background processes (it is my work machine, after all, so it was checking my email client and so on), such that meaningful direct comparisons between the emulators isn't really possible. That said, people probably shouldn't bring up latency--at least in win64--as a primary reason for using RA over other options.

Anonymous said...

Hi Hunter K.,

First of all, congratulations for your tests.

I just want to point out that in BYOAC we've been doing latency tests for quite a long time already. It was surprising for us to find out that Win64 is by far better platform than Linux when it comes to input latency. I know it sounds difficult to believe, but reality is stubborn.

For the Win64/GroovyMAME/CRT testing scenario, our high speed recordings show we're consistently getting next frame response, for those games that allow it natively (sf2, etc.). My estimation is that bare input latency for the average nowadays pc running Win64 must be below 10 ms, otherwise we couldn't be getting these results. We couldn't achieve the same thing with Linux, neither RA+KMS nor GroovyMAME ever react sooner than third frame. We even tried with low latency kernels.

The latency figures you're getting are nearly an order of magnitude above our results. As far as I can see there may be two main sources of latency affecting your tests:

- The LCD processing crap, that's the most obvious.
- The render ahead frame queue, arranged by the gpu.
(I leave filters out of this discussion).

I'd say all other stuff running in the background is negligible as long as you have a decent machine.

Disabling desktop compositing doesn't affect the dreaded frame queue. Bypassing the frame queue is tricky. Even if you set the number of rendered ahead frames to a minimum of 1, you still have the problem of gpu parallelization. The bad news are, if you want the lowest possible latency, you have to limit the gpu parallelization, and this can be a challenge by itself.

The bottom line is, given the proper display an operating system, you can replicate the real hardware response through emulation -and we're doing it routinely- provided you target the frame queue issue specifically, and have a "frame delay" implementation as GroovyMAME and RA do. Frame delay is required to remove the last remaining frame of latency, the one associated to frame buffering. The common statement that emulation by definition adds a frame of latency is a myth.


omegaxii said...

@Hunter K
> That said, people probably shouldn't bring up latency--at least in win64--as a primary reason for using RA over other options.

Does that still apply if we bring OpenGL Hard Sync and Frame Delay into the discussion? A lot of emulators don't have those options, though GroovyMAME itself has Frame Delay (and is where someone got the idea to apply it to RA).

Hunter K. said...

Hey man, thanks for the info. Do you guys have any articles or writeups on that stuff or is it floating around in forum threads? Either way, I'd be interested in reproducing your results.

Hard Sync and/or frame delay are definitely useful for reducing latency, as demonstrated by the data in the spreadsheet. OTOH, there are other factors in the chain that seem to be making a bigger difference. It's also important to keep in mind the quality of sync. RetroArch has zero tearing and zero audio crackling, while ZSNES and ZMZ suffer from both to varying degrees, so you have to decide whether you're willing to put up with that in exchange for a small (potential) improvement in latency. I didn't test RetroArch with vsync disabled (i.e., blocking on audio with tearing), which could potentially close the gap /shrug.

Anonymous said...

Thank you for doing that.

100 ms seems quite a lot of latency, no wonder some people say they can "feel" it. But your tests also show that the lag introduced from emulators themselves is neglectable compared to the one coming from OS or hardware.

I would definitively see this tested against emulators (Retroarch + standalones) running on console (preferably a Wii with 240p output).

Pokefan531 said...

Had you tested other emulators on different system? I was testing genesis gx with 240p test suite and for the manual lag test, I usually have a score around half a frame lag, though some could be human error when trying to do manual lag test. I never went over 16ms of input delay with keyboard or Xbox 360 wireless controller and using any type of shader barely affects it. I didn't use hard sync or disable aero for genesis gx. For the SNES emulators, I do have problems with manual frame lag around 3-4 frames. I even used bsnes and snes9x for 240p suite test. Hard sync and disable aero gets me down to around 3 frames of lag without any shaders so it I didn't change much for me. So far, I only know genesis gx is capable of better response time.

Hunter K. said...

No, I didn't try Genplusgx vs the SNES cores. That's a good idea, though. I'll throw that in the mix on my next battery of tests.

Vicosku said...

Hi, Hunter K. First of all, thanks for exploring this topic in such depth. Regarding next-frame response with GroovyMAME, there's a lot of data in the input lag thread with reports from multiple people acheiving these results. The more recent Raw Input vs Direct Input thread has a lot of test results by me, but the most recent interesting information resides in the GM ASIO patch thread, where incredibly low audio latencies have been achieved.

For input lag, you may be most interested in a 240FPS video in my post here:,145174.msg1512836.html#msg1512836

There is a spreadsheet with some response data as well. Basically, any result less than 8 frames at 240FPS is next-frame response at 60FPS.

The best I've been able to do with RA in KMS mode in Linux was 4th frame response, around 67ms or more.,133194.msg1446726.html#msg1446726

I think that a better result must be possible with RetroArch now though. I didn't know RA had a frame delay feature! I'm a big fan of both projects, and found this post while playing around with the new release. I'm enjoying it a lot.

Anonymous said...

> Do you guys have any articles or writeups on that stuff or is it floating around in forum threads? Either way, I'd be interested in reproducing your results.

Hi Hunter K.,

I'm afraid it's floating around spread in different threads.

Here you can find a recent compilation of links:,133194.msg1522250.html#msg1522250

(This whole thread contains interesting information on the topic.)



Anonymous said...

Hunter K. said...
Hey man, thanks for the info. Do you guys have any articles or writeups on that stuff or is it floating around in forum threads? Either way, I'd be interested in reproducing your results.


the first thread:,133194.0.html
second thread:,142143.0.html

the threads are long, but it is worth reading ^^ . I liked your tests, hope you will make some more, as i am very interested in those stuff, especially how shaders affect input-lag.

Thx, u-man

Hunter K. said...

@Calamity, Vicosku and u-man
Ah, awesome! Thanks for the links, guys :D I'll dig in and see about reproducing those stellar results.

Alex, the guy who helped me with the testing, is working on building a self-contained latency tester based on an arduino, which should allow me to do a *lot* more testing in a lot more emulators and settings, so expect more data soon :)

Anonymous said...

I found this very interesting:

"You'll notice there's no benefit to fullscreen in Higan (it's worse, in fact) because it's not *exclusive* fullscreen. Instead, it's what's known as "windowed" or "borderless" fullscreen."

I am really curious, how many emulators have a *exclusive* fullscreen ? Does i.e. MAME have one? I am pretty sure that Demul is also non exclusive fullscreen or "borderless" fullscreen. Didnt know that it creates input-lag.

greets u-man

Hunter K. said...

Yeah, MAME has exclusive fullscreen. Most emus do. The exceptions are higan, the no$ emus and ones like Exodus, which don't have fullscreen at all. You can tell whether it's borderless or exclusive because exclusive will have to fully redraw the screen, which causes stuttering/flickering as it changes modes to and from. Borderless fullscreen is essentially instant because it just redraws the window over the existing desktop.

Patrick said...

Hey Hunter, awesome post, looking forward to a follow-up. I'm curious: which version of CRT-Hyllian did you use for the test? The one that is located in the hyllian-glow subfolder, or the one located in the main CRT directory?

Hunter K. said...

Thanks :) It's the one from the main CRT directory.

Unknown said...

These results are completely dubious.

If you've got a G-Sync monitor, there's basically no lag/next frame response.

It's your setup. You can't just take some shitty work computer with tons of garbage running on it and use it for benchmarks like this.

I can't wait until he tells us that Windows was running in a VM.


Hunter K. said...

@HowAbout NoSon
It's a typical setup, not an optimum one, which I made clear, and the post is about identifying trends with actual measurements, not fabricating ideal setups based on voodoo, which I also made clear. So, how about you try collecting some actual data to backup your otherwise unhelpful comment and work on your reading comprehension while you're at it.

kleev said...

Excellent experiment!!
You know, I've grown to dislike emulating on PC a lot.
The more times I play, even if it's "fine" and there's little to complain, the more I crave a perfect environment equal to the original system.

And you might say "Well just play the original systems on a CRT then!"
Here's the thing: Emulators only have input lag when on a PC or a similar systems.
The NES emulator for PS1 (imbNES) that thing has 0 lag.
I can play for example the Super Mario Bros or the MegaMan games and leave 100% satisfied.
Sadly I don't fully understand why this is. Some say it's because the PS1 is much more like an NES than a computer is, so there are advantages.

I just wish someday someone would create a system MEANT for emulation so that we can have near 0 lag if not the absolute same as the original systems, that can take USB controllers and can connect to CRT TVs with 240p compatibility. I would fund a project like this with all my wallet's girth! If I bought a RetroN5, why wouldn't I pay a lot more for such a system?

Hunter K. said...

Yeah, the issue is how far removed from the bare metal a modern operating system is. DOS could get much lower latency but its drivers needed exclusive access to the hardware and things like USB just can't work.

However, you should check out kevtris's (of HiDefNES fame) upcoming FPGA multi-system emulator, Zimba 3000:
By implementing the emulators in hardware rather than software, it will have little-to-no latency and very high accuracy.

Anonymous said...

Someone else is doing emulator input lag tests as well:

Analytics Tracking Footer