FEX 2312 Tagged!
We’re back with another month of changes. After last month being a bit slower, we’re back in the swing of implementing more optimizations and bug fixes. No dilly dallying, let’s get right in to it!
More optimizations this month!
Once again this month has a whole bunch of optimizations that is very exciting! We will lightly go over the changes to talk about what changed.
Keep guest SF/ZF/CF/OF flags resident in host NZCV
This is one of the bigger optimizations this month. A bit of backstory is needed for what this optimization is for. x86 has a flags register called EFLAGS which contains quite a few random bits of information. The subflags we care about here are the SF, ZF, CF, and OF flags inside of it. These are various flags that are set typically from ALU operations for information depending on the result. So something like an integer Add will set ZF if the result was zero, SF if the result has the sign bit set, CF if a carry occured, and OF if the operation overflowed. These are usually quite cheap for the CPU to calculate by itself, but manually calculating the flags usually takes a few additional instructions each.
The original implementation inside of FEX for calculating these flags would spend the additional instructions and calculate each one manually. This would usually end up with a dozen or so of additional instructions for calculating flags. While FEX would typically optimize out the calculations if they weren’t used, it would still add CPU time when we couldn’t.
Luckily ARM also has a flags register called NZCV which maps almost perfectly to x86’s EFLAGS. This lets us optimize these instruction implementations to instead use the ARM flags directly. This has a couple of effects, not only does it remove the instructions from our code generation, it has knock-on effects that the flags are now stored inside of the NZCV which reduces memory accesses. A multi-hit combo for improving performance.
While not all x86 instructions map their flags registers 1:1, this has a fairly significant performance uplift in most situations!
Dedicate registers for PF/AF
Related to the previous change, x86 has two flags registers stored inside of EFLAGS that doesn’t have a direct equivalent on ARM CPUs. These two flags are fairly uncommon but instructions will still generate them. These flags have the additional problem that they are fairly costly to calculate, with one of them requiring a GPR population count instruction which ARM doesn’t even support until new instruction extensions called CSSC. While in most cases the result of these flags isn’t used, the overhead of calculating them can add up a bit. This is why we are now dedicating two registers to these flags to reduce their overhead as much as possible!
- Optimize BT/BTC/BTS/BTR
- Optimize shifts/rotates
- Optimize selects & branches & more nzcv goodies
- Optimize three sha instructions
- Make “not” not garbage
- Optimize memcpy and memset when direction is compile time constant
With all these optimizations in place this month we have a fairly significant performance uplift!
While Geekbench is showing a fairly modest 17.6% performance uplift, bytemark is showing up to a 60% performance uplift! Over the course of the last three months we have had benchmarks that have improved by over 100%! These improvements can be seen in games as well, with some CPU heavy games have had their FPS improve by over 2x. In a lot of games tested they have changed from being CPU limited to GPU limited on our Lenovo X13s laptops even! We are looking forward to when these companies release new laptops based on Snapdragon X Elite in the middle of next year!
Various bug fixes
In addition to performance improvements, we have some bug fixes this month.
- Fixed corruption in the JIT
- Caused corruption with x87 heavy games
- Fixes integer multiply corrupting results
- Corrupted some register state, which was breaking the game Dungeon Defenders
Support extracting erofs images
One of the features that FEXRootFSFetcher was missing was the ability to extract erofs images once downloaded. This was because we didn’t know that erofs-utils provided an application for extracting these images without FUSE. Turns out the developers put an extractor inside of their fsck application that we had completely missed! Now if a user wants to extract an x86 rootfs image for lower overhead, they can do this directly from our FEXRootFSFetcher tool.
Preparation for improving gdbserver
GDBServer is a socket interface that GDB supports for remotely debugging applications. One of the harder things about working on FEX-Emu is that the ability to debug an application is usually quite hard. GDBServer is a way to improve this situation so that GDB can remotely connect to a FEX process.
There’s a bunch of work this month towards cleaning up this interface and getting it to work correctly. While it is still not quite usable for debugging, we are working towards this so applications can actually be debugged!
Improvements to WOW64 compatibility for newer WINE
Newer versions of WINE has changed some behaviour around WOW64 support. So this month we have added support for some of this newer behaviour. Thanks again to Bylaws for implementing this!
FEX rootfs image updates
This month we are updating our rootfs images to incorporate the latest Mesa 23.3.0 release that occured a few days ago. We have updated our Ubuntu 22.04, 23.04, 23.10, ArchLinux, and Fedora 38 images with this latest version of mesa. As usual if there are any issues, let us know so we can sort them out.