FEX 2412 Tagged

Welcome back to another FEX release! With the last month’s release being cancelled there is a few more changes than average.

Fix Steam’s webhelper process again

On November 5th, Valve’s Steam client updated their embedded version of Chromium. This uncovered new bugs in FEX-Emu and caused Steam to constantly restart its UI. Luckily Asahi Lina diagnosed the issues and resolved the problems that were occuring! Now Steam is more stable than it has ever been!

Fix WINE path execution problems

Asahi Lina had encountered a bug when running WINE under Fedora which wasn’t seen before. Turns out FEX-Emu was incorrectly concating some filepaths together when child processes were getting executed inside WINE. This usually worked but due to the combination of a newer version of WINE and how Fedora set up its paths caused this to break under FEX. With this issue resolved, WINE is working again. There are likely some more bugs due to the nature of how FEX sets up its path handling, but those will likely get solved in time.

Emulate PAUSE instruction using WFE instead of YIELD

This is mostly just a passing curiousity since it doesn’t really matter. FEX has long emulated the x86 instruction PAUSE by using a theoretically functionally equivalent ARM64 instruction called YIELD. It came to our attention over the last couple months that Apple Silicon and ARM Cortex implement the YIELD instruction as a nop, so it effectively does nothing. This kind of makes sense since it is typically useful for SMT systems, and no relevant ARM cores support SMT.

On x86 CPUs this matters more since they do support SMT, but PAUSE depending on CPU will also have the CPU core sleep for some number of nanoseconds that is implementation dependent. To be more similar to how PAUSE operates, we have instead switched over to using the WFE instruction which provides similar guarantees on most ARM64 hardware. The trade-off here is that Qualcomm’s new Oryon CPU treats WFE as a nop instead. A reasonable trade-off and just an interesting quirk of their hardware.

Adds support for compat input prctl

A major problem that constantly comes back to bite FEX is emulating 32-bit syscalls with complete accuracy. Due to the Linux Kernel hiding information about a syscall returning opaque data structures, FEX jumps through a lot of hoops to make sure everything works while running as a 64-bit ARM process.

The ioctl syscall is one of these annoying syscalls that completely change how they behave depending on the device that is being communicated with, requiring FEX to track file descriptors and wrap all ioctl syscalls with checks see which ioctl it is, to which device. This is fraught with problems since we can’t always know 100% if we are actually communicating with a device that we think it is.

This leads us to this new bug that we encountered. When a 32-bit process is communicating with a gamepad, it uses a combination of the joydev and udev subsystems inside of the Linux kernel. Any 32-bit game was having issues where gamepads where the controls were “sticky”. While FEX is catching the ioctl communications and translating the data as necessary, there was a more nefarious problem happening. Turns out the Linux kernel in some instances will check to see if a syscall is coming from either a “compat” process, or is inside of a “compat” syscall. A compatibility process is just a process in which the Linux kernel knows it is a 32-bit process, while the kernel is 64-bit. In x86 land this is easy since you just check if it is x86 or x86-64. When running under FEX, the process is /never/ a compatibility process, because it is always 64-bit.

Turns out that the read and write syscalls, depending on what type of device it is communicating with, will return different data if it is compat or not! This isn’t just specific to gamepads, there are various subsystems in Linux that check if they are in a compat process and modify the data. This is a huge problem since we would introduce a ton of overhead tracking every read, write, seek, etc syscall and try to compare it against something that needs modified data or not. slp decided to tackle this for the Asahi Linux distro by adding a new prctl to changing the data to the “compat” versions if enabled, even in a 64-bit process. This is a bandage to a significant problem as it only covers gamepad devices today. We will continue tracking these issues in to more subsystems as we find them over on our 32Bit Syscall Woes wiki page.

Support CPU index through TPIDRRO

This is a fun little feature to make FEX’s life easier. When FEX is emulating the CPUID and RDTSCP instructions, we need to know the physical CPU index because the instructions will give that information to the program. The RDTSCP instruction gives the index directly, while the CPUID instruction uses it for per-cpu data lookup and also some indexing data.

The fun thing is that Windows is the only operating system that supports this today. So FEX only gets to enjoy this in a little toy sandbox for when it is running under Windows. The linux implementation on the other hand does a syscall to getcpu for each one of these instructions getting emulated, which adds a bunch of overhead. This is due to Linux not supporting using TPIDRRO in this fashion, instead the register is set to zero.

This has an additional knock-on effect that if WINE wants to run Arm64 native Windows applications, this is going to break any algorithm that relies on that Windows feature. Ideally the Linux kernel implements this feature for both WINE compatibility, and FEX performance improvements some day!

FEXCore cleanups and restructuring

While not user facing at all, there has been some restructuring of how FEXCore, the CPU emulation backend of FEX-Emu, exposes its CPU emulation. Historically FEX-Emu has had a problem where we leak OS information across the boundary, making the interface to the CPU backend more convoluted than it needed to be. Over the last couple of months we have been removing large amounts of the Linux specific code from FEXCore and moving it in to the frontend.

While this doesn’t affect the user, it makes our programmer’s lives easier since they don’t need to dance around OS abstractions quite as much. It also makes it easier for upcoming debugger improvements that FEX-Emu is sorely lacking today.

Fixed some JIT bugs

We uncovered some implementation bugs in our AVX emulation this last month. There were four instructions that had some incorrect behaviour. While fixed, we don’t know of any actual bugs in games that happened because of these edge case failures. It’s a good testament towards having unittesting infrastructure that can catch behaviours like this, and knowing that even with thousands of hand-written assembly tests, some edge cases still aren’t tested.

There were also a couple of other misc optimizations and bug fixes but we’ve gone on for quite a while now.

See the 2412 Release Notes or the detailed change log in Github.

Written on December 3, 2024