FEX 2405 Tagged!
One month older and we have a new release for FEX. This month is a little bit slower for user facing features but behind the scenes we had a large amount of refactoring to facilitate future improvements.
Support OpenGL and Vulkan thunking without forwarding X11
A thorn in FEX’s side has been forwarding X11 calls when thunking OpenGL and Vulkan. This has caused us pain since X11’s API is a fairly leaky abstraction that leaks data symbols which FEX’s thunking can’t accurately thunk. We have now instead redirect the X11 Display object directly in OpenGL and Vulkan. This not only reduces the amount of code we need to thunk, but also is required for us to eventually thunk 32-bit libraries.
This may change behaviour in some games when thunks are enabled, so it will be useful to do some testing and regression testing.
Enable Enhanced REP MOVS when memcpy TSO is disabled
Alongside last month’s optimization when optimizing memcpy instructions, we now enable this CPUID bit when memcpy TSO emulation is disabled. This means that glibc will take advantage of the optimization when it is doing memcpy and memset operations. This is a minor performance improvement depending on the application.
Implement support for SMSW instruction
This instruction isn’t too notable since all it does on recent x86 CPUs is return the same data no matter what, but legacy CPUs it was useful for checking if x87 was supported. As this is considered a system level instruction, FEX didn’t implement it originally but we finally found a game that uses it. The original Far Cry game from 2004 uses this instruction for some reason. Now that we have implemented the instruction the game at least gets to the menus but seems to still stall out when going in-game. Kind of neat!
Fix disabling TSO emulation on some stack accesses
When emulating the x86 memory model, we can get away with not emulating it when a thread accesses its stack. This works since we know that stack accesses are usually not shared between threads. While we usually disabled the TSO emulation in these cases, we had accidentally missed some cases. This will mean that there is some performance improvements for “free.”
Fix ADC and SBC instructions
Back in FEX-2403 we had landed some optimizations for these instructions. Turns out that we had inadvertently introduced some broken behaviour as an edge case. Most games didn’t hit these edge cases so they were fine but it completely broke rendering in the Steam release of Final Fantasy 7. With that fixed, we now get correct rendering again!
Option to disable half-barrier TSO emulation on unaligned atomics
On x86 most atomic instructions don’t care about alignment, with a feature that Intel calls “Split-locks” to even support unaligned atomics that cross a cacheline. On ARM we don’t have this luxury, where all atomic instructions need to be naturally aligned to their size. Newer ARM CPUs added a feature that allows unaligned atomics inside of a 16-byte region, but if the data crosses the edge then FEX needs to handle this instruction in a different way. We backpatch the instruction stream with a “half-barrier” access which still emulates the x86 memory model but is exceedingly heavy.
Now we have an option to convert these “half-barrier” implementations in to non-atomic access. While this doesn’t match true x86 behaviour, this can accelerate some games that heavily abuse unaligned atomic memory accesses. So if a game is running slow, this is an option to try out!
Refactor code using clang-format
As alluded to at the start, the FEX codebase has now been completely refactored using clang-format to ensure a more consistent coding experience. We provide a helper script and CI enforcement to ensure this stays in place. This took quite a bit of work to ensure the feature was up to everyone’s standards. Major thanks to everyone that worked on this!
Eliminate cross-block liveness for instructions
This is preparation work for future JIT improvements and speedups. Cross-block liveness makes our register allocator more complex and by removing this from our JIT we can start working on improving the register allocator. Look forward to register allocator improvements in the coming months.
Support arm64ec function calls
This changes how some of FEX’s JIT infrastructure works in order to support the ARM64ec Windows ABI. This will get used in conjunction with Wine’s ARM64ec support. While still a work in progress, this is starting to become interesting as WINE continues to implement more features to handle this case.
Video game showcase
See the 2405 Release Notes or the detailed change log in Github.