FEX 2510 Tagged

We’re just gonna kick out this little release and be on our way. There might be some interesting things this month, read and find out!

JIT Improvements

This month we have had various JIT improvements and bug fixes. Most of which are bug fixes but we do have some performance optimizations as well.

  • Fixed a potential crash with x87 loadstore operations
  • Fixed a potential crash with AVX on 32-bit applications
  • Fixes legacy fxsave/fxrstor x87 register saving and restoring
  • Fix telemetry on legacy segment register usage
  • Avoid flushing MMX registers until MMX->x87 transition
  • Fix NaN propagation behaviour in x87
  • Implement support for SSE4a
  • Fix SIGILL reporting
    • Fixes Mafia II Classic
  • ARM64ec: Check for suspend interrupts on backedges
    • Fixes UPlay behaviour

Better cache x87 intermediate results in “slow path”

To understand what this is doing, one must first understand the optimizations that our JIT is doing for most x87 operations. As everyone knows, x87 is a wacky stack based architecture where you push data on to a stack before doing operations. When done, you then pop those off in to memory.

Contrary to popular belief, this isn’t how most architectures today to floating point operations. Both ARM’s ASIMD/NEON and x86’s SSE instead work directly on this popular concept called “registers.” So when translating x87 behaviour to what ARM does, it doesn’t really match!

To combat this, FEX has a very extensive “x87StackOptimizationPass” which optimizes our IR to remove stack usage behaviour as much as possible. This causes significant performance improvements in x87 heavy games because all the x87 stack management typically gets deleted! But, this does have the downside that if FEX doesn’t see all of the x87 stack usage, we must fall down a safe “slow path” which does correct stack management. This means some code doesn’t get the benefits of the optimization pass.

This month our developer bylaws decided to tackle this problem and improve the situation. In the case that we need to hit this slow path, we were not caching any of the intermediate stack calculations which was causing more overhead than should be necessary. With this significant improvement, we have seen the number of ARM instructions necessary to run in the slow path drop dramatically!

Some examples:

  • 25 x86 instructions: Used to take 169 ARM instructions, now only 72.
    • This was in Half-Life
  • 214 x86 instructions: 3165 -> 1743!
    • This was in Oblivion
  • 351 x86 instructions: 5270 -> 2809!
    • This was Psychonauts

As one can see, there are some significant instruction reductions which directly correlate to performance improvements! There will be countless number of games using x87 that this improves performance of!

Memory usage hunting

This month we started hunting down FEX’s memory usage. It has been known for a while that FEX’s memory usage could be better, tripping up 16GB devices, and even worse 12GB or 8GB devices. This has been a known thorn but there were other higher priority problems before we work towards solving this.

This month we have added the ability to name our memory allocations so we can start tracking usage by what is actually allocating it. Additionally going through some data structures and making them smaller. While this month we only did some minor improvements, expect some more significant memory savings as we have a bunch in-flight as we are rolling out this release!

(P.S. We also fixed a bug where we allocated 16MB of RAM from 32-bit guests, that’s been fixed)

Implement support for “Extended Volatile Metadata” on Linux

Taking a leaf from what Microsoft has done in their compiler, we have implemented a feature called “Extended Volatile Metadata” or lovingly called EVMD. To know how this feature works, we need to know how Microsoft’s “Volatile Metadata” itself works.

In Windows land, Microsoft knew that emulating the x86 memory model is one of the most costly things emulators need to do. To help alleviate the strain of this, they employed compiler assistance to know when x86 doesn’t need to strictly follow the x86 memory model. In Microsoft’s Visual Studio compiler 2019 and newer, they will inspect code as it is compiled and determine if it is requiring thread safe memory semantics or not. When it has been determined that a piece of code is safe to use a weaker memory model, they add it to the volatile metadata. Once the entire program is compiled, it then gets stored inside the executable.

When the executable runs on x86 hardware, this data is ignored and nothing has changed. On the other hand, when running on a Windows Arm64 device, this data structure is fed in to their emulator and is used to determine if they can safely turn off x86-TSO emulation! This gives a dramatic speedup for any application that actually has this metadata. In fact, it is so useful that ARM64EC FEX uses the data as well to ensure we get similar performance improvements!

This new feature that FEX has implemented allows manual user intervention to declare regions of code that are known safe to disable TSO emulation on. Just like the regular metadata, but works on Linux applications and also Windows applications that are compiled without it or before 2019!

Sadly this isn’t automatic and requires significant work from either FEX developers or users to safely determine hot code blocks. For example, a game that spends more than 80% of its CPU time in a memcpy function can be safely optimized with a simple FEX option to enable EVMD.

  • FEX_EXTENDEDVOLATILEMETADATA=iw4sp.exe\;0xe9da0-0xe9ec7

Hopefully this feature will be used by our users for games they’ve determined run too slow with TSO emulation enabled, but crash with TSO emulation disabled! Gives us a little step in the middle to work around the FEAT_LRCPC extension.

Good bye FEXLoader and FEXInterpreter!

FEX has historically had to methods of invocation: FEXInterpreter, which directly starts emulation of the given program, and FEXLoader, which additionally supports overriding FEX configuration using command line arguments. We also support such config overrides using environment variables like FEX_MULTIBLOCK though, so we decided to remove the redundant FEXLoader binary to ease maintenance. With environment variables, per-app configuration files, and the graphical FEXConfig tool for permanent changes, we’re covering most use cases.

With FEXLoader out of the picture, we took the opportunity to rename FEXInterpreter to just FEX. The old name is still available as an alias for now, but we recommend updating any scripts to avoid friction in the near future.

A small change for building FEX tests

Our build scripts used to have two variables that needed to be set to enable testing: BUILD_TESTS and BUILD_TESTING. Setting the wrong one when toggling tests on or off would have surprising side effects (like CTest running old build artifacts!), so we invested some time to refactor our CMake code and unify the two. In the new release, testing is now controlled using the BUILD_TESTING variable, as in any other CMake-based project.


See the 2510 Release Notes or the detailed change log in Github.

Written on October 8, 2025