FEX 2302 Tagged!

This month certainly passed in the blink of an eye. A lot of good bug fixes this month as usual! Continue reading to find out more.

Fix incorrect operation for cache line clears

In emulating the CLFLUSH instruction, FEX was incorrectly using the wrong operation for clearing caches. We were accidentally using the CVAU operation instead of CIVAC. While this is incorrect, it was hard to find anything that was actually affected by the wrong implementation. With Snapdragon’s open source Vulkan driver implementing what is required for VKD3D, it became evident from Vulkan tests that this was incorrectly implemented. Switching the implementation is easy and will let VKD3D run without hacks when the required feature is finished.

Bug fixes to 64-bit x87 emulation

A big thanks to CallumDev for finding and fixing these latest bugs in FEX’s less accurate x87 emulation. As a reminder, x87 on original hardware operates using 80-bit float values. This is a feature that ARM doesn’t natively support, so FEX needs to emulate this using a software floating point library. We have a hack in our configuration to allow removing this software implementation and instead operate using 64-bit double operations instead. This can significantly improve performance in some 32-bit games but introduce rendering artifacts.

This month there were many bug fixes:

  • ALU operations that consume integers converted to floats are fixed
  • Float comparison that also consumes 16-bit integers fixed
  • FPREM instruction no longer infinite looping

With these fixes in place, a large number of games now actually render correctly with this hack enabled. It will be interesting to see how well this improves performance or batterty savings in 32-bit games!

More AVX instructions emulated

With one of FEX’s developers taking some away time, this was a little less involved than the last couple of months. There was still a handful of instructions implementation

  • VPBLENDD, VBLENDPS, and VPSRAVD

Additionally while these aren’t AVX instruction, we also implemented the CLWB and CLFLUSHOPT instructions. These match their ARM equivalents so it was mostly an easy implementation that applications can use if they want.

Fix copy and paste error in Arm64 JIT

While this is a fairly minor issue, we had a copy and paste error in FEX’s register spilling code. This caused Steam to crash in certain situations, so fixing this since the previous release helps users wanting to run that.

A bunch of minor optimizations

This month had a bunch of small optimizations around the entire project. Alone these are all quite minor but added together should result in a couple percentage of CPU time removed from FEX’s JIT.

  • Arm64 Dispatcher is slightly faster
  • CPUID emulation initialization is faster
  • Optimize File loading, improving config loading time
  • Frontend instruction decoder optimizations to be faster
  • Makes IR operations 1 byte smaller, improving memory usage
  • Inline IR constants optimization to reduce IR memory size

Fixing thunk symbol override fetching

FEX’s thunks had an issue where if a library was loaded, we would only ever fetch relevant symbols from that library directly. While this worked for our use case, it breaks when wanting to use MangoHud in OpenGL applications. Resolving this issue fixes most things that will override symbols with LD_PRELOAD.

Update JEMalloc from 5.2.1 to 5.3.0

While this is a fairly minor change, this release on JEMalloc fixes some bugs and improves performance. Small but every performance improvement is welcome.

Support for execveat with AT_EMPTY_PATH

This is an interesting feature where an application can be executed directly through a file descriptor instead of a filepath on disk. This is a fairly simple idea but has some interesting edge cases that might be interesting to some people. To see the more technical information about implementing this, check out the pull request.

See the 2302 Release Notes or the detailed change log in Github.

Written on February 3, 2023