FEX 2402 Tagged!

Welcome back everyone! After last month’s cancelled release and this month being a bit late we have a lot of changes that happened.

More JIT performance improvements

A lot of the work these paste two months have been optimizing our JIT more. We have run Geekbench and Bytemark for these which showed a marginal performance improvement in these benchmarks. Bytemark showing the biggest improvement of 16% in one sub-benchmark. A lot of the performance improvements are targeting real-world applications rather than benchmarks which shows as those games getting more of an improvement.

As typical, explaining each individual optimization would take too long so we’re going to spam out a bunch in a list.

  • Removes a vtable indirection for syscalls
  • Fix RCL/RCR wraparound behaviour
  • Remove process-wide lock in JIT
  • Fixes syscall rcx/r11 state
  • Optimize SIB address calculation from three instructions to one
  • Optimize TST instruction with -1
  • Optimize TST more
  • Improve XCHG instructions
  • Optimize rotates
  • Optimize CDQ
  • Optimize shifts
  • Optimize PTEST, VTESTP, PDEP
  • Optimize SHA256 instructions to remove spilling
  • Optimize CMPXCHG
  • Stop zero extending a bunch of instructions where it doesn’t matter
  • Optimize ANDN
  • Optimize a bunch of instructions using NZCV flags

Fix glibc clone usage of CLONE_CLEAR_SIGHAND

Newer glibc versions starting with 2.38 have started using this new clone flag for executing a program. We also fixed this in 2312.1 but can now make a note of it.

Fix VDSO symbol fetching on ARM64

This is a fairly minor change but can have a big performance hit. When FEX was querying for VDSO interface functions we were using the wrong names on ARM64. Since the wrong names were used, this meant we always fell back to the slower glibc implementation of functions. This in particular fixes a performance hit when games call clock_gettime excessively.

Fix Proton again

Sometime in December there were some changes to Valve’s Proton layer which caused us to break it. This has now been fixed.

Expose Linux 6.6

With some relatively minor changes we now support reporting kernel version 6.6 to the guest application. This gives us a range from v5.0 to v6.6 now.

Workaround hang when process is forking

A long-standing bug in FEX is that sometimes a process can hang when it is forking, usually to execute another program. We have now worked around this issue to an extent that lets the application continue. It’s not a full fix because we can still have a crash but that is easier to see instead of a program hanging forever in the background.

Commonize some WOW64 code to share with ARM64ec

In preparation for sharing some code with FEX built for ARM64ec, this has shared move some Windows code to a common location to be used.

An absolute ton of work went in to thunking

Over the past couple months this has been one of the more active projects within FEX. Today FEX has support for thunking 64-bit x86 libraries across to ARM64. A significant portion of this work is doing analysis of API interfaces in order to allow thunking 32-bit x86 libraries over to ARM64 libraries with data repacking. This isn’t yet complete but since a ton of work has gone in to this, we wanted to call it out.

NOTE: Memory leak on long-running processes like Steam

We have found a memory leak when a process shuts down a thread that has been around for quite a while. We only identified this memory leak this last month which hasn’t been fixed. We are hoping to fix this bug for the next release but be aware that long running processes like Steam has a relatively aggressive memory leak. This is exacerbated by how Steam spins up threads for doing work which makes this application particularly heavy.

Video game showcase

See the 2402 Release Notes or the detailed change log in Github.

Written on February 12, 2024