FEX 2506 Tagged
Welcome to the second half of the year! With this release we have some big changes to talk about, so let’s jump right in!
Reduce JIT time by 25% by sharing code buffers between threads
This is an absolute banger of a change from our venerable developer neobrain. They have been working towards this as a goal for a while; The tricky nature of the feature making it difficult to land. Before we discuss how this improves performance, it is first necessary to discuss how FEX’s JIT worked before this change.
Before this change, our JIT would execute independently for every thread that the guest application makes, without sharing those code buffers with other threads. This meant that if multiple threads execute the same code, they would all be JITing it, consuming memory and taking precious CPU cycles. Additionally, if a thread exits then all of that code buffer gets deleted as well and not reused at all. Not only does this consume more memory, it’s actually worse off for CPUs because even if we are executing the same x86 code between threads, the ARM code is at different locations in memory, meaning we put even more pressure on our CPU’s poor L2/L3 caches. This is becoming an even larger issue for newer games where they have multi-threaded job-queue systems where any of the threads in the pool could execute jobs and it becomes random chance if the same thread ends up executing the same code. Usually just means every thread in the pool will end up JITing all the code multiple times.
neobrain’s change here is a fundamental shift to how FEX does its code JITing. In particular, all JIT code gets written to a shared code buffer region and if one thread has JITed the code, then all threads can reuse it. This means in an ideal case only one thread ever JITs code and all other threads benefit from it. In addition since code is now shared between threads, if a thread exits then none of that JITed code is lost anymore; a new thread can reuse it.
This change has some serious knock-on effects; Memory-usage is lower, total time spent in the JIT is lower, and it preps FEX to start caching code to the filesystem for sharing between multiple invocations of an application! So not only will memory usage of applications be lower, allowing more games to run on platforms with less RAM, but they should be faster as well since JIT time is lower and L2/L3 cache is hit less aggressively.
A pedantic edge case game called RUINER improved from around 30FPS to 60FPS due to how it constantly JITs code due to threads being created and destroyed quickly! In other games tested by neobrain, we also see significantly less time spent in the JIT.
Go and test some games people!
More JIT optimizations
Not to be outdone, there were more JIT optimizations this month. This includes making the JIT itself faster, and also faster generated code so performance is improved in-game. Definitely go and look at the pull requests for these to know more, because walking through each individual change would take all day.
- Inline register-allocation into SSA IR
- Stop using a hashmap in Dead-Code-Elimination pass
- Constant-fold on the fly
- Optimize pairs of stack pushes and pops
- Optimize Xor with all -1
- Add more cases for zeroing registers
- Optimize X87 FTWTag generation using fancy bit-twiddling techniques
- Optimize CDQ
- Fix for thunk callbacks potentially corrupting registers
Fix a nasty race condition that causes invalid memory tracking
This was a big nasty bug that landed on our plate this last month. We noticed recently that after Steam shipping an update, it would crash very frequently under FEX, this only seemed to occur when games were downloading. It could technically be worked around by restarting Steam each time it crashed, but if you’re downloading a big 100GB game like Spider-Man 2 then you’re going to need to restart Steam a lot.
After some investigation we found out that Steam has seemingly updated its memory allocator, or made it more aggressively allocate and deallocate memory. This happens particularly frequently during a game download where each thread is now allocating and deallocating across the whole system.
When any memory syscall gets used under FEX, we need to track this in order to ensure that self-modifying code works correctly. We track the virtual memory regions and keep a map around to ensure if anything gets overwritten that we can invalidate code caches. Turns out we had mutex locking in the wrong location, which was causing us to have a different view of memory versus what the kernel had. This shows up when multiple threads perfectly interleaved a munmap and an mmap, and having FEX’s mutex that tracked these end up in the wrong order. So FEX would end up thinking the mmap came first and then a munmap (at the same address!) came second, but it was actually the other way around.
This completely broke FEX’s tracking and resulted in some bad crashes. The core change was to make FEX’s tracking mutex also wrap the syscall doing the memory operation. Everything is sorted now and Steam is even more stable than before!
FEXServer fixes
We had a couple of minor bugs that came up this month that were fixed in our FEXServer. We had some cases where starting an application would cause the Server to early exit while FEX was still running. Which would cause strange behaviour, so we needed to fix it. These are now fixed so it should stay running while applications execute
Print a warning if an unknown FEX config option is set
Every so often FEX changes config options or old ones get phased out. We didn’t have any way to alert the user that they have old config options sitting around, or even if they typo’d an option. Now if you’re manually setting config options in the config JSON, it will print a warning to alert you to fix the option. Nice little quality of life change.
Fortification safe long-jump
Last month we fixed a nasty memory leak which required introducing a single long-jump usage inside of FEX. Turns out this broke FEX on some distros that enable fortification build options when compiling FEX. This is now fixed by using a long-jump that is safe against fortifications.
See the 2506 Release Notes or the detailed change log in Github.