FEX 2307 Tagged!
This release we had a bit of a slower month as some larger pieces were being worked on, but we still have some good stuff that is worth talking about.
Implement per-instruction RIP reconstruction
This was a fairly curious bug that FEX encountered. When trying to run the game Ultimate Chicken Horse then the game would crash very in its startup. While investigating the game we determined that this was one of the first games we tested that uses Unity Engine’s AOT scripting reflection(?) mechanism. This codepath seemingly heavily relies on either tagged pointers or some other mechanism that causes a SIGSEGV when accessing it the first time. After that point the Unity AOT will catch the SIGSEGV and depending on the RIP of the instruction, it will change behaviour. One of the problems with FEX is that on synchronous faults like SIGSEGV, we don’t yet support full state reconstruction. Since it seems like this only relies on RIP being correct, we can fairly easily wire this up and get Ultimate Chicken Horse running!
AVX work completed!
This last month FEX has done the last remaining work to implement AVX. With this month the remaining SSE4.2 instructions were finished, and the prerequisite XSAVE and XRSTOR instructions were implemented. Although while the feature is effectively complete we aren’t yet enabling the CPUID bit yet. We are wanting to investigate a potential crash that has cropped up in Java games due to the extension first, and additionally we want to finish up AVX2 work and enable them both in one step! Next month is looking to be the first version with AVX and AVX2 support in the source.
Fix 32-bit robust futex fetching
This issue has been a thorn in our sides for quite a while now. Usually this only ever manifested as an issue if the user was running Steam using FEX’s official PPA binaries in their setup. Once the user tried running Steam, then it would crash with a really obscure message about “Fata error: futex robust_list not initialized by pthreads.” This was something that would then never reproduce if the code was rebuilt locally.
With a bit of poking around and using a local pbuilder version of FEX we were finally able to reproduce the error. Turns out FEX was writing a 64-bit pointer back in to the result when the application tried querying the robust list pointer, overwriting part of the stack and corrupting its data. This falls under one of the circumstances of “How did this ever work!?” but now with it resolved, theoretically Steam should finally work for our users that are using the PPA build of FEX. Enjoy~!
Fix application hangs due to mutexes being locked on forks
This has been a very spicy bug that has been haunting FEX for years at this point. Whenever an application in modern day wants to execute a process it will use a combination of fork and execve. Fork might end up being a vfork, or might end up being a clone syscall that does the same thing. Regardless fork when executing in a threaded environment has some very strict requirements that it basically can only do an execve afterwards. vfork even adds an additional restriction that it can’t corrupt the stack at all because its sharing memory space.
The problem with this approach is that even if the application is only ever going to call execve after the fact, FEX needs to do a bunch of bookkeeping or additional JIT emission and execution. This causes the problem that FEX’s mutexes might end up being in an unknown state going in to a fork, which will cause this new child process to hang indefinitely on the mutex.
To work around this issue, FEX will now globally lock all mutexes that matter, do the fork, and then immediately unlock the mutexes on the parent side. On the child process FEX needs to be a bit mean to these mutexes, resetting them to zero to ensure no thread is holding the mutex. While this is fairly heavy-handed this dramatically reduces how frequently FEX hangs when fork is used.
Specifically Steam tended to launch a bunch of background processes which would hang indefinitely, causing Proton games or downloads to never continue. This should pretty much entirely be fixed!
Stop using faccessat2 to emulated faccessat
This was an oops on our part. faccessat2 was added in Linux kernel 5.8, so if your device was running an older kernel this syscall would /never/ work. We didn’t notice this since most of our devices are running a new enough kernel that faccessat2 just worked.
Thanks to the user that found this problem!
Handle xattr syscalls with overlayfs rootfs
Turns out that FEX had missed the various syscalls that access files to get xattr information. This was causing weird failures where some applications would say that a file doesn’t exist purely because it was in the rootfs overlay only. Sadly Linux doesn’t support *at variants of these syscalls so they aren’t quite as fast as native execution, but that’s fine.
Fix conflicting ARM64 register allocation
A couple months ago we added one more register to our register allocation for slightly more optimal register allocation. This broke a game called Osmos under FEX. This is purely a bug but in resolving it, we likely fixed crashes in various applications that we didn’t notice before. Oops!
This month we have a couple new rootfs images on our server that have been hotly requested! We now have an ArchLinux rootfs image and a Fedora 38 rootfs image. These haven’t been as thoroughly tested as our Ubuntu images so if you find any problems with them, make sure to let us know on our Discord