FEX 2212 Tagged!

A lot of good work this month with the highlight being that we have started working on our AVX implementation and started optimizing our IR to be more efficient.

Disable PCLMUL if not supported on host

This carry-less multiplication instruction is only implemented on ARM SoCs that ship the cryptographic extension. This extension is unsupported on the Raspberry pi which was causing applications that use openssl to crash. Specifically this fixes Steam running on the Raspberry Pi again.

Adds 256-bit support to the remaining IR vector ops

A lot of work this month for implementing support for 256-bit operations. With this work in place our JITs now support 256-bit for all of the IR operations.

Work started on AVX emulation

With the previous work completed for having our JITs support 256-bit operations, work could now be started on implementing AVX. This AVX work is implemented as native SVE 256-bit operations, so the only hardware that can currently execute this partial implementation is Neoverse-V1 CPUs. The expectation that as ARM CPUs become more powerful, they will eventually support SVE with 256-bit sized registers. It may take a few generations to get hardware that supports this, if ARM CPUs want to run AVX games then they will need to support the equivalent hardware feature-set.

Current instructions implemented:

  • VZEROUPPER, VZEROALL
  • VMOVAPS, VMOVQ
  • VMOVNTDQ, VMOVNTDQA, VMOVNTPD, VMOVNTPS
  • VMOVDQA, VMOVDQU
  • VMOVAPD, VMOVUPD, VMOVUPS
  • VMOVLPD, VMOVLPS
  • VMOVSHDUP, VMOVSLDUP
  • VMOVHPD, VMOVHPS
  • VMOVDDUP
  • VORPD, VORPS, VPOR
  • VPXOR, VXORPD, VXORPS
  • VANDPD, VANDPS, VPAND, VANDNPD, VANDNPS, VPANDN
  • VADDPD, VADDPS, VPADDB, VPADDW, VPADDD, VPADDQ

This is just the beginning of us implementing support for this, stay tuned as we implement the remaining operations over the next few months.

Generate register access IR operations directly

As an original implementation design detail, FEX implemented GPR and XMM register accesses as a generic emulated CPU state access. Once we added static register allocation we also added an optimization pass to convert these generic accesses in to register accesses which directly map to our static register allocator.

This is a redundant pass since we know upfront which registers were being accessed. With this change we are generating register access IR operations directly and removed the optimization pass. This removes around 12% JIT compilation time, which improves responsiveness and lets FEX spend less time compiling code.

Systemd fixes

While this is a niche supported operation, some people may be interested in running FEXServer as a systemd client. A FEXServer is meant to be a user-wide server that the FEX clients talk to for rootfs and eventually other management. Using a systemd user service, a FEXServer can be started early, letting it mount the rootfs image, and run in the background. This can be fairly useful as FEX error logs can then be printed to journalctl for inspection as for why a process has crashed.

Add support for steamid based configuration files

As an ongoing effort of documenting which applications can run with FEX’s OpenGL and Vulkan thunk libraries, it was determined that some applications use generic executable names. This means that a configuration file that uses the application name would have erroneously enabled thunks for other untested applications.

In order to work around this issue, our configuration system now supports an optional steamid based naming convention for games that are launched from Steam. With this in place, we now have a repository that contains application configurations that users can install at their leisure. This repository can be found on Github

As part of the documentation process, all of these configurations must be documented on our Wiki with testing results to ensure it works.

Implement SGDT

This is a quirky instruction that is emulated on a native x86 system these days. This instruction is a system instruction that is used by the OS for getting the configuration of the global descriptor table. Linux captures this instruction and returns a configuration that says the table is living in kernel memory space. While this is already true, an application usually doesn’t need to care about this data.

Curiously enough Denuvo uses this instruction in some of their implementations for some reason. With us implementing this instruction, Denuvo games now get slightly further before they horribly crash.

auxv fixes

When FEX executes an application, it needs to setup an emulated auxv state since this isn’t a cross-architecture state.

  • AT_RANDOM
    • This now correctly passes through the host’s AT_RANDOM value rather than fixed values
  • AT_PLATFORM
    • Some tooling uses this to determine if it is running as i686 or x86-64
  • AT_HWCAP/HWCAP2
    • This just returns some CPUID values, most applications use CPUID directly instead of this
  • AT_MINSIGSTKSZ
    • The minimum signal stack size is no longer being a hardcoded constant size
    • Applications are supposed to use this to calculate a signal stack size

Support radeon drm driver in ioctl emulation

Most Radeon GPUs these days use the amdgpu kernel driver, but a user found a hole in our ioctl emulation by using an old Radeon GPU on a Phytium ARM board.

With this in-place, older Radeon cards that use the radeon kernel driver can now have accelerated OpenGL.

Misc optimizations

This month we have had a random smattering of optimizations that improve startup, shutdown, and execve performance. While not individually providing a lot of benefit; small optimizations like these add up to make FEX better over time

  • Defer cpuinfo file initialization until first access
    • Improves startup time
  • Use tsl::robin_map for some internal maps
    • Improves JIT time, and some minor shutdown performance improvements
  • Disable multiblock by default
    • This causes excessive JIT overhead which makes the experience worse for the user
    • Significantly reduces stutters
  • Improve hot path of file existance checking in syscall wrapping
    • During our overlayfs handling, this can be hit quite hard during file accesses
    • Improves file IO in applications

Rootfs updates

We have updated our rootfs images to include Mesa 22.3.0 in this release!

  • Use the FEXRootFSFetcher tool to upgrade to the latest rootfs

See the 2212 Release Notes or the detailed change log in Github.

Written on December 5, 2022