<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://fex-emu.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://fex-emu.com/" rel="alternate" type="text/html" /><updated>2026-03-04T21:39:25+00:00</updated><id>https://fex-emu.com/feed.xml</id><title type="html">FEX-Emu</title><subtitle>A fast linux usermode x86 and x86-64 emulator</subtitle><entry><title type="html">FEX 2603 Tagged</title><link href="https://fex-emu.com/FEX-2603/" rel="alternate" type="text/html" title="FEX 2603 Tagged" /><published>2026-03-04T00:00:00+00:00</published><updated>2026-03-04T00:00:00+00:00</updated><id>https://fex-emu.com/FEX-2603</id><content type="html" xml:base="https://fex-emu.com/FEX-2603/"><![CDATA[<p>Welcome back, take off your shoes and relax, it’s been a while since our last release. With all of the regression hunting sorted out, we now have
around two months worth of changes to go over. Let’s jump in to some of the changes that happened!</p>

<h3 id="steamwebhelper-crash---known-bug">Steamwebhelper crash - <strong>Known Bug</strong></h3>
<p>One of the bugs we encountered that caused us to cancel last month’s release was a spurious crash that occurs with Steam’s steamwebhelper process.
This ends up behaving like Steam is crashing constantly and coming back. What’s actually happening is one of the steamwebhelper processes crashes, and
then steam restarts it consistently. This isn’t actually a regression on the FEX side, but a change that occured in Steam late last year that we
didn’t notice initially. The CEF version that Steam is shipping has updated to change some behaviour around FD handling that FEX interacts badly with.
We haven’t fully worked around the issue so Steam’s GUI may still crash fairly frequently, but as long as FEX’s logging is disabled then it is less
likely to occur.</p>

<h3 id="hide-biglittle-layout-by-default">Hide big.LITTLE layout by default</h3>
<p>We determined this month that some games with anti-tamper break if we expose the different CPU core names in the “product string” of CPUID.
To combat this we are now hiding the big.LITTLE nature of the CPUs by default and now replicate CPU zero’s name across all cores. This fixes a fairly
significant number of of anti-tamper games with just a trivial change.</p>

<h3 id="convert-vzerouppervzeroall-to-use-zva">Convert vzeroupper/vzeroall to use ZVA</h3>
<p>This month we found out that some CPU cores are significantly faster at zeroing a couple of cachelines of memory using the <strong>dc zva</strong> instruction.
This hadn’t become an issue on most consumer class CPUs because they can typically saturate their store pipelines using regular <strong>stp</strong> instructions
already. <strong>vzeroupper</strong> is quite common when executing AVX code, so we want to make sure it is as fast as possible. This change alone increased Death
Stranding’s FPS from 55FPS to 70FPS on AmpereOne CPUs in our test scene.</p>

<h3 id="fix-build-with-upstream-clangllvm-22-and-mingw">Fix build with upstream Clang/LLVM 22 and Mingw</h3>
<p>This month we fixed our builds for both the latest LLVM 22 release and the newest LLVM-Mingw toolchain. There were some minor changes to the clang API
and libc++ that required some work on our side to resolve. With these changes in place, this will now more easily allow package managers to build our
Wine DLL files and use a newer compiler.</p>

<h3 id="switch-one-allocator-to-rpmalloc">Switch one allocator to RPMalloc</h3>
<p>This is a fairly big change that we did this month. We ripped out one of our JEMalloc allocators and instead replaced it with RPMalloc. The
driving force behind this change is that RPMalloc uses significantly less RAM for its internal state tracking, which in turn means that FEX itself is
using less RAM for the emulation. We’ve seen some dramatic examples where this change would shave hundreds of megabytes of memory off of FEX’s memory
usage. This allocator is also quite a bit smaller so it is easier to read and see what it’s doing, which is good when JEMalloc is no longer
maintained.</p>

<h3 id="various-jit-changes">Various JIT changes</h3>
<p>Another month of JIT changes that would take too much time to dive in to directly, so we’ll just list them off.</p>
<ul>
  <li>VEX compare operations fixed</li>
  <li>Optimize x87 conversion instructions</li>
  <li>Fix undocumented x87 instruction alises</li>
  <li>Implement uncommon instruction ARPL</li>
  <li>Switch over to ankerl::unordered_dense instead of tsl for cache</li>
  <li>Fix initial PF flag state</li>
</ul>

<h3 id="various-linux-frontend-changes">Various Linux frontend changes</h3>
<p>This month we had too many frontend changes to dive in to as well.</p>
<ul>
  <li>Override glibc <strong>program_invocation</strong> name, so mesa can see application profiles</li>
  <li>Works around <strong>execveat</strong> Linux kernel bug with MFD_CLOEXEC</li>
  <li>Ensures <strong>seccomp</strong> gets inherited correctly</li>
  <li>Ensure <strong>personality</strong> gets inherited correctly</li>
  <li>Moves DRM LRU FD cache to be per-thread</li>
  <li>Updates sigaltstack minimum size requirements</li>
  <li>Ensures XSTATE_MAGIC2 is saved correctly</li>
</ul>

<hr />

<p>See the <a href="https://github.com/FEX-Emu/FEX/releases/tag/FEX-2603">2603 Release Notes</a> or the <a href="https://github.com/FEX-Emu/FEX/compare/FEX-2601...FEX-2603">detailed change log</a> in Github.</p>]]></content><author><name>FEX-Emu Maintainers</name></author><summary type="html"><![CDATA[Welcome back, take off your shoes and relax, it’s been a while since our last release. With all of the regression hunting sorted out, we now have around two months worth of changes to go over. Let’s jump in to some of the changes that happened!]]></summary></entry><entry><title type="html">FEX 2601 Tagged</title><link href="https://fex-emu.com/FEX-2601/" rel="alternate" type="text/html" title="FEX 2601 Tagged" /><published>2026-01-07T00:00:00+00:00</published><updated>2026-01-07T00:00:00+00:00</updated><id>https://fex-emu.com/FEX-2601</id><content type="html" xml:base="https://fex-emu.com/FEX-2601/"><![CDATA[<p>As the developers awaken from their holiday induced hiberation, another release is upon us in the new year! Let’s see what we managed to implement
before hibernation snuck up on us.</p>

<h3 id="update-thunks-for-vulkan-14337">Update thunks for Vulkan 1.4.337</h3>
<p>This update is fairly important as Proton and Mesa have started using some new extensions that we didn’t previously support. So if your system had a
new driver with these extensions then dxvk/vkd3d-proton would assert out. With these updated it is no longer a problem and thunking is working again
as normal!</p>

<h3 id="fix-a-couple-rare-hangs-on-wine-mutex-handling">Fix a couple rare hangs on Wine mutex handling</h3>
<p>FEX has a custom mutex implementation that has “writer” priority that was implemented a few months ago. We needed to implement this to help reduce
stuttering in our code cache implementation. When implementing it we actually had two bugs in the implementation for Wine that went unnoticed until
now. Our implementation is fairly smart and will spin on the mutex using ARM’s highly efficient “Wait-For-Event” instruction for 1/10th of a
millisecond before deferring to the kernel mutex implementation. Because we block the mutex for such a small amount of time, it was /highly/ unlikely
to hit the kernel implementation. When we did defer to the kernel (Or Wine’s implementation anyway) we had a bug in our anti-stampeding behaviour
which would cause reader threads to never wake up. Then additionally we had a bug in the calling API for the “RtlWaitOnAddress” function declaration
where it was only waiting on a 32-bit address. Causing the process to never wake up and usually crash.</p>

<p>A very long winded way to say we had two simple bugs that were rare and infuriating to debug because of a race.</p>

<h3 id="jit-fixes">JIT fixes</h3>
<p>This month there weren’t actually that many JIT fixes. We found a bug that was breaking Ubisoft’s UPlay program which is now fixed.
Additionally we resolved some handling of self-modifying code on our Wine implementation that could fix some spurious hangs or incorrect
invalidations.</p>

<h3 id="minor-linux-syscall-fix">Minor Linux syscall fix</h3>
<p>This month we noticed that Steam was using some new fcntl syscall operations that we didn’t handle. This caused Steam to crash in some rare edge cases
when it actually called this syscall. We have now resolved this, and future proofed any new commands getting emulated by passing directly to the
kernel.</p>

<h3 id="more-code-caching-implementation-work">More code caching implementation work</h3>
<p>This is definitely the task that has the most code land for it. There was code caching support wired up for both Linux side and Wine side. While still
a heavy Work-In-Progress, we have code caches getting generated and loaded at runtime to reduce the amount of time spent running in the JIT.
In particular, this can be thought of as trading CPU time for disk space, theoretically reducing JIT stutter if the code already existed in the cache.
Still lots of work to go to make this viable for the user, but we’ll be trucking along as usual!</p>

<h3 id="39c3-talk-from-neobrain">39C3 talk from <a href="https://mastodon.social/@neobrain">@neobrain</a></h3>
<p>This last month, neobrain had a talk going through some of the architecture of FEX-Emu. It’s very informative and definitely worth the watch!</p>
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/3yDXyW1WERg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe>

<hr />

<p>See the <a href="https://github.com/FEX-Emu/FEX/releases/tag/FEX-2601">2601 Release Notes</a> or the <a href="https://github.com/FEX-Emu/FEX/compare/FEX-2512...FEX-2601">detailed change log</a> in Github.</p>]]></content><author><name>FEX-Emu Maintainers</name></author><summary type="html"><![CDATA[As the developers awaken from their holiday induced hiberation, another release is upon us in the new year! Let’s see what we managed to implement before hibernation snuck up on us.]]></summary></entry><entry><title type="html">FEX 2512 Tagged</title><link href="https://fex-emu.com/FEX-2512/" rel="alternate" type="text/html" title="FEX 2512 Tagged" /><published>2025-12-05T00:00:00+00:00</published><updated>2025-12-05T00:00:00+00:00</updated><id>https://fex-emu.com/FEX-2512</id><content type="html" xml:base="https://fex-emu.com/FEX-2512/"><![CDATA[<p>Another month and here we are with a new release! We also celebrated our <a href="/FEXiversary/">seven year anniversary</a> late last month; but enough about that boring
stuff, let’s talk about what we improved!</p>

<h3 id="remap-procfs-cmdline-using-pr_set_mm_map">Remap procfs <strong>cmdline</strong> using <strong>PR_SET_MM_MAP</strong></h3>
<p>This has been a thorn in our side for a while. When an application reads the cmdline FEX would need to rewrite the file contents to remove the
FEXInterpreter argument. Turns out the kernel has had this feature for quite a while to remap this file, we just weren’t utilizing it.
Now instead of mangling the data, we are using the correct interface from the kernel. This means that things like Mesa application profiles and KDE
Plasma see the correct application name in all instances.</p>

<p><img src="/images/posts/2512-12-05/fex_task_manager_name.png" alt="KDE Plasma before" />
<img src="/images/posts/2512-12-05/fex_task_manager_name_after.png" alt="KDE Plasma after" /></p>

<p>Big shoutout to the external contributors that implemented this for us!</p>

<h3 id="implement-support-for-jit-codebuffer-guard-page-based-restart">Implement support for JIT codebuffer guard page based restart</h3>
<p>This one takes a bit to explain what this is and why it is necessary. When writing our AArch64 code emitter, we made the decision not to do range
checks for how much memory is remaining in our JIT code buffer. We instead used a heuristic to determine how much space is required which usually
worked. The problem with heuristics of course is that they can fail and our “fallback” case was to crash. This was a known problem that we would need
to resolve at some point, and that was finally this month that we go around to it. Due to us utilizing larger “multiblock” JIT blocks, we had started
having a more likely chance of hitting this crash, which usually ends up being due to x87 heavy code because the JIT translation is heavy.</p>

<p>Now when the heuristic fails, our code emitter will try writing to our guard page and we will catch the <strong>SIGSEGV</strong> and restart the JIT with a larger
code buffer. Fixing these edge case crash behaviours and making our JIT more robust in the process.</p>

<h3 id="initial-code-caching-features-landing">Initial code caching features landing</h3>
<p>There’s an absolute ton of work that is going in to this and it’s not yet ready for users yet, but it would be remiss to not call out all the effort
on this front.
This month we landed initial support for “code maps” and offline “code cache” generation. There is not yet any way for a user to actually utilize these
code maps and caches but these are the required steps to get us to the transparent code-caching that we are expecting to have. Watch out for the
coming months as we finish fleshing out this feature fully wired up.</p>

<h3 id="fixes-apicid-count">Fixes APICID count</h3>
<p>This is a bit of a weird feature that we had accidentally missed. When reading CPUID processes get what is called an APIC ID, which is essentially
just a core index. Some applications will use this ID as a way to determine how many unique CPU cores are available on the system. We were
accidentally always returning zero which was causing some applications to only think the system had one CPU. With this fixed, the FPGA software that
this was detected in now generates the correct number of worker threads for the cores in the system. This of course improves their synthesize time
dramatically since they scale well with the number of cores in the system.</p>

<h3 id="disable-io_uring-syscalls">Disable io_uring syscalls</h3>
<p>Our good friends over at <a href="https://felix86.com/">felix86</a> alerted us to an issue around io_uring causing infinite loops in node.js and libuv. Upon further
investigation we determine that there is an ABI break in io_uring between x86 and Arm64 that we previously didn’t know about. This comes down to how
the user submission queues in io_uring can embed epoll_event structures and these have different layouts between the architectures.</p>

<p>Because we can’t safely rewrite the queue data to handle this layout difference, we have determined the only course of action is to disable the
syscalls. Luckily most games don’t rely on this syscall interface or applications will have a legacy fallback for when it is unsupported. In that
vein, node.js now works again.</p>

<h3 id="feat_lrcpc2-performance-errata">FEAT_LRCPC2 performance errata</h3>
<p>This month we found out that a large number of Cortex and Neoverse CPU cores have an errata that only affects the instructions added in FEAT_LRCPC2.
We have disabled this extension on the affected CPU cores, which can give a reasonable performance improvement in games that were TSO emulation bounded.</p>

<h3 id="jit-and-emulation-bug-fixes">JIT and emulation bug fixes</h3>
<p>There were a bunch of bug fixes in both our JIT and Linux syscall emulation this month as usual, but this month’s report is already running long so if
you’re interested, take a peek at our pull requests to find out more.</p>

<hr />

<p>See the <a href="https://github.com/FEX-Emu/FEX/releases/tag/FEX-2512">2512 Release Notes</a> or the <a href="https://github.com/FEX-Emu/FEX/compare/FEX-2511...FEX-2512">detailed change log</a> in Github.</p>]]></content><author><name>FEX-Emu Maintainers</name></author><summary type="html"><![CDATA[Another month and here we are with a new release! We also celebrated our seven year anniversary late last month; but enough about that boring stuff, let’s talk about what we improved!]]></summary></entry><entry><title type="html">FEX seven year anniversary!</title><link href="https://fex-emu.com/FEXiversary/" rel="alternate" type="text/html" title="FEX seven year anniversary!" /><published>2025-11-28T00:00:00+00:00</published><updated>2025-11-28T00:00:00+00:00</updated><id>https://fex-emu.com/FEXiversary</id><content type="html" xml:base="https://fex-emu.com/FEXiversary/"><![CDATA[<p><a href="/images/posts/2025-11-28/fex-cake-final.avif"><img src="/images/posts/2025-11-28/fex-cake-final.avif" alt="FEX anniversary cake" /></a></p>

<p>Hey everyone! In light of the recent product announcements using FEX, I wanted to take a moment to celebrate our anniversary! On this day seven years ago (28th Nov 2018), I landed the first commit in the prototype project that would eventually become FEX-Emu! I want to thank the people from Valve for being here from the start and allowing me to kickstart this project. They trusted me with the responsibility of designing and frameworking the project in a way that it can work long-term; not only for their use cases but also keeping it an open project that anyone can adapt for their own use cases.</p>

<p>Additionally I need to thank the great collaborators that have worked on the project over the years, because this project wouldn’t be here without them as well! Myself alone wouldn’t be enough to push us over the finish line, as it was a large collaborative effort to get us where FEX is today! That said, I also need to give a huge shoutout to the community that has grown over the years. Everyone’s tireless effort at testing weird edge case applications and reporting bugs/performance/etc problems has buffed out a significant number of problems before even being “user-viable.”</p>

<p>This is only the beginning of our journey; FEX is still maturing and there is still a ton more we need to grow. FEX has been a wild ride and it has been so great sharing this project with the world!</p>

<p>Keep gaming, gamers!</p>

<p>— Ryan H</p>

<!-- The meme is the emdash is necessary to not create a list -->]]></content><author><name>Ryan Houdek</name></author><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">FEX 2511 Tagged</title><link href="https://fex-emu.com/FEX-2511/" rel="alternate" type="text/html" title="FEX 2511 Tagged" /><published>2025-11-05T00:00:00+00:00</published><updated>2025-11-05T00:00:00+00:00</updated><id>https://fex-emu.com/FEX-2511</id><content type="html" xml:base="https://fex-emu.com/FEX-2511/"><![CDATA[<p>You would think doing this month after month we would eventually run out of things to work on, but in true emulator fashion the work never ends. Let’s
jump in to what has changed for the release this month!</p>

<h2 id="more-jit-improvements-is-this-surprising-yet">More JIT improvements (Is this surprising yet?)</h2>
<p>This month we have another smattering of changes that primarily affect the JIT, but also some related systems around it, let’s break it down.</p>

<h3 id="potential-memory-savings">Potential memory savings</h3>
<p>It’s been known that FEX’s memory usage hasn’t been amazing, we cause a decent amount of memory overhead for every emulated process which adds up
quickly. This becomes a huge burden on systems with only 8GB of RAM, but also hits 16GB RAM users; This is why most of our developers run on systems
with at least 32GB of RAM to sidestep the problem. The vast majority of the problem comes from our JIT’s lookup caches, called L1, L2, and L3 caches
depending on the tier. The L1 and L2 caches are a per-thread resource, while the L3 cache is shared between all threads in a process. All these caches
end up doing is storing where to find JIT code for relevant x86 code, it’s not the JIT code itself until the L3 cache!</p>

<p>Turns out that these per-thread caches can actually consume quite a bit of memory and because it’s per thread then it tends to scale out very heavily
for games that create quite a few threads. In some heavier games like Death Stranding, the combined L1 and L2 cache sizes can end up consuming around
a gigabyte of memory! Just for lookup tables to find JIT code! We’ve also seen other games consume more or less depending on what they end up doing,
usually growing the longer a game runs.</p>

<p>To help mitigate this problem, we are introducing two new FEX options. The first option is to disable the L2 cache entirely. While this is a fairly
heavy hammer, it’s the L2 that consumes the majority of the RAM so it’s a good first step. The second option enables a heuristic to grow or shrink the
L1 cache dynamically based on how frequently it is used. This also can dramatically reduce the size of the L1 cache, but because it’s <strong>usually</strong> only a
few dozen megabytes, it isn’t quite as interesting.</p>

<p>The reason why these options aren’t enabled by default although is because there is a chance for them to introduce stuttering, which is hard to
distinquish from just regular “JIT” stutter because it tends to happen at the same time. This is why we have actually introduced another optimization
in that we have implemented our own writer-priority-mutex which is super low latency for our lookup caches. This cuts the lock contention time
compared to our previous C++ mutex to about a third, which helps reduce stuttering.</p>

<p>So go out, enable the options if you’re in a low-memory situation and let us know if it works well for you!</p>

<h3 id="fixes-crashes-due-to-out-of-bounds-branch-encoding">Fixes crashes due to out of bounds branch encoding</h3>
<p>One issue that FEX has fought over the past couple of years is that when our JIT encounters a problem with branch targets, we couldn’t restart and try
again. This would result in either code that is broken being generated, or FEX throwing a message about it, both of which typically result in a game
crashing. This month we introduced the ability to safely restart the JIT if we encounter this situation and compile again knowing that branch targets
need to be “far jump” aware. This usually fixes older games that rely heavily on x87, but it can technically happen in a bunch of random edge cases.</p>

<p>If you have had a game that spuriously crashed, this might fix it!</p>

<h3 id="enable-avx-for-32-bit-by-default">Enable AVX for 32-bit by default</h3>
<p>This is a fairly minor change, but for Linux emulation we weren’t enabling AVX out of a concern for potential stack overflow problems. We have fixed a
couple of edge case bugs and have decided to turn it on by default, since some algorithms can provide some performance benefits that we want to ensure
we get. If we find any games that have stack overflows with AVX enabled then a game profile to disable it is trivial.</p>

<p>This isn’t wired up to the WoW64 emulation because of some missing features in the Wine/Windows side, but that lives outside of FEX’s control.</p>

<h3 id="performance">Performance!</h3>
<p>This month we also added a few performance improvements. Primarily we have fixed an oversight with string instructions still using TSO memory model by
default, which was significant performance issues in games like Dishonored. We also optimized x87 register exchange instruction slightly.</p>

<h2 id="minor-bug-fixes-this-month">Minor bug fixes this month</h2>
<p>We found a handful of bugs around the project. We found out that the game Ender Magnolia was crashing when thunks were enabled due to a quirky
interaction between OpenGL and Vulkan. So GL and Vulkan thunks at the same time should be a bit more stable this month.</p>

<p>Additionally we had a few bugs show up in our memory allocator which have been stamped out. These were also showing up in Ender Magnolia by chance,
resulting in some hard to diagnose memory corruption. With that fixed our memory allocator is now even more robust!</p>

<hr />

<p>See the <a href="https://github.com/FEX-Emu/FEX/releases/tag/FEX-2511">2511 Release Notes</a> or the <a href="https://github.com/FEX-Emu/FEX/compare/FEX-2510...FEX-2511">detailed change log</a> in Github.</p>]]></content><author><name>FEX-Emu Maintainers</name></author><summary type="html"><![CDATA[You would think doing this month after month we would eventually run out of things to work on, but in true emulator fashion the work never ends. Let’s jump in to what has changed for the release this month!]]></summary></entry><entry><title type="html">FEX 2510 Tagged</title><link href="https://fex-emu.com/FEX-2510/" rel="alternate" type="text/html" title="FEX 2510 Tagged" /><published>2025-10-08T00:00:00+00:00</published><updated>2025-10-08T00:00:00+00:00</updated><id>https://fex-emu.com/FEX-2510</id><content type="html" xml:base="https://fex-emu.com/FEX-2510/"><![CDATA[<p>We’re just gonna kick out this little release and be on our way. There might be some interesting things this month, read and find out!</p>

<h2 id="jit-improvements">JIT Improvements</h2>
<p>This month we have had various JIT improvements and bug fixes. Most of which are bug fixes but we do have some performance optimizations as well.</p>
<ul>
  <li>Fixed a potential crash with x87 loadstore operations</li>
  <li>Fixed a potential crash with AVX on 32-bit applications</li>
  <li>Fixes legacy fxsave/fxrstor x87 register saving and restoring</li>
  <li>Fix telemetry on legacy segment register usage</li>
  <li>Avoid flushing MMX registers until MMX-&gt;x87 transition</li>
  <li>Fix NaN propagation behaviour in x87</li>
  <li>Implement support for SSE4a</li>
  <li>Fix SIGILL reporting
    <ul>
      <li>Fixes Mafia II Classic</li>
    </ul>
  </li>
  <li>ARM64ec: Check for suspend interrupts on backedges
    <ul>
      <li>Fixes UPlay behaviour</li>
    </ul>
  </li>
</ul>

<h2 id="better-cache-x87-intermediate-results-in-slow-path">Better cache x87 intermediate results in “slow path”</h2>
<p>To understand what this is doing, one must first understand the optimizations that our JIT is doing for most x87 operations. As everyone knows, x87 is
a wacky stack based architecture where you push data on to a stack before doing operations. When done, you then pop those off in to memory.</p>

<p>Contrary to popular belief, this isn’t how most architectures today handle floating point operations. Both ARM’s ASIMD/NEON and x86’s SSE instead work
directly on this popular concept called “registers.” So when translating x87 behaviour to what ARM does, it doesn’t really match!</p>

<p>To combat this, FEX has a very extensive “x87StackOptimizationPass” which optimizes our IR to remove stack usage behaviour as much as possible. This
causes significant performance improvements in x87 heavy games because all the x87 stack management typically gets deleted! But, this does have the
downside that if FEX doesn’t see all of the x87 stack usage, we must fall down a safe “slow path” which does correct stack management. This means some
code doesn’t get the benefits of the optimization pass.</p>

<p>This month our developer <a href="https://github.com/bylaws">bylaws</a> decided to tackle this problem and improve the situation. In the case that we need to hit
this slow path, we were not caching any of the intermediate stack calculations which was causing more overhead than should be necessary. With this
significant improvement, we have seen the number of ARM instructions necessary to run in the slow path drop dramatically!</p>

<p>Some examples:</p>
<ul>
  <li>25 x86 instructions: Used to take 169 ARM instructions, now only 72.
    <ul>
      <li>This was in Half-Life</li>
    </ul>
  </li>
  <li>214 x86 instructions: 3165 -&gt; 1743!
    <ul>
      <li>This was in Oblivion</li>
    </ul>
  </li>
  <li>351 x86 instructions: 5270 -&gt; 2809!
    <ul>
      <li>This was Psychonauts</li>
    </ul>
  </li>
</ul>

<p>As one can see, there are some significant instruction reductions which directly correlate to performance improvements! There will be countless number
of games using x87 that this improves performance of!</p>

<h2 id="memory-usage-hunting">Memory usage hunting</h2>
<p>This month we started hunting down FEX’s memory usage. It has been known for a while that FEX’s memory usage could be better, tripping up 16GB
devices, and even worse 12GB or 8GB devices. This has been a known thorn but there were other higher priority problems before we work towards solving
this.</p>

<p>This month we have added the ability to name our memory allocations so we can start tracking usage by what is actually allocating it. Additionally
going through some data structures and making them smaller. While this month we only did some minor improvements, expect some more significant memory
savings as we have a bunch in-flight as we are rolling out this release!</p>

<p>(P.S. We also fixed a bug where we allocated 16MB of RAM from 32-bit guests, that’s been fixed)</p>

<h2 id="implement-support-for-extended-volatile-metadata-on-linux">Implement support for “Extended Volatile Metadata” on Linux</h2>
<p>Taking a leaf from what Microsoft has done in their compiler, we have implemented a feature called “Extended Volatile Metadata” or lovingly called
<strong>EVMD</strong>. To know how this feature works, we need to know how Microsoft’s “Volatile Metadata” itself works.</p>

<p>In Windows land, Microsoft knew that emulating the x86 memory model is one of the most costly things emulators need to do. To help alleviate the
strain of this, they employed compiler assistance to know when x86 doesn’t need to strictly follow the x86 memory model. In Microsoft’s Visual Studio
compiler 2019 and newer, they will inspect code as it is compiled and determine if it is requiring thread safe memory semantics or not. When it has
been determined that a piece of code is safe to use a weaker memory model, they add it to the volatile metadata. Once the entire program is compiled,
it then gets stored inside the executable.</p>

<p>When the executable runs on x86 hardware, this data is ignored and nothing has changed. On the other hand, when running on a Windows Arm64 device,
this data structure is fed in to their emulator and is used to determine if they can safely turn off x86-TSO emulation! This gives a dramatic speedup
for any application that actually has this metadata. In fact, it is so useful that ARM64EC FEX uses the data as well to ensure we get similar
performance improvements!</p>

<p>This new feature that FEX has implemented allows manual user intervention to declare regions of code that are known safe to disable TSO emulation on.
Just like the regular metadata, but works on Linux applications and also Windows applications that are compiled without it or before 2019!</p>

<p>Sadly this isn’t automatic and requires significant work from either FEX developers or users to safely determine hot code blocks. For example, a game
that spends more than 80% of its CPU time in a memcpy function can be safely optimized with a simple FEX option to enable EVMD.</p>
<ul>
  <li><strong>FEX_EXTENDEDVOLATILEMETADATA=iw4sp.exe\;0xe9da0-0xe9ec7</strong></li>
</ul>

<p>Hopefully this feature will be used by our users for games they’ve determined run too slow with TSO emulation enabled, but crash with TSO emulation
disabled! Gives us a little step in the middle to work around the <strong>FEAT_LRCPC</strong> extension.</p>

<h2 id="good-bye-fexloader-and-fexinterpreter">Good bye FEXLoader and FEXInterpreter!</h2>

<p>FEX has historically had to methods of invocation: FEXInterpreter, which directly starts emulation of the given program, and FEXLoader, which additionally supports overriding FEX configuration using command line arguments. We also support such config overrides using environment variables like <code class="language-plaintext highlighter-rouge">FEX_MULTIBLOCK</code> though, so we decided to remove the redundant FEXLoader binary to ease maintenance. With environment variables, per-app configuration files, and the graphical FEXConfig tool for permanent changes, we’re covering most use cases.</p>

<p>With FEXLoader out of the picture, we took the opportunity to rename <code class="language-plaintext highlighter-rouge">FEXInterpreter</code> to just <code class="language-plaintext highlighter-rouge">FEX</code>. The old name is still available as an alias for now, but we recommend updating any scripts to avoid friction in the near future.</p>

<h2 id="a-small-change-for-building-fex-tests">A small change for building FEX tests</h2>

<p>Our build scripts used to have two variables that needed to be set to enable testing: BUILD_TESTS and BUILD_TESTING. Setting the wrong one when toggling tests on or off would have surprising side effects (like CTest running old build artifacts!), so we invested some time to refactor our CMake code and unify the two. In the new release, testing is now controlled using the <code class="language-plaintext highlighter-rouge">BUILD_TESTING</code> variable, as in any other CMake-based project.</p>

<hr />

<p>See the <a href="https://github.com/FEX-Emu/FEX/releases/tag/FEX-2510">2510 Release Notes</a> or the <a href="https://github.com/FEX-Emu/FEX/compare/FEX-2509...FEX-2510">detailed change log</a> in Github.</p>]]></content><author><name>FEX-Emu Maintainers</name></author><summary type="html"><![CDATA[We’re just gonna kick out this little release and be on our way. There might be some interesting things this month, read and find out!]]></summary></entry><entry><title type="html">FEX 2509 Tagged</title><link href="https://fex-emu.com/FEX-2509/" rel="alternate" type="text/html" title="FEX 2509 Tagged" /><published>2025-09-09T00:00:00+00:00</published><updated>2025-09-09T00:00:00+00:00</updated><id>https://fex-emu.com/FEX-2509</id><content type="html" xml:base="https://fex-emu.com/FEX-2509/"><![CDATA[<p>After last month’s enormous improvements, this release will look quite tame in comparison. Although we still did a bunch of work, so let’s dive in.</p>

<h2 id="more-jit-improvements">More JIT improvements</h2>
<p>Not quite as striking, but we still did work on the JIT this last month as usual. As usual too many to go over individually so here’s a list.</p>
<ul>
  <li>Move Arm64EC EC entrypoint check to L1 branch target lookup</li>
  <li>Fix 32-bit zero-extend semantics on CMPXCHG</li>
  <li>Fix 16-bit operand ENTER/LEAVE instructions</li>
  <li>Implement support for JMPF/CALLF/RETF with operating mode switch
    <ul>
      <li>Mode switch will still cause crashes until remaining implementation details are sorted</li>
    </ul>
  </li>
  <li>Optimize x87 FTW calculation</li>
  <li>Fix IEEE 754 unordered comparison in x87</li>
  <li>Various JIT execution time improvements to reduce stutters</li>
  <li>Recheck TF flag after POPF instruction to ensure correct behaviour</li>
  <li>Fix some OOB reads for specific float instructions</li>
  <li>Always explore fallthrough conditional branches</li>
</ul>

<h2 id="mono-specific-performance-hacks-for-arm64ecwow64">Mono specific performance hacks for ARM64EC/WOW64</h2>
<p>Mono’s JIT has consistently been a pain for us where it’s use of <a href="https://en.wikipedia.org/wiki/Self-modifying_code">self-modifying code</a> causes
significant code invalidation and stutters due to how we interact with each other. When JITs get stacked in emulation it is almost always a bad time.
We now detect Mono on Arm64EC/WOW64 and will do some very specific optimizations. I won’t go in to the details since it’s fairly complex, but when
Mono is detected, we will end up having less stutter, more performance, and less likely to crash when TSO memory model emulation is disabled.</p>

<h2 id="disable-3dnow-by-default-on-wine-wow64-binary">Disable 3DNow! by default on Wine WOW64 binary</h2>
<p>We found out this month that Fallout: New Vegas will only render a black screen when <a href="https://en.wikipedia.org/wiki/3DNow!">3DNow!</a> is enabled. This is seemingly due to specific optimized
code routines that the D3D9X support library uses for various things. We are currently assuming that this is due to how we implemented the 3DNow!
reciprocal instructions, where they behave like the <a href="https://en.wikipedia.org/wiki/Geode_(processor)">Geode</a> implementation rather than “typical” AMD
3DNow! versions. This could mean that these games also render black screens on Geodes, but getting an <a href="https://en.wikipedia.org/wiki/Geode_(processor)">OLPC XO-1</a> today to
try the game seems a bit excessive. Instead we have built a Phenom system to ensure the game renders fine on a real Phenom and to confirm our
suspicions. It renders fine on real hardware, so until we have a soft-float implementation of these instructions we won’t have complete proof but
something to work towards.</p>

<p>Until then, we are going to disable 3DNow! in this configuration, which causes this d3d9x support library to use
<a href="https://en.wikipedia.org/wiki/SSE2">SSE2</a> codepaths instead; Resolving the black screen rendering in the process.</p>

<h2 id="fix-brk-size-calculation">Fix BRK size calculation</h2>
<p>When emulating a Linux application, we need to setup a small amount of memory higher in the address space for the <a href="https://www.man7.org/linux/man-pages/man2/brk.2.html">“program
break”</a>. This is a bit of legacy memory allocation that tends to get used early in a program’s
life-cycle and then everything ends up using mmap later. Our original size calculation for this was actually incorrect which was causing part of our
ELF loading to overlap this range. Usually not a problem because it usually meant some overlap with a BSS segment of the ELF that was aligned to some
larger page size and didn’t do any harm.</p>

<p>The bigger problem is that we could have inadvertently allowed 32-bit applications to allocate memory in to the low pages of 64-bit memory space which
could result in emulation bugs. Additionally we were always reserving a full 8MB of virtual memory space, which while it isn’t a lot, there are late
generation 32-bit games that run in to Virtual Address space exhaustion, so giving back as much of that 8MB as possible means they are less likely to
OOM themselves.</p>

<p>It’s a little thing, but nice to see fixed.</p>

<h2 id="force-wow64-allocations-outside-of-the-32-bit-memory-space">Force WOW64 allocations outside of the 32-bit memory space</h2>
<p>Turns out FEX was accidentally allocating some memory in the 32-bit address space when running in WOW64 Wine emulation mode. Oops! We have now fixed
this which similar to the last point, means 32-bit games are less likely to OOM.</p>

<h2 id="updates-to-downstream-fex-packages">Updates to downstream FEX packages</h2>

<p>We have news for users of the FEX packages included in certain Linux distributions:</p>

<p>Apple Silicon users running <strong>Fedora Asahi Remix</strong> (and others that use <code class="language-plaintext highlighter-rouge">muvm</code>) may have seen
an issue where FEX failed to start after customizing its configuration. The issue was that
our graphical FEXConfig tool overrode muvm’s internal RootFS setup, so
<a href="https://github.com/FEX-Emu/FEX/pull/4802">we fixed the problem</a> by no longer touching
settings managed by the microVM. With the new FEX release, you can use FEXConfig like on any other
distribution: To configure logging, to change TSO configuration, to enable library forwarding,
and more.</p>

<p>Speaking of library forwarding: This feature provides significant speed-ups by omitting
costly emulation and instead directly calling into native ARM64 libraries. Unfortunately
only few distributions have so far been able to provide this feature in their FEX packages:
Our Ubuntu PPA includes it out of the box, and Fedora users can install fex-emu-thunks from
the official repositories. Joining these ranks this month is <strong>NixOS</strong> thanks to our
<a href="https://github.com/NixOS/nixpkgs/pull/413255">nixpkgs pull request</a> being merged.
This support is somewhat preliminary since there are open questions around the unique approach
to dynamic linking taken in Nix/NixOS, but nixpkgs community members are already looking into
this problem. Hopefully it won’t be long until enjoyers of reproducible builds will have the
same fluent experience as Fedora and Ubuntu users!</p>

<hr />

<p>See the <a href="https://github.com/FEX-Emu/FEX/releases/tag/FEX-2509">2509 Release Notes</a> or the <a href="https://github.com/FEX-Emu/FEX/compare/FEX-2508...FEX-2509">detailed change log</a> in Github.</p>]]></content><author><name>FEX-Emu Maintainers</name></author><summary type="html"><![CDATA[After last month’s enormous improvements, this release will look quite tame in comparison. Although we still did a bunch of work, so let’s dive in.]]></summary></entry><entry><title type="html">FEX 2508 Tagged</title><link href="https://fex-emu.com/FEX-2508/" rel="alternate" type="text/html" title="FEX 2508 Tagged" /><published>2025-08-01T00:00:00+00:00</published><updated>2025-08-01T00:00:00+00:00</updated><id>https://fex-emu.com/FEX-2508</id><content type="html" xml:base="https://fex-emu.com/FEX-2508/"><![CDATA[<p>You thought we were done with optimizations? Too bad, we had some massive improvements this month! Let’s jump in!</p>

<h2 id="big-juicy-jit-optimizations">Big juicy JIT optimizations!</h2>
<p>The improvements this month can’t be understated for how much performance have been lifted. To start off, let’s show off some performance graphs for a handful of games!</p>

<div id="container_perc" style="min-width: 250px; height: 300px; margin: 0 auto">
</div>

<p>And a chart for the averaged FPS numbers recorded from each of these games.</p>
<div id="container_average" style="min-width: 250px; height: 300px; margin: 0 auto">
</div>

<p>As you can see from the tested games, the improvements can be wild depending on what the game is doing! A nearly 39% FPS uplift in Cyberpunk 2077 is
wacky! From various testing we have done, the uplift tends to be closer to Cyberpunk but there are of course other games like God Of War where the
uplift is minimal!</p>

<p>The majority of this performance uplift has come from call-return stack optimizations, where we are now able to take advantage of the ARM CPU’s own
call-return prediction hardware, but we have had a variety of optimizations this month that improve both JIT compilation time in addition to execution
time! Additionally now we compile significantly less code since we would have combinatorial explosion of JIT compiles when multiblock was enabled. We
have now made it so each individual block of JIT code is freestanding and usually only gets compiled once.</p>

<p>Another improvement this month is the WINE wow64/arm64ec libraries can now take advantage of Apple Silicon’s hardware TSO feature. This happened to
not be implemented with the wow64/arm64ec code path. This will significantly improve performance on that hardware in the case that someone spent the
effort to run a game in that environment.</p>

<p>There’s a few other JIT improvements but we could spend all day here if we talked about everything! Have fun gaming with the performance improvements!</p>

<h2 id="implement-nx-bit">Implement NX bit</h2>
<p>This is a fun little security feature which prevents games from executing code that isn’t mapped executable. This is a feature that has been around
for a long time in hardware, but FEX has finally implemented it! This fixed a single game that we know of, where it tests to ensure this security
feature is enabled. This usually isn’t a problem for most games, but it is kind of funny that a game using NaCL didn’t work because of it.</p>

<h2 id="more-anti-debuggertamper-improvements">More anti-debugger/tamper improvements</h2>
<p>This month has also gotten a bunch of improvements around behaviour of code that only shows up in anti-tamper or debugger code. Specifically we found
that <strong>Peggle Deluxe</strong> and <strong>Crysis 2: Maximum Edition</strong> usually worked under FEX, but it relied on some subtle self-modifying code that happened to
work. These are either for anti-tamper, or anti-debugging, or maybe even a way to block data mining. We don’t know for sure but since the game relies
on it, we just need to support those forms of self-modifying code.</p>

<p>This may also happen to get <strong>some</strong> versions of Denuvo anti-tamper working under FEX, but it isn’t guaranteed and will depending on the game and the
particular flavour of that anti-tamper software.</p>

<h2 id="upload-wine-dll-artifacts">Upload WINE DLL artifacts</h2>
<p>This is a minor thing, but we are trying out uploading the WINE wow64/arm64ec DLL files for every commit. This can be found on <a href="https://github.com/FEX-Emu/FEX/actions/workflows/wine_dll_artifacts.yml">our github actions
page</a>. This is for people that want to tinker with the main branch under
arm64 wine. As usual we will recommend the official releases on our <a href="https://launchpad.net/ex-emu/+archive/ubuntu/fex">Launchpad PPA</a> but adventerous
users always want more.</p>

<p>See the <a href="https://github.com/FEX-Emu/FEX/releases/tag/FEX-2508">2508 Release Notes</a> or the <a href="https://github.com/FEX-Emu/FEX/compare/FEX-2507...FEX-2508">detailed change log</a> in Github.</p>

<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js">
</script>

<script src="https://code.highcharts.com/highcharts.js">
</script>

<script src="https://code.highcharts.com/modules/exporting.js">
</script>

<script type="text/javascript">
Highcharts.chart('container_perc', {
    chart: {
        type: 'column'
    },
    title: {
        text: 'FEX-2508 improvement over FEX-2507.1'
    },
    xAxis: {
        categories: ['Cyperpunk 2077', 'Stray', 'God of War 2018', 'Doom 2016', 'Grim Fandango Remastered', 'Teardown'],
        crosshair: true,
    },
    yAxis: {
        min: 0,
        title: {
            text: '% FPS improvement'
        }
    },
    tooltip: {
        valueSuffix: ' (%)'
    },
    plotOptions: {
        column: {
            pointPadding: 0.2,
            borderWidth: 0
        }
    },
    series: [
        {
            name: '% FPS improvement',
            data: [38.9, 25.2, 4.6, 30.9, 24.7, 12.6]
        }
    ]
});

Highcharts.chart('container_average', {
    chart: {
        type: 'column'
    },
    title: {
        text: 'FPS averages'
    },
    xAxis: {
        categories: ['Cyperpunk 2077', 'Stray', 'God of War 2018', 'Doom 2016', 'Grim Fandango Remastered', 'Teardown'],
        crosshair: true,
    },
    yAxis: {
        min: 0,
        title: {
            text: 'FPS'
        }
    },
    tooltip: {
        valueSuffix: ' (FPS)'
    },
    plotOptions: {
        column: {
            pointPadding: 0.2,
            borderWidth: 0
        }
    },
    series: [
        {
            name: 'FEX-2507.1',
            data: [50.3, 34.1, 64.4, 150.7, 360, 37.8]
        },
        {
            name: 'FEX-2508',
            data: [69.9, 42.7, 67.4, 197.4, 449, 42.6]
        },
    ]
});
</script>]]></content><author><name>FEX-Emu Maintainers</name></author><summary type="html"><![CDATA[You thought we were done with optimizations? Too bad, we had some massive improvements this month! Let’s jump in!]]></summary></entry><entry><title type="html">FEX 2507 Tagged</title><link href="https://fex-emu.com/FEX-2507/" rel="alternate" type="text/html" title="FEX 2507 Tagged" /><published>2025-07-08T00:00:00+00:00</published><updated>2025-07-08T00:00:00+00:00</updated><id>https://fex-emu.com/FEX-2507</id><content type="html" xml:base="https://fex-emu.com/FEX-2507/"><![CDATA[<p>This month we slowed down to take our time and we got some good fixes in store for you.</p>

<h2 id="horizon-zero-dawn-slow-motion-physics-fix">Horizon Zero Dawn slow-motion physics fix</h2>
<p>This month it was brought to our attention that the game <a href="https://store.steampowered.com/app/2561580/Horizon_Zero_Dawn_Remastered/">Horizon Zero Dawn</a> was running in slow-motion. Even though the FPS was relatively stable, the physics were all running at about a third of the speed!
This turned out to be a pretty silly bug. WINE fills a registry key with the frequency of the cycle counter, but it first determines if the RDTSC is
“reliable”. FEX was failing this reliability check which causes WINE to fall back to the maximum clock speed of the CPU. HZD would then use this value
for the speed of its animations! A modern CPU can run its CPU at more than 3Ghz, while cycle counters on both ARM and x86 don’t go anywhere near as
high! We fixed WINE’s “reliability” check inside of FEX, which means the registry key is filled correctly and the game now runs its animations at the correct speed.</p>

<p>This does mean that WINE technically still has a bug where if RDTSC is ever described as “unreliable” then you can end up with something up to 6Ghz in
that registry key, which is incorrect but will reproduce this bug even without emulation playing a role.</p>

<p>As a side-note, ARM64ec WINE still isn’t fixed with this so the game will still have weird issues there under emulation. It’s <a href="https://gitlab.winehq.org/wine/wine/-/merge_requests/8506">getting
fixed</a> but will take some time!</p>

<h2 id="optimize-x87-fsincos">Optimize x87 FSINCOS</h2>
<p>x87 is this monster of a feature that keeps coming back to bite us in performance because the softfloat requirements. This month we found a
<a href="https://wiki.fex-emu.com/index.php/Bayonetta">game</a> that hammers the FSINCOS instruction and get up to a staggering 70 million soft-float operations
per second!</p>

<p>To help mitigate this a bit, we found that we could optimize the FSINCOS soft-float implementation slightly. This gave us a 75% uplift in
operations-per-second, from 8,949 per second to 15,676 per second. While this isn’t enough to make this game run full speed, it is a significant
uplift and there may be further optimizations that can be found in the future.</p>

<p>It should be noted that with this game it is recommended use our x87 “reduced precision” mode to dramatically improve performance. It’s a speed hack
but it’s definitely worth it in this case. At least until ARM gives us 128-bit floats with transcendental instructions to match x87.</p>

<h2 id="optimize-long-division-cpuid-and-xgetbv">Optimize long division, cpuid, and xgetbv</h2>
<p>Since 2021 we had an optimization that removed long-division when it was unnecessary, turns out we had accidentally broken this in the last year
alongside a similar optimization for cpuid and xgetbv. This optimize for long-division removal previously gave a significant uplift in SuperTuxKart,
but we hadn’t tested any of the latest AAA games with it in a while. Long-division gets used all over the place because it’s the only way x86 supports
64-bit division, so we end up needing to fall back to a software implementation in worst case.</p>

<p>Go and test some games, this could result in some decent performance uplift again now that we fixed it again! Now with unittests so we don’t break it
again.</p>

<h2 id="implement-enough-of-ptrace-for-ubisoft-connect-launcher">Implement enough of ptrace for Ubisoft Connect launcher</h2>
<p>This launcher has been a thorn in our side for years. It has some anti-debugger tech built in to it for some reason which always prevented all Ubisoft
games from launching. We have now fixed this in FEX and it now works as well as expected. Now various Ubisoft games can be played! Have fun!</p>

<h2 id="a-new-cross-compilation-setup">A new cross-compilation setup</h2>

<p>Developers and testers living on the edge will know this long-standing issue: Parts of FEX require a full x86 cross-toolchain to build, but surprisingly few Linux distributions ship a setup that works out of the box. Setting things up manually is tricky enough that we couldn’t even provide the most generic instructions to do so, so we’ve been searching for a way to make life easier for people. We’ve finally found a good solution: The <a href="https://en.wikipedia.org/wiki/Nix_(package_manager)">package manager Nix</a>!</p>

<p>Once Nix is installed, it now only takes a call to one of FEX’s bundled <a href="https://github.com/FEX-Emu/FEX/tree/main/Data/nix">helper scripts</a> to build our x86-native FEXLinuxTests or to enable the library forwarding feature for improved emulation performance. For usage details, head over to the <a href="https://github.com/FEX-Emu/FEX/pull/4632/">pull request</a> that added the helpers.</p>

<p>A sweet little bonus: This same system allows you to automatically set up the toolchain required for building FEX as a WoW64/ARM64EC emulation module!</p>

<hr />

<p>See the <a href="https://github.com/FEX-Emu/FEX/releases/tag/FEX-2507">2507 Release Notes</a> or the <a href="https://github.com/FEX-Emu/FEX/compare/FEX-2506...FEX-2507">detailed change log</a> in Github.</p>]]></content><author><name>FEX-Emu Maintainers</name></author><summary type="html"><![CDATA[This month we slowed down to take our time and we got some good fixes in store for you.]]></summary></entry><entry><title type="html">FEX 2506 Tagged</title><link href="https://fex-emu.com/FEX-2506/" rel="alternate" type="text/html" title="FEX 2506 Tagged" /><published>2025-06-04T00:00:00+00:00</published><updated>2025-06-04T00:00:00+00:00</updated><id>https://fex-emu.com/FEX-2506</id><content type="html" xml:base="https://fex-emu.com/FEX-2506/"><![CDATA[<p>Welcome to the second half of the year! With this release we have some big changes to talk about, so let’s jump right in!</p>

<h2 id="reduce-jit-time-by-25-by-sharing-code-buffers-between-threads">Reduce JIT time by 25% by sharing code buffers between threads</h2>
<p>This is an absolute banger of a change from our venerable developer <a href="https://github.com/neobrain">neobrain</a>. They have been working towards
this as a goal for a while; The tricky nature of the feature making it difficult to land. Before we discuss how this improves performance, it is first
necessary to discuss how FEX’s JIT worked before this change.</p>

<p>Before this change, our JIT would execute independently for every thread that the guest application makes, without sharing those code buffers with
other threads. This meant that if multiple threads execute the same code, they would all be JITing it, consuming memory and taking precious CPU
cycles. Additionally, if a thread exits then all of that code buffer gets deleted as well and not reused at all. Not only does this consume more
memory, it’s actually worse off for CPUs because even if we are executing the same x86 code between threads, the ARM code is at different locations in
memory, meaning we put even more pressure on our CPU’s poor L2/L3 caches. This is becoming an even larger issue for newer games where they have
multi-threaded job-queue systems where any of the threads in the pool could execute jobs and it becomes random chance if the same thread ends up
executing the same code. Usually just means every thread in the pool will end up JITing all the code multiple times.</p>

<p>neobrain’s change here is a fundamental shift to how FEX does its code JITing. In particular, all JIT code gets written to a shared code buffer region
and if one thread has JITed the code, then all threads can reuse it. This means in an ideal case only one thread ever JITs code and all other threads
benefit from it. In addition since code is now shared between threads, if a thread exits then none of that JITed code is lost anymore; a new thread can
reuse it.</p>

<p>This change has some serious knock-on effects; Memory-usage is lower, total time spent in the JIT is lower, and it preps FEX to start caching code to
the filesystem for sharing between multiple invocations of an application! So not only will memory usage of applications be lower, allowing more games
to run on platforms with less RAM, but they should be faster as well since JIT time is lower and L2/L3 cache is hit less aggressively.</p>

<p>A pedantic edge case game called <a href="https://steamdb.info/app/464060/">RUINER</a> improved from around 30FPS to 60FPS due to how it
constantly JITs code due to threads being created and destroyed quickly! In other <a href="https://github.com/FEX-Emu/FEX/pull/4479">games tested</a> by neobrain,
we also see significantly less time spent in the JIT.</p>

<p>Go and test some games people!</p>

<h2 id="more-jit-optimizations">More JIT optimizations</h2>
<p>Not to be outdone, there were more JIT optimizations this month. This includes making the JIT itself faster, and also faster generated code so
performance is improved in-game. Definitely go and look at the pull requests for these to know more, because walking through each individual change
would take all day.</p>

<ul>
  <li>Inline register-allocation into SSA IR</li>
  <li>Stop using a hashmap in Dead-Code-Elimination pass</li>
  <li>Constant-fold on the fly</li>
  <li>Optimize pairs of stack pushes and pops</li>
  <li>Optimize Xor with all -1</li>
  <li>Add more cases for zeroing registers</li>
  <li>Optimize X87 FTWTag generation using fancy bit-twiddling techniques</li>
  <li>Optimize CDQ</li>
  <li>Fix for thunk callbacks potentially corrupting registers</li>
</ul>

<h2 id="fix-a-nasty-race-condition-that-causes-invalid-memory-tracking">Fix a nasty race condition that causes invalid memory tracking</h2>
<p>This was a big nasty bug that landed on our plate this last month. We noticed recently that after Steam shipping an update, it would crash very
frequently under FEX, this only seemed to occur when games were downloading. It could technically be worked around by restarting Steam each time it
crashed, but if you’re downloading a big 100GB game like Spider-Man 2 then you’re going to need to restart Steam a <strong>lot.</strong></p>

<p>After some investigation we found out that Steam has seemingly updated its memory allocator, or made it more aggressively allocate and deallocate
memory. This happens particularly frequently during a game download where each thread is now allocating and deallocating across the whole system.</p>

<p>When any memory syscall gets used under FEX, we need to track this in order to ensure that self-modifying code works correctly. We track the virtual
memory regions and keep a map around to ensure if anything gets overwritten that we can invalidate code caches. Turns out we had mutex locking in the
wrong location, which was causing us to have a different view of memory versus what the kernel had. This shows up when multiple threads perfectly
interleaved a munmap and an mmap, and having FEX’s mutex that tracked these end up in the wrong order. So FEX would end up thinking the mmap came
first and then a munmap (at the same address!) came second, but it was actually the other way around.</p>

<p>This completely broke FEX’s tracking and resulted in some bad crashes. The core change was to make FEX’s tracking mutex also wrap the syscall doing the
memory operation. Everything is sorted now and Steam is even more stable than before!</p>

<h2 id="fexserver-fixes">FEXServer fixes</h2>
<p>We had a couple of minor bugs that came up this month that were fixed in our FEXServer. We had some cases where starting an application would cause
the Server to early exit while FEX was still running. Which would cause strange behaviour, so we needed to fix it. These are now fixed so it should
stay running while applications execute</p>

<h2 id="print-a-warning-if-an-unknown-fex-config-option-is-set">Print a warning if an unknown FEX config option is set</h2>
<p>Every so often FEX changes config options or old ones get phased out. We didn’t have any way to alert the user that they have old config options
sitting around, or even if they typo’d an option. Now if you’re manually setting config options in the config JSON, it will print a warning to alert
you to fix the option. Nice little quality of life change.</p>

<h2 id="fortification-safe-long-jump">Fortification safe long-jump</h2>
<p>Last month we fixed a nasty memory leak which required introducing a single long-jump usage inside of FEX. Turns out this broke FEX on some distros
that enable fortification build options when compiling FEX. This is now fixed by using a long-jump that is safe against fortifications.</p>

<p>See the <a href="https://github.com/FEX-Emu/FEX/releases/tag/FEX-2506">2506 Release Notes</a> or the <a href="https://github.com/FEX-Emu/FEX/compare/FEX-2505...FEX-2506">detailed change log</a> in Github.</p>]]></content><author><name>FEX-Emu Maintainers</name></author><summary type="html"><![CDATA[Welcome to the second half of the year! With this release we have some big changes to talk about, so let’s jump right in!]]></summary></entry></feed>