Box64 and RISC-V in 2024: What It Takes to Run the Witcher 3 on RISC-V

1 year ago (box86.org)

148 comments

pabs3

Question for somebody who doesn't work in chips: what does a software engineer has to do differently when targeting software for RISC5?

I would imagine that executable size increases, meaning it has to be aggressively optimized for cache locality?

I would imagine that some types of softwares are better suited for either CISC or RISC, like games, webservers?

dzaima 1 year ago
RISC-V with the compressed instruction extension actually ends up smaller than x86-64 and ARM on average.
There's not much inherent that needs to change in software approach. Probably the biggest thing vs x86-64 is the availability of 32 registers (vs 16 on x86-64), allowing for more intermediate values before things start spilling to stack, which also applies to ARM (which too has 32 registers). But generally it doesn't matter unless you're micro-optimizing.
More micro-optimization things might include:
- The vector extension (aka V or RVV) isn't in the base rv64gc ISA, so you might not get SIMD optimizations depending on the target; whereas x86-64 and aarch64 have SSE2 and NEON (128-bit SIMD) in their base.
- Similarly, no popcount & count leading/trailing zeroes in base rv64gc (requires Zbb); base x86-64 doesn't have popcount, but does have clz/ctz. aarch64 has all.
- Less efficient branchless select, i.e. "a ? b : c"; takes ~4-5 instrs on base rv64gc, 3 with Zicond, but 1 on x86-64 and aarch64. Some hardware can also fuse a jump over a mv instruction to be effectively branchless, but that's even more target-specific.
RISC-V profiles kind of solve the first two issues (e.g. Android requires rva23, which requires rvv & Zbb & Zicond among other things) but if linux distros decide to target rva20/rv64gc then they're ~forever stuck without having those extensions in precompiled code that hasn't bothered with dynamic dispatch. Though this is a problem with x86-64 too (much less so with ARM as it doesn't have that many extensions; SVE is probably the biggest thing by far, and still not supported widely (i.e. Apple silicon doesn't)).
- packetlost 1 year ago
  
  That seems like something the compiler would generally handle, no? Obviously that doesn't apply everywhere, but in the general case it should.
  
  14 replies →
cesarb 1 year ago

> Question for somebody who doesn't work in chips: what does a software engineer has to do differently when targeting software for RISC5?
Most of the time, nothing; code correctly written on higher-level languages like C should work the same. The biggest difference, the weaker memory model, is something you also have on most non-x86 architectures like ARM (and your code shouldn't be depending on having a strong memory model in the first place).
> I would imagine that executable size increases, meaning it has to be aggressively optimized for cache locality?
For historical reasons, executable code density on x86 is not that good, so the executable size won't increase as much as you'd expect; both RISC-V with its compressed instructions extension and 32-bit ARM with its Thumb extensions are fairly compact (there was an early RISC-V paper which did that code size comparison, if you want to find out more).
> I would imagine that some types of softwares are better suited for either CISC or RISC, like games, webservers?
What matters most is not CISC vs RISC, but the presence and quality of things like vector instructions and cryptography extensions. Some kinds of software like video encoding and decoding heavily depend on vector instructions to have good performance, and things like full disk encryption or hashing can be helped by specialized instructions to accelerate specific algorithms like AES and SHA256.
Pet_Ant 1 year ago

No, any ISA pretty much should be equally good for any type of workload. If you are doing assembly programming then it makes a difference but if you were doing something in Python or Unity it really isn’t going to matter.
This is more about being free of ARM’s patents and getting a fresh start using the lessons learned

theragra 1 year ago

Reminded me how one famous Russian guy ran Atomic Heart on Elbrus 8S.

Elbrus has native translator, though, and pretty good one, afaik. Atomic Heart was kinda playable, 15-25 fps.

mrweasel 1 year ago

This guy: https://www.youtube.com/watch?v=-0t-5NWk_1o
Beijinger 1 year ago
Elbrus is/was RISC?-V?
- BirAdam 1 year ago
  
  https://www.abortretry.fail/p/the-itanic-saga
- theragra 1 year ago
  
  Nah, it is fully custom VLIW

littlecranky67 1 year ago

Article is a bit short on "the basics" - I assumed they used some kind of wine port to run it. But it seems they implemented the x86_64 ISA on a RISC-V chip in some way - anyone can shed more light on that part how that is done?

anewhnaccount2 1 year ago
The basics are here: https://box86.org/ It is an emulator but:
> Because box86 uses the native versions of some “system” libraries, like libc, libm, SDL, and OpenGL, it’s easy to integrate and use with most applications, and performance can be surprisingly high in some cases.
Wine can also be compiled/run as native.
- ThatPlayer 1 year ago
  
  > Wine can also be compiled/run as native.
  I'm not sure you can run Wine natively to run x86 Windows programs on RISC-V because Wine is not an emulator. There is an ARM port of Wine, but that can only run Windows ARM programs, not x86.
  Instead box64 is running the x86_64 Wine https://github.com/ptitSeb/box64/blob/main/docs/X64WINE.md
  
  4 replies →

brandonpelfrey 1 year ago

Incredible result! This is a tremendous amount of work and does seem like RV is at its limits in some of these cases. The bit gather and scatter instructions should become an extension!

glitchc 1 year ago

Would be useful to see test results on a game that relies more heavily on the graphics core than the CPU. Perhaps Divinity 2?

Manfred 1 year ago

> At least in the context of x86 emulation, among all 3 architectures we support, RISC-V is the least expressive one.

RISC was explained to me as a reduced instruction set computer in computer science history classes, but I see a lot of articles and proposed new RISC-V profiles about "we just need a few more instructions to get feature parity".

I understand that RISC-V is just a convenient alternative to other platforms for most people, but does this also mean the RISC dream is dead?

gary_0 1 year ago
As I've heard it explained, RISC in practise is less about "an absolutely minimalist instruction set" and more about "don't add any assembly programmer conveniences or other such cleverness, rely on compilers instead of frontend silicon when possible".
Although as I recall from reading the RISC-V spec, RISC-V was rather particular about not adding "combo" instructions when common instruction sequences can be fused by the frontend.
My (far from expert) impression of RISC-V's shortcomings versus x86/ARM is more that the specs were written starting with the very basic embedded-chip stuff, and then over time more application-cpu extensions were added. (The base RV32I spec doesn't even include integer multiplication.) Unfortunately they took a long time to get around to finishing the bikeshedding on bit-twiddling and simd/vector extensions, which resulted in the current functionality gaps we're talking about.
So I don't think those gaps are due to RISC fundamentalism; there's no such thing.
- Closi 1 year ago
  
  Put another way, "try to avoid instructions that can't be executed in a single clock cycle, as those introduce silicon complexity".
  
  2 replies →
- Suppafly 1 year ago
  
  >and more about "don't add any assembly programmer conveniences or other such cleverness, rely on compilers instead of frontend silicon when possible"
  What are the advantages of that?
  
  13 replies →
Symmetry 1 year ago

In order to have an instruction set that a student can implement in a single semester class you need to make simplifications like having all instructions have two inputs and one output. That also makes the lives of researchers experimenting one processor design a lot simpler as well. But it does mean that some convenient instructions are off the table for getting to higher performance.
That's not the whole story, a simpler pipeline takes less engineering resources for teams going to a high performance design so they can spend more time optimizing.
RISC is generally a philosophy of simplification but you can take it further or less far. MIPS is almost as simplified as RISC-V but ARM and POWER are more moderate in their simplifications and seem to have no trouble going toe to toe with x86 in high performance arenas.
But remember there are many niches for processors out there besides running applications. Embedded, accelerators, etc. In the specific niche of application cores I'm a bit pessimistic about RISC-V but from a broader view I think it has a lot of potential and will probably come to dominate at least a few commercial niches as well as being a wonderful teaching and research tool.
RiverCrochet 1 year ago

The RISC dream was to simplify CPU design because most software was written using compilers and not direct assembly.
Characteristics of classical RISC:
- Most data manipulation instructions work only with registers.
- Memory instructions are generally load/store to registers only.
- That means you need lots of registers.
- Do your own stack because you have to manually manipulate it to pass parameters anyway. So no CALL/JSR instruction. Implement the stack yourself using some basic instructions that load/store to the instruction pointer register directly.
- Instruction encoding is predictable and each instruction is the same size.
- More than one RISC arch has a register that always reads 0 and can't be written. Used for setting things to 0.
This worked, but then the following made it less important:
- Out-of-order execution - generally the raw instruction stream is a declaration of a path to desired results, but isn't necessarily what the CPU is really doing. Things like speculative execution, branch prediction and register renaming are behind this.
- SIMD - basically a separate wide register space with instructions that work on all values within those wide registers.
So really OOO and SIMD took over.
flanked-evergl 1 year ago
Is there a RISC dream? I think there is an efficiency "dream", there is a performance "dream", there is a cost "dream" — there are even low-complexity relative to cost, performance and efficiency "dreams" — but a RISC dream? Who cares more about RISC than cost, performance, efficiency and simplicity?
- Joker_vD 1 year ago
  
  There was such dream. It was about getting the mind-bogglingly simple CPU, put caches into the now empty place where all the control logic used to be, and clock it up the wazoo, and let the software deal with load/branch delays, efficiently using all 64 registers, etc. That'll beat the hell out of those silly CISC architectures at performance, and at the fraction of the design and production costs!
  This didn't work out, for two main reasons: first, just being able to turn clocks hella high is still not enough to get great performance: you really do want your CPU to be super-scalar, out-of-order, and with great branch predictor, if you need amazing performance. But when you do all that, the simplicity of RISC decoding stops mattering all that much, as Pentium II demonstrated when it equalled DEC Alpha on performance, while still having practically useful things like e.g. byte loads/stores. Yes, it's RISC-like instructions under the hood but that's an implementation detail, no reason to expose it to the user in the ISA, just as you don't have to expose the branch delay slots in your ISA because it's a bad idea to do so: e.g. MIPS II added 1 additional pipeline stage, and now they needed two branch/load delay slots. Whoops! So they added interlocks anyway (MIPS originally stood for "Microprocessor without Interlocked Pipelined Stages", ha-ha) and got rid of the load delays; they still left 1 branch delay slot exposed due to backwards compatibility, and the circuitry required was arguably silly.
  The second reason was that the software (or compilers, to be more precise) can't really deal very well with all that stuff from the first paragraph. That's what sank Itanium. That's why nobody makes CPUs with register windows any more. And static instruction scheduling in the compilers still can't beat dynamic instruction reordering.
  
  11 replies →
- impossiblefork 1 year ago
  
  But we define the RISC dream as a dream that efficiency, performance and low-cost could be achieved by cores with very small instruction sets?
  
  7 replies →
WhyNotHugo 1 year ago

In this particular context, they're trying to run code compiled for x86_64 on RISCV5. The need from "we just need a few more instructions to get feature parity" comes from trying to run code that is already compiled for an architecture with all those extra instructions.
In theory, if you compiled the original _source_ code for RISC, you'd get an entirely binary and wouldn't need those specific instructions.
In practice, I doubt anyone is going to actually compile these games for RISCV5.
ahartmetz 1 year ago

The explanation that I've seen is that it's "(reduced instruction) set computer" - simple instructions, not necessarily few.
wang_li 1 year ago

Beyond the most trivial of microcontrollers and experimental designs there are no RISC chips under the original understanding of RISC. The justification for RISC evaporated when we became able to put 1 million, 100 million, and so on, transistors on a chip. Now all the chips called "RISC" include vector, media, encryption, network, FPUs, and etc. instructions. Someone might want to argue that some elements of RISC designs (orthogonal instruction encoding, numerous registers, etc.) make a particular chip a RISC chip. But they really aren't instances of the literal concept of RISC.
To me, the whole RISC-V interest is all just marketing. As an end user I don't make my own chips and I can't think of any particular reason I should care whether a machine has RISC-V, ARM, x86, SPARC, or POWER. In the end my cost will be based on market scale and performance. The licensing cost of the design will not be passed on to me as a customer.

int0x29 1 year ago

That screenshot shows 31 gb of ram which is distinctly more than the mentioned dev board at max specs. Are they using something else here?

snvzz 1 year ago

Pioneer, an older board.
Note that, today, one of the recent options with several, faster cores implementing RVA22 and RVV 1.0 is the better idea.
pengaru 1 year ago

https://milkv.io/pioneer
ptitSeb 1 year ago

The milk-v pioneer comes with 128GB of RAM.

mrlonglong 1 year ago

Is this the 86Box? I found it fun reliving the time I got my Amstrad PC1512, I added two hard cards of 500MB and a 128k memory expansion to 640KB which made things a lot more fun. Back then I only had two 360KB floppies and added a 32MB hard card a few years later. I had Borland TurboPascal and Zortech C too. Fun times.

ptitSeb 1 year ago
No, it's Box64, a completly different project.
(But I do remember the time I had an Amstrad PC1512 too :D )
- mrlonglong 1 year ago
  
  It will be interesting to try out Box64 as soon as I get my hands on some suitable RISCV hardware. I have played with RISCV microcontrollers they're quite nice to work with.

bee_rider 1 year ago

I wonder if systems will ship at some point that are a handful of big RISC-V CPUs, and then a “GPU” implemented as a bunch of little RISC-V CPUs (with the appropriate vector stuff—actually, side-question, can classic vectors, instead of packed SIMD, be useful in a GPU?)

lyu07282 1 year ago

Another technically impressive Witcher 3 feat was the Switch port, it ran really well. Goes to show how much can be done with optimization and how much resources are wasted on the PC purely by bad optimization.

laserbeam 1 year ago

And with using much lower quality textures and 3D models, therefore using much less RAM for assets. It's not an apples to apples comparison and you can't really make claims about bad optimization on PCs when the scope of what's shown on screen is vastly different.
zamadatix 1 year ago

You too can run Witcher 3 equally on a minimal PC if you're willing to set the render resolution to 720p (540p undocked), settings to below minimum, and call ~30 FPS well.

justahuman74 1 year ago

I hope they're able to get this ISA-level feedback to people at RVI

camel-cdr 1 year ago
The scalar efficiency SIG has already been discussing bitfield insert and extract instructions.
We figured out yesterday [1], that the example in the article can already be done in four risc-v instructions, it's just a bit trickier to come up with it:
# a0 = rax, a1 = rbx slli t0, a1, 64-8 rori a0, a0, 16 add a0, a0, t0 rori a0, a0, 64-16
[1] https://www.reddit.com/r/RISCV/comments/1f1mnxf/box64_and_ri...
- bonzini 1 year ago
  
  Nice trick, in fact with 4 instructions it's as efficient as extract/insert and it works for all ADD/SUB/OR/XOR/CMP instructions (not for AND), except if the source is a high-byte register. However it's not really a problem if code generation is not great in this case: compilers in practice will not generate accesses to these registers, and while old 16-bit assembly code has lots of such accesses it's designed to run on processors that ran at 4-20 MHz.
  Flag computation and conditional jumps is where the big optimization opportunities lie. Box64 uses a multi-pass decoder that computes liveness information for flags and then computes flags one by one. QEMU instead tries to store the original operands and computes flags lazily. Both approaches have advantages and disadvantages...
  
  2 replies →
- ksco 1 year ago
  
  Author here, we have adopted this approach as a fast path to box64: https://github.com/ptitSeb/box64/pull/1763, thank you very much!
dmitrygr 1 year ago
None of this is new. None of it.
In fact, bitfield extract is such an obvious oversight that it is my favourite example of how idiotic the RISCV ISA is (#2 is lack of sane addressing modes).
Some of the better RISCV designs, in fact, implement a custom instr to do this, eg: BEXTM in Hazard3: https://github.com/Wren6991/Hazard3/blob/stable/doc/hazard3....
- renox 1 year ago
  
  Whoa, someone else who doesn't believe that the RISC-V ISA is 'perfect'! I'm curious: how the discussions on the bitfield extract have been going? Because it does really seem like an obvious oversight and something to add as a 'standard extension'.
  What's your take on
  1) unaligned 32bit instructions with the C extension?
  2) lack of 'trap on overflow' for arithmetic instructions? MIPS had it..
  
  20 replies →
- Findecanor 1 year ago
  
  Bitfield-extract is being discussed for a future extension. E.g. Qualcomm is pressing for it to be added.
  In the meantime, it can be done as two shifts: left to the MSB, and then right filling with zero or sign bits. There is at least one core in development (SpaceMiT X100) that is supposed to be able to fuse those two into a single µop, maybe some that already do.
  However, I've also seen that one core (XianShan Nanhu) is fusing pairs of RVI instructions into one in the B extension, to be able to run old binaries compiled for CPUs without B faster. Throwing hardware at the problem to avoid a recompile ... feels a bit backwards to me.

nikitau 1 year ago

I'm not very familiar with the ecosystem, but I have used this on an RPi4 to run some games through wine.

I'm wondering, how's the landscape nowadays. Is this the leading project for x86 compatibility on ARM? With the rising popularity of the architecture for consumer platforms, I'd guess companies like Valve would be interested in investing in these sort of translation layers.

Beijinger 1 year ago

Previously: https://news.ycombinator.com/item?id=19118642

And:

Milk-V Pioneer A 64-core, RISC-V motherboard and workstation for native development

https://www.crowdsupply.com/milk-v/milk-v-pioneer

sylware 1 year ago

lol, I am going the other way around.

Since RISC-V ISA is worldwide royalty free and more than nice, I am writting basic rv64 assembly which I do interpret on x86_64 hardware with a linux kernel.

I did not push the envelop up to have a "compiler", because it is indeed while waiting for hardcore performant desktop, aka large, rv64 hardware implementations.

anthk 1 year ago

I used to use GL4ES on the PocketCHIP. And I daily use it on a netbook to get more performance on some GL 2.1 games.

Thaxll 1 year ago

Box86 is so good, I run x86-64 steam games ( servers ) on free Oracle instance ( ARM64 ) with it.

high_na_euv 1 year ago

Great game choice!

victor_cl 1 year ago

I remember learning RISC-V in Berkeley CS61C. Anyone from Berkeley？

jychang 1 year ago
There's nobody from Berkeley on HN
- victor_cl 1 year ago
  
  oh really, didn't know that. Me neither. That course was open-sourced.

stuckinhell 1 year ago

wow very impressive

sdwrj 1 year ago

box64 is getting too advanced lol

nolist_policy 1 year ago

> The x86 instruction set is very very big. According to rough statistics, the ARM64 backend implements more than 1,600 x86 instructions in total, while the RV64 backend implements about 1,000 instructions

This is just insane and gets us full-circle to why we want RISC-V.

aithrowaway1987 1 year ago
I think the 1600 number is a coarse metric for this sort of thing. Keep in mind that these instructions are limited in the number of formal parameters they can take: e.g. 16 nominally distinct instructions can be more readily understood/memorized as one instruction with an implicit 4-bit flag. Obviously there's a ton of legacy cruft in Intel ISAs, along with questionable decisions, and I'm not trying to take away from the appeals of RISC (e.g. there are lots of outstanding compiler bugs around these "pseudoparamaterized" instructions). But it's easy to look at "1600" and think "ridiculous bloat," when in reality it's somewhat coherent and systematic - and more to the point, clearly necessary for highly performance-sensitive work.
- panick21_ 1 year ago
  
  > clearly necessary for highly performance-sensitive work
  Its clearly necessary to have comparability back to the 80s. Its clearly necessary to have 10 different generation of SIMD. Its clearly necessary to have multiple different floating point systems.
  
  1 reply →
eternauta3k 1 year ago
If an insane instruction set gives us higher performance and makes CPU and compiler design more complex, this might be an acceptable trade-off.
- panick21_ 1 year ago
  
  But it doesn't.
  Its simply about the amount of investment. x86 had 50 years of gigantic amounts of sustained investment. Intel outsold all the RISC vendors combined by like 100 to 1 because they owned the PC business.
  When Apple started seriously investing in ARM. They were able to match of beat x86 laptops.
  The same will be true for RISC-V.
ben-schaaf 1 year ago

ARM64 has approximately 1300 instructions.
h_tbob 1 year ago

I want somebody to make a GPT fine tune that specializes in converting instructions and writing tests. If you made it read all x86 docs a bunch and risc v docs, a lot of this could be automated.
patmorgan23 1 year ago
Not really. RISC-V's benefits are not the "Reduced Instruction Set" part, it's the open ISA part. A small instruction set as actually has several disadvantages. It means you binary bigger because what was a single operation in x86 is now several in RISC-V, meaning more memory bandwidth and cache is taken up by instructions instead of data.
Modern CPUs are actually really good at deciding operations into micro-ops. And the flexibility of being able to implement a complex operation in microcode, or silicon is essential for CPU designers.
Is there a bunch of legacy crap in x86? Yeah. Does getting rid of dramatically increase the performance ceiling? Probably not.
The real benefit of RISC-V is anybody can use it. It's democratizing the ISA. No one has to pay a license to use it, they can just build their CPU design and go.
- zozbot234 1 year ago
  
  > Modern CPUs are actually really good at deciding operations into micro-ops.
  The largest out-of-order CPUs are actually quite reliant on having high-performance decode that can be performed in parallel using multiple hardware units. Starting from a simplified instruction set with less legacy baggage can be an advantage in this context. RISC-V is also pretty unique among 64-bit RISC ISA's wrt. including compressed instructions support, which gives it code density comparable to x86 at a vastly improved simplicity of decode (For example, it only needs to read a few bits to determine which insns are 16-bit vs. 32-bit length).
- panick21_ 1 year ago
  
  > means you binary bigger .... meaning more memory bandwidth and cache
  Except this isn't actually true.
  > Does getting rid of dramatically increase the performance ceiling? Probably not.
  No but it dramatically DECREASES the amount of investment necessary to reach that ceiling.
  Assume you have 2 teams, each get the same amount of money. Then ask them to make the highest performing spec compatible chip. What team is gone win 99% of the time?
  > And the flexibility of being able to implement a complex operation in microcode, or silicon is essential for CPU designers.
  You can add microcode to a RISC-V chip if you want, most people just don't want to.
  > The real benefit of RISC-V is anybody can use it.
  That is true, but its also just a much better instruction set then x86 -_-
- snvzz 1 year ago
  
  >It means you binary bigger
  False premise, as size tool shows RVA20(RV64GC) binaries were already smallest among 64bit architectures.
  Code gets smaller still (rather than larger) with newer extensions such as B in RVA22.
  As of recently, the same is true in 32bit when comparing rv32 against former best (thumb2). But it was quite close before to begin with.

Havoc 1 year ago

>15 fps in-game

Wow...that's substantially more than I would have guessed. Good times ahead for hardware

KingOfCoders 1 year ago

"which allows games like Stardew Valley to run, but it is not enough for other more serious Linux games"

Hey! ;-)

ChoHag 1 year ago

[dead]

krishnamegh801 1 year ago

[dead]