using the core/benches cpu_bench #142
Replies: 3 comments
-
Hi! Would be happy to accept PR's that speed up the CPU, as long as we don't sacrifice accuracy. I think a lot of what is currently slowing it down is an absolutely ridiculous amount of logging, with several logging calls per cycle. Even if they are conditional, those checks add up. As you have seen you need to specify the crate which you want to run the benches in, which you should be able to do with -p marty_core to specify the core crate. To add V20 and multiple CPU support, a CPU is now a trait object of the Cpu trait, using enum_dispatch to avoid what would be very costly vtable lookups. The Cpu trait is defined here the CPU benchmark was never updated after changing the implementation, so it's trying to make a Cpu the old way and that just doesn't work anymore. it's way to easy to let benchmarks rot because there's nothing reminding me to run them every build. should probably put that in a CI step somewhere. want to take a shot at updating the benches? we should probably independently bench the 8088 and the V20. |
Beta Was this translation helpful? Give feedback.
-
There's kind of a crossroads of needs here I need to think about a good design for this stuff. There are two basic types of user I expect MartyPC to have:
Supporting the latter hurts the former; if you just want to run stuff you probably just a core with no logging whatsoever, for maximum performance. The most straightforward way to do this is to gate logging behind a feature so the various macros compile to nothing. We could then produce different releases for developers and end-users. I'm not sure I really like bifurcating releases like that though. The other option is an entirely different CPU core for speed vs debugging. This is now possible due to the Cpu trait, but making two copies of each Cpu seems pretty wasteful There is an idea I've been kicking around which I was going to use to implement the 8086 / V30. That is to compose a Cpu from generics. I was going to do this to let me basically make the 8088 and 8086 share code other than a specific BIU implementation to emulate the different bus widths. (Their execution units are the same). Maybe a similar technique could swap out log-heavy components with faster ones? Dunno. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the thoughts and pointers. My thought process has been to work toward my goals and try to keep an eye on any places where I can contribute back. Starting with getting the core benches up to date seems like a good place to start, thanks again for the pointers there. |
Beta Was this translation helpful? Give feedback.
-
I am hoping to work on optimizations to the 8088 core and saw the benchmarking support in core/benches, but I haven't understood how to launch these.
cargo bench --benches from the top level directory gives me:
Compiling martypc v0.3.0 (/home/robr/mpc32) Finished
benchprofile [optimized] target(s) in 1m 16s warning: the following packages contain code that will be rejected by a future version of Rust: traitobject v0.1.0, typemap v0.3.3 note: to see what the problems were, use the option
--future-incompat-report, or run
cargo report future-incompatibilities --id 1`Running unittests frontends/martypc_desktop_wgpu/src/main.rs (target/release/deps/martypc-84b882304314d5fe)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
if I run cargo bench --benches in core directory I get many errors similar to:
error[E0782]: expected a type, found a trait --> core/src/cpu_808x/bitwise.rs:462:31 | 462 | let (result, carry) = Cpu::shr_u8_with_carry(0x80, 7); | ^^^ | help: you can add the
dynkeyword if you want a trait object | 462 | let (result, carry) = <dyn Cpu>::shr_u8_with_carry(0x80, 7); | ++++ +
And the build fails and runs no benchmarks.
Any pointers?
Thanks,
-rob
Beta Was this translation helpful? Give feedback.
All reactions