-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eliminate high-latency IDIV CPU instruction #9
base: master
Are you sure you want to change the base?
Conversation
I don't think this will work, |
You're right. Can we hard-code tick_hz as 161132800? Are there much variations in it?
|
It does vary, e.g.: |
If we just try to do it in one mul + shr for entire cycles counter we'd effectively multiply it by 6.25 instead of true ratio of ~6.206061087. Which would obviously cause huge loss of precision. The mult selected would be 50 and shift would be only 3. This is because we need to set maxsec to pretty much Unix Epoch. We can convert seconds separately (see to_nanos) and then fraction via mult and shift (which would be selected as 29) but it'd also cause div instruction and pretty much take us to square one. Actually would be super cool to have something like seconds since boot in the cycles register instead of full Epoch. As far as I can tell the best method could be to have all the different TICK_HZ versions as separate functions where the compiler could figure better integer math for us and do a switch. Or we'd need some more sophisticated way to have mult, shr, etc i.e. pretty much do that GCC does once it knows the constant divisor. Makes sense? |
If the divisors are fixed, or at least computed at the start of the process, we could calculate the constants for the Granlund-Montgomery division algorithm, which is just a couple cycles. |
Simplifies disassembly as follows. The IDIV is 56 uops