Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alg: extremely slow for 3000+ bit exponents #60

Closed
mmcloughlin opened this issue Feb 24, 2021 · 10 comments
Closed

alg: extremely slow for 3000+ bit exponents #60

mmcloughlin opened this issue Feb 24, 2021 · 10 comments
Labels
perf Performance issues and optimization

Comments

@mmcloughlin
Copy link
Owner

mmcloughlin commented Feb 24, 2021

addchain search '2^3072 - 1103717' takes an extremely long time, and in fact appears to get stuck.

See https://gophers.slack.com/archives/C6WDZJ70S/p1614191115020200 for more context.

@mmcloughlin mmcloughlin changed the title alg: doesn alg: extremely slow for 3000+ bit exponents Feb 24, 2021
@mmcloughlin mmcloughlin added the perf Performance issues and optimization label Feb 24, 2021
mmcloughlin added a commit that referenced this issue Apr 26, 2021
@mmcloughlin
Copy link
Owner Author

Generate a CPU profile following #88.

$ go run ./cmd/addchain/ search -v -cpuprofile cpu.out '2^3072 - 1103717'
addchain: expr: "2^3072 - 1103717"
addchain: hex: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffef289b
addchain: dec: 5809605995369958062859502533304574370686975176362895236661486152287203730997110225737336044533118407251326157754980517443990529594540047121662885672187032401032111639706440498844049850989051627200244765807041812394729680540024104827976584369381522292361208779044769892743225751738076979568811309579125511333093243519553784816306381580161860200247492568448150242515304449577187604136428738580990172551573934146255830366405915000869643732053218566832545291107903722831634138599586406690325959725187447169059540805012310209639011750748760017095360734234945757416272994856013308616958529958304677637019181594088528345061285863898271763457294883546638879554311615446446330199254382340016292057090751175533888161918987295591531536698701292267685465517437915790823154844634780260102891718032495396075041899485513811126977307478969074857043710716150121315922024556759241239013152919710956468406379442914941614357107914462567328589979
addchain: start: opt(dictionary(sliding_window(4),continued_fractions(co_binary)))
addchain: start: opt(dictionary(sliding_window(8),heuristic(use_first(halving,approximation))))
addchain: start: opt(dictionary(sliding_window(8),heuristic(use_first(halving,delta_largest))))
addchain: start: opt(dictionary(sliding_window(4),heuristic(use_first(halving,approximation))))
addchain: start: opt(dictionary(sliding_window(4),continued_fractions(dichotomic)))
addchain: start: opt(dictionary(sliding_window(4),continued_fractions(binary)))
addchain: start: opt(dictionary(sliding_window(8),continued_fractions(binary)))
addchain: start: opt(dictionary(sliding_window(4),heuristic(use_first(halving,delta_largest))))
^Caddchain: caught interrupt: stopping cpu profile

It seems that all the time is in addchain.Chain.Ops():

Screen Shot 2021-04-25 at 10 31 00 PM

@mmcloughlin
Copy link
Owner Author

mmcloughlin commented Apr 26, 2021

Yeah, this is bad:

addchain/chain.go

Lines 47 to 61 in 9396c1b

// Ops returns all operations that produce the kth position. This could be empty
// for an invalid chain.
func (c Chain) Ops(k int) []Op {
ops := []Op{}
s := new(big.Int)
for i := 0; i < k; i++ {
for j := i; j < k; j++ {
s.Add(c[i], c[j])
if s.Cmp(c[k]) == 0 {
ops = append(ops, Op{i, j})
}
}
}
return ops
}

The addition chains for these exponents are 3000+ long, so a quadratic algorithm is slow, and cubic is extremely slow:

addchain/alg/opt/opt.go

Lines 38 to 42 in 9396c1b

// Build program for c with all possible options at each step.
ops := make([][]addchain.Op, len(c))
for k := 1; k < len(c); k++ {
ops[k] = c.Ops(k)
}

@mmcloughlin
Copy link
Owner Author

The Chain.Ops() method is performing the 2-SUM algorithm. Therefore in the canonical case where the chain is ascending (sorted and unique) this can be done in linear time, giving a substantial speedup. See #90.

@mmcloughlin
Copy link
Owner Author

The next largest consumer is now heuristic.useFirst.Suggest():

Screen Shot 2021-04-27 at 12 21 25 AM

@mmcloughlin
Copy link
Owner Author

A few things help with the optimization of heuristic.Approximation.Suggest():

  • 48b62ed use a set for contains check
  • 0db08b1 preallocating temporary integer
  • 75a34ce reducing allocations in integer set type

These optimizations have helped massively with general performance, but it still takes forever for the 3072-bit exponent. Although it does terminate now, it appears to get stuck.

Can heuristic.Approximation.Suggest() be switched to a linear algorithm (like 2-SUM) without sacrificing results?

@mmcloughlin
Copy link
Owner Author

Bryan Mills offered a crafty idea for optimizing big.Int map with a yoloInt:

https://gophers.slack.com/archives/C0VP8EF3R/p1619665998077200

@mmcloughlin
Copy link
Owner Author

More experiments:

  • 5745726 use binary search for contains (rather than set type)
  • dae1cef use value rather than pointer in bigvector

@mmcloughlin
Copy link
Owner Author

Other useful optimizations:

1497ee4 bigvector: reduce basis allocations by using interface
63c24ba heuristic: linear algorithm for approximate

mmcloughlin added a commit that referenced this issue May 12, 2021
mmcloughlin added a commit that referenced this issue May 12, 2021
Implement comprehensive profiling options in the search subcommand, using the https://github.com/mmcloughlin/profile module. Profiles are configured with the ADDCHAIN_PROFILE environment variable. Removes the `-cpuprofile` option.

Updates #60
Updates #25
mmcloughlin added a commit that referenced this issue May 12, 2021
Use linear 2SUM-like algorithm in the typical case where the chain is
ascending.

Updates #60
Updates #25
mmcloughlin added a commit that referenced this issue May 13, 2021
Use golden test to ensure that all algorithm results remain unchanged. Intended to provide confidence that algorithm outputs have not changed when making risky or delicate changes.

Updates #60
Updates #25
mmcloughlin added a commit that referenced this issue May 13, 2021
The Approximation.Suggest() function is now the largest consumer in CPU profiles. Like #99 it is currently "accidentally cubic" but can be optimized to use a linear outer loop, similar to the 2-SUM problem.

Updates #60
Updates #25
mmcloughlin added a commit that referenced this issue May 13, 2021
Optimize Approximation.Suggest() by replacing linear bigint.Contains() call with a
new bigints.ContainsSorted() function that uses binary search.

Updates #60
Updates #25
mmcloughlin added a commit that referenced this issue May 13, 2021
)

By far the largest number of allocations comes from the creation of bigvector
basis vectors. These just contain zeros and ones. This PR changes the Vector
type to an interface so that the basis type can be implemented without any
allocations at all.

Unfortunately, this does complicate the interface, since it requires the user
to not write to any integers returned from Idx(). Consider this okay since
it's an internal package.

Updates #60
Updates #25
mmcloughlin added a commit that referenced this issue May 13, 2021
The Approximation.Suggest() method is prominent in the allocation profile.
This PR reduces allocations by hoisting them out of the loop.

Updates #60
Updates #25
@mmcloughlin
Copy link
Owner Author

Various changes led to a massive speedup.

$ time ./addchain search '2^3072 - 1103717'
addchain: expr: "2^3072 - 1103717"
addchain: hex: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffef289b
addchain: dec: 5809605995369958062859502533304574370686975176362895236661486152287203730997110225737336044533118407251326157754980517443990529594540047121662885672187032401032111639706440498844049850989051627200244765807041812394729680540024104827976584369381522292361208779044769892743225751738076979568811309579125511333093243519553784816306381580161860200247492568448150242515304449577187604136428738580990172551573934146255830366405915000869643732053218566832545291107903722831634138599586406690325959725187447169059540805012310209639011750748760017095360734234945757416272994856013308616958529958304677637019181594088528345061285863898271763457294883546638879554311615446446330199254382340016292057090751175533888161918987295591531536698701292267685465517437915790823154844634780260102891718032495396075041899485513811126977307478969074857043710716150121315922024556759241239013152919710956468406379442914941614357107914462567328589979
addchain: best: opt(runs(heuristic(use_first(halving,approximation))))
_10    = 2*1
_11    = 1 + _10
_1100  = _11 << 2
_1111  = _11 + _1100
_11110 = 2*_1111
_11111 = 1 + _11110
x10    = _11111 << 5 + _11111
i14    = 2*x10
x20    = i14 << 9 + x10
x40    = x20 << 20 + x20
x80    = x40 << 40 + x40
x160   = x80 << 80 + x80
i178   = x160 << 11
x320   = i178 << 149 + x160
i499   = x320 << 171
x640   = i499 << 149 + x320
x1280  = x640 << 640 + x640
x2560  = x1280 << 1280 + x1280
x3051  = x2560 << 491 + i499 + i178 + i14 + 1
i3078  = ((x3051 << 5 + _1111) << 3 + 1) << 2
i3089  = ((1 + i3078) << 4 + 1) << 4 + _11
return   i3089 << 3 + _11

real	0m24.625s
user	3m6.041s
sys	0m1.017s

@mmcloughlin
Copy link
Owner Author

The changes that made the difference:

23389d5 optimize Chain.Ops() (#99)
653b8b2 alg/heuristic: 2-SUM-like loop in Approximation (#103)
5bb4870 alg/heuristic: optimize Approximation with sorted contains check (#104)
05190c7 internal/bigvector: use interface to allocations for basis vectors (#106)
035ef76 alg/heuristic: preallocate in Approximation (#107)

Supporting code:

10bb4ec alg/ensemble: benchmark (#97)
db3ac33 cmd/addchain: comprehensive profiling options (#98)
5e94a59 alg/ensemble: ensure results unchanged (#101)

Updates #25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf Performance issues and optimization
Projects
None yet
Development

No branches or pull requests

1 participant