alg: extremely slow for 3000+ bit exponents #60

mmcloughlin · 2021-02-24T21:21:46Z

addchain search '2^3072 - 1103717' takes an extremely long time, and in fact appears to get stuck.

See https://gophers.slack.com/archives/C6WDZJ70S/p1614191115020200 for more context.

The text was updated successfully, but these errors were encountered:

Update #25 Update #60

mmcloughlin · 2021-04-26T05:31:58Z

Generate a CPU profile following #88.

$ go run ./cmd/addchain/ search -v -cpuprofile cpu.out '2^3072 - 1103717'
addchain: expr: "2^3072 - 1103717"
addchain: hex: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffef289b
addchain: dec: 5809605995369958062859502533304574370686975176362895236661486152287203730997110225737336044533118407251326157754980517443990529594540047121662885672187032401032111639706440498844049850989051627200244765807041812394729680540024104827976584369381522292361208779044769892743225751738076979568811309579125511333093243519553784816306381580161860200247492568448150242515304449577187604136428738580990172551573934146255830366405915000869643732053218566832545291107903722831634138599586406690325959725187447169059540805012310209639011750748760017095360734234945757416272994856013308616958529958304677637019181594088528345061285863898271763457294883546638879554311615446446330199254382340016292057090751175533888161918987295591531536698701292267685465517437915790823154844634780260102891718032495396075041899485513811126977307478969074857043710716150121315922024556759241239013152919710956468406379442914941614357107914462567328589979
addchain: start: opt(dictionary(sliding_window(4),continued_fractions(co_binary)))
addchain: start: opt(dictionary(sliding_window(8),heuristic(use_first(halving,approximation))))
addchain: start: opt(dictionary(sliding_window(8),heuristic(use_first(halving,delta_largest))))
addchain: start: opt(dictionary(sliding_window(4),heuristic(use_first(halving,approximation))))
addchain: start: opt(dictionary(sliding_window(4),continued_fractions(dichotomic)))
addchain: start: opt(dictionary(sliding_window(4),continued_fractions(binary)))
addchain: start: opt(dictionary(sliding_window(8),continued_fractions(binary)))
addchain: start: opt(dictionary(sliding_window(4),heuristic(use_first(halving,delta_largest))))
^Caddchain: caught interrupt: stopping cpu profile

It seems that all the time is in addchain.Chain.Ops():

mmcloughlin · 2021-04-26T05:52:49Z

Yeah, this is bad:

addchain/chain.go

Lines 47 to 61 in 9396c1b

    
           // Ops returns all operations that produce the kth position. This could be empty 
        
           // for an invalid chain. 
        
           func (c Chain) Ops(k int) []Op { 
        
           	ops := []Op{} 
        
           	s := new(big.Int) 
        
           	for i := 0; i < k; i++ { 
        
           		for j := i; j < k; j++ { 
        
           			s.Add(c[i], c[j]) 
        
           			if s.Cmp(c[k]) == 0 { 
        
           				ops = append(ops, Op{i, j}) 
        
           			} 
        
           		} 
        
           	} 
        
           	return ops 
        
           }

The addition chains for these exponents are 3000+ long, so a quadratic algorithm is slow, and cubic is extremely slow:

addchain/alg/opt/opt.go

Lines 38 to 42 in 9396c1b

    
           // Build program for c with all possible options at each step. 
        
           ops := make([][]addchain.Op, len(c)) 
        
           for k := 1; k < len(c); k++ { 
        
           	ops[k] = c.Ops(k) 
        
           }

mmcloughlin · 2021-04-27T07:20:35Z

The Chain.Ops() method is performing the 2-SUM algorithm. Therefore in the canonical case where the chain is ascending (sorted and unique) this can be done in linear time, giving a substantial speedup. See #90.

mmcloughlin · 2021-04-27T07:25:48Z

The next largest consumer is now heuristic.useFirst.Suggest():

mmcloughlin · 2021-04-28T06:37:12Z

A few things help with the optimization of heuristic.Approximation.Suggest():

48b62ed use a set for contains check
0db08b1 preallocating temporary integer
75a34ce reducing allocations in integer set type

These optimizations have helped massively with general performance, but it still takes forever for the 3072-bit exponent. Although it does terminate now, it appears to get stuck.

Can heuristic.Approximation.Suggest() be switched to a linear algorithm (like 2-SUM) without sacrificing results?

mmcloughlin · 2021-04-29T03:27:47Z

Bryan Mills offered a crafty idea for optimizing big.Int map with a yoloInt:

https://gophers.slack.com/archives/C0VP8EF3R/p1619665998077200

mmcloughlin · 2021-04-29T05:10:40Z

More experiments:

5745726 use binary search for contains (rather than set type)
dae1cef use value rather than pointer in bigvector

mmcloughlin · 2021-05-12T03:17:02Z

Other useful optimizations:

1497ee4 bigvector: reduce basis allocations by using interface
63c24ba heuristic: linear algorithm for approximate

Updates #60 Updates #25

Implement comprehensive profiling options in the search subcommand, using the https://github.com/mmcloughlin/profile module. Profiles are configured with the ADDCHAIN_PROFILE environment variable. Removes the `-cpuprofile` option. Updates #60 Updates #25

Use linear 2SUM-like algorithm in the typical case where the chain is ascending. Updates #60 Updates #25

Use golden test to ensure that all algorithm results remain unchanged. Intended to provide confidence that algorithm outputs have not changed when making risky or delicate changes. Updates #60 Updates #25

The Approximation.Suggest() function is now the largest consumer in CPU profiles. Like #99 it is currently "accidentally cubic" but can be optimized to use a linear outer loop, similar to the 2-SUM problem. Updates #60 Updates #25

Optimize Approximation.Suggest() by replacing linear bigint.Contains() call with a new bigints.ContainsSorted() function that uses binary search. Updates #60 Updates #25

) By far the largest number of allocations comes from the creation of bigvector basis vectors. These just contain zeros and ones. This PR changes the Vector type to an interface so that the basis type can be implemented without any allocations at all. Unfortunately, this does complicate the interface, since it requires the user to not write to any integers returned from Idx(). Consider this okay since it's an internal package. Updates #60 Updates #25

The Approximation.Suggest() method is prominent in the allocation profile. This PR reduces allocations by hoisting them out of the loop. Updates #60 Updates #25

mmcloughlin · 2021-05-13T08:04:43Z

Various changes led to a massive speedup.

$ time ./addchain search '2^3072 - 1103717'
addchain: expr: "2^3072 - 1103717"
addchain: hex: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffef289b
addchain: dec: 5809605995369958062859502533304574370686975176362895236661486152287203730997110225737336044533118407251326157754980517443990529594540047121662885672187032401032111639706440498844049850989051627200244765807041812394729680540024104827976584369381522292361208779044769892743225751738076979568811309579125511333093243519553784816306381580161860200247492568448150242515304449577187604136428738580990172551573934146255830366405915000869643732053218566832545291107903722831634138599586406690325959725187447169059540805012310209639011750748760017095360734234945757416272994856013308616958529958304677637019181594088528345061285863898271763457294883546638879554311615446446330199254382340016292057090751175533888161918987295591531536698701292267685465517437915790823154844634780260102891718032495396075041899485513811126977307478969074857043710716150121315922024556759241239013152919710956468406379442914941614357107914462567328589979
addchain: best: opt(runs(heuristic(use_first(halving,approximation))))
_10    = 2*1
_11    = 1 + _10
_1100  = _11 << 2
_1111  = _11 + _1100
_11110 = 2*_1111
_11111 = 1 + _11110
x10    = _11111 << 5 + _11111
i14    = 2*x10
x20    = i14 << 9 + x10
x40    = x20 << 20 + x20
x80    = x40 << 40 + x40
x160   = x80 << 80 + x80
i178   = x160 << 11
x320   = i178 << 149 + x160
i499   = x320 << 171
x640   = i499 << 149 + x320
x1280  = x640 << 640 + x640
x2560  = x1280 << 1280 + x1280
x3051  = x2560 << 491 + i499 + i178 + i14 + 1
i3078  = ((x3051 << 5 + _1111) << 3 + 1) << 2
i3089  = ((1 + i3078) << 4 + 1) << 4 + _11
return   i3089 << 3 + _11

real	0m24.625s
user	3m6.041s
sys	0m1.017s

mmcloughlin · 2021-05-13T08:07:20Z

The changes that made the difference:

23389d5 optimize Chain.Ops() (#99)
653b8b2 alg/heuristic: 2-SUM-like loop in Approximation (#103)
5bb4870 alg/heuristic: optimize Approximation with sorted contains check (#104)
05190c7 internal/bigvector: use interface to allocations for basis vectors (#106)
035ef76 alg/heuristic: preallocate in Approximation (#107)

Supporting code:

10bb4ec alg/ensemble: benchmark (#97)
db3ac33 cmd/addchain: comprehensive profiling options (#98)
5e94a59 alg/ensemble: ensure results unchanged (#101)

Updates #25

mmcloughlin changed the title ~~alg: doesn~~ alg: extremely slow for 3000+ bit exponents Feb 24, 2021

mmcloughlin added the perf Performance issues and optimization label Feb 24, 2021

mmcloughlin added a commit that referenced this issue Apr 26, 2021

cmd/addchain: cpu profile option (#88)

80391ed

Update #25 Update #60

mmcloughlin mentioned this issue Apr 27, 2021

alg: optimization experiments #90

Closed

mmcloughlin added a commit that referenced this issue May 12, 2021

alg/ensemble: benchmark (#97)

10bb4ec

Updates #60 Updates #25

mmcloughlin added a commit that referenced this issue May 12, 2021

optimize Chain.Ops() (#99)

23389d5

Use linear 2SUM-like algorithm in the typical case where the chain is ascending. Updates #60 Updates #25

mmcloughlin mentioned this issue May 13, 2021

alg/ensemble,internal/results: dedupe expensive tests #102

Open

mmcloughlin added a commit that referenced this issue May 13, 2021

alg/heuristic: preallocate in Approximation (#107)

035ef76

The Approximation.Suggest() method is prominent in the allocation profile. This PR reduces allocations by hoisting them out of the loop. Updates #60 Updates #25

mmcloughlin closed this as completed May 13, 2021

mmcloughlin mentioned this issue May 13, 2021

alg: profile and optimize algorithms #25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alg: extremely slow for 3000+ bit exponents #60

alg: extremely slow for 3000+ bit exponents #60

mmcloughlin commented Feb 24, 2021 •

edited

Loading

mmcloughlin commented Apr 26, 2021

mmcloughlin commented Apr 26, 2021 •

edited

Loading

mmcloughlin commented Apr 27, 2021

mmcloughlin commented Apr 27, 2021

mmcloughlin commented Apr 28, 2021

mmcloughlin commented Apr 29, 2021

mmcloughlin commented Apr 29, 2021

mmcloughlin commented May 12, 2021

mmcloughlin commented May 13, 2021

mmcloughlin commented May 13, 2021

alg: extremely slow for 3000+ bit exponents #60

alg: extremely slow for 3000+ bit exponents #60

Comments

mmcloughlin commented Feb 24, 2021 • edited Loading

mmcloughlin commented Apr 26, 2021

mmcloughlin commented Apr 26, 2021 • edited Loading

mmcloughlin commented Apr 27, 2021

mmcloughlin commented Apr 27, 2021

mmcloughlin commented Apr 28, 2021

mmcloughlin commented Apr 29, 2021

mmcloughlin commented Apr 29, 2021

mmcloughlin commented May 12, 2021

mmcloughlin commented May 13, 2021

mmcloughlin commented May 13, 2021

mmcloughlin commented Feb 24, 2021 •

edited

Loading

mmcloughlin commented Apr 26, 2021 •

edited

Loading