-
Notifications
You must be signed in to change notification settings - Fork 480
Description
The issue was originally filed as a convergence failure of SYEVD in Jax. It can be traced to a convergence failure of DLAED4. Here is a minimal standalone reproducer.
// main.cpp
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <vector>
#include <iostream>
#include <iomanip>
extern "C" {
void dlaed4_(const int* N, const int* I,
const double* D, const double* Z,
double* DELTA, const double* RHO,
double* DLAM, int* INFO);
}
int main(int argc, char** argv) {
const int n = 29;
const int i = 23;
const double rho = 6.2964808955495258e-05;
const double d[n] = {
-8.9907411945275305e-17, 3.1527281176230556e-06, 7.8104209108493992e-06, 1.4528378412105169e-05, 1.6571840586254615e-05, 1.6791959834500457e-05, 1.6951003434697501e-05, 1.6961635049587819e-05, 1.6969218460422806e-05, 1.7028037406879576e-05, 1.9135232081591496e-05, 2.1876238530929982e-05, 2.1920146416725336e-05, 3.9357203097119847e-05, 1.0557423781645928e-04, 1.3686127039388258e-04, 1.3686300570614894e-04, 1.3686313427158908e-04, 1.3686314099616321e-04, 1.3686314127052412e-04, 1.3686314210337733e-04, 1.3686314231692531e-04, 1.3686314257213152e-04, 5.3532577970471340e-02, 9.9996866934433004e-01, 1.0000000849390769e+00, 1.0000000901678945e+00, 1.0000000918404301e+00, 1.0000001095998177e+00
};
const double z[n] = {
1.6700411548440888e-07, 2.2704845744451106e-06, -6.8596999842130412e-06, -1.9813464524908256e-04, 3.4676472662621390e-03, 5.8131816598176476e-02, -2.4389840480820292e-03, 2.2571085689296944e-03, -2.0150012240231657e-03, -9.7432047705138750e-03, 7.3184880793391433e-04, -2.9319792346511997e-02, 8.3364827297377855e-07, -8.9722126354645391e-09, 7.0412738449665979e-01, -1.4042299148200350e-06, 3.8731670039916887e-06, -5.4330381400333468e-07, 4.4026694557354993e-07, 2.2528424563293299e-07, 3.6064772311683760e-06, 2.8472995783701232e-07, 8.3086036055888118e-07, 2.8464273601162125e-02, -7.0642255076744132e-01, -5.2009238883566685e-08, 8.6348330718758654e-08, -2.9664496252475970e-08, 1.4932776629316714e-09
};
std::vector<double> delta(n, 0.0);
double dlam = 0.0;
int info = 0;
dlaed4_(&n, &i, d, z, delta.data(), &rho, &dlam, &info);
std::cout << std::setprecision(17) << std::scientific;
std::cout << "info = " << info << "\n";
std::cout << "dlam = " << dlam << "\n";
return (info == 0) ? 0 : 1;
}The code returns info = 1, which means not converged. The maximum number of iterations in DLAED4 is 30; the algorithm requires 31 iterations.
The eigenvalue to be found is very close to the left pole. There are several iterations where the Newton-like step overshoots and a fallback is taken. This happens for the first time in iteration 5. The step taken by the fallback turns out to be a bad step. The objective function value (W in the table) grows a lot and the remaining iterations do not suffice to correct this. Recall that this is a root finder, so W should go to zero.
| iter | W | relative position |
|---|---|---|
| 3 | -15.183206754205958 | 9.3224424664962507E-012 |
| 4 | -7.2541345644250672 | 2.8321247865713130E-011 |
| 5 | 15845.035200581398 | 0.25000000001416062 |
| 6 | 7937.0973190230861 | 5.8695311144112583E-004 |
| 7 | 5292.0409126050627 | 2.9347656988118687E-004 |
| 8 | 2646.0775072154233 | 1.1736977032438319E-004 |
| 9 | 1443.4324592139662 | 5.8684899322815527E-005 |
| 10 | 721.72516385211111 | 2.7943564578650639E-005 |
| 11 | 369.27924724566680 | 1.3971796449949251E-005 |
| 12 | 184.64794882583919 | 6.9033154364403488E-006 |
| 13 | 92.881305918680368 | 3.4516718788441071E-006 |
| 14 | 46.448843381611326 | 1.7204602732982808E-006 |
| 15 | -2.9323820134484166 | 8.3977374357369141E-011 |
| 16 | -1.4556166910784307 | 1.8246643456796507E-010 |
| 17 | 23.277161269594469 | 8.6032136986642434E-007 |
| 18 | 11.646269558195501 | 4.2954626252879772E-007 |
| 19 | -0.91250965489136449 | 3.0284357148388565E-010 |
| 20 | -0.45016815654621373 | 6.3928690716673472E-010 |
| 21 | 5.8486784388543924 | 2.1509277471798222E-007 |
| 22 | 2.9300774463199879 | 1.0726528303886306E-007 |
| 23 | 1.4829047906270059 | 5.3952284973014894E-008 |
| 24 | 0.73997721204365996 | 2.6852690757367693E-008 |
| 25 | -0.35749940737873870 | 8.0525079151536220E-010 |
| 26 | -0.15628904239033362 | 1.6119748312661070E-009 |
| 27 | 7.0768362172978744E-002 | 4.8489293641088358E-009 |
| 28 | 1.1908566236531963E-002 | 3.6314105287554539E-009 |
| 29 | -3.2274931252138529E-004 | 3.4136194909394072E-009 |
| 30 | -2.3743868398512502E-007 | 3.4192031981775864E-009 |
The root cause of the slow convergence is the fallback to bisection.
Lines 849 to 856 in 06f5ba3
| TEMP = TAU + ETA | |
| IF( TEMP.GT.DLTUB .OR. TEMP.LT.DLTLB ) THEN | |
| IF( W.LT.ZERO ) THEN | |
| ETA = ( DLTUB-TAU ) / TWO | |
| ELSE | |
| ETA = ( DLTLB-TAU ) / TWO | |
| END IF | |
| END IF |
As we see by the relative position in iteration 5, bisection is really bad in this case and moves the candidate point far away from the root.
I would like to understand why bisection was chosen as the fallback.
- LAPACK 2.0 had a different update (I did not try to decipher what exactly it does) and goes into a cyclic infinite loop for this case.
- LAPACK 3.0 introduced bisection as fallback. This part has not been changed since.
Are there any release notes for LAPACK 3.0? Was there any other kind of dissemination, email lists etc, where I could find a hint?