-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Add sparse non-negative OLS and WLS via QP for MinTraceSparse
#319
base: main
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Thanks this is great work! Some minor comments so far:
|
Hi, Olivier! I hope you're well. The linting issue is strange, thanks for pointing that out! I thought I ran Yup, I also thought that three is close to excessive. The good news is that they're both light on dependencies as both have numpy and scipy dependencies which don't conflict with ours. OSQP has a third dependency called qdldl. Would you rather scale it down to one sparse QP solver? It's straightforward to express the secondary approach in a problem format that OSQP will accept as the only change is reformulating the linear constraints, or vice versa for Clarabel. In my experience, the trade-off between the two in this context is that OSQP is slightly faster than Clarabel but slightly less reliable, particularly for the second approach unless you set higher accuracy, hence the incorporation of the two solvers. Unfortunately, we have to keep quadprog for now until My design decision for the PR just comes down to my workflow in my job and what I think is most valuable in a large business context. When evaluating tens or hundreds of millions of forecasts, it's important to have a process that's robust and performant. At that scale, in this case it's inconsequential whether for example, the QP solvers struggled with a particular reconciliation problem, or which if any of the two approaches were successful, as long as the best estimate coherent non-negative forecasts are returned for a given I understand if you think this methodology isn't suitable for most users of the package. If you'd like, I can easily simplify it and scale it way back to something like I find the QP solutions almost always improve upon the heuristic solution if successful, which you can measure by using the heuristic solution as a baseline and calculating metrics like the MSSE or RMSSE. I can put something together that quantitatively shows the differences between the QP approaches and the heuristic approach and measures the alignment of the two approaches with the dense approach in This is purely anecdotal, but using Let me know what you think and I can do the tests and make the changes. 😄 |
This PR improves the current sparse non-negative reconciliation in$S^{T}W^{-1}S$ , thereby maintaining the sparsity of the problem.
MinTraceSparse
by expanding upon the heuristic from my previous PR, #284, to improve the quality of the non-negative reconciled forecasts for large numbers of time series. It converts the OLS and WLS estimators into quadratic programming (QP) problems with non-negativity constraints, similar to the approach inMinTrace
, but most importantly avoiding the evaluation of the denseThe implementation in this PR is robust and performant, allowing users dealing with large numbers of time series—which is extremely common in industrial contexts—to reliably generate high quality non-negative OLS and WLS reconciled forecasts in a fraction of the time.
MinTrace
MinTraceSparse
ols
wls_struct
ols
wls_struct
wls_var
It adds two new QP solvers as dependencies to the package—OSQP and Clarabel.rs—which use augmented Lagrangian and interior point algorithms respectively. These two solvers are the most performant for these sorts of problems in my experience, and perform well in benchmarks. The new behaviour is triggered by default upon instantiation of
MinTraceSparse
when anonnegative
ofTrue
is passed, but can be toggled off by settingqp
toFalse
, which will revert to the method detailed in #284.I will add GLS estimators,$W$ by making it an upper-triangular matrix, maintaining a reasonable degree of sparsity and decreasing the time taken to invert it.
mint_cov
andmint_shrink
, in a future PR as we can exploit the symmetry ofWarm Start
via modified #284
As the OSQP solver supports manually warm starting primal and dual variables, we can modify the linear operator for$P$ used in #284 to directly project onto the non-negative orthant, using GCROT(m, k) to quickly solve the linear system and generate the reconciled non-negative forecasts, $x_{0}$ , to warm start the primal variables with. We could use a Jacobi preconditioner as the diagonal of $A$ in the $Ax=b$ linear system is given by the row-wise sums of $S^{T}W^{-1}$ , but it is probably not well-suited for diagonal preconditioning for most aggregation structures.
The generation of$x_{0}$ comes with a small performance penalty, but it offsets this by decreasing the number of iterations to solve the primary problem below as well as increasing the chance of solving it in my experience. Also, $x_{0}$ is used as a baseline to compare the post-processed QP primal solutions to and therefore acts as an important safeguard to ensure that regardless, we return non-negative reconciled forecasts that have an objective value at least as good as $x_{0}$ .
Primary Approach
via OSQP
where$A$ is the "aggregation matrix", which describes the relationship between the bottom level (leaf node) time series and the aggregated (parent node) time series, $W^{-1}$ is the precision matrix for the problem, and $\hat{y}_{h}$ are the base forecasts for a particular horizon step, $h$ .
This formulation is the most efficient as$\left(I_{n_{a}} \quad -A\right)$ has the set of all reconciled forecasts in its null space, so the graph structure is encoded in just $n_{a}$ equality constraints, with a remaining $n_{b}$ inequality constraints to constrain $x$ to the convex, non-negative feasible region of the solution space.
We try this approach first for each$h$ , and if the solver solves the problem, even with reduced accuracy, the primal solutions are post-processed by clipping to $[0, \infty)$ and aggregating with $S$ , then compared to $x_{0}$ to ensure they are objectively better, and the best non-negative reconciled forecasts are returned for each $h$ . If the OSQP solver exceeds the maximum number of iterations for a given $h$ without finding a solution, or finds the problem to be infeasible, the secondary approach below is employed by default as a fallback, else if $x_{0}$ is returned.
fallback
is set toFalse
,Secondary Approach
via Clarabel
where$S$ is the "summing matrix", $z$ is an auxiliary variable, and $s$ is a composition (product) of two convex cones.
This formulation is an alternative way of framing the problem in the Wickramasuriya, Turlach, and Hyndman paper, that still maintains the sparsity. The secondary approach increases the decision variables from$n$ to $n$ + $n_{b}$ and the "nnz" of the linear constraints matrix by $2n_{b}$ relative to the primary approach, so it is a tad slower for most cases, but it tends to be more successful in my experience. Also, pairing it with a different solver, in this case Clarabel, maximises our chances of finding optimal non-negative reconciled forecasts.
Post-process
As mentioned above, if either OSQP or Clarabel return a solution, it is post-processed by clipping to$[0, \infty)$ and aggregated with $S$ , then compared to $x_{0}$ to ensure it is objectively better.