Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: ingest! and update! #13

Open
ablaom opened this issue Jan 25, 2023 · 3 comments
Open

Discussion: ingest! and update! #13

ablaom opened this issue Jan 25, 2023 · 3 comments

Comments

@ablaom
Copy link
Member

ablaom commented Jan 25, 2023

Comment of @jeremiedb, copied from #10:

I think I remain a little confused to the extent to which these term can translate unambiguously to the variety of algos and their implementations.

For a GBT / EvoTree:

fit: preprocess X / Y, creates a cache, then apply a grow_evotree! for some iterations
update!: essentially grow_evotree!, that is, add a tree to the model, assuming no change to data (uses cache), though the hyper-params may have changed (learning rate, regularization...) but not all (nbins couldn't as it would require an expensive re-creation of cache)
ingest!: continue training using new data. This is not a functionnality that isn't supported in current implementation. Is there an actual use case for which it would be expeted to be supported for GBT?
Is the intent of ingest! to act as a support for online learning? As I unnderstand it, the intent of ingest! is that the intended effect of fit(x1, y1, m); update!(x2, y2, m) is to be equivalent to fit(x3, y3) where x3 is the concatenation of x1 & x2. If such is the case, then I guess that some extra information needs to be captures during fit, for as linear model for example to exhibit such behavior where the fit + ingest = fit on concatenated data.

Is it assumed that for ingest!, the new data keeps the same features, or it could be a subset / overset? The later could be relevant in situations where uses initial model as offset models, over which training could be performed potentially on additional features, though I don't think such mechanism would be the appropriate approach to achieve this rather than an explicit model stacking.

For neural nets, where a model is fed data through a DataLoader, I'm not too clear which of the update! and ingest! best applies. Is each of of the batch of an epoch be considered like new data? Or would ingest! only be used is a new DataLoader is built or new data?

I think a reason why I find the update / ingest distinction not so clear is that it may be that the underlying reason for a difference in implications from the 2 verbs have more to do about algorithm implementations and whether they involve preprocessing / caching, than actual distinct verbs generally applicable.

For example, if using a GBT with exact method (one which does not require data preprocessing), then such tree construction algo could be implemented using a stream / online approach. Each iteration could be fed with either entirely new data (having the same features) or just another subsampling of the original data. This is a similar situation for neural nets where I don't see fundamental distinction between a batch from a fixed dataset or a batch coming from an entirely new one. And in all cases, I think there are some parms that can changed through both update / ingest like learning rates and regularization, and others that can't like number of features, or size of hidden layers.

Perhaps this has already been done, but I'm wondering if a clarification of the scope of what algos / use cases are supported by the framework. By that I mean to explicit what are the implications (is there any overhead, in what circumstances) for a variety of algo families, notably:

Linear models
Neural Nets
Gradient boosted trees
Algos requiring cache / initialization vs those that doesn't
Given the broadly different cowds that may feel concerned by the framework, it also comes with very different perspectives of what are the "natural" way of doing things and what appear like reasonablw compromise (for instance performance overhead is a big deal in my prod oriented usage, but isn't for many research / educational ones).

@ablaom
Copy link
Member Author

ablaom commented Jan 26, 2023

Related discussion: JuliaAI/MLJ.jl#60

@ablaom
Copy link
Member Author

ablaom commented Sep 20, 2024

Thanks @jeremiedb for the thoughtful commentary.

I suppose that in any kind of "update", we provide one or more of the following:

  1. Changes to hyper-parameters (e.g., an increase in the iteration parameter)
  2. New training observations
  3. New training features

Any model can support 1, by simply retraining, from scratch, on the original data (provided as a fallback).

I cannot conceive of a need for any model to support 2 and 3 in the same update step.

edit I guesss support for the remaining possibilities (2 only, 3 only, 1 + 2 or 1 +3) can probably be optional.

Yes, increasing an iteration parameter (case of 1) may be equivalent to providing previous observations as "new" ones (case of 2). However, this fact does not render the distinction between 1 and 2 useless more generally.

My main use case for 2 is online learning. My main use case for 3 is linear models (see this comment).

I think the important issue you raise is what promises of behaviour should we make for the different update cases, where implemented. How about:

  • 1 only: (edited again) the update should be equivalent to training from scratch (on the original data) with the new hyperparameters

  • 2 only or 3 only: update should be equivalent to retraining with the original data concatenated with new data

Beyond this, I can't really think of a well-defined generic requirement, and so would leave that up to the implementation to explain in documentation.

@jeremiedb What say you?

@ablaom
Copy link
Member Author

ablaom commented Oct 6, 2024

Okay, 1 as above doesn't allow for adding iterations with a new learning rate for the new iterations. Perhaps:

1 only: For a single (or no) hyperparameter replacement, the update should be equivalent to training from scratch (on the original data) with the updated hyperparameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant