dynamic determination of invalidated cache #15

Eh2406 · 2016-11-07T17:24:37Z

can we keep track of when we generate a orca column and use that to determine when a orca table needs to be re-calculated.

fscottfoti · 2016-11-07T18:38:14Z

You mean like have a dependency graph? I definitely thought that was a promising idea but Matt thought it would be a lot of work at the time...

Eh2406 · 2016-11-07T21:44:29Z

To be honest I assumed we had a DG, or the data to build one on the fly. I assumed that is what is needed to do lazy loading. I will go back to the code. How do we do lazy loading?

When we add data to orca, store a timestamp (or monatomic number) in a _CACHE_TIME.
However we do lazy loading, rerun that code and see the largest value for the _CACHE_TIME of the dependencies. If the dependencies time > than the time for the cashe of this item, than we are in a invalidated cache state.

fscottfoti · 2016-11-08T01:00:25Z

No there's no DG - lazy evaluation happens when an item gets injected into a function. If it's in the cache it doesn't get re-evaluated. The user has to invalidate the cache manually, which is usually done every iteration, and is why my other issue about the cache is so important. It's not that we didn't think DG was a good idea, just that Matt never got to it.

Eh2406 · 2016-11-09T17:24:17Z

Does the algorithm I suggested make sense? Do you think it would improve things?

As always just starting a discussion, and willing to help implement any fix.

fscottfoti · 2016-11-09T17:43:50Z

I definitely agree that building a DF and invalidating the cache appropriately when something gets updated would be awesome. The problem is that your dependency graph is usually built based on what gets injected into the function, but everything that gets injected is not necessarily used. So like when you inject the parcels table which has 20 columns, you only update one column in the parcels table you want to invalidate only the things that depend on that column, but you're injecting the whole table. I see no obvious way to solve this other than requiring the user to say which columns a function depends on explicitly, but then we're almost as prone to error as we were in the first place. Thoughts?

Eh2406 · 2016-11-09T19:16:24Z

It can also be subverted by using orca.get_table in the function to make a dependency without telling orca, or by using a different global variable / file to store some state. Generally python is permissive with privacy and mutability for better and worse. (off topic, I am excited by timely dataflow for rust. It is leveraging rust's stinginess with privacy and mutability to insure a lot of the invariants we are discussing.)

Perhaps this algorithm can be a diagnostic tool? Still thinking...

fscottfoti · 2016-11-09T19:20:27Z

Right - that's the best I can come up with too... orca knows when it's inside that method, and it can capture any calls to get_table or __getattr__ and know which columns are a dependency. My fear is that you can have different code paths through the method - like an if/else - so you can't know you've got ALL the dependencies just by running the function once. And that's where I get stumped.

bridwell · 2016-11-18T21:31:20Z

I would say that stating specific column dependencies, while cumbersome, is still preferable to having to manually clear the cache within the steps. But maybe this could be automated in some way, like looking through the contents of a yaml file or configuration dictionary rather than being declared in the decorator?

waddell · 2016-11-18T21:40:08Z

@bridwell that is my inclination as well, to be explicit about dependencies. A bit more work up front, but with good payoff in terms of being able to address challenges like identifying dynamically when to recompute based on an updated dependency.

bridwell mentioned this issue Mar 28, 2017

Extend variable expressions for tables #22

Open

smmaurer mentioned this issue Sep 14, 2019

clearing orca caches? UDST/urbansim_templates#115

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dynamic determination of invalidated cache #15

dynamic determination of invalidated cache #15

Eh2406 commented Nov 7, 2016

fscottfoti commented Nov 7, 2016

Eh2406 commented Nov 7, 2016

fscottfoti commented Nov 8, 2016

Eh2406 commented Nov 9, 2016

fscottfoti commented Nov 9, 2016

Eh2406 commented Nov 9, 2016

fscottfoti commented Nov 9, 2016 •

edited

Loading

bridwell commented Nov 18, 2016

waddell commented Nov 18, 2016

dynamic determination of invalidated cache #15

dynamic determination of invalidated cache #15

Comments

Eh2406 commented Nov 7, 2016

fscottfoti commented Nov 7, 2016

Eh2406 commented Nov 7, 2016

fscottfoti commented Nov 8, 2016

Eh2406 commented Nov 9, 2016

fscottfoti commented Nov 9, 2016

Eh2406 commented Nov 9, 2016

fscottfoti commented Nov 9, 2016 • edited Loading

bridwell commented Nov 18, 2016

waddell commented Nov 18, 2016

fscottfoti commented Nov 9, 2016 •

edited

Loading