-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add_table does not completely remove data from previously registered table #44
Comments
Related to this, it would be useful to have a function |
Hi @hanase! I wonder if the Was just discussing this with @janowicz, and we can't think of any reason not to have explicit deletion functions, though. It looks like they don't exist for anything right now. There could be If there were a |
Thanks for looking into it @smmaurer and @janowicz! My use case is as follows: There is a registered column defined as a variable. But sometimes I want to provide external numbers for this column instead of letting orca to computed it (which I do now with Thanks a lot! |
I think the current behavior is what is expected/desired. For example, we re-add (essentially overwrite) the households table each time the household transition model is run. So we still need all the existing orca columns that we have defined for it. Adding a One approach might be to define a simple injectable that defines whether or not we want to use the calculation, and redefine the b orca column so that it uses this orca.add_injectable('use_orca', True)
@orca.column('my_table')
def b(my_table, use_orca):
if use_orca:
return pd.Series([1000, 2000, 3000]))
else:
return my_table['c'] orca.get_table('my_table')['b'] 0 1000 # now use the stored/provided value
orca.get_table('my_table').update_col('c', pd.Series([4, 5, 6]))
orca.add_injectable('use_orca', False)
orca.get_table('my_table')['b'] 0 4 Another approach might be to check for the presence of the c column in the local/raw df and act on that? @orca.column('my_table')
def b(my_table):
if 'c' not in my_table.local_columns:
return pd.Series([1000, 2000, 3000])
else:
return my_table['c']
# this call is assuming that c is in the table already
orca.get_table('my_table')['b'] 0 4 # now remove the local column and use the computed value
del orca.get_table('my_table').local['c']
orca.get_table('my_table')['b'] 0 1000 |
Thanks Scott @bridwell ! I'm not sure if this actually works for my case. I need to make it more specific: In our application, I need the households.city_id to be recomputed in each step in some years (cache_scope = "step"), but then not be computed at all in other years (cache_scope = "iteration"). In those other years, the transition model runs first where the control totals are set on city_id level. Thus, households.city_id becomes a local (actually as well as computed) column, which I need to stay unchanged until the end of the iteration. (I first re-save the table without it and then use Any idea how to do it? Thanks. |
This might be a short term fix, but how's something like this? Basically define a step that will change the scope of the column at the start of each year. Note that in the 1st year of the run the city is re-computed in the 1st step only, in the 2nd year it's re-computed in each step. # let's define a simple data frame
orca.add_table(
'persons',
pd.DataFrame({
'name': ['john', 'paul','george', 'ringo']
})
)
# define a simple orca column
# intially define w/ a scope of 'forever' (the default)
@orca.column('persons', cache=True)
def city(persons):
print('******************')
print('...computing city')
print('******************')
return pd.Series(
np.repeat('liverpool', len(persons)),
index=persons.index
)
def change_col_cache_scope(table, column, scope=None):
"""
Change the cache scope for a given column. If scope is
None, assume we're going to stop caching it.
"""
# allowable scopes, values indicate the update granularity
scopes = {
None: 0,
'step': 1,
'iteration': 2,
'forever': 3
}
# fetch the column and it's caching properties
c = orca.get_raw_column(table, column)
curr_cache = c.cache
curr_scope = c.cache_scope
# change the cache properties
if scope is None:
c.cache = False
# setting back to the default
# ?? is there something better to do with this?
c.cache_scope = 'forever'
else:
if scope not in scopes.keys():
raise ValueError('{} is not an allowed scope'.format(scope))
c.cache = True
c.cache_scope = scope
# clear out any existing caches if the provided scope is
# more granular than the existing
old_granularity = scopes[curr_scope]
new_granularity = scopes[scope]
if new_granularity < old_granularity:
c.clear_cached()
# let's define a step that changes the cache scope, this should be at the beginning of the run
# in odd years this will use iteration, in even years use step
@orca.step()
def set_scopes(iter_var):
if iter_var % 2 == 0:
scope = 'step'
else:
scope = 'iteration'
print('================================')
print('current scope: {}'.format(scope))
print('================================')
change_col_cache_scope('persons', 'city', scope)
# define some steps, each of thes collects the persons' city column
@orca.step()
def step1(persons):
city = persons['city']
@orca.step()
def step2(persons):
city = persons['city']
@orca.step()
def step3(persons):
city = persons['city']
# define our run
# in odd years, the message from the city column should only be printed in the 1st step
# in even years, the message should be printed after every step
orca.run(
[
'set_scopes',
'step1',
'step2',
'step3'
],
iter_vars=range(1, 11)
)
Running iteration 1 with iteration value 1
Running step 'set_scopes'
================================
current scope: iteration
================================
Time to execute step 'set_scopes': 0.00 s
Running step 'step1'
******************
...computing city
******************
Time to execute step 'step1': 0.00 s
Running step 'step2'
Time to execute step 'step2': 0.00 s
Running step 'step3'
Time to execute step 'step3': 0.00 s
Total time to execute iteration 1 with iteration value 1: 0.00 s
Running iteration 2 with iteration value 2
Running step 'set_scopes'
================================
current scope: step
================================
Time to execute step 'set_scopes': 0.00 s
Running step 'step1'
******************
...computing city
******************
Time to execute step 'step1': 0.00 s
Running step 'step2'
******************
...computing city
******************
Time to execute step 'step2': 0.00 s
Running step 'step3'
******************
...computing city
******************
Time to execute step 'step3': 0.00 s
Total time to execute iteration 2 with iteration value 2: 0.00 s
... In the longer term it seems like what you're after (myself also) is a more dynamic cache that only clears things out when a value has changed in a function dependencies so we don't have to deal with this. This is definitely a high priority for me. |
Wow - this looks like it could work. Obviously, it would be better if the BTW, the dependencies issue is a high priority for us as well. I have done some work on it which I'm going to talk about in Denver. I hope I'll see you there. |
Cool, I think we're in the same session in Denver. I'll be talking about some of my ideas for orca/caching issues as well. Should be an interesting discussion! |
When data is added to a table via add_column(), that column is not deleted when the table is overwritten with add_table(). Here is an example:
Function
add_table()
does clear the cache in this line, but I think it also needs to iterate over _COLUMNS and _LOCAL_COLUMNS (maybe also other places) to delete the old data.The text was updated successfully, but these errors were encountered: