How should the {si} package work? / Roadmap ideas #5

Robinlovelace · 2022-04-18T06:11:52Z

I'm looking for feedback from anyone with experience of SIMs in terms of:

How to not reinvent the wheel? Aim is for modelling functions in {spflow} and {gravity} and other packages to be easy to implement, with the function si_predict(). Aim to add these packages to Suggests and put examples implementing them into articles/vignettes.
What additional functionality would be most useful? Currently the main function is actually focussed on pre-processing with si_to_od() creating an 'analysis ready' (and modelling ready) data frame with all the variables from origins and destinations you could need.
- Functions like si_model_exponential_decay() and si_model_power() for quickly getting people started and not having to define their own functions
- Implementation of the radiation model, previously implemented in {stplanr} and in scikit-mobility
- More example datasets?
Tidy or standard evaluation?
Anything else?

Currently (2022-04-22) the function used to predict interaction is called si_predict() and works like this:

https://github.com/Robinlovelace/si/blob/d9ae80e683b316d619f3a8843f2a7d138c7d3b1f/README.qmd#L40-L53

That is likely to change to a tidy-eval framework in #10.

Previous questions (now mostly answered) related to this:

Should it be called si_predict(), perhaps with another function e.g. called si_train() to train models (constrained/unconstrained)?
- Yes, now implemented
Should the first argument of the of the fun argument be an od object (I'm currently thinking not as that arg is already in si_model(), heads up @Nowosad)?
- I don't think so, implemented in Tidyeval #10
How should custom SI prediction functions, e.g. si_gravity() work? I'm thinking as simple as possible would be good, enabling commands such as si_predict(od, fun = si_gravity(m = origins_population, n = destinations_population, distance = distance_euclidean)) would be good
- Partially implemented in Tidyeval #10
Related to the previous question, should we use tidy evaluation (currently is being used with var_p)?
- Implemented, now constraint_p
More broadly which conventions should we follow in terms of symbols used for SIM equations, e.g. Wilson's 1979 paper uses w_1/w_2, while some more recent papers (e.g. Simini's 2012 paper) uses m/n, throughout?
- Going with notation in Dennett's 2018 paper

The text was updated successfully, but these errors were encountered:

Nowosad · 2022-04-18T09:24:38Z

Hi @Robinlovelace -- I think you forgot to add an example of how this function works currently.

Robinlovelace · 2022-04-18T11:01:58Z

True that, updated, thanks for the heads-up @Nowosad and looking forward to doing some geocomputing with you soon!

Create si_predict/si_calculate for #5

adamdennett · 2022-04-22T18:00:14Z

Will try and add some proper thoughts when I'm back at work next week (or more likely the week after) but immediate thoughts on functionality that would be useful would be as well as various options to calibrate cost or distance / origin / destination parameters with observed data and Poisson / nb regression, functionality to input ones own parameter guesses (I.e. 1 for origin, - 1.5 for dis) would be really useful for rough and ready flow estimating.

TaylorOshan · 2022-04-27T16:21:36Z

One thing that might be useful to consider if incorporating constrained models estimated via GLMs is the use of sparse matrices to accommodate design matrices dominated by binary indicator variables. Unnecessary if instead using the multiplicative form, but could be nice to have both. Another consideration is metrics for evaluating predictions, such as comparing matrices, out-of-sample methods, SSI, etc.

Robinlovelace · 2022-04-27T17:07:49Z

Hi Taylor, thanks for the input. As per #14 and #15 I think the greatest 'added value' part of this approach could be the geographic pre-processing and flexibility for people to use whatever modelling frameworks they want as inputs into the si_calculate() (which takes hard-coded SIM functions) and si_predict() (which takes model objects as the first input) functions. I lack deep experience with SIMs and as such defer to the judgement of others re. that side of things and, to be honest, I don't 100% understand what design matrices dominated by binary indicator variables are, would that not be handled by the predictive model, e.g. glm() in base R or the nlsLM() function from the minpack.lm package as outlined in the introductory si vignette?

Agree re metrics for evaluating predictions, plan to discuss this with @lenkahas on Friday, although still need to get the foundations right e.g. #16 is the priority ATM.

TaylorOshan · 2022-04-27T22:06:44Z

That makes sense @Robinlovelace , very cool. In that case, you can leave the bespoke data structures up to the individual packages doing the calibration.

Apologies for the ambiguity, the design matrix here is the columns of input data used for the regression. In the case of the singly constrained model, a Poisson linear regression with fixed effects for the set of locations will generate the same coefficient estimates and predicted values as directly using nonlinear optimization (based on a multinomial distribution) for the multiplicative form from Wilson. The fixed effects from the Poisson linear regression is typically included into the design matrix using a binary indicator/dummy variable for each location, which causes the design matrix to become very sparse for even a moderate number of locations. Not an issue if you are using a different calibration technique, and as you mentioned here, it is more of a downstream issue with the function being supplied by the user.

Would be interested to hear yours and @lenkahas thoughts on metrics once you've had a chance to discuss!

Former-commit-id: 01a92c2

Create si_predict/si_calculate for #5 Former-commit-id: 56d2eb7

Former-commit-id: 3c05538

Former-commit-id: babf5ea

Robinlovelace added a commit that referenced this issue Apr 18, 2022

Create si_predict/si_calculate for #5

01a92c2

Robinlovelace added a commit that referenced this issue Apr 18, 2022

Merge pull request #8 from Robinlovelace/si_calculate

56d2eb7

Create si_predict/si_calculate for #5

Robinlovelace added a commit that referenced this issue Apr 21, 2022

Switch to tidyeval for #5

3c05538

Robinlovelace added a commit that referenced this issue Apr 21, 2022

Complete switch to tidyeval, progress on #5

babf5ea

Robinlovelace changed the title ~~How should si_model()/si_predict()/si_train() work?~~ How should the {si} package work? Apr 22, 2022

Robinlovelace changed the title ~~How should the {si} package work?~~ How should the {si} package work? / Roadmap Apr 25, 2022

Robinlovelace changed the title ~~How should the {si} package work? / Roadmap~~ How should the {si} package work? / Roadmap ideas Apr 25, 2022

Nowosad added the question Further information is requested label Apr 25, 2022

This was referenced Apr 25, 2022

tidy framework and data.table operations #13

Closed

Two general questions #14

Open

Robinlovelace added a commit that referenced this issue Jun 13, 2022

Create si_predict/si_calculate for #5

37f0cf3

Former-commit-id: 01a92c2

Robinlovelace added a commit that referenced this issue Jun 13, 2022

Merge pull request #8 from Robinlovelace/si_calculate

1c2bfaa

Create si_predict/si_calculate for #5 Former-commit-id: 56d2eb7

Robinlovelace added a commit that referenced this issue Jun 13, 2022

Switch to tidyeval for #5

7171497

Former-commit-id: 3c05538

Robinlovelace added a commit that referenced this issue Jun 13, 2022

Complete switch to tidyeval, progress on #5

0576342

Former-commit-id: babf5ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How should the {si} package work? / Roadmap ideas #5

How should the {si} package work? / Roadmap ideas #5

Robinlovelace commented Apr 18, 2022 •

edited

Loading

Nowosad commented Apr 18, 2022

Robinlovelace commented Apr 18, 2022

adamdennett commented Apr 22, 2022

TaylorOshan commented Apr 27, 2022

Robinlovelace commented Apr 27, 2022

TaylorOshan commented Apr 27, 2022

How should the {si} package work? / Roadmap ideas #5

How should the {si} package work? / Roadmap ideas #5

Comments

Robinlovelace commented Apr 18, 2022 • edited Loading

Nowosad commented Apr 18, 2022

Robinlovelace commented Apr 18, 2022

adamdennett commented Apr 22, 2022

TaylorOshan commented Apr 27, 2022

Robinlovelace commented Apr 27, 2022

TaylorOshan commented Apr 27, 2022

Robinlovelace commented Apr 18, 2022 •

edited

Loading