Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should the {si} package work? / Roadmap ideas #5

Open
7 of 13 tasks
Robinlovelace opened this issue Apr 18, 2022 · 6 comments
Open
7 of 13 tasks

How should the {si} package work? / Roadmap ideas #5

Robinlovelace opened this issue Apr 18, 2022 · 6 comments
Labels
question Further information is requested

Comments

@Robinlovelace
Copy link
Owner

Robinlovelace commented Apr 18, 2022

I'm looking for feedback from anyone with experience of SIMs in terms of:

  • How to not reinvent the wheel? Aim is for modelling functions in {spflow} and {gravity} and other packages to be easy to implement, with the function si_predict(). Aim to add these packages to Suggests and put examples implementing them into articles/vignettes.
  • What additional functionality would be most useful? Currently the main function is actually focussed on pre-processing with si_to_od() creating an 'analysis ready' (and modelling ready) data frame with all the variables from origins and destinations you could need.
    • Functions like si_model_exponential_decay() and si_model_power() for quickly getting people started and not having to define their own functions
    • Implementation of the radiation model, previously implemented in {stplanr} and in scikit-mobility
    • More example datasets?
  • Tidy or standard evaluation?
  • Anything else?

Currently (2022-04-22) the function used to predict interaction is called si_predict() and works like this:

https://github.com/Robinlovelace/si/blob/d9ae80e683b316d619f3a8843f2a7d138c7d3b1f/README.qmd#L40-L53

That is likely to change to a tidy-eval framework in #10.

Previous questions (now mostly answered) related to this:

  • Should it be called si_predict(), perhaps with another function e.g. called si_train() to train models (constrained/unconstrained)?
    • Yes, now implemented
  • Should the first argument of the of the fun argument be an od object (I'm currently thinking not as that arg is already in si_model(), heads up @Nowosad)?
  • How should custom SI prediction functions, e.g. si_gravity() work? I'm thinking as simple as possible would be good, enabling commands such as si_predict(od, fun = si_gravity(m = origins_population, n = destinations_population, distance = distance_euclidean)) would be good
  • Related to the previous question, should we use tidy evaluation (currently is being used with var_p)?
    • Implemented, now constraint_p
  • More broadly which conventions should we follow in terms of symbols used for SIM equations, e.g. Wilson's 1979 paper uses w_1/w_2, while some more recent papers (e.g. Simini's 2012 paper) uses m/n, throughout?
    • Going with notation in Dennett's 2018 paper
@Nowosad
Copy link
Collaborator

Nowosad commented Apr 18, 2022

Hi @Robinlovelace -- I think you forgot to add an example of how this function works currently.

@Robinlovelace
Copy link
Owner Author

True that, updated, thanks for the heads-up @Nowosad and looking forward to doing some geocomputing with you soon!

Robinlovelace added a commit that referenced this issue Apr 18, 2022
Create si_predict/si_calculate for #5
Robinlovelace added a commit that referenced this issue Apr 21, 2022
@Robinlovelace Robinlovelace changed the title How should si_model()/si_predict()/si_train() work? How should the {si} package work? Apr 22, 2022
@adamdennett
Copy link
Collaborator

Will try and add some proper thoughts when I'm back at work next week (or more likely the week after) but immediate thoughts on functionality that would be useful would be as well as various options to calibrate cost or distance / origin / destination parameters with observed data and Poisson / nb regression, functionality to input ones own parameter guesses (I.e. 1 for origin, - 1.5 for dis) would be really useful for rough and ready flow estimating.

@Robinlovelace Robinlovelace changed the title How should the {si} package work? How should the {si} package work? / Roadmap Apr 25, 2022
@Robinlovelace Robinlovelace changed the title How should the {si} package work? / Roadmap How should the {si} package work? / Roadmap ideas Apr 25, 2022
@Nowosad Nowosad added the question Further information is requested label Apr 25, 2022
@TaylorOshan
Copy link

One thing that might be useful to consider if incorporating constrained models estimated via GLMs is the use of sparse matrices to accommodate design matrices dominated by binary indicator variables. Unnecessary if instead using the multiplicative form, but could be nice to have both. Another consideration is metrics for evaluating predictions, such as comparing matrices, out-of-sample methods, SSI, etc.

@Robinlovelace
Copy link
Owner Author

Hi Taylor, thanks for the input. As per #14 and #15 I think the greatest 'added value' part of this approach could be the geographic pre-processing and flexibility for people to use whatever modelling frameworks they want as inputs into the si_calculate() (which takes hard-coded SIM functions) and si_predict() (which takes model objects as the first input) functions. I lack deep experience with SIMs and as such defer to the judgement of others re. that side of things and, to be honest, I don't 100% understand what design matrices dominated by binary indicator variables are, would that not be handled by the predictive model, e.g. glm() in base R or the nlsLM() function from the minpack.lm package as outlined in the introductory si vignette?

Agree re metrics for evaluating predictions, plan to discuss this with @lenkahas on Friday, although still need to get the foundations right e.g. #16 is the priority ATM.

@TaylorOshan
Copy link

That makes sense @Robinlovelace , very cool. In that case, you can leave the bespoke data structures up to the individual packages doing the calibration.

Apologies for the ambiguity, the design matrix here is the columns of input data used for the regression. In the case of the singly constrained model, a Poisson linear regression with fixed effects for the set of locations will generate the same coefficient estimates and predicted values as directly using nonlinear optimization (based on a multinomial distribution) for the multiplicative form from Wilson. The fixed effects from the Poisson linear regression is typically included into the design matrix using a binary indicator/dummy variable for each location, which causes the design matrix to become very sparse for even a moderate number of locations. Not an issue if you are using a different calibration technique, and as you mentioned here, it is more of a downstream issue with the function being supplied by the user.

Would be interested to hear yours and @lenkahas thoughts on metrics once you've had a chance to discuss!

Robinlovelace added a commit that referenced this issue Jun 13, 2022
Robinlovelace added a commit that referenced this issue Jun 13, 2022
Create si_predict/si_calculate for #5

Former-commit-id: 56d2eb7
Robinlovelace added a commit that referenced this issue Jun 13, 2022
Former-commit-id: 3c05538
Robinlovelace added a commit that referenced this issue Jun 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants