Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidy framework and data.table operations #13

Closed
rafapereirabr opened this issue Apr 25, 2022 · 3 comments
Closed

tidy framework and data.table operations #13

rafapereirabr opened this issue Apr 25, 2022 · 3 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@rafapereirabr
Copy link

Hi Robin. The package is shaping really nicely and it's aligned with an idea I've been thinking about related to spatial interaction models and accessibility (perhaps a topic for a chat in the future). I just wanted to ask whether package will be exclusively built on top of a tidy framework, or whether it could also use some data.table operations under the hood to be more efficient.

@Robinlovelace
Copy link
Owner

Robinlovelace commented Apr 25, 2022

Good question. In my experience OD data processing and modelling, including SIMs, are seldom computationally intensive relative to other parts of transport modelling and analysis workflows including routing and visualisation, so no plans at present. The priority currently is to get it somewhere close to a minimum viable product for a user friendly SIM package, as per #5 but happy to revisit this at some point later down the line. Also, because data.table objects extend the data frame, the approach should be compatible. Just tested in the example below:

remotes::install_github("robinlovelace/si")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'si' from a github remote, the SHA1 (b1e2c250) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(si)
# ?si_predict
od = si_to_od(si_zones, si_zones, max_dist = 4000)
#> 1695 OD pairs remaining after removing those with a distance greater than 4000 meters:
#> 15% of all possible OD pairs
m = lm(od$origin_all ~ od$origin_bicycle)
od_updated = si_predict(od, m)
class(od_updated)
#> [1] "sf"         "data.frame"
od_dt = data.table::data.table(od)
class(od_dt)
#> [1] "data.table" "data.frame"
od_updated_dt = si_predict(od_dt, m)
class(od_updated_dt)
#> [1] "data.table" "data.frame"
identical(od_updated$interaction, od_updated_dt$interaction)
#> [1] TRUE

Created on 2022-04-25 by the reprex package (v2.0.1)

Do you have a use case, or ideally a reproducible example, where better support for data.table would be beneficial? PRs and other ideas welcome, especially at this nascent stage.

@Nowosad Nowosad added enhancement New feature or request question Further information is requested labels Apr 26, 2022
@rafapereirabr
Copy link
Author

Thanks for the clarification, Robin. I believe data.tablewould not make much difference in si_predict() but it could possibly improve the efficiency of other functions such as si_to_od(), specially when dealing with large data sets. I totally understand the priority of getting to a minimum viable product first. Please let me know if you need a hand with this issue in the future and I'd be glad to help.

@Robinlovelace
Copy link
Owner

Agreed, low priority area. If it does come in useful would be more likely in the od package which is the driver of the data frame to sf conversion, which in turn uses sfheaders. Overall it's small beer in the grand scheme of things, good to think about and will happily revisit this if there are concrete use cases where it could be useful. Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants