Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade dbplyr generics #211

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

florianm
Copy link
Contributor

@florianm florianm commented May 26, 2024

This PR upgrades dbplyr generics from v1 to v2 to prevent breakage once the v1 generics are removed from dbplyr.

Description

  • Follow the upgrade guide at https://dbplyr.tidyverse.org/articles/backend-2.html#nd-edition
  • DESCRIPTION: Bump version (suggest 0.8.0 once merged), bump and pin min versions for R, DBI, crul, jsonlite, dplyr, dbplyr to latest available versions, update author email for FM, bump RoxygenNote
  • ckanr-package.R: Upgrade syntax to fix Roxygen warning, add new authors from DESCRIPTION
  • Add dbplyr_edition as per upgrade guide (new man page)
  • Upgrade dplyr.R generics and imports as per upgrade guide
  • Document package to rebuild NAMESPACE
  • NEWS: add v0.8.0 with migration guide for ckanr users
  • Tests: update ckanr_settings with defaults for CKAN_TEST_URL and CKAN_TEST_BEHAVIOUR and remove from GH actions workflows so that tests pass on GH actions from PRs with no access to repo secrets
  • Tests: patch to pass with CKAN 2.10
  • Tests: skip CKAN on windows and mac
  • Generics: Implement generics until all examples pass, update examples

Questions to the package maintainer @fjuniorr:

  • How do you feel about pinning and upgrading all the dependencies?
  • The package docs mention that ds_create_dataset will be removed in favour of resource_create. This PR suggests a new package version (0.8.0). Would you like me to drop ds_create_dataset in this PR or leave it up to you?

Related Issue

Fix #187
Fix #212

Example

The upgraded dbplyr generics are not used explicitly outside of dplyr.R in ckanr, so no existing ckanr function will be affected.

However, users of the deprecated dplyr generics must now upgrade their code to use the newer dbplyr equivalents. The v0.8.0 release note in NEWS adds a migration guide for the affected functions.

This PR upgrades internal syntax (dbplyr generics). (See above: affected are only exposed dplyr/dbplyr generics).

The test run (example) looks like CKAN is up and running (e.g. API key is generated), but ckanr has trouble accessing it ("CKAN is offline").

@fjuniorr while I'm getting a local CKAN up and running, have you got any pointer why the GH action workflow can't find the CKAN test instance?

* Follow upgrade guide at https://dbplyr.tidyverse.org/articles/backend-2.html#nd-edition
* DESCRIPTION: Bump version (suggest 0.7.1 once merged), bump min versions for jsonlite, dplyr, dbplyr to latest available, update author email for FM, bump RoxygenNote
* Fix Roxygen warning: Upgrade syntax in ckanr-package.R
* Add dbplyr_edition as per upgrade guide (new man page)
* Upgrade dplyr.R generics and imports as per upgrade guide
* Document package to rebuild NAMESPACE
* TODO NEWS
* TODO see if tests pass (no working CKAN available here)
* This PR should be reviewed thoroughly before merging
* Improve DESCRIPTION: add R dependency (prepare to use native pipe over magrittr pipe), pin crul version
* Improve ckanr-package author email
* Fix dbplyr_edition generics
* Suggest new package version 0.8.0
@fjuniorr
Copy link
Contributor

How do you feel about pinning and upgrading all the dependencies?

My overall preference would be to not pin the dependencies to allow the installation on older environments.

The package docs mention that ds_create_dataset will be removed in favour of resource_create. This PR suggests a new package version (0.8.0). Would you like me to drop ds_create_dataset in this PR or leave it up to you?

Would you mind opening a separate PR for this? I'm still not familiar with the codebase and it will make it easier for me to digest piecemeal changes.

@fjuniorr while I'm getting a local CKAN up and running, have you got any pointer why the GH action workflow can't find the CKAN test instance?

The workflow needed access to the CKANR_TEST_URL and CKANR_TEST_BEHAVIOUR env vars but they were defined as secrets, so they were not passed to the runner when a workflow is triggered from a forked repository.

I'm fixing this on #212 and managed to run the tests on ubuntu but some other issue came up that still needs investigating related to the CKANR_TEST_BEHAVIOUR on windows and macos runners.

@florianm
Copy link
Contributor Author

florianm commented May 29, 2024

@fjuniorr thanks for the info!

re dependencies:
This PR only needs to pin dbplyr and dplyr to support the new dbplyr backend API.
I've moved the idea of requiring R >= 4.1 to #213.

re deprecation: moved to #214.

Great discussion at #187 too!

I'm pushing a work in progress - I've set up a local CKAN as follows (TODO add to contrib guide?)

  • Windows 10, WSL2 Ubuntu
  • docker compose installed in WSL2
  • copied ckanr's docker-compose.yml to a local dir inside my WSL2 home folder
  • created an .env file next to docker-compose with CKAN_VSERION=2.10
  • ran docker compose up -d
  • ran the command from the GH actions workflow to create TEST_API_KEY
  • added TEST_API_KEY=... to my .Renviron

Tests are now running, but 2.10 deprecated some API endpoints such as package_activity_list (breaks ckanr::changes and ckanr::package_activity_list)

Current WIP:

> my_ckan <- src_ckan("https://www.data.qld.gov.au/")
> tbl_list <- DBI::dbListTables(my_ckan, limit=5)
> tbl1 <- dplyr::tbl(my_ckan, tbl_list[1])
Error in `db_query_fields.DBIConnection()`:
! Can't query fields.
Using SQL: _id
Using SQL: _full_text
Using SQL: Motor Accident Insurance Commission
Caused by error in `curl::curl_fetch_memory()`:
! URL rejected: Malformed input to a URL function
Run `rlang::last_trace()` to see where the error occurred.

> rlang::last_trace(drop = FALSE)
<error/rlang_error>
Error in `db_query_fields.DBIConnection()`:
! Can't query fields.
Using SQL: _id
Using SQL: _full_text
Using SQL: Motor Accident Insurance Commission
Caused by error in `curl::curl_fetch_memory()`:
! URL rejected: Malformed input to a URL function
---
Backtrace:
     ▆
  1. ├─dplyr::tbl(my_ckan, tbl_list[1])
  2. ├─ckanr:::tbl.CKANConnection(my_ckan, tbl_list[1])
  3. │ └─ckanr:::tbl.src_ckan(src, name, ...) at ckanr/R/dplyr.R:88:3
  4. │   └─dbplyr::tbl_sql(...) at ckanr/R/dplyr.R:65:3
  5. │     ├─base::withCallingHandlers(...)
  6. │     ├─vars %||% dbplyr_query_fields(src$con, source)
  7. │     └─dbplyr:::dbplyr_query_fields(src$con, source)
  8. │       └─dbplyr:::dbplyr_fallback(con, "db_query_fields", ...)
  9. │         ├─rlang::eval_bare(expr((!!fun)(con, ...)))
 10. │         └─dbplyr:::db_query_fields.DBIConnection(con, ...)
 11. │           └─dbplyr:::db_get_query(con, sql, "Can't query fields.")
 12. │             ├─dbplyr:::dbi_wrap(...)
 13. │             │ └─base::withCallingHandlers(...)
 14. │             ├─DBI::dbGetQuery(con, sql)
 15. │             └─ckanr (local) dbGetQuery(con, sql)
 16. │               ├─DBI::dbSendQuery(conn, statement, ...) at ckanr/R/dbi.R:120:13
 17. │               └─ckanr (local) dbSendQuery(conn, statement, ...)
 18. │                 └─ckanr::ds_search_sql(...) at ckanr/R/dbi.R:110:13
 19. │                   └─con$get(query = list(sql = sql)) at ckanr/R/ds_search_sql.R:23:3
 20. │                     └─private$make_request(rr)
 21. │                       └─crul:::crul_fetch(opts)
 22. │                         └─curl::curl_fetch_memory(x$url$url, handle = x$url$handle)
 23. └─base::.handleSimpleError(...)
 24.   └─dbplyr (local) h(simpleError(msg, call))
 25.     └─cli::cli_abort(msg, parent = cnd, call = call, .envir = env)
 26.       └─rlang::abort(...)

I may need to implement db_query_fields.CKANConnection.

Edit: Implementing a clean DBI backend will require a bit more work: https://solutions.posit.co/connections/db/advanced/backend/

* See ropensci#215: support multiple CKAN versions
* Disable tests for deprecated API endpoints as of CKAN 2.10
* Introduce a default for get_test_url and get_test_behaviour
* Format ckanr_settings
* Update functions to match new DBI connection
* Some function generics still need to be implemented
* Tests are passing, but not covering much of the dplyr/DBI interface
* Improve docstrings and examples
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CKANConnection uses an old dbplyr interface
2 participants