Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCR tests fail with longurl::expand_urls() #220

Open
bretsw opened this issue Jan 28, 2021 · 11 comments
Open

VCR tests fail with longurl::expand_urls() #220

bretsw opened this issue Jan 28, 2021 · 11 comments
Milestone

Comments

@bretsw
Copy link

bretsw commented Jan 28, 2021

In a package I'm developing, tests for a function have been failing, and I traced the issue back to the source: longurl::expand_urls()

As an example of functionality, running longurl::expand_urls("http://bit.ly/2SfWO3K", seconds=10)$expanded_url) returns "https://www.aect.org/about_us.php".

However, putting this inside a vcr test fails, and it's unclear to me why.

Example
test_that("longurl::expand_urls() works for vcr", {

  vcr::use_cassette("longurl_test", {
    url1 <- longurl::expand_urls("http://bit.ly/2SfWO3K", seconds=10)$expanded_url
  })

  expect_equal(url1, "https://www.aect.org/about_us.php")
})
Session Info
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] vcr_0.6.0      testthat_3.0.0 longurl_0.3.3 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5        rstudioapi_0.13   knitr_1.30        whisker_0.4       magrittr_2.0.1   
 [6] R6_2.5.0          rlang_0.4.9       fansi_0.4.1       httr_1.4.2        tools_4.0.2      
[11] rtweet_0.7.0      xfun_0.19         sessioninfo_1.1.1 cli_2.2.0         withr_2.3.0      
[16] htmltools_0.5.1.1 assertthat_0.2.1  yaml_2.2.1        digest_0.6.27     httpcode_0.3.0   
[21] crayon_1.3.4      webmockr_0.7.4    base64enc_0.1-3   triebeard_0.3.0   curl_4.3         
[26] crul_1.0.0        glue_1.4.2        evaluate_0.14     rmarkdown_2.6     compiler_4.0.2   
[31] urltools_1.7.3    fauxpas_0.5.0     jsonlite_1.7.2  
@sckott
Copy link
Collaborator

sckott commented Jan 28, 2021

thanks! i'll have a look

@bretsw
Copy link
Author

bretsw commented Jan 28, 2021

Thank you! I appreciate it

@sckott
Copy link
Collaborator

sckott commented Jan 28, 2021

i think i found the issue, need to do some more testing to make sure a fix doesn't break other thing

@sckott sckott added this to the v0.6.2 milestone Jan 29, 2021
sckott added a commit that referenced this issue Jan 29, 2021
…hange in webmockr

- RequestMatcherRegistry register_$built_ins method needed fix in the rul for path: strip trailing slash first
@sckott
Copy link
Collaborator

sckott commented Jan 29, 2021

i thought i had it fixed, but not working yet. problem is the redirects. we don't record the redirect for some reason

@sckott
Copy link
Collaborator

sckott commented Jan 29, 2021

curl only returns the final http request in a redirect chain. - headers for all requests are returned, so that's part of what we'd need to construct complete request/response objects.

library(curl)
h = new_handle()
handle_setopt(h, .list = list(customrequest = "HEAD", nobody = TRUE, followlocation = TRUE, verbose = TRUE))
res = curl_fetch_memory("http://bit.ly/2SfWO3K", h)
res

not clear to me how Ruby vcr records all requests in a redirect chain if it behaves the same way as above in R curl in only returning the final request

@sckott
Copy link
Collaborator

sckott commented Jan 29, 2021

We can record a redirect with httr or crul, but both error on the 2nd use of the cassette because the cassette only has 1 request/response - the final http request in the redirect chain - so subsequent use_cassette uses mismatch b/c first url is the bitly one (http://bit.ly/2SfWO3K) - whereas what's on disk is the final url from the redirect (https://www.aect.org/about_us.php)

library(crul)
con <- HttpClient$new("http://bit.ly/2SfWO3K")
use_cassette("crul_redirect_head", {
  x <- con$head()
})
library(httr)
use_cassette("httr_redirect_head", {
  x <- HEAD("http://bit.ly/2SfWO3K")
})

@sckott
Copy link
Collaborator

sckott commented Jan 29, 2021

to dos

  • along the way, make sure to add tests specifically for redirects, with both httr and crul
  • make sure it works for httr as well
  • response is not working so far. got redirect recording working, but the http response should be the last response, but is the first response, which makes sense given how vcr works, but will need another hack for this to work
    • I think the issue is that in the first call_block usage https://github.com/ropensci/vcr/blob/redirects/R/use_cassette.R#L182 (in which we've set followlocation=0L) the response is then returned to webmockr along with the initial request. and that's then returned to crul or httr as the response. So even though the next few lines are doing more requests for all redirects, the first request is the one that's returned. Not sure how to fix this. This could be handled in the http clients themselves, but that would take longer than solving the problem here once.
    • locally we can now collect all requests/responses in the redirect chain, BUT the main problem is that the first request handled by the request_handler is spit back to crul/httr - so we need to be able to reach out to the http clients (crul/httr) and replace the request/response object with the final request/response - to match what happens when redirects are followed in a normal scenario

sckott added a commit that referenced this issue Jan 29, 2021
@sckott
Copy link
Collaborator

sckott commented Feb 25, 2021

@bretsw Making progress now, work on branch redirects. still need to make adjustments for httr, which longurl uses

sckott added a commit that referenced this issue Mar 2, 2021
sckott added a commit that referenced this issue Mar 2, 2021
- add set_handler() and add_redirect() methods to Cassette R6 class for dealing with redirects
- fix record_separate_redirects usage in insert_cassette
- change use_cassette() to use add_redirect
- change RequestHandler to use set_handler
@sckott sckott modified the milestones: v0.7, v0.8 Mar 12, 2021
@sckott
Copy link
Collaborator

sckott commented Mar 12, 2021

I moved this to the next milestone since a) this is complicated, and b) its not done yet, and c) there's lots of issues in the current milestone that should be pushed.

b) major issue is the http response object is from the first chain in the redirect - NOT the last http response in the chain as it should be. I'm not sure how to fix this yet.

@bretsw
Copy link
Author

bretsw commented Mar 12, 2021

Thanks @sckott! It's interesting how such a simple-seeming bug is turning out to be quite complicated to address. Thanks for sticking with this!

@sckott
Copy link
Collaborator

sckott commented Mar 12, 2021

It's an interesting one, thanks for opening the issue

@sckott sckott modified the milestones: v1.1, v2.0 Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants