-
Couldn't load subscription status.
- Fork 95
fix: Don't set referer header when downloading sources #1964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I have a standalone reproducer though I don't understand the behavior: See reproducer script#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let url = "https://sourceforge.net/projects/arma/files/armadillo-15.2.1.tar.xz";
println!("Testing SourceForge redirect handling with different reqwest configurations\n");
// Test 1: Default reqwest client (automatic redirects)
println!("=== Test 1: Default reqwest (automatic redirects) ===");
test_client_auto_redirect(url).await?;
println!("\n=== Test 2: Manual redirect following (like Python) ===");
test_client_manual_redirect(url).await?;
println!("\n=== Test 3: Try fetching the /download URL directly ===");
test_client_auto_redirect("https://sourceforge.net/projects/arma/files/armadillo-15.2.1.tar.xz/download").await?;
Ok(())
}
async fn test_client_auto_redirect(
url: &str,
) -> Result<(), Box<dyn std::error::Error>> {
// Try using RequestBuilder.send() instead of building then executing
// This ensures headers are applied to ALL redirect requests
let client = reqwest::Client::new();
let resp = client.get(url)
.header("User-Agent", "python-requests/2.31.0")
.header("Accept-Encoding", "gzip, deflate, br")
.header("Accept", "*/*")
.header("Connection", "keep-alive")
.send() // Use .send() instead of .build() + .execute()
.await?;
let status = resp.status();
let final_url = resp.url().clone();
let content_type = resp.headers()
.get(reqwest::header::CONTENT_TYPE)
.and_then(|v| v.to_str().ok())
.unwrap_or("unknown");
let content_length = resp.headers()
.get(reqwest::header::CONTENT_LENGTH)
.and_then(|v| v.to_str().ok())
.unwrap_or("unknown");
println!("Status: {}", status);
println!("Final URL: {}", final_url);
println!("Content-Type: {}", content_type);
println!("Content-Length: {}", content_length);
// Download first 1KB to check what we got
let bytes = resp.bytes().await?;
let first_kb = &bytes[..std::cmp::min(1024, bytes.len())];
// Check if it's HTML or binary
let is_html = std::str::from_utf8(first_kb)
.map(|s| s.contains("<html") || s.contains("<!DOCTYPE"))
.unwrap_or(false);
println!("Downloaded {} bytes, is_html: {}", bytes.len(), is_html);
// Compute SHA256
use sha2::{Sha256, Digest};
let mut hasher = Sha256::new();
hasher.update(&bytes);
let hash = hasher.finalize();
println!("SHA256: {:x}", hash);
let expected = "a5b8109da3c169802f51a14d3bd1246395c24bbca55601760b0c96a3c0b2f8fa";
let actual = format!("{:x}", hash);
if actual == expected {
println!("✓ CORRECT FILE DOWNLOADED");
} else {
println!("✗ WRONG FILE (expected: {})", expected);
}
Ok(())
}
async fn test_client_manual_redirect(
initial_url: &str,
) -> Result<(), Box<dyn std::error::Error>> {
use sha2::{Sha256, Digest};
// Create a client that does NOT follow redirects automatically
let client = reqwest::Client::builder()
.redirect(reqwest::redirect::Policy::none())
.build()?;
let mut url = initial_url.to_string();
loop {
let request = client.get(&url)
.header("User-Agent", "python-requests/2.31.0")
.header("Accept-Encoding", "gzip, deflate, br")
.header("Accept", "*/*")
.header("Connection", "keep-alive")
.build()?;
let resp = client.execute(request).await?;
let status = resp.status();
println!("<Response [{}]>", status.as_u16());
println!("{{'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive'}}");
// If it's not a redirect, we got the final response
if !status.is_redirection() {
let content_type = resp.headers()
.get(reqwest::header::CONTENT_TYPE)
.and_then(|v| v.to_str().ok())
.unwrap_or("unknown");
println!();
println!("Status: {}", status);
println!("Final URL: {}", url);
println!("Content-Type: {}", content_type);
let bytes = resp.bytes().await?;
println!("Downloaded {} bytes", bytes.len());
let mut hasher = Sha256::new();
hasher.update(&bytes);
let hash = hasher.finalize();
println!("SHA256: {:x}", hash);
let expected = "a5b8109da3c169802f51a14d3bd1246395c24bbca55601760b0c96a3c0b2f8fa";
let actual = format!("{:x}", hash);
if actual == expected {
println!("✓ CORRECT FILE DOWNLOADED");
} else {
println!("✗ WRONG FILE (expected: {})", expected);
}
break;
}
// Get the Location header for the redirect
if let Some(location) = resp.headers().get(reqwest::header::LOCATION) {
url = location.to_str()?.to_string();
println!("Redirecting to {}", url);
} else {
return Err("Redirect response missing Location header".into());
}
}
Ok(())
} |
Fixes test_sourceforge_redirects introduced in 2b33e9e
|
Looks like sourceforge uses the $ curl --referer https://example.invalid -LO https://sourceforge.net/projects/arma/files/armadillo-15.2.1.tar.xz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 348 100 348 0 0 894 0 --:--:-- --:--:-- --:--:-- 896
100 364 100 364 0 0 606 0 --:--:-- --:--:-- --:--:-- 606
100 173k 100 173k 0 0 103k 0 0:00:01 0:00:01 --:--:-- 3478k
$ file armadillo-15.2.1.tar.xz
armadillo-15.2.1.tar.xz: HTML document text, ASCII text, with very long lines (15739) |
|
Great! Amazing investigation. |
|
I wonder if we can add the test somewhere else and run it as part of a "slow-tests" test suite. Not sure if you have capacity for that. |
|
I'm not convinced I understand the testing strategy of rattler-build and the split between the various test tasks well enough to do that. My first instinct would be to propose either removing the test given it's a relatively costly one compared to the value it provides and have a comment in the sources instead. Alternatively I could make it as ignored? |
Should reproduce the bug, not sure I have a fix yet or even if my theory about cross-host redirects being the cause is correct.