Skip to content

Conversation

@chrisburr
Copy link
Contributor

Should reproduce the bug, not sure I have a fix yet or even if my theory about cross-host redirects being the cause is correct.

@chrisburr
Copy link
Contributor Author

I have a standalone reproducer though I don't understand the behavior:

See reproducer script
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let url = "https://sourceforge.net/projects/arma/files/armadillo-15.2.1.tar.xz";

    println!("Testing SourceForge redirect handling with different reqwest configurations\n");

    // Test 1: Default reqwest client (automatic redirects)
    println!("=== Test 1: Default reqwest (automatic redirects) ===");
    test_client_auto_redirect(url).await?;

    println!("\n=== Test 2: Manual redirect following (like Python) ===");
    test_client_manual_redirect(url).await?;

    println!("\n=== Test 3: Try fetching the /download URL directly ===");
    test_client_auto_redirect("https://sourceforge.net/projects/arma/files/armadillo-15.2.1.tar.xz/download").await?;

    Ok(())
}

async fn test_client_auto_redirect(
    url: &str,
) -> Result<(), Box<dyn std::error::Error>> {
    // Try using RequestBuilder.send() instead of building then executing
    // This ensures headers are applied to ALL redirect requests
    let client = reqwest::Client::new();
    
    let resp = client.get(url)
        .header("User-Agent", "python-requests/2.31.0")
        .header("Accept-Encoding", "gzip, deflate, br")
        .header("Accept", "*/*")
        .header("Connection", "keep-alive")
        .send()  // Use .send() instead of .build() + .execute()
        .await?;

    let status = resp.status();
    let final_url = resp.url().clone();
    let content_type = resp.headers()
        .get(reqwest::header::CONTENT_TYPE)
        .and_then(|v| v.to_str().ok())
        .unwrap_or("unknown");
    let content_length = resp.headers()
        .get(reqwest::header::CONTENT_LENGTH)
        .and_then(|v| v.to_str().ok())
        .unwrap_or("unknown");

    println!("Status: {}", status);
    println!("Final URL: {}", final_url);
    println!("Content-Type: {}", content_type);
    println!("Content-Length: {}", content_length);

    // Download first 1KB to check what we got
    let bytes = resp.bytes().await?;
    let first_kb = &bytes[..std::cmp::min(1024, bytes.len())];

    // Check if it's HTML or binary
    let is_html = std::str::from_utf8(first_kb)
        .map(|s| s.contains("<html") || s.contains("<!DOCTYPE"))
        .unwrap_or(false);

    println!("Downloaded {} bytes, is_html: {}", bytes.len(), is_html);

    // Compute SHA256
    use sha2::{Sha256, Digest};
    let mut hasher = Sha256::new();
    hasher.update(&bytes);
    let hash = hasher.finalize();
    println!("SHA256: {:x}", hash);

    let expected = "a5b8109da3c169802f51a14d3bd1246395c24bbca55601760b0c96a3c0b2f8fa";
    let actual = format!("{:x}", hash);
    if actual == expected {
        println!("✓ CORRECT FILE DOWNLOADED");
    } else {
        println!("✗ WRONG FILE (expected: {})", expected);
    }

    Ok(())
}

async fn test_client_manual_redirect(
    initial_url: &str,
) -> Result<(), Box<dyn std::error::Error>> {
    use sha2::{Sha256, Digest};
    
    // Create a client that does NOT follow redirects automatically
    let client = reqwest::Client::builder()
        .redirect(reqwest::redirect::Policy::none())
        .build()?;
    
    let mut url = initial_url.to_string();
    
    loop {
        let request = client.get(&url)
            .header("User-Agent", "python-requests/2.31.0")
            .header("Accept-Encoding", "gzip, deflate, br")
            .header("Accept", "*/*")
            .header("Connection", "keep-alive")
            .build()?;
        
        let resp = client.execute(request).await?;
        let status = resp.status();
        
        println!("<Response [{}]>", status.as_u16());
        println!("{{'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive'}}");
        
        // If it's not a redirect, we got the final response
        if !status.is_redirection() {
            let content_type = resp.headers()
                .get(reqwest::header::CONTENT_TYPE)
                .and_then(|v| v.to_str().ok())
                .unwrap_or("unknown");
            
            println!();
            println!("Status: {}", status);
            println!("Final URL: {}", url);
            println!("Content-Type: {}", content_type);
            
            let bytes = resp.bytes().await?;
            println!("Downloaded {} bytes", bytes.len());
            
            let mut hasher = Sha256::new();
            hasher.update(&bytes);
            let hash = hasher.finalize();
            println!("SHA256: {:x}", hash);
            
            let expected = "a5b8109da3c169802f51a14d3bd1246395c24bbca55601760b0c96a3c0b2f8fa";
            let actual = format!("{:x}", hash);
            if actual == expected {
                println!("✓ CORRECT FILE DOWNLOADED");
            } else {
                println!("✗ WRONG FILE (expected: {})", expected);
            }
            
            break;
        }
        
        // Get the Location header for the redirect
        if let Some(location) = resp.headers().get(reqwest::header::LOCATION) {
            url = location.to_str()?.to_string();
            println!("Redirecting to {}", url);
        } else {
            return Err("Redirect response missing Location header".into());
        }
    }
    
    Ok(())
}
Testing SourceForge redirect handling with different reqwest configurations

=== Test 1: Default reqwest (automatic redirects) ===
Status: 200 OK
Final URL: https://sourceforge.net/projects/arma/files/armadillo-15.2.1.tar.xz/download
Content-Type: text/html; charset=utf-8
Content-Length: unknown
Downloaded 37358 bytes, is_html: false
SHA256: a1e0ecd9a540249e98e95c7f29f319378cfecf4f11a9cfa5f5a567914e86e496
✗ WRONG FILE (expected: a5b8109da3c169802f51a14d3bd1246395c24bbca55601760b0c96a3c0b2f8fa)

=== Test 2: Manual redirect following (like Python) ===
<Response [301]>
{'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive'}
Redirecting to https://sourceforge.net/projects/arma/files/armadillo-15.2.1.tar.xz/
<Response [301]>
{'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive'}
Redirecting to https://sourceforge.net/projects/arma/files/armadillo-15.2.1.tar.xz/download
<Response [302]>
{'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive'}
Redirecting to https://downloads.sourceforge.net/project/arma/armadillo-15.2.1.tar.xz?ts=gAAAAABpAdVtbp13qtRkHUAwha6ab6FvZg9LvcXtG5Gp9CMNQGY-00eVue2vLnC6-4E1owt29wGNkeGWwqIPwI4it85gTBWISw%3D%3D&use_mirror=deac-fra&r=
<Response [302]>
{'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive'}
Redirecting to https://deac-fra.dl.sourceforge.net/project/arma/armadillo-15.2.1.tar.xz?viasf=1
<Response [200]>
{'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive'}

Status: 200 OK
Final URL: https://deac-fra.dl.sourceforge.net/project/arma/armadillo-15.2.1.tar.xz?viasf=1
Content-Type: application/octet-stream
Downloaded 7086824 bytes
SHA256: a5b8109da3c169802f51a14d3bd1246395c24bbca55601760b0c96a3c0b2f8fa
✓ CORRECT FILE DOWNLOADED

=== Test 3: Try fetching the /download URL directly ===
Status: 200 OK
Final URL: https://deac-riga.dl.sourceforge.net/project/arma/armadillo-15.2.1.tar.xz?viasf=1
Content-Type: application/octet-stream
Content-Length: 7086824
Downloaded 7086824 bytes, is_html: false
SHA256: a5b8109da3c169802f51a14d3bd1246395c24bbca55601760b0c96a3c0b2f8fa
✓ CORRECT FILE DOWNLOADED

Fixes test_sourceforge_redirects introduced in 2b33e9e
@chrisburr chrisburr marked this pull request as ready for review October 29, 2025 09:23
@chrisburr
Copy link
Contributor Author

Looks like sourceforge uses the Referer header to indicate if it's a browser. With it set the problem can also be reproduced with curl:

$ curl --referer https://example.invalid -LO https://sourceforge.net/projects/arma/files/armadillo-15.2.1.tar.xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   348  100   348    0     0    894      0 --:--:-- --:--:-- --:--:--   896
100   364  100   364    0     0    606      0 --:--:-- --:--:-- --:--:--   606
100  173k  100  173k    0     0   103k      0  0:00:01  0:00:01 --:--:-- 3478k
$ file armadillo-15.2.1.tar.xz
armadillo-15.2.1.tar.xz: HTML document text, ASCII text, with very long lines (15739)

@chrisburr chrisburr changed the title fix: downloading from SourceForge fix: Don't set referer header when downloading sources Oct 29, 2025
@wolfv
Copy link
Member

wolfv commented Oct 29, 2025

Great! Amazing investigation.

@wolfv
Copy link
Member

wolfv commented Oct 29, 2025

I wonder if we can add the test somewhere else and run it as part of a "slow-tests" test suite. Not sure if you have capacity for that.

@chrisburr
Copy link
Contributor Author

I'm not convinced I understand the testing strategy of rattler-build and the split between the various test tasks well enough to do that.

My first instinct would be to propose either removing the test given it's a relatively costly one compared to the value it provides and have a comment in the sources instead. Alternatively I could make it as ignored?

@wolfv wolfv merged commit 8672322 into prefix-dev:main Oct 29, 2025
12 checks passed
@chrisburr chrisburr deleted the fix/sourceforge branch October 29, 2025 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants