Skip to content

Conversation

@GreenHatHG
Copy link

@GreenHatHG GreenHatHG commented Nov 22, 2025

Code-related

iterative/dvc-webdav#95

Related Issues

Description

This PR introduces dynamic Bearer Token management for WebDAV remotes via the new bearer_token_command configuration option. Unlike static token configurations, this feature enables DVC to:

  • Execute external commands to fetch fresh tokens (e.g., OAuth device flow, cloud auth helpers)
  • Automatically refresh expired tokens during operations
  • Securely persist tokens exclusively to local DVC config (.dvc/config.local)
  • Maintain full backward compatibility with existing static token configurations

Key Enhancements

  • Dynamic Token Workflow

    [remote "my-webdav"]
        url = webdavs://example.com/data
        bearer_token_command = "gcloud auth print-access-token"  # Example command

    DVC executes this command on-demand to acquire tokens, eliminating manual token rotation.

  • Secure Token Lifecycle

    • Tokens are never stored globally – persisted only to local config
    • Thread-safe refresh mechanism prevents concurrent token fetch storms
    • Automatic cleanup of expired tokens from config
  • Fallback Compatibility
    Existing static tokens (via token = xyz config) continue working unchanged. This feature only activates when bearer_token_command is set.

Issue Resolution

This PR directly addresses the use case described in issue #10689, where users of short-lived OIDC tokens (like those from oidc-agent that expire every 5 minutes) currently need to manually update tokens in .dvc/config.local. With this implementation, users can now configure:

[remote "my-remote"]
    url = webdavs://example.com/data
    bearer_token_command = "oidc-token my-oidc-config"

DVC will automatically execute this command whenever a token is needed, eliminating the manual token rotation workflow. This matches the pattern used by tools like rclone that was referenced in the issue discussion.

Testing Instructions

Testing Environment Setup

WebDAV Server (Docker):

docker run --rm -p 9000:8080 -v $(pwd)/data:/data rclone/rclone:latest serve webdav /data --addr :8080

Token Fetch Script (get_token.sh):

DVC_TOKEN=$(curl -s http://localhost:8080/token)
echo "$DVC_TOKEN"

Proxy Server (proxy.py)

  • Short-Lived Tokens: Generates time-bound tokens (default 5s expiry) via /token endpoint, mimicking OIDC's short-lived access tokens that require frequent renewal.
  • 401-Driven Refresh Cycle: Returns 401 Unauthorized on expired/invalid tokens, triggering DVC's bearer_token_command to automatically fetch new tokens – exactly replicating OIDC's refresh flow.
import logging
import os
import secrets
import time
import requests
from flask import Flask, request, Response, stream_with_context

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

app = Flask(__name__)

# Environment configuration
TOKEN_EXPIRY = int(os.getenv('TOKEN_EXPIRY_SECONDS', '5'))
UPSTREAM_URL = os.getenv('UPSTREAM_URL', 'http://127.0.0.1:9000')

valid_tokens = {}

def generate_token():
    """Generate a new time-limited token"""
    timestamp = int(time.time())
    token = f"tk_{timestamp}_{secrets.token_hex(4)}"
    valid_tokens[token] = timestamp
    logger.info(f"Generated token: {token}")
    return token

def validate_token(token):
    """Validate token existence and expiration"""
    # 1. Fixed token (for debugging/backdoor access)
    if token == "valid_token_123":
        return True

    # 2. Dynamic token validation
    if token in valid_tokens:
        if time.time() - valid_tokens[token] < TOKEN_EXPIRY:
            return True
        else:
            # Auto-clean expired tokens
            logger.info(f"Token expired: {token}")
            del valid_tokens[token]
            return False
    
    logger.warning(f"Token not found: {token}")
    return False

# --------------------------
#   1) Token issuance endpoint
# --------------------------
@app.route('/token', methods=['GET', 'POST'])
def issue_token():
    """Endpoint to obtain a new authentication token"""
    token = generate_token()
    # Return as plain text to match shell script consumption
    return token, 200

# --------------------------
#   2) Authentication middleware
# --------------------------
@app.before_request
def auth():
    """Global authentication handler"""
    # Skip authentication for token endpoint
    if request.path == '/token':
        return None

    auth_header = request.headers.get("Authorization")
    
    # Validate header format
    if not auth_header or not auth_header.startswith("Bearer "):
        logger.warning(f"Missing or malformed Authorization header from {request.remote_addr}")
        return Response("Missing valid Authorization header", 401)

    token = auth_header[7:]  # Remove "Bearer " prefix
    
    # CRITICAL: Return 401 on failure to trigger client token refresh
    if not validate_token(token):
        logger.warning(f"Invalid or expired token from {request.remote_addr}")
        return Response("Token expired or invalid", 401)

# --------------------------
#   3) WebDAV reverse proxy
# --------------------------
@app.route('/', defaults={'path': ''}, methods=["GET", "PUT", "PROPFIND",
                                            "MKCOL", "DELETE", "COPY",
                                            "MOVE", "OPTIONS", "HEAD"])
@app.route('/<path:path>', methods=["GET", "PUT", "PROPFIND",
                                   "MKCOL", "DELETE", "COPY",
                                   "MOVE", "OPTIONS", "HEAD"])
def proxy(path):
    """Reverse proxy for WebDAV operations with streaming support"""
    target_url = f"{UPSTREAM_URL}/{path}"
    
    # Filter hop-by-hop headers to prevent protocol conflicts
    excluded_headers = [
        'content-encoding', 'content-length', 
        'transfer-encoding', 'connection', 'host'
    ]
    
    # Prepare headers for upstream request
    upstream_headers = {
        k: v for k, v in request.headers.items()
        if k.lower() not in excluded_headers
    }

    try:
        # CRITICAL FIX: Use streaming for file uploads
        upstream_data = request.stream if request.method in ['PUT', 'POST'] else None

        # Forward request to upstream WebDAV server
        resp = requests.request(
            method=request.method,
            url=target_url,
            headers=upstream_headers,
            data=upstream_data,
            stream=True,  # Enable streaming mode
            allow_redirects=False
        )

        # Prepare response headers for client
        downstream_headers = [
            (name, value) for name, value in resp.raw.headers.items()
            if name.lower() not in excluded_headers
        ]

        # CRITICAL FIX: Stream large file downloads
        return Response(
            stream_with_context(resp.iter_content(chunk_size=8192)),
            status=resp.status_code,
            headers=downstream_headers
        )

    except requests.exceptions.RequestException as e:
        logger.error(f"Upstream connection error: {e}")
        return Response(f"Upstream Proxy Error: {e}", 502)

if __name__ == "__main__":
    # Disable ASCII conversion for non-English filenames
    app.config['JSON_AS_ASCII'] = False
    # Enable threading for concurrent WebDAV operations
    app.run(host="0.0.0.0", port=8080, threaded=True)

Validation Tests Performed

Run these before DVC operations (see Testing Environment Setup): Start WebDAV Server (Docker), Run Proxy Server (python proxy.py)

Short Token Expiry (5s default):

mkdir test1 && cd test1
dvc init --no-scm
dvc remote add -d test-webdav webdav://localhost:8080/
dvc remote modify test-webdav bearer_token_command "sh /home/jooooody/webdav-test/py/get_token.sh"  -v

touch test.txt
echo "$(date) | $(head /dev/urandom -c 32 | base64)" > test.txt
dvc add test.txt
dvc push -v  # Initial push
echo "$(date) | $(head /dev/urandom -c 32 | base64)" >> test.txt
dvc push -v  # Token refresh during push
rm test.txt
dvc pull -v  # Token refresh during pull

Ouput example:

╰─$ dvc push -v                                                                                                   130 ↵
2025-11-22 15:15:55,318 DEBUG: v3.63.1.dev25+gfa5c33ed2.d20251116, CPython 3.12.12 on Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.42
2025-11-22 15:15:55,318 DEBUG: command: /home/jooooody/projects/pycharm_dvc/venv/bin/dvc push -v
2025-11-22 15:15:55,790 DEBUG: Bearer token command provided, using BearerAuthClient, command: sh /home/jooooody/webdav-test/py/get_token.sh
Collecting                                                                                    |0.00 [00:00,    ?entry/s]
2025-11-22 15:15:55,926 DEBUG: Preparing to transfer data from '/home/jooooody/webdav-test/dvc/.dvc/cache/files/md5' to 'http://localhost:8080/dvc/files/md5'
2025-11-22 15:15:55,926 DEBUG: Preparing to collect status from 'files/md5'
2025-11-22 15:15:55,927 DEBUG: Collecting status from 'files/md5'
2025-11-22 15:15:55,928 DEBUG: Querying 1 oids via object_exists
2025-11-22 15:15:55,949 DEBUG: [ThreadPoolExecutor-0_0] [BearerAuthClient] Received 401. Attempting recovery.
2025-11-22 15:15:55,950 DEBUG: [ThreadPoolExecutor-0_0] [BearerAuthClient] Refreshing token via command...
2025-11-22 15:15:55,999 DEBUG: Saved token for remote 'test-webdav'
2025-11-22 15:15:56,001 DEBUG: Writing '/home/jooooody/webdav-test/dvc/.dvc/config.local'.
2025-11-22 15:15:56,012 DEBUG: [ThreadPoolExecutor-0_0] [BearerAuthClient] Token refreshed successfully.
Pushing
Everything is up to date.
2025-11-22 15:15:56,026 DEBUG: Analytics is enabled.
2025-11-22 15:15:56,089 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmps_s184o4', '-v']
2025-11-22 15:15:56,099 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmps_s184o4', '-v'] with pid 103183

Long Token Expiry:

  • Same workflow with TOKEN_EXPIRY_SECONDS=300 (5 minutes)
  • Verify that the token will not be refreshed in the short term

Output example:

╰─$ dvc push -v
2025-11-22 15:16:13,624 DEBUG: v3.63.1.dev25+gfa5c33ed2.d20251116, CPython 3.12.12 on Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.42
2025-11-22 15:16:13,625 DEBUG: command: /home/jooooody/projects/pycharm_dvc/venv/bin/dvc push -v
2025-11-22 15:16:14,074 DEBUG: Bearer token command provided, using BearerAuthClient, command: sh /home/jooooody/webdav-test/py/get_token.sh
Collecting                                                                                    |0.00 [00:00,    ?entry/s]
2025-11-22 15:16:14,214 DEBUG: Preparing to transfer data from '/home/jooooody/webdav-test/dvc/.dvc/cache/files/md5' to 'http://localhost:8080/dvc/files/md5'
2025-11-22 15:16:14,215 DEBUG: Preparing to collect status from 'files/md5'
2025-11-22 15:16:14,215 DEBUG: Collecting status from 'files/md5'
2025-11-22 15:16:14,217 DEBUG: Querying 1 oids via object_exists
Pushing
Everything is up to date.
2025-11-22 15:16:14,247 DEBUG: Analytics is enabled.
2025-11-22 15:16:14,309 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpe7wyiogz', '-v']
2025-11-22 15:16:14,318 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpe7wyiogz', '-v'] with pid 103404

@codecov
Copy link

codecov bot commented Nov 22, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.00%. Comparing base (2431ec6) to head (9ff8562).
⚠️ Report is 154 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10917      +/-   ##
==========================================
+ Coverage   90.68%   91.00%   +0.31%     
==========================================
  Files         504      504              
  Lines       39795    40936    +1141     
  Branches     3141     3241     +100     
==========================================
+ Hits        36087    37252    +1165     
- Misses       3042     3047       +5     
+ Partials      666      637      -29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant