Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
168 changes: 168 additions & 0 deletions csrf_safe-using-origin/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# Secure Databricks App (CSRF via Origin Header)

This application demonstrates best practices for building secure Databricks apps using **Origin header validation for CSRF protection** instead of CSRF tokens. This approach provides an alternative CSRF protection method while maintaining comprehensive security features.

## Security Features

### 1. CSRF Protection via Origin Header Validation
**Origin Header Validation** provides CSRF protection by validating the `Origin` HTTP header on every request. This method offers an alternative to token-based CSRF protection.

**How it works:**
- All state-changing requests (POST, PUT, DELETE, PATCH) are validated via the `@app.before_request` hook
- The app checks if the `Origin` header matches the expected app URL (`DATABRICKS_APP_URL`)
- Requests without an `Origin` header or with `Origin: null` are allowed (same-origin requests)
- **Important:** Browsers typically don't send `Origin` headers with GET requests, so these are allowed by default
- **Do NOT use GET requests for state-changing operations** (create, update, delete) - always use POST, PUT, DELETE, or PATCH for state changes
- When `Origin` is present, it must:
- Use HTTPS protocol
- Match the configured app URL (case-insensitive comparison)
- Contain only valid characters

**Validation Logic:**
```python
# If Origin header is absent or null - ALLOW (same-origin)
# If Origin header is present:
# - Must start with https://
# - Must match APP_URL
# - Must contain only valid characters
```

**Reference:** [OWASP CSRF Prevention Cheat Sheet - Verifying Origin](https://cheatsheetseries.owasp.org/cheatsheets/Cross-Site_Request_Forgery_Prevention_Cheat_Sheet.html#verifying-origin-with-standard-headers)

### 2. User Authorization (OBO - On-Behalf-Of-User)
**On-Behalf-Of-User (OBO) Authorization** enables the application to execute SQL queries using the authenticated user's access token. The user's identity and permissions are maintained throughout the query execution, ensuring proper access control based on Unity Catalog policies.

**How it works:**
- User's access token is retrieved from the `x-forwarded-access-token` header (provided by Databricks)
- SQL queries execute with the user's identity and permissions
- Unity Catalog enforces row-level filters, column masks, and other access controls
- All operations are audited under the user's identity

Reference: [Databricks Apps Authorization Documentation](https://docs.databricks.com/aws/en/dev-tools/databricks-apps/auth)

### 3. CSP (Content Security Policy)
The application enforces the following Content Security Policy to prevent XSS and injection attacks:

```
Content-Security-Policy: default-src https:; script-src https:; style-src 'self' 'unsafe-inline'; img-src https: data:; font-src https: data:; object-src 'none'; base-uri 'self'; frame-ancestors 'none';
```

**Policy Details:**
- `default-src https:` - Only allow HTTPS resources by default
- `script-src https:` - Only allow scripts from HTTPS sources
- `style-src 'self' 'unsafe-inline'` - Allow styles from same origin and inline styles
- `img-src https: data:` - Allow images from HTTPS and data URIs
- `font-src https: data:` - Allow fonts from HTTPS and data URIs
- `object-src 'none'` - Block all plugins (Flash, Java, etc.)
- `base-uri 'self'` - Restrict base tag to same origin
- `frame-ancestors 'none'` - Prevent the page from being embedded in iframes (clickjacking protection)

### 4. CORS (Cross-Origin Resource Sharing)
CORS headers are **disabled by default** for security. To enable CORS support, add the following environment variable in your `app.yaml`:

```yaml
env:
- name: "CORS_ENABLE"
value: "true"
```

**Example `app.yaml` with CORS enabled:**
```yaml
display_name: "csrf-using-origin"
env:
- name: "SERVER_PORT"
value: "8000"
- name: "DATABRICKS_WAREHOUSE_ID"
valueFrom: "sql-warehouse"
- name: "CORS_ENABLE"
value: "true"
```

When enabled, the following CORS headers will be set:
- `Access-Control-Allow-Origin`: Configured app URL
- `Access-Control-Allow-Credentials`: false
- `Access-Control-Allow-Methods`: GET, POST, PUT, DELETE, PATCH
- `Access-Control-Allow-Headers`: Content-Type, X-Requested-With

## Additional Security Features

- **SQL Injection Protection**: Uses parameterized queries with placeholder binding (`?`) to prevent SQL injection attacks. User input is passed as query parameters, never directly concatenated into SQL statements. The application queries a fixed table (`samples.nyctaxi.trips`) and only accepts user input for the WHERE clause value (pickup_zip), which is safely parameterized.
```python
# Safe parameterized query
query = "SELECT * FROM samples.nyctaxi.trips WHERE pickup_zip = ? LIMIT 10"
cursor.execute(query, [str(pickup_zip_value)])
```

- **Input Sanitization**: All user inputs and query results are escaped using MarkupSafe to prevent XSS attacks

- **X-Content-Type-Options**: Set to `nosniff` to prevent MIME-sniffing attacks

- **Secure Token Handling**: User access tokens are handled securely via headers

- **Unity Catalog Integration**: Query execution respects all Unity Catalog permissions, row filters, and column masks

## Environment Variables

### Automatically Injected by Databricks Apps

These variables are automatically set by Databricks when you deploy your app:

| Variable | Description | Auto-Injected |
|----------|-------------|---------------|
| `DATABRICKS_HOST` | Your Databricks workspace hostname | ✅ Yes |
| `DATABRICKS_CLIENT_ID` | App service principal OAuth client ID | ✅ Yes |
| `DATABRICKS_CLIENT_SECRET` | App service principal OAuth client secret | ✅ Yes |
| `DATABRICKS_APP_NAME` | Name of your Databricks app | ✅ Yes |
| `DATABRICKS_APP_URL` | URL where your app is accessible | ✅ Yes |

### Required Configuration

These variables must be configured in your `app.yaml`:

| Variable | Description | Required |
|----------|-------------|----------|
| `DATABRICKS_WAREHOUSE_ID` | SQL warehouse ID | Yes |
| `SERVER_PORT` | Port to run the application (default: 8000) | No |
| `CORS_ENABLE` | Enable CORS headers (default: false) | No |

## Configuration

1. If you want to add more libraries add them in requirements.txt

2. Configure your `app.yaml` with required environment variables

3. **Enable User Authorization**: Ensure your Databricks app has user authorization scopes configured to access SQL warehouses on behalf of users. This is configured in the Databricks UI when creating or editing the app.

## File Structure

```
csrf-origin-app/
├── app.py
├── app.yaml
├── requirements.txt
├── README.md
├── static/
│ ├── css/
│ │ └── style.css
│ └── js/
│ └── app.js
└── templates/
└── index.html
```

### File Descriptions

- **app.py** - Main Flask application file containing route handlers, Origin header validation for CSRF protection, security header configurations, and SQL query execution logic with parameterized queries (SQL injection protection) using user token (OBO) authentication.

- **app.yaml** - Databricks app configuration file that defines environment variables and deployment settings for running the Flask application on Databricks Apps platform.

- **requirements.txt** - Python dependencies file listing all required packages (Flask, databricks-sql-connector, databricks-sdk, pandas, MarkupSafe) with pinned versions for consistent deployments.

- **templates/index.html** - Main HTML template providing the user interface with query input forms and dynamic results display with XSS protection.

- **static/css/style.css** - Stylesheet containing all visual styling for the application including responsive layout, forms, tables, error messages, and navigation components.

- **static/js/app.js** - Client-side JavaScript handling form submissions, dynamic query preview updates, and error handling.



201 changes: 201 additions & 0 deletions csrf_safe-using-origin/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
from flask import Flask, render_template, request, jsonify
from databricks import sql
from databricks.sdk.core import Config
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.sql import StatementParameterListItem
from markupsafe import escape
import os
import re

app = Flask(__name__)

# Initialize Databricks Config - automatically detects DATABRICKS_CLIENT_ID and DATABRICKS_CLIENT_SECRET
# These environment variables are automatically injected by Databricks Apps
cfg = Config()
w = WorkspaceClient()

# Configuration from environment variables
SERVER_PORT = int(os.environ.get('SERVER_PORT', 8000))
APP_URL = os.environ.get('DATABRICKS_APP_URL')
CORS_ENABLE = os.environ.get('CORS_ENABLE', 'false').lower() == 'true'

# Databricks warehouse configuration
DATABRICKS_WAREHOUSE_ID = os.environ.get('DATABRICKS_WAREHOUSE_ID')

# Validate required environment variables
if not DATABRICKS_WAREHOUSE_ID:
raise ValueError("DATABRICKS_WAREHOUSE_ID environment variable is required")
if not APP_URL:
raise ValueError("DATABRICKS_APP_URL environment variable is required")


# Regex pattern for origin validation only
ORIGIN_REGEX = re.compile(r'^[a-zA-Z0-9\-.]+(:[0-9]{1,5})?$')

def validate_origin():
"""
Validate Origin header for CSRF protection.
Returns: (is_valid, error_message)
"""
origin = request.headers.get('Origin')

# If Origin header is not present, empty, or null - ALLOW
if origin is None or origin == '' or origin == 'null':
return True, None

# Origin is present - validate it
# First check if origin contains only valid characters
if not ORIGIN_REGEX.match(origin):
return False, "Invalid Origin header format - contains invalid characters"

# Check if it starts with https://
if not origin.startswith('https://'):
return False, "Origin must use HTTPS protocol"

# Normalize both values to lowercase for comparison
origin_lower = origin.lower()
expected_origin_lower = APP_URL.lower()

# Compare the origins (without considering ports or trailing slashes)
# Remove trailing slashes for comparison
origin_normalized = origin_lower.rstrip('/')
expected_normalized = expected_origin_lower.rstrip('/')

if origin_normalized == expected_normalized:
return True, None
else:
return False, f"Origin header mismatch. Expected: {APP_URL}, Got: {origin}"

app.config['DEBUG'] = False


@app.before_request
def csrf_protect():
"""Check Origin header for all requests"""
is_valid, error_message = validate_origin()
if not is_valid:
return jsonify({'error': escape(error_message)}), 403

@app.after_request
def set_security_headers(response):
# CORS headers (optional, controlled by CORS_ENABLE environment variable)
if CORS_ENABLE:
response.headers['Access-Control-Allow-Origin'] = APP_URL
response.headers['Access-Control-Allow-Credentials'] = 'false'
response.headers['Access-Control-Allow-Methods'] = 'GET, POST, PUT, DELETE, PATCH'
response.headers['Access-Control-Allow-Headers'] = 'Content-Type, X-Requested-With'

# Content Security Policy
response.headers['Content-Security-Policy'] = (
"default-src https:; "
"script-src https:; "
"style-src 'self' 'unsafe-inline'; "
"img-src https: data:; "
"font-src https: data:; "
"object-src 'none'; "
"base-uri 'self'; "
"frame-ancestors 'none';"
)

# Other security headers
response.headers['X-Content-Type-Options'] = 'nosniff'

return response

def execute_sql_query_with_params(pickup_zip_value, user_token):
"""
Execute SQL query using OBO (On-Behalf-Of-User) authorization with parameterized query.

Uses StatementParameterListItem for SQL injection protection.
The query uses a fixed table (samples.nyctaxi.trips) and parameterizes user input.

The user's access token is passed to act on behalf of the user.
"""
if not user_token:
raise ValueError("User token is required for SQL execution")

# Create parameterized query - user input is safely passed as parameter
query = "SELECT * FROM samples.nyctaxi.trips WHERE pickup_zip = :pickup_zip LIMIT 10"

# Use StatementParameterListItem for safe parameterization
param_list = [StatementParameterListItem(name="pickup_zip", value=str(pickup_zip_value))]

# Execute using Databricks SDK with user token for OBO
# Note: We need to use sql.connect for OBO with user token
conn = sql.connect(
server_hostname=cfg.host,
http_path=f"/sql/1.0/warehouses/{DATABRICKS_WAREHOUSE_ID}",
access_token=user_token
)

with conn.cursor() as cursor:
# Note: databricks-sql-connector does not support StatementParameterListItem directly
# We use standard parameter binding with ? placeholder
parameterized_query = "SELECT * FROM samples.nyctaxi.trips WHERE pickup_zip = ? LIMIT 10"
cursor.execute(parameterized_query, [str(pickup_zip_value)])
df = cursor.fetchall_arrow().to_pandas()

if len(df) > 0:
return {
'columns': [escape(str(col)) for col in df.columns.tolist()],
'rows': [[escape(str(cell)) for cell in row] for row in df.values.tolist()],
'row_count': len(df),
'has_data': True,
}
else:
return {
'columns': [escape(str(col)) for col in df.columns.tolist()] if len(df.columns) > 0 else [],
'rows': [],
'row_count': 0,
'has_data': False,
}

@app.route('/', methods=['GET', 'POST', 'PUT', 'DELETE', 'PATCH'])
def index():
headers = request.headers
user = escape(headers.get('X-Forwarded-Preferred-Username', 'Unknown User'))
user_token = headers.get('x-forwarded-access-token')

result_data = None
parsed_data = None
error_message = None
query_info = None

# Handle different HTTP methods
if request.method == 'POST':
pickup_zip = request.form.get('pickup_zip', '').strip()

if not pickup_zip:
error_message = "Pickup ZIP code is required."
elif not user_token:
error_message = "User token is required for query execution."
else:
try:
# Execute parameterized query (SQL injection safe)
query_display = f"SELECT * FROM samples.nyctaxi.trips WHERE pickup_zip = '{pickup_zip}' LIMIT 10"
parsed_data = execute_sql_query_with_params(pickup_zip, user_token)

query_info = {
'query': escape(query_display),
'status': 'executed',
'result_count': parsed_data['row_count'] if parsed_data else 0,
'has_data': parsed_data['has_data'] if parsed_data else False
}
except Exception as e:
error_message = "Query execution failed. Please check your inputs and permissions."

elif request.method in ['PUT', 'DELETE', 'PATCH']:
# Handle other state-changing methods
error_message = f"{request.method} method not implemented for this endpoint."

# For all methods, return the template
return render_template('index.html',
user=user,
user_token=user_token,
result_data=result_data,
parsed_data=parsed_data,
error_message=error_message,
query_info=query_info)

if __name__ == '__main__':
app.run(host="0.0.0.0", port=SERVER_PORT, debug=False)
6 changes: 6 additions & 0 deletions csrf_safe-using-origin/app.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
display_name: "csrf-using-origin"
env:
- name: "SERVER_PORT"
value: "8000"
- name: "DATABRICKS_WAREHOUSE_ID"
valueFrom: "sql-warehouse"
2 changes: 2 additions & 0 deletions csrf_safe-using-origin/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
pandas==2.3.3
MarkupSafe==3.0.3
Loading