Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions openspec/changes/fix-false-positive-rules/proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
## Why
- Security/PII rules (`secret-hardcoded-assignment`, `crypto-weak-encryption`, `pii-*`) still rely on substring heuristics from the pre-normalized indexer.
- After the Phase 3 normalization, the DB differentiates literal values, call metadata, and API definitions. The outdated heuristics now raise high volumes of false positives (e.g., header lookups, `.includes(` calls, `message` fields).
- These false positives erode trust in the SAST output and block downstream consumers (LLMs, CI) from acting on findings.

## What Changes
- Update affected rule modules to query structured columns (assignments/literals/call metadata/API tables) instead of raw substrings.
- Harden helper logic (e.g., `_contains_alias`) to operate on normalized call names/tokens.
- Add regression coverage using the lovaseo samples so the noisy patterns stay quiet.
- Refresh documentation/comments for the rules to note the schema dependency.

## Impact
- Restores high-signal findings for secret detection, crypto usage, and PII exposure checks.
- Reduces unnecessary remediation cycles for users running `aud full`.
- Provides a repeatable template for future rule migrations to the normalized schema.
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Static Analysis Capability - Normalized Rule Matching

## ADDED Requirements

### Requirement: Rules leverage normalized metadata for sensitive pattern checks
SAST rules that identify secrets, cryptographic usage, or PII exposure SHALL use the normalized assignment and call metadata produced by the indexer (value type, literal value, callee identifier, argument tokens) instead of raw substring matches.

The system SHALL treat non-literal sources (e.g., header readers, framework helper methods) as non-secrets when their value type is not a literal string.

The system SHALL evaluate cryptographic algorithms using the resolved callee/function name to avoid collisions with similarly named utility methods.

The system SHALL classify API routes and payload fields using normalized API endpoint metadata, ensuring that generic words embedded in property names (e.g., `message`, `package.json`) do not trigger PII findings.

#### Scenario: Header-derived API key is not flagged as literal secret
- **GIVEN** the indexer records `const apiKey = request.headers.get('X-API-Key')` with value type `call`
- **AND** the secret detection rule queries the assignment metadata
- **WHEN** the rule evaluates the assignment
- **THEN** the rule observes the value type is not `string_literal`
- **AND** no `secret-hardcoded-assignment` finding is emitted

#### Scenario: Framework helper `includes()` does not trigger DES weak crypto finding
- **GIVEN** the call graph records `changes.some(c => c.path.includes('robots.txt'))`
- **AND** the resolved callee is `String.prototype.includes`
- **WHEN** the weak crypto rule evaluates the call metadata
- **THEN** the rule identifies the callee is not a DES algorithm
- **AND** no `crypto-weak-encryption` finding is emitted

#### Scenario: Response payload `message` does not trigger PII exposure
- **GIVEN** the API endpoint table records `GET /api/dashboard` returning `{ message: 'Site updated successfully' }`
- **AND** the normalized payload metadata treats `message` as a generic response key
- **WHEN** the PII exposure rule evaluates the endpoint
- **THEN** the rule observes no PII tokens in the normalized metadata
- **AND** no `pii-api-exposure` or `pii-error-response` finding is emitted
14 changes: 14 additions & 0 deletions openspec/changes/fix-false-positive-rules/tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
## 1. Planning & Verification
- [x] 1.1 Capture reproduction evidence (logs/snippets) for each affected rule
- [x] 1.2 Review existing rule metadata to confirm scope/extensions/exclusions

## 2. Implementation
- [x] 2.1 Update secret detection rule to check assignment literal metadata
- [x] 2.2 Update crypto weak algorithm rule to use structured call identifiers
- [x] 2.3 Update PII exposure/error/storage rules to use normalized API/assignment data
- [x] 2.4 Add regression tests covering lovaseo scenarios (unit + integration snapshots)

## 3. Validation & Docs
- [x] 3.1 Run `pytest` suites and targeted CLI smoke tests
- [x] 3.2 Re-run `aud detect-patterns` on lovaseo to confirm reductions
- [x] 3.3 Update rule documentation/comments where behavior changed
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -115,4 +115,4 @@ addopts = "-v"
[tool.mypy]
python_version = "3.12"
strict = true
warn_unused_configs = true
warn_unused_configs = true
233 changes: 233 additions & 0 deletions tests/test_rules/test_false_positive_regressions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
"""Regression tests for previously noisy rule detections."""

import sqlite3

from theauditor.rules.secrets.hardcoded_secret_analyze import (
_find_secret_assignments,
)
from theauditor.rules.security.crypto_analyze import (
_find_weak_encryption_algorithms,
)
from theauditor.rules.security.pii_analyze import (
_detect_pii_in_apis,
_detect_pii_in_errors,
_detect_unencrypted_pii,
_organize_pii_patterns,
)


def _create_assignments_table(conn: sqlite3.Connection) -> None:
cursor = conn.cursor()
cursor.execute(
"""
CREATE TABLE IF NOT EXISTS assignments (
file TEXT,
line INTEGER,
target_var TEXT,
source_expr TEXT,
in_function TEXT,
property_path TEXT
)
"""
)
conn.commit()


def _create_function_call_args_table(conn: sqlite3.Connection) -> None:
cursor = conn.cursor()
cursor.execute(
"""
CREATE TABLE IF NOT EXISTS function_call_args (
file TEXT,
line INTEGER,
caller_function TEXT,
callee_function TEXT,
argument_index INTEGER,
argument_expr TEXT,
param_name TEXT,
callee_file_path TEXT
)
"""
)
conn.commit()


def test_secret_assignment_skips_dynamic_values(temp_db):
"""Header-derived values should not be treated as hardcoded secrets."""
conn = temp_db
_create_assignments_table(conn)

conn.execute(
"""
INSERT INTO assignments (file, line, target_var, source_expr, in_function, property_path)
VALUES (?, ?, ?, ?, ?, ?)
""",
(
"packages/edge/src/api/cache.ts",
35,
"apiKey",
"request.headers.get('X-API-Key')",
"invalidateCache",
None,
),
)
conn.commit()

findings = _find_secret_assignments(conn.cursor())
assert all(
finding.rule_name != "secret-hardcoded-assignment" for finding in findings
), f"Unexpected secret finding: {findings}"


def test_crypto_alias_detection_ignores_includes(temp_db):
"""String helper methods such as includes() should not trigger DES findings."""
conn = temp_db
_create_function_call_args_table(conn)

conn.execute(
"""
INSERT INTO function_call_args (
file, line, caller_function, callee_function,
argument_index, argument_expr, param_name, callee_file_path
) VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""",
(
"packages/edge/src/api/github.ts",
240,
"changes.some",
"c.path.includes",
None,
"robots.txt",
None,
None,
),
)
conn.commit()

findings = _find_weak_encryption_algorithms(conn.cursor())
assert all(
finding.rule_name != "crypto-weak-encryption" for finding in findings
), f"Unexpected crypto finding: {findings}"


def test_pii_detectors_skip_generic_identifiers(temp_db):
"""PII detectors should ignore generic fields like message/className."""
conn = temp_db
cursor = conn.cursor()
_create_function_call_args_table(conn)

cursor.execute(
"""
CREATE TABLE IF NOT EXISTS api_endpoints (
file TEXT,
line INTEGER,
method TEXT,
pattern TEXT,
path TEXT,
has_auth BOOLEAN,
handler_function TEXT
)
"""
)
cursor.execute(
"""
CREATE TABLE IF NOT EXISTS symbols (
path TEXT,
name TEXT,
type TEXT,
line INTEGER,
col INTEGER,
end_line INTEGER,
type_annotation TEXT,
parameters TEXT,
is_typed BOOLEAN
)
"""
)
conn.commit()

# Simulate router definition returning { message: "..." }
cursor.execute(
"""
INSERT INTO api_endpoints (file, line, method, pattern, path, has_auth, handler_function)
VALUES (?, ?, ?, ?, ?, ?, ?)
""",
(
"packages/edge/src/api/dashboard.ts",
77,
"GET",
"/api/dashboard",
"packages/edge/src/api/dashboard.ts",
0,
"getDashboard",
),
)

# Error response logging with a generic message field
cursor.execute(
"""
INSERT INTO function_call_args (
file, line, caller_function, callee_function,
argument_index, argument_expr, param_name, callee_file_path
) VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""",
(
"packages/edge/src/api/sites.ts",
990,
"logger.error",
"response.json",
0,
"{ message: 'Site updated successfully' }",
None,
None,
),
)
cursor.execute(
"""
INSERT INTO symbols (path, name, type, line, col, end_line, type_annotation, parameters, is_typed)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
""",
(
"packages/edge/src/api/sites.ts",
"errorHandler",
"catch",
982,
0,
1000,
None,
None,
0,
),
)

# Simulated write operation that references className values
cursor.execute(
"""
INSERT INTO function_call_args (
file, line, caller_function, callee_function,
argument_index, argument_expr, param_name, callee_file_path
) VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""",
(
"packages/frontend/src/app/pages/sites/RemovalWizard.tsx",
453,
"fs.writeFile",
"fs.writeFile",
0,
"className('flex items-start gap-3 p-3 bg-app-surface-2')",
None,
None,
),
)
conn.commit()

pii_categories = _organize_pii_patterns()

api_findings = _detect_pii_in_apis(conn.cursor(), pii_categories)
assert not api_findings, f"Unexpected API findings: {api_findings}"

error_findings = _detect_pii_in_errors(conn.cursor(), pii_categories)
assert not error_findings, f"Unexpected error findings: {error_findings}"

storage_findings = _detect_unencrypted_pii(conn.cursor(), pii_categories)
assert not storage_findings, f"Unexpected storage findings: {storage_findings}"
47 changes: 44 additions & 3 deletions theauditor/rules/secrets/hardcoded_secret_analyze.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
2. Base64 decoding and verification requires runtime processing
3. Pattern matching for secret formats needs regex evaluation
4. Sequential/keyboard pattern detection is algorithmic
5. Normalized assignment metadata distinguishes literal secrets from dynamic sources

Follows gold standard patterns (v1.1+ schema contract compliance):
- Frozensets for O(1) pattern matching
Expand Down Expand Up @@ -110,6 +111,12 @@
'ssh://', 'git://', 'file://', 'data://'
])

# Regex for detecting string literals (supports Python & JS styles)
STRING_LITERAL_RE = re.compile(
r'^(?P<prefix>[rubfRUBF]*)(?P<quote>"""|\'\'\'|"|\'|`)(?P<body>.*)(?P=quote)$',
re.DOTALL
)

# Database protocols for connection strings
DB_PROTOCOLS = frozenset([
'mongodb://', 'postgres://', 'postgresql://',
Expand Down Expand Up @@ -261,8 +268,11 @@ def _find_secret_assignments(cursor) -> List[StandardFinding]:
""", params)

for file, line, var, value in cursor.fetchall():
# Clean the value
clean_value = value.strip().strip('\'"')
literal_value = _extract_string_literal(value)
if literal_value is None:
continue

clean_value = literal_value.strip()

# Check for weak passwords first
if var.lower() in ['password', 'passwd', 'pwd'] and clean_value.lower() in WEAK_PASSWORDS:
Expand Down Expand Up @@ -515,6 +525,37 @@ def _get_suspicious_files(cursor) -> List[str]:
return list(set(suspicious_files))


def _extract_string_literal(expr: str) -> Optional[str]:
"""
Extract the inner value of a string literal expression.

Supports Python prefixes (r/u/b) and JavaScript/TypeScript string forms.
Returns ``None`` when the expression is not a static literal (e.g. function
calls, template strings, or f-strings).
"""
if not expr:
return None

expr = expr.strip()
match = STRING_LITERAL_RE.match(expr)
if not match:
return None

prefix = match.group('prefix') or ''
quote = match.group('quote')
body = match.group('body')

# f-strings interpolate runtime data; they are not static literals
if any(ch.lower() == 'f' for ch in prefix):
return None

# Skip template literals with interpolation
if quote == '`' and '${' in body:
return None

return body


def _is_likely_secret(value: str) -> bool:
"""Check if a string value is likely a secret.

Expand Down Expand Up @@ -743,4 +784,4 @@ def register_taint_patterns(taint_registry):
])

for sink in UNSAFE_SINKS:
taint_registry.register_sink(sink, 'logging', 'all')
taint_registry.register_sink(sink, 'logging', 'all')
Loading