TheAuditorTool · dev-corelift · Oct 29, 2025
diff --git a/openspec/changes/fix-false-positive-rules/proposal.md b/openspec/changes/fix-false-positive-rules/proposal.md
@@ -0,0 +1,15 @@
+## Why
+- Security/PII rules (`secret-hardcoded-assignment`, `crypto-weak-encryption`, `pii-*`) still rely on substring heuristics from the pre-normalized indexer.
+- After the Phase 3 normalization, the DB differentiates literal values, call metadata, and API definitions. The outdated heuristics now raise high volumes of false positives (e.g., header lookups, `.includes(` calls, `message` fields).
+- These false positives erode trust in the SAST output and block downstream consumers (LLMs, CI) from acting on findings.
+
+## What Changes
+- Update affected rule modules to query structured columns (assignments/literals/call metadata/API tables) instead of raw substrings.
+- Harden helper logic (e.g., `_contains_alias`) to operate on normalized call names/tokens.
+- Add regression coverage using the lovaseo samples so the noisy patterns stay quiet.
+- Refresh documentation/comments for the rules to note the schema dependency.
+
+## Impact
+- Restores high-signal findings for secret detection, crypto usage, and PII exposure checks.
+- Reduces unnecessary remediation cycles for users running `aud full`.
+- Provides a repeatable template for future rule migrations to the normalized schema.
diff --git a/openspec/changes/fix-false-positive-rules/specs/static-analysis/spec.md b/openspec/changes/fix-false-positive-rules/specs/static-analysis/spec.md
@@ -0,0 +1,33 @@
+# Static Analysis Capability - Normalized Rule Matching
+
+## ADDED Requirements
+
+### Requirement: Rules leverage normalized metadata for sensitive pattern checks
+SAST rules that identify secrets, cryptographic usage, or PII exposure SHALL use the normalized assignment and call metadata produced by the indexer (value type, literal value, callee identifier, argument tokens) instead of raw substring matches.
+
+The system SHALL treat non-literal sources (e.g., header readers, framework helper methods) as non-secrets when their value type is not a literal string.
+
+The system SHALL evaluate cryptographic algorithms using the resolved callee/function name to avoid collisions with similarly named utility methods.
+
+The system SHALL classify API routes and payload fields using normalized API endpoint metadata, ensuring that generic words embedded in property names (e.g., `message`, `package.json`) do not trigger PII findings.
+
+#### Scenario: Header-derived API key is not flagged as literal secret
+- **GIVEN** the indexer records `const apiKey = request.headers.get('X-API-Key')` with value type `call`
+- **AND** the secret detection rule queries the assignment metadata
+- **WHEN** the rule evaluates the assignment
+- **THEN** the rule observes the value type is not `string_literal`
+- **AND** no `secret-hardcoded-assignment` finding is emitted
+
+#### Scenario: Framework helper `includes()` does not trigger DES weak crypto finding
+- **GIVEN** the call graph records `changes.some(c => c.path.includes('robots.txt'))`
+- **AND** the resolved callee is `String.prototype.includes`
+- **WHEN** the weak crypto rule evaluates the call metadata
+- **THEN** the rule identifies the callee is not a DES algorithm
+- **AND** no `crypto-weak-encryption` finding is emitted
+
+#### Scenario: Response payload `message` does not trigger PII exposure
+- **GIVEN** the API endpoint table records `GET /api/dashboard` returning `{ message: 'Site updated successfully' }`
+- **AND** the normalized payload metadata treats `message` as a generic response key
+- **WHEN** the PII exposure rule evaluates the endpoint
+- **THEN** the rule observes no PII tokens in the normalized metadata
+- **AND** no `pii-api-exposure` or `pii-error-response` finding is emitted
diff --git a/openspec/changes/fix-false-positive-rules/tasks.md b/openspec/changes/fix-false-positive-rules/tasks.md
@@ -0,0 +1,14 @@
+## 1. Planning & Verification
+- [x] 1.1 Capture reproduction evidence (logs/snippets) for each affected rule
+- [x] 1.2 Review existing rule metadata to confirm scope/extensions/exclusions
+
+## 2. Implementation
+- [x] 2.1 Update secret detection rule to check assignment literal metadata
+- [x] 2.2 Update crypto weak algorithm rule to use structured call identifiers
+- [x] 2.3 Update PII exposure/error/storage rules to use normalized API/assignment data
+- [x] 2.4 Add regression tests covering lovaseo scenarios (unit + integration snapshots)
+
+## 3. Validation & Docs
+- [x] 3.1 Run `pytest` suites and targeted CLI smoke tests
+- [x] 3.2 Re-run `aud detect-patterns` on lovaseo to confirm reductions
+- [x] 3.3 Update rule documentation/comments where behavior changed
diff --git a/pyproject.toml b/pyproject.toml
@@ -115,4 +115,4 @@ addopts = "-v"
 [tool.mypy]
 python_version = "3.12"
 strict = true
-warn_unused_configs = true
+warn_unused_configs = true
diff --git a/tests/test_rules/test_false_positive_regressions.py b/tests/test_rules/test_false_positive_regressions.py
@@ -0,0 +1,233 @@
+"""Regression tests for previously noisy rule detections."""
+
+import sqlite3
+
+from theauditor.rules.secrets.hardcoded_secret_analyze import (
+    _find_secret_assignments,
+)
+from theauditor.rules.security.crypto_analyze import (
+    _find_weak_encryption_algorithms,
+)
+from theauditor.rules.security.pii_analyze import (
+    _detect_pii_in_apis,
+    _detect_pii_in_errors,
+    _detect_unencrypted_pii,
+    _organize_pii_patterns,
+)
+
+
+def _create_assignments_table(conn: sqlite3.Connection) -> None:
+    cursor = conn.cursor()
+    cursor.execute(
+        """
+        CREATE TABLE IF NOT EXISTS assignments (
+            file TEXT,
+            line INTEGER,
+            target_var TEXT,
+            source_expr TEXT,
+            in_function TEXT,
+            property_path TEXT
+        )
+        """
+    )
+    conn.commit()
+
+
+def _create_function_call_args_table(conn: sqlite3.Connection) -> None:
+    cursor = conn.cursor()
+    cursor.execute(
+        """
+        CREATE TABLE IF NOT EXISTS function_call_args (
+            file TEXT,
+            line INTEGER,
+            caller_function TEXT,
+            callee_function TEXT,
+            argument_index INTEGER,
+            argument_expr TEXT,
+            param_name TEXT,
+            callee_file_path TEXT
+        )
+        """
+    )
+    conn.commit()
+
+
+def test_secret_assignment_skips_dynamic_values(temp_db):
+    """Header-derived values should not be treated as hardcoded secrets."""
+    conn = temp_db
+    _create_assignments_table(conn)
+
+    conn.execute(
+        """
+        INSERT INTO assignments (file, line, target_var, source_expr, in_function, property_path)
+        VALUES (?, ?, ?, ?, ?, ?)
+        """,
+        (
+            "packages/edge/src/api/cache.ts",
+            35,
+            "apiKey",
+            "request.headers.get('X-API-Key')",
+            "invalidateCache",
+            None,
+        ),
+    )
+    conn.commit()
+
+    findings = _find_secret_assignments(conn.cursor())
+    assert all(
+        finding.rule_name != "secret-hardcoded-assignment" for finding in findings
+    ), f"Unexpected secret finding: {findings}"
+
+
+def test_crypto_alias_detection_ignores_includes(temp_db):
+    """String helper methods such as includes() should not trigger DES findings."""
+    conn = temp_db
+    _create_function_call_args_table(conn)
+
+    conn.execute(
+        """
+        INSERT INTO function_call_args (
+            file, line, caller_function, callee_function,
+            argument_index, argument_expr, param_name, callee_file_path
+        ) VALUES (?, ?, ?, ?, ?, ?, ?, ?)
+        """,
+        (
+            "packages/edge/src/api/github.ts",
+            240,
+            "changes.some",
+            "c.path.includes",
+            None,
+            "robots.txt",
+            None,
+            None,
+        ),
+    )
+    conn.commit()
+
+    findings = _find_weak_encryption_algorithms(conn.cursor())
+    assert all(
+        finding.rule_name != "crypto-weak-encryption" for finding in findings
+    ), f"Unexpected crypto finding: {findings}"
+
+
+def test_pii_detectors_skip_generic_identifiers(temp_db):
+    """PII detectors should ignore generic fields like message/className."""
+    conn = temp_db
+    cursor = conn.cursor()
+    _create_function_call_args_table(conn)
+
+    cursor.execute(
+        """
+        CREATE TABLE IF NOT EXISTS api_endpoints (
+            file TEXT,
+            line INTEGER,
+            method TEXT,
+            pattern TEXT,
+            path TEXT,
+            has_auth BOOLEAN,
+            handler_function TEXT
+        )
+        """
+    )
+    cursor.execute(
+        """
+        CREATE TABLE IF NOT EXISTS symbols (
+            path TEXT,
+            name TEXT,
+            type TEXT,
+            line INTEGER,
+            col INTEGER,
+            end_line INTEGER,
+            type_annotation TEXT,
+            parameters TEXT,
+            is_typed BOOLEAN
+        )
+        """
+    )
+    conn.commit()
+
+    # Simulate router definition returning { message: "..." }
+    cursor.execute(
+        """
+        INSERT INTO api_endpoints (file, line, method, pattern, path, has_auth, handler_function)
+        VALUES (?, ?, ?, ?, ?, ?, ?)
+        """,
+        (
+            "packages/edge/src/api/dashboard.ts",
+            77,
+            "GET",
+            "/api/dashboard",
+            "packages/edge/src/api/dashboard.ts",
+            0,
+            "getDashboard",
+        ),
+    )
+
+    # Error response logging with a generic message field
+    cursor.execute(
+        """
+        INSERT INTO function_call_args (
+            file, line, caller_function, callee_function,
+            argument_index, argument_expr, param_name, callee_file_path
+        ) VALUES (?, ?, ?, ?, ?, ?, ?, ?)
+        """,
+        (
+            "packages/edge/src/api/sites.ts",
+            990,
+            "logger.error",
+            "response.json",
+            0,
+            "{ message: 'Site updated successfully' }",
+            None,
+            None,
+        ),
+    )
+    cursor.execute(
+        """
+        INSERT INTO symbols (path, name, type, line, col, end_line, type_annotation, parameters, is_typed)
+        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
+        """,
+        (
+            "packages/edge/src/api/sites.ts",
+            "errorHandler",
+            "catch",
+            982,
+            0,
+            1000,
+            None,
+            None,
+            0,
+        ),
+    )
+
+    # Simulated write operation that references className values
+    cursor.execute(
+        """
+        INSERT INTO function_call_args (
+            file, line, caller_function, callee_function,
+            argument_index, argument_expr, param_name, callee_file_path
+        ) VALUES (?, ?, ?, ?, ?, ?, ?, ?)
+        """,
+        (
+            "packages/frontend/src/app/pages/sites/RemovalWizard.tsx",
+            453,
+            "fs.writeFile",
+            "fs.writeFile",
+            0,
+            "className('flex items-start gap-3 p-3 bg-app-surface-2')",
+            None,
+            None,
+        ),
+    )
+    conn.commit()
+
+    pii_categories = _organize_pii_patterns()
+
+    api_findings = _detect_pii_in_apis(conn.cursor(), pii_categories)
+    assert not api_findings, f"Unexpected API findings: {api_findings}"
+
+    error_findings = _detect_pii_in_errors(conn.cursor(), pii_categories)
+    assert not error_findings, f"Unexpected error findings: {error_findings}"
+
+    storage_findings = _detect_unencrypted_pii(conn.cursor(), pii_categories)
+    assert not storage_findings, f"Unexpected storage findings: {storage_findings}"
diff --git a/theauditor/rules/secrets/hardcoded_secret_analyze.py b/theauditor/rules/secrets/hardcoded_secret_analyze.py
@@ -5,6 +5,7 @@
 2. Base64 decoding and verification requires runtime processing
 3. Pattern matching for secret formats needs regex evaluation
 4. Sequential/keyboard pattern detection is algorithmic
+5. Normalized assignment metadata distinguishes literal secrets from dynamic sources
 
 Follows gold standard patterns (v1.1+ schema contract compliance):
 - Frozensets for O(1) pattern matching
@@ -110,6 +111,12 @@
     'ssh://', 'git://', 'file://', 'data://'
 ])
 
+# Regex for detecting string literals (supports Python & JS styles)
+STRING_LITERAL_RE = re.compile(
+    r'^(?P<prefix>[rubfRUBF]*)(?P<quote>"""|\'\'\'|"|\'|`)(?P<body>.*)(?P=quote)$',
+    re.DOTALL
+)
+
 # Database protocols for connection strings
 DB_PROTOCOLS = frozenset([
     'mongodb://', 'postgres://', 'postgresql://',
@@ -261,8 +268,11 @@ def _find_secret_assignments(cursor) -> List[StandardFinding]:
     """, params)
 
     for file, line, var, value in cursor.fetchall():
-        # Clean the value
-        clean_value = value.strip().strip('\'"')
+        literal_value = _extract_string_literal(value)
+        if literal_value is None:
+            continue
+
+        clean_value = literal_value.strip()
 
         # Check for weak passwords first
         if var.lower() in ['password', 'passwd', 'pwd'] and clean_value.lower() in WEAK_PASSWORDS:
@@ -515,6 +525,37 @@ def _get_suspicious_files(cursor) -> List[str]:
     return list(set(suspicious_files))
 
 
+def _extract_string_literal(expr: str) -> Optional[str]:
+    """
+    Extract the inner value of a string literal expression.
+
+    Supports Python prefixes (r/u/b) and JavaScript/TypeScript string forms.
+    Returns ``None`` when the expression is not a static literal (e.g. function
+    calls, template strings, or f-strings).
+    """
+    if not expr:
+        return None
+
+    expr = expr.strip()
+    match = STRING_LITERAL_RE.match(expr)
+    if not match:
+        return None
+
+    prefix = match.group('prefix') or ''
+    quote = match.group('quote')
+    body = match.group('body')
+
+    # f-strings interpolate runtime data; they are not static literals
+    if any(ch.lower() == 'f' for ch in prefix):
+        return None
+
+    # Skip template literals with interpolation
+    if quote == '`' and '${' in body:
+        return None
+
+    return body
+
+
 def _is_likely_secret(value: str) -> bool:
     """Check if a string value is likely a secret.
 
@@ -743,4 +784,4 @@ def register_taint_patterns(taint_registry):
     ])
 
     for sink in UNSAFE_SINKS:
-        taint_registry.register_sink(sink, 'logging', 'all')
+        taint_registry.register_sink(sink, 'logging', 'all')