Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable masking token, perf improvements, refactoring. #4586

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

michaelcfanning
Copy link
Member

@michaelcfanning michaelcfanning commented Dec 27, 2023

This change:

  • Does some additional refactoring of the secret masking code (in support of a future move to consume the updated masker from a separate library). Mostly this changed entailed having LoggedSecretMasker derive from SecretMasker rather than wrapping an instance of this class.
  • Added a configurable preferred redaction token to the ReplacementPosition class (now simple renamed to Replacement). For secret values, we retain the existing default redaction string ***. For regexes that have no explicit rule information (an id and friendly name, together which comprise a moniker), we use the string +++. This change will allow users to easily search for logs looking for evidence of automatically redacted secrets. These secrets either should have been protected as secured variables OR there is an upstream leak of the secret in a script/dependency, etc.
  • For patterns that explicitly declare a rule moniker, the redaction token consists of the opaque id, friendly name and hash of the redacted secret. This data will be persisted to telemetry to allow secure correlation of redacted secrets with evidence of exposure in other locations.
  • I have added some preliminary telemetry in a naive way. Looking for feedback/refinements of this.
  • Renamed some items to communicate that they originate with the 1ES scan project, not CredScan.
  • Expanded unit testing. The updated secret masker namespace now has >90% code coverage.

@michaelcfanning michaelcfanning changed the title Configurable masking token Configurable masking token, perf improvements, refactoring. Dec 27, 2023

namespace Agent.Sdk.SecretMasking
{
/// <summary>
/// Extended secret masker service, that allows to log origins of secrets
/// </summary>
public class LoggedSecretMasker : ILoggedSecretMasker
public class LoggedSecretMasker : SecretMasker, ILoggedSecretMasker
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SecretMasker

The logged secret masker (as opposed to the legacy logged secret masker) can extend SecretMasker rather than wrapping it (because no other ILoggedSecretMasker will be wrapped by this class). This change will help us switch over to using a SecretMasker instance that 1ES provides that lives in a different repo (https://github.com/microsoft/security-utilities).

void AddValue(String value, string origin);
void AddValueEncoder(ValueEncoder encoder, string origin);
void SetTrace(ITraceWriter trace);
IDictionary<string, string> GetTelemetry();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetTelemetry

Added this on ILoggedSecretMasker because I thought it made most sense given that we will produce this data for both the legacy and current secret maskers. This interface currently holds all extensions to the original ADO masker interface.

@@ -8,5 +8,5 @@ internal interface ISecret
/// <summary>
/// Returns one item (start, length) for each match found in the input string.
/// </summary>
IEnumerable<ReplacementPosition> GetPositions(String input);
IEnumerable<Replacement> GetReplacements(String input);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacement

ReplacementPositions now has additional metadata, for example, the preferred token to use for redaction, so I performed a rename.

@michaelcfanning
Copy link
Member Author

public ISecretMasker Clone()

I can't find the use of secret masker clones in the code, does this utilization exist? One difference now between the legacy masker and the current is that the legacy masker doesn't retain elapsed masking time on clone. Is that an issue?


Refers to: src/Agent.Sdk/SecretMasking/LegacyLoggedSecretMasker.cs:134 in a95cc17. [](commit_id = a95cc17, deletion_comment = False)

}

Sdk.SecretMasking.ISecretMasker Sdk.SecretMasking.ISecretMasker.Clone()
{
return new LegacyLoggedSecretMasker(this._secretMasker.Clone());
}

public IDictionary<string, string> GetTelemetry()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetTelemetry

Note that we only produce the ElapsedMaskingTime data point for the legacy masker, as I think we are only really concerned about relative runtime perf differences between the maskers (i.e., and other telemetry isn't interesting/won't be consumed for the legacy masker).

@@ -50,7 +50,7 @@ public void IsUserInfoMaskedCorrectly()
var testSecretMasker = initSecretMasker();

Assert.Equal(
"https://user:***@example.com",
"https://user:+++@example.com",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:+++@

here and elsewhere the use of +++ indicates that a value matched a regex and was therefore automatically redacted. we retain *** when redacting an explicitly specified value (e.g., a secured variable).

return this._secretMasker.MaskSecrets(input);
var stopwatch = Stopwatch.StartNew();
string result = this._secretMasker.MaskSecrets(input);
Interlocked.Exchange(ref elapsedMaskingTime, stopwatch.ElapsedTicks);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exchange

I'm not sure what threading is in play in the agent. The masker is thread-safe for the masking operation, so I used Interlocked.Exchange to update the elapsed time property.

}

// Replace
var stringBuilder = new StringBuilder();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringBuilder

StringBuilder instance construction is pretty expensive, we should consider a thread local field for a dedicated string builder instance (and set its length to 0 in this code path). A future change we can look at, maybe, to test our flighting/performance telemetry-based testing).

{
startIndex = match.Index + 1;
yield return new ReplacementPosition(match.Index, match.Length);
if (input.IndexOf(sniffLiteral, StringComparison.Ordinal) != -1)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexOf

A simple IndexOf check is 15x+ faster than something like RE2 in a typical perf test. And so using it as a first pass filter to decide whether to run a follow-on regex is likely to yield performance gains. These are typically 4o% or more in other scan scenarios 1ES owns that use the technique.

@michaelcfanning michaelcfanning marked this pull request as ready for review December 28, 2023 23:47
@michaelcfanning michaelcfanning requested review from a team as code owners December 28, 2023 23:47
@michaelcfanning
Copy link
Member Author

@merlynomsft

@@ -347,6 +348,9 @@ public async Task<TaskResult> RunAsync(Pipelines.AgentJobRequestMessage message,
finally
{
Trace.Info("Finalize job.");

PublishSecretsMaskingTelemetry(jobContext);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PublishSecretsMaskingTelemetry

Is this a good place to put the secret masking telemetry? Is this when the effective lifetime of the masker is over? What are the lifetime semantics of the host context relative to this code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, considering we upload logs just after (UploadDiagnosticLogsAsync)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants