Skip to content

Commit

Permalink
adding documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Aaron Loo committed Nov 16, 2020
1 parent e66eb0b commit c42f973
Show file tree
Hide file tree
Showing 9 changed files with 895 additions and 160 deletions.
210 changes: 54 additions & 156 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,107 +5,83 @@ for developers everywhere! This document serves as a guide to help you quickly
gain familarity with the repository, and start your development environment so
that you can quickly hit the ground running.

## Layout
## 1. Learn the Overall Layout of the Code

```
/detect_secrets # This is where the main code lives
/core # Powers the detect-secrets engine
/plugins # All plugins live here, modularized.
/common # Common logic shared between plugins
main.py # Entrypoint for console use
pre_commit_hook.py # Entrypoint for pre-commit hook
/test_data # Sample files used for testing purposes
/testing # Common logic used in test cases
/tests # Mirrors detect_secrets layout for all tests
```
Be sure to read through the [overview of `detect-secrets`' design](/docs/design.md) before
starting to work on it! This will give you a better idea of the different components to the
system, and how they interact together to find secrets.

## Building Your Development Environment
## 2. Building Your Development Environment

There are several ways to spin up your virtual environment:

**Casual Python Developers**:

```bash
virtualenv --python=python3 venv
python3 -m venv venv
source venv/bin/activate
pip install -r requirements-dev.txt
```

or
**Regular Python Developers:**

```bash
python3 -m venv venv
virtualenv --python=python3 venv
source venv/bin/activate
pip install -r requirements-dev.txt
```

> **Developer Note**: The main difference between this method and the former one (using Python's
in-built virtual environment) is that Python's `venv` module pins the `pip` version. However,
it doesn't matter too much if you're working on this repository alone, since `detect-secrets`
doesn't ship with many dependency requirements.

or

```bash
tox -e venv
source venv/bin/activate
```

Whichever way you choose, you can check to see whether you're successful by
executing:

```bash
PYTHONPATH=`pwd` python detect_secrets/main.py --version
```

## Writing a Plugin

There are many examples of existing plugins to reference, under
`detect_secrets/plugins`. However, this is the overall workflow:

1. Write your tests

Before you write your plugin, you should **know what it intends to do**:
what it should catch, and arguably more importantly, what it should
avoid. Formalize these examples in tests!

For a basic example, see `tests/plugins/basic_auth_test.py`.

2. Write your plugin
> **Developer Note**: The benefit of this is that `tox` sets up a common development environment
for you. The downside is that you'll need to install `tox` first -- which if you already have,
you wouldn't be reading this section :)

All plugins MUST inherit from `detect_secrets.plugins.base.BasePlugin`.
See that class' docstrings for more detailed information.

Depending on the complexity of your plugin, you may be able to inherit
from `detect_secrets.plugins.base.RegexBasedDetector` instead. This is
useful if you want to merely customize a new regex rule. Check out
`detect_secrets/plugins/basic_auth.py` for a good example of this.
Whichever way you choose, you can check to see whether you're successful by executing:

Be sure to write comments about **why** your particular regex was crafted
as it is!

3. Update documentation

Be sure to add your changes to the `README.md` and `CHANGELOG.md` so that
it will be easier for maintainers to bump the version and for other
downstream consumers to get the latest information about plugins available.
```bash
python -m detect_secrets --version
```

### Tips
## 3. Run tests

- There should be a total of three modified files in a minimal new plugin: the
plugin file, it's corresponding test, and an updated README.
- If your plugin uses customizable options (e.g. entropy limit in `HighEntropyStrings`)
be sure to add default options to the plugin's `default_options`.
Tests should succeed on master. Any code additions you contribute will also need testing
so it's good to run tests first to make sure you have a working copy. Don't worry -- the tests
don't take long!

## Running Tests
```bash
$ time python -m pytest tests
...
real 0m10.113s
user 0m6.848s
sys 0m2.486s
```

### Running the Entire Test Suite

You can run the test suite in the interpreter of your choice (in this example,
`py35`) by doing:
You can run the test suite in the interpreter of your choice (in this example, `py36`) by doing:

```bash
tox -e py35
tox -e py36
```

This will also run the code through our series of coverage tests, `mypy` rules and other linting
checks to enforce a consistent coding style.

For a list of supported interpreters, check out `envlist` in `tox.ini`.

If you wanted to run **all** interpreters (might take a while), you can also
just run:
If you wanted to run **all** interpreters (might take a while), you can also just run:

```bash
make test
Expand All @@ -125,113 +101,35 @@ levels. Here are a couple of examples:
- Running a single test class

```bash
pytest tests/core/baseline_test.py::TestInitializeBaseline
pytest tests/core/baseline_test.py::TestCreate
```

- Running a single test function, inside test class

```bash
pytest tests/core/baseline_test.py::TestInitializeBaseline::test_basic_usage
pytest tests/core/baseline_test.py::TestCreate::test_basic_usage
```

- Running a single root level test function

```bash
pytest tests/plugins/base_test.py::test_fails_if_no_secret_type_defined
pytest tests/plugins/baseline_test.py::test_upgrade_succeeds
```

## Technical Details

### PotentialSecret

This lives at the very heart of the engine, and represents a line being flagged
for its potential to be a secret.

Since the detect-secrets engine is heuristics-based, it requires a human to read
its output at some point to determine false/true positives. Therefore, its
representation is tailored to support **high readability**. Its attributes
represent values that you would want to know (and keep track of) for
each potential secret, including:

1. What is it?
2. How was it found?
3. Where is it found?
4. Is it a true/false positive?

We can see that the JSON dump clearly shows this.

```
{
"type": "Base64 High Entropy String",
"filename": "test_data/config.yaml",
"line_number": 5,
"hashed_secret": "bc9160bc0ff062e1b2d21d2e59f6ebaba104f051",
"is_secret": false
}
```

However, since it is designed for easy reading, we didn't want the baseline to
be the single file that contained all the secrets in a given repository.
Therefore, we mask the secret by hashing it with three core attributes:

1. The actual secret
2. The filepath where it was found
3. How the engine determined it was a secret

Any potential secret that has **all three values the same is equal**.

This means that the engine will flag the following cases as separate occurrences
to investigate:

* Same secret value, but present in different files
* Same secret value, caught by multiple plugins

Furthermore, this will **not** flag on every single usage of a given secret in a
given file, to minimize noise.

**Important Note:** The line number does not play a part in the identification
of a potential secret because code is expected to move around through continuous
iteration. However, through the `audit` tool, these line numbers are leveraged
to quickly identify the secret that was identified by a given plugin.

### SecretsCollection

A collection of `PotentialSecrets` are stored in a `SecretsCollection`. This
contains a list of all the secrets in a given repository, as well as any other
details needed to recreate it.

A formatted dump of a `SecretsCollection` is used as the baseline file.

In this way, the overall baseline logic is simple:

1. Scan the repository to create a collection of known secrets.
2. Check every new secret against this collection of known secrets.
3. If you previously didn't know about it, alert off it.

With this in mind, this class exposes three types of methods:

##### 1. Creating

We need to create a `SecretsCollection` object from a formatted baseline output,
so that we can compare new secrets against it. This means that the baseline
**must** include all information needed to initialize a `SecretsCollection`,
such as:
Generally speaking, we use test classes to group a series of related test cases together (e.g.
`TestCreate` tests the `detect_secrets.core.baseline.create` functionality), but root test
functions otherwise. If you're writing tests for your plugins, you should probably just use
root test functions.

* Secrets found,
* Files to exclude,
* Plugin configurations,
* Version of detect-secrets used
## 4. Make Your Change

##### 2. Adding
Want to contribute a new plugin? Check out more details here:
[Writing Your Own Plugin](/docs/plugins.md#Writing%20Your%20Own%20Plugin)

Once we have a collection of secrets, we can add secrets to it via various
methods of scanning strings. The various methods of scanning strings (e.g.
`scan_file`, `scan_diff`) should handle iterating through all plugins, and
adding results found to the collection.
What about contributing better false positive filters? Check out more details here:
[Writing Your Own Filter](/docs/filters.md#Writing%20Your%20Own%20Filter)

##### 3. Outputting
## 5. Deploying Changes

We need to be able to create a baseline from a SecretsCollection, so that it
can be used for future comparisons. In the same spirit as the `PotentialSecret`
object, it is designed for **high readability**, and may contain other metadata
that aids human analysis of the generated output (e.g. `generated_at` time).
Check out [more detailed upgrade instructions here](/docs/upgrades.md), and how to write
backwards-compatible changes using the built-in upgrade infrastructure.
7 changes: 7 additions & 0 deletions detect_secrets/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
import sys

from .main import main


if __name__ == '__main__':
sys.exit(main())
4 changes: 0 additions & 4 deletions detect_secrets/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,3 @@ def handle_audit_action(args: argparse.Namespace) -> None:
audit.audit_baseline(args.filename[0])
except InvalidBaselineError:
pass


if __name__ == '__main__':
sys.exit(main())
Loading

0 comments on commit c42f973

Please sign in to comment.