tap-github is a Singer tap for GitHub.
Built with the Singer SDK.
# use uv (https://docs.astral.sh/uv/)
uv tool install meltanolabs-tap-github
# or pipx (https://pipx.pypa.io/stable/)
pipx install meltanolabs-tap-github
# or Meltano
meltano add extractor tap-githubA list of release versions is available at https://github.com/MeltanoLabs/tap-github/releases
This tap accepts the following configuration options:
- Required: One and only one of the following modes:
repositories: An array of strings specifying the GitHub repositories to be included. Each element of the array should be of the form<org>/<repository>, e.g.MeltanoLabs/tap-github.organizations: An array of strings containing the github organizations to be includedsearches: An array of search descriptor objects with the following properties:name: A human readable name for the search queryquery: A github search string (generally the same as would come after?q=in the URL)
user_usernames: A list of github usernamesuser_ids: A list of github user ids [int]
- Highly recommended:
- Personal access tokens (PATs) for authentication can be provided in 3 ways:
auth_token- Takes a single token.additional_auth_tokens- Takes a list of tokens. Can be used together withauth_tokenor as the sole source of PATs.- Any environment variables beginning with
GITHUB_TOKENwill be assumed to be PATs. These tokens will be used in addition toauth_token(if provided), but will not be used ifadditional_auth_tokensis provided.
- GitHub App keys are another option for authentication, and can be used in combination with PATs if desired. App IDs and keys should be assembled into the format
:app_id:;;-----BEGIN RSA PRIVATE KEY-----\n_YOUR_P_KEY_\n-----END RSA PRIVATE KEY-----(replace:app_id:with your actual GitHub App ID and_YOUR_P_KEY_with your private key content) where the key can be generated from thePrivate keyssection on https://github.com/organizations/:organization_name/settings/apps/:app_name. Read more about GitHub App quotas here. Formatted app keys can be provided in 2 ways:auth_app_keys- List of GitHub App keys in the prescribed format.- If
auth_app_keysis not provided but there is an environment variable with the nameGITHUB_APP_PRIVATE_KEY, it will be assumed to be an App key in the prescribed format.
- Personal access tokens (PATs) for authentication can be provided in 3 ways:
- Optional:
user_agentstart_datemetrics_log_levelstream_mapsstream_maps_configstream_options: Options which can change the behaviour of a specific stream are nested within.milestones: Valid options for themilestonesstream are nested within.state: Determines which milestones will be extracted. One ofopen(default),closed,all.
rate_limit_buffer: A buffer to avoid consuming all query points for the auth_token at hand. Defaults to 1000.expiry_time_buffer: A buffer used when determining when to refresh GitHub app tokens. Only relevant when authenticating as a GitHub app. Defaults to 10 minutes. Tokens generated by GitHub apps expire 1 hour after creation, and will be refreshed once fewer thanexpiry_time_bufferminutes remain until the anticipated expiry time.
Note that modes 1-3 are repository modes and 4-5 are user modes and will not run the same set of streams.
A full list of supported settings and capabilities for this tap is available by running:
tap-github --aboutA small number of records may be pulled without an auth token. However, a Github auth token should generally be considered "required" since it gives more realistic rate limits. (See GitHub API docs for more info.)
The GitHub API is limited for some resources such as /events. For some resources, users might encounter the following error:
In order to keep the API fast for everyone, pagination is limited for this resource. Check the rel=last link relation in the Link response header to see how far back you can traverse.
To avoid this, the GitHub streams will exit early. I.e. when there are no more next page available. If you are fecthing /events at the repository level, beware of letting the tap disabled for longer than a few days or you will have gaps in your data.
You can easily run tap-github by itself or in a pipeline using Meltano.
- For the
traffic_*streams, you will need write access to the repository. You can enable extraction for these streams by selecting them in the catalog.
tap-github --version
tap-github --help
tap-github --config CONFIG --discover > ./catalog.jsonThis project uses parent-child streams. Learn more about them here.
pipx install poetry
poetry installCreate tests within the tap_github/tests subfolder and
then run:
poetry run pytestYou can also test the tap-github CLI interface directly using poetry run:
poetry run tap-github --helpTesting with Meltano
Note: This tap will work in any Singer environment and does not require Meltano. Examples here are for convenience and to streamline end-to-end orchestration scenarios.
Your project comes with a custom meltano.yml project file already created. Open the meltano.yml and follow any "TODO" items listed in
the file.
Next, install Meltano (if you haven't already) and any needed plugins:
# Install meltano
pipx install meltano
# Initialize meltano within this directory
cd tap-github
meltano installNow you can test and orchestrate using Meltano:
# Test invocation:
meltano invoke tap-github --version
# OR run a test `elt` pipeline:
meltano elt tap-github target-jsonlOne-liner to recreate output directory, run elt, and write out state file:
# Update this when you want a fresh state file:
TESTJOB=testjob1
# Run everything in one line
mkdir -p .output && meltano elt tap-github target-jsonl --job_id $TESTJOB && meltano elt tap-github target-jsonl --job_id $TESTJOB --dump=state > .output/state.jsonSee the dev guide for more instructions on how to use the Singer SDK to develop your own taps and targets.