-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Singer/Meltano: Add example
github-to-cratedb
It uses the `meltano-target-cratedb` Singer component. https://github.com/crate-workbench/meltano-target-cratedb
- Loading branch information
Showing
12 changed files
with
503 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
name: Python SQLAlchemy | ||
|
||
on: | ||
pull_request: | ||
branches: ~ | ||
paths: | ||
- '.github/workflows/test-singer-meltano.yml' | ||
- 'framework/singer-meltano/**' | ||
- 'requirements.txt' | ||
push: | ||
branches: [ main ] | ||
paths: | ||
- '.github/workflows/test-singer-meltano.yml' | ||
- 'framework/singer-meltano/**' | ||
- 'requirements.txt' | ||
|
||
# Allow job to be triggered manually. | ||
workflow_dispatch: | ||
|
||
# Run job each night after CrateDB nightly has been published. | ||
schedule: | ||
- cron: '0 3 * * *' | ||
|
||
# Cancel in-progress jobs when pushing to the same branch. | ||
concurrency: | ||
cancel-in-progress: true | ||
group: ${{ github.workflow }}-${{ github.ref }} | ||
|
||
jobs: | ||
test: | ||
name: " | ||
Python: ${{ matrix.python-version }} | ||
CrateDB: ${{ matrix.cratedb-version }} | ||
on ${{ matrix.os }}" | ||
runs-on: ${{ matrix.os }} | ||
strategy: | ||
fail-fast: false | ||
matrix: | ||
os: [ 'ubuntu-latest' ] | ||
python-version: [ '3.10', '3.11' ] | ||
cratedb-version: [ 'nightly' ] | ||
|
||
services: | ||
cratedb: | ||
image: crate/crate:nightly | ||
ports: | ||
- 4200:4200 | ||
- 5432:5432 | ||
|
||
steps: | ||
|
||
- name: Acquire sources | ||
uses: actions/checkout@v4 | ||
|
||
- name: Set up Python | ||
uses: actions/setup-python@v5 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
architecture: x64 | ||
cache: 'pip' | ||
cache-dependency-path: | | ||
requirements.txt | ||
framework/singer-meltano/requirements.txt | ||
framework/singer-meltano/requirements-dev.txt | ||
- name: Install utilities | ||
run: | | ||
pip install -r requirements.txt | ||
- name: Validate framework/singer-meltano | ||
run: | | ||
ngr test --accept-no-venv framework/singer-meltano |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,9 @@ | ||
.DS_Store | ||
.idea | ||
.env | ||
.venv* | ||
__pycache__ | ||
.coverage | ||
coverage.xml | ||
mlruns/ | ||
archive/ | ||
logs.log |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
.meltano | ||
output |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
# Meltano Examples | ||
|
||
Concise examples about working with [CrateDB] and [Meltano], for conceiving and | ||
running flexible ELT tasks. All the recipes are using [meltano-target-cratedb] | ||
for reading and writing data from/to CrateDB. | ||
|
||
## What's inside | ||
|
||
- `singerfile-to-cratedb`: Acquire data from Singer File, and load it into | ||
CrateDB database table. | ||
|
||
- `github-to-cratedb`: Acquire repository metadata from GitHub API, and load | ||
it separated per entity into 32 CrateDB database tables. | ||
|
||
## Prerequisites | ||
|
||
Before running an examples within the subdirectories, make sure to install | ||
Meltano and its dependencies. | ||
|
||
```shell | ||
python3 -m venv .venv | ||
source .venv/bin/activate | ||
pip install -r requirements.txt | ||
``` | ||
|
||
## Usage | ||
|
||
Then, explore the individual Meltano projects, either invoke them from within | ||
their directories, or by using the `--cwd` option from the root folder. | ||
|
||
```shell | ||
meltano --cwd github-to-cratedb install | ||
meltano --cwd github-to-cratedb run tap-github target-cratedb | ||
``` | ||
|
||
## Software Tests | ||
```shell | ||
pip install -r requirements-dev.txt | ||
poe check | ||
``` | ||
|
||
|
||
[CrateDB]: https://cratedb.com/product | ||
[Meltano]: https://meltano.com/ | ||
[meltano-target-cratedb]: https://github.com/crate-workbench/meltano-target-cratedb |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
# Meltano GitHub -> CrateDB example | ||
|
||
## About | ||
|
||
Acquire repository metadata from GitHub API, and insert into CrateDB database | ||
tables, using [meltano-target-cratedb]. | ||
|
||
It follows the canonical example demonstrated at the [Meltano Getting Started Tutorial]. | ||
|
||
## Configuration | ||
|
||
### tap-github | ||
|
||
For accessing the GitHub API, you will need an authentication token. It | ||
can be acquired at [GitHub Developer Settings » Tokens]. | ||
|
||
To configure the recipe, please store it into the `TAP_GITHUB_AUTH_TOKEN` | ||
environment variable, either interactively, or by creating a dotenv | ||
configuration file `.env`. | ||
|
||
```shell | ||
TAP_GITHUB_AUTH_TOKEN='ghp_hmQR3XTFWkfIcuyjRTBuVrRt6mnL1j2mMPT8' | ||
``` | ||
|
||
Then, in `meltano.yml`, identify the `tap-github` section in `plugins.extractors`, | ||
and adjust the value of `config.repositories` to correspond to the repository | ||
you intend to scrape. | ||
|
||
### target-cratedb | ||
|
||
Within `loaders` section `target-cratedb`, adjust `config.sqlalchemy_url` to | ||
match your database connectivity settings. | ||
|
||
|
||
## Usage | ||
|
||
Install dependencies. | ||
```shell | ||
meltano install | ||
``` | ||
|
||
Invoke data transfer to JSONL files. | ||
```shell | ||
meltano run tap-github target-jsonl | ||
cat github-to-cratedb/output/commits.jsonl | ||
``` | ||
|
||
Invoke data transfer to CrateDB database. | ||
```shell | ||
meltano run tap-github target-cratedb | ||
``` | ||
|
||
## Screenshot | ||
|
||
Enjoy the release notes. | ||
```sql | ||
SELECT repo, tag_name, body FROM melty.releases ORDER BY tag_name DESC; | ||
``` | ||
|
||
![image](https://github.com/crate-workbench/cratedb-toolkit/assets/453543/ac37c9cc-8e42-4c7c-84aa-64498bf48f4d) | ||
|
||
## Troubleshooting | ||
|
||
If you see such errors on stdout, please verify your GitHub authentication | ||
token stored within the `TAP_GITHUB_AUTH_TOKEN` environment variable. | ||
```python | ||
singer_sdk.exceptions.RetriableAPIError: 401 Client Error: b'{"message":"This endpoint requires you to be authenticated.","documentation_url":"https://docs.github.com/graphql/guides/forming-calls-with-graphql#authenticating-with-graphql"}' (Reason: Unauthorized) for path: /graphql cmd_type=elb consumer=False name=tap-github producer=True stdio=stderr string_id=tap-github | ||
``` | ||
|
||
## Development | ||
In order to link the sandbox to a development installation of [meltano-target-cratedb], | ||
configure the `pip_url` of the component like this: | ||
```yaml | ||
pip_url: --editable=/path/to/sources/meltano-target-cratedb | ||
``` | ||
|
||
|
||
[GitHub Developer Settings » Tokens]: https://github.com/settings/tokens | ||
[Meltano Getting Started Tutorial]: https://docs.meltano.com/getting-started/part1 | ||
[meltano-target-cratedb]: https://github.com/crate-workbench/meltano-target-cratedb | ||
[tap-github]: https://hub.meltano.com/extractors/tap-github/ | ||
[target-jsonl]: https://hub.meltano.com/loaders/target-jsonl/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# A Meltano project is just a directory on your filesystem containing text-based files. | ||
# At a minimum, a Meltano project must contain a project file named `meltano.yml`, | ||
# which contains your project configuration, and tells Meltano that a particular | ||
# directory is a Meltano project. | ||
--- | ||
version: 1 | ||
default_environment: dev | ||
send_anonymous_usage_stats: false | ||
project_id: f14797b9-9d1c-414c-851c-c91e08ddbc2e | ||
|
||
environments: | ||
- name: dev | ||
- name: staging | ||
- name: prod | ||
|
||
plugins: | ||
|
||
# Configure data source. | ||
# In Singer jargon, it is an "extractor", wrapped into a "tap". | ||
extractors: | ||
|
||
- name: tap-github | ||
variant: cratedb | ||
namespace: cratedb | ||
pip_url: git+https://github.com/crate-workbench/tap-github.git@cratedb | ||
# Note: Configure your GitHub repository here. | ||
config: | ||
start_date: '2023-12-01' | ||
repositories: | ||
- crate-workbench/cratedb-toolkit | ||
|
||
# Configure data sinks. | ||
# In Singer jargon, it is a "loader", wrapped into a "target". | ||
loaders: | ||
|
||
- name: target-jsonl | ||
variant: andyh1203 | ||
pip_url: target-jsonl | ||
|
||
- name: target-cratedb | ||
namespace: cratedb | ||
variant: cratedb | ||
# Acquire from PyPI. | ||
pip_url: meltano-target-cratedb | ||
# Acquire from GitHub. | ||
# pip_url: git+https://github.com/crate-workbench/meltano-target-cratedb.git | ||
|
||
# Note: Configure your database server and credentials here. | ||
config: | ||
sqlalchemy_url: crate://crate@localhost/ | ||
add_record_metadata: true |
Oops, something went wrong.