-
Notifications
You must be signed in to change notification settings - Fork 70
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
19 changed files
with
579 additions
and
38 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,172 @@ | ||
# Code Samples | ||
|
||
Below you will find a collection of code samples which can be used for inspiration. | ||
|
||
## Project Samples | ||
|
||
Below are full project samples, contributed by members in the community. Use these for inspiration | ||
or to get more information on what an SDK-based tap will look like. | ||
|
||
- [tap-bamboohr by Auto IDM](https://gitlab.com/autoidm/tap-bamboohr) | ||
- [tap-confluence by @edgarrmondragon](https://github.com/edgarrmondragon/tap-confluence) | ||
- [tap-investing by @DouweM](https://gitlab.com/DouweM/tap-investing) | ||
- [tap-parquet by AJ](https://github.com/dataops-tk/tap-parquet) | ||
- [tap-powerbi-metadata by Slalom](https://github.com/dataops-tk/tap-powerbi-metadata) | ||
|
||
To add your project to this list, please | ||
[submit an issue](https://gitlab.com/meltano/meltano/-/issues/new?issue%5Bassignee_id%5D=&issue%5Bmilestone_id%5D=). | ||
|
||
## Reusable Code Snippets | ||
|
||
These are code samples taken from other projects. Use these as a reference if you get stuck. | ||
|
||
### A simple Tap class definition with two streams | ||
|
||
```python | ||
class TapCountries(Tap): | ||
"""Sample tap for Countries GraphQL API. This tap has no | ||
config options and does not require authentication. | ||
""" | ||
name = "tap-countries" | ||
config_jsonschema = PropertiesList([]).to_dict() | ||
|
||
def discover_streams(self) -> List[Stream]: | ||
"""Return a list containing the two stream types.""" | ||
return [ | ||
CountriesStream(tap=self), | ||
ContinentsStream(tap=self), | ||
] | ||
``` | ||
|
||
### Define a simple GraphQL-based stream with schema defined in a file | ||
|
||
```python | ||
class ContinentsStream(GraphQLStream): | ||
"""Continents stream from the Countries API.""" | ||
|
||
name = "continents" | ||
primary_keys = ["code"] | ||
replication_key = None # Incremental bookmarks not needed | ||
|
||
# Read JSON Schema definition from a text file: | ||
schema_filepath = SCHEMAS_DIR / "continents.json" | ||
|
||
# GraphQL API endpoint and query text: | ||
url_base = "https://countries.trevorblades.com/" | ||
query = """ | ||
continents { | ||
code | ||
name | ||
} | ||
""" | ||
``` | ||
|
||
### Dynamically discovering `schema` for a stream | ||
|
||
Here is an example which parses schema from a CSV file: | ||
|
||
```python | ||
FAKECSV = """ | ||
Header1,Header2,Header3 | ||
val1,val2,val3 | ||
val1,val2,val3 | ||
val1,val2,val3 | ||
""" | ||
|
||
@property | ||
class ParquetStream(Stream): | ||
def schema(self): | ||
"""Dynamically detect the json schema for the stream. | ||
This is evaluated prior to any records being retrieved. | ||
""" | ||
properties: List[Property] = [] | ||
for header in FAKECSV.split("\n")[0].split(",") | ||
# Assume string type for all fields | ||
properties.add(header, StringType()) | ||
return PropertiesList(*properties).to_dict() | ||
``` | ||
|
||
Here is another example from the Parquet tap. This sample uses a | ||
custom `get_jsonschema_type()` function to return the data type. | ||
|
||
```python | ||
class ParquetStream(Stream): | ||
"""Stream class for Parquet streams.""" | ||
|
||
#... | ||
|
||
@property | ||
def schema(self) -> dict: | ||
"""Dynamically detect the json schema for the stream. | ||
This is evaluated prior to any records being retrieved. | ||
""" | ||
properties: List[Property] = [] | ||
# Get a schema object using the parquet and pyarrow libraries | ||
parquet_schema = pq.ParquetFile(self.filepath).schema_arrow | ||
|
||
# Loop through each column in the schema object | ||
for i in range(len(parquet_schema.names)): | ||
# Get the column name | ||
name = parquet_schema.names[i] | ||
# Translate from the Parquet type to a JSON Schema type | ||
dtype = get_jsonschema_type(str(parquet_schema.types[i])) | ||
|
||
# Add the new property to our list | ||
properties.append(Property(name, dtype)) | ||
|
||
# Return the list as a JSON Schema dictionary object | ||
return PropertiesList(*properties).to_dict() | ||
``` | ||
|
||
### Initialize a collection of tap streams with differing types | ||
|
||
```python | ||
class TapCountries(Tap): | ||
# ... | ||
def discover_streams(self) -> List[Stream]: | ||
"""Return a list containing one each of the two stream types.""" | ||
return [ | ||
CountriesStream(tap=self), | ||
ContinentsStream(tap=self), | ||
] | ||
``` | ||
|
||
Or equivalently: | ||
|
||
```python | ||
|
||
# Declare list of types here at the top of the file | ||
STREAM_TYPES = [ | ||
CountriesStream, | ||
ContinentsStream, | ||
] | ||
|
||
class TapCountries(Tap): | ||
# ... | ||
def discover_streams(self) -> List[Stream]: | ||
"""Return a list with one each of all defined stream types.""" | ||
return [ | ||
stream_type(tap=self) | ||
for stream_type in STREAM_TYPES | ||
] | ||
``` | ||
|
||
### Run the standard built-in tap tests | ||
|
||
```python | ||
# Import the tests | ||
from singer_sdk.testing import get_standard_tap_tests | ||
|
||
# Import our tap class | ||
from tap_parquet.tap import TapParquet | ||
|
||
SAMPLE_CONFIG = { | ||
# ... | ||
} | ||
|
||
def test_sdk_standard_tap_tests(): | ||
"""Run the built-in tap tests from the SDK.""" | ||
tests = get_standard_tap_tests(TapParquet, config=SAMPLE_CONFIG) | ||
for test in tests: | ||
test() | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,20 @@ | ||
# Singer SDK Implementation Details | ||
|
||
This section documents certain behaviors and expectations of the Singer SDK framework. | ||
|
||
1. [CLI](./cli.md) | ||
2. [Discovery](./discovery.md) | ||
3. [Metadata](./discovery.md) | ||
4. [Metrics](./discovery.md) | ||
5. [State](./state.md) | ||
|
||
## How to use the implementation reference material | ||
|
||
_**Note:** You should not need to master all of the details here in order | ||
to build your tap, and the behaviors described here should be automatic | ||
and/or intuitive. For general guidance on tap development, please instead refer to our | ||
[Dev Guide](../dev_guide.md)._ | ||
|
||
The specifications provided in this section are documented primarily to support | ||
advanced use cases, behavior overrides, backwards compatibility with legacy taps, | ||
debugging unexpected behaviors, or contributing back to the SDK itself. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# [Singer SDK Implementation Details](/.README.md) - Catalog Discovery | ||
|
||
All taps developed using the SDK will automatically support `discovery` as a base | ||
capability, which is the process of generating and emitting a catalog that describes the | ||
available streams and stream types. | ||
|
||
The catalog generated is automatically populated by a small number of developer inputs. Most | ||
importantly: | ||
|
||
- `Tap.discover_streams()` - Should return a list of available "discovered" streams. | ||
- `Stream.schema` or `Stream.schema_filepath` - The JSON Schema definition of each stream, | ||
provided either directly as a Python `dict` or indirectly as a `.json` filepath. | ||
- `Stream.primary_keys` - a list of strings indicating the primary key(s) of the stream. | ||
- `Stream.replication_key` - a single string indicating the name of the stream's replication | ||
key (if applicable). | ||
|
||
## See Also | ||
|
||
- See the [Dev Guide](../dev_guide.md) and [Code Samples](../code_samples.md) for more | ||
information on working with dynamic stream schemas. | ||
- [Singer Spec: Discovery (meltano.com)](https://meltano.com/docs/singer-spec.html#discovery-mode) | ||
- [Singer Spec: Discovery (singer-io)](https://github.com/singer-io/getting-started/blob/master/docs/DISCOVERY_MODE.md) |
Oops, something went wrong.