Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 27 additions & 17 deletions doc/changes/unreleased.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,32 @@
# Unreleased

This release comes with breaking changes in package `exasol.analytics.schema`:
* Classes `ColumnType` and `ColumnBuilder` are removed
* Class `Column` is changed significantly
* Subclasses of `Column` have been added for specific column types:
* `BooleanColumn`
* `CharColumn`
* `DateColumn`
* `DecimalColumn`
* `DoublePrecisionColumn`
* `GeometryColumn`
* `HashTypeColumn`
* `TimeStampColumn`
* `VarCharColumn`
* Class `ColumnBuilder` has been removed
* Class `ColumnType` is changed significantly
* Subclasses of `ColumnType` have been added for specific column types:
* `BooleanType`
* `CharType`
* `DateType`
* `DecimalType`
* `DoublePrecisionType`
* `GeometryType`
* `HashTypeType`
* `TimeStampType`
* `VarCharType`
* Additional classes have been added for specific attributes of some of the column types:
* `CharSet`
* `HashSizeUnit`
* Convenience functions for creating instances of `Column` have been replaced by class method `simple()` of the resp. subclasses of `Column`:
* `decimal_column()`
* `varchar_column()`
* `hashtype_column()`

The available convenience functions for creating instances of `Column` specifying the name as a simple string have been completed:
* `boolean_column`
* `char_column`
* `date_column`
* `decimal_column`
* `double_column`
* `geometry_column`
* `hashtype_column`
* `timestamp_column`
* `varchar_column`

Please see the [User Guide](http://github.com/exasol/advanced-analytics-framework/blob/main/doc/user_guide/database_objects.md) about creating and using instances of `Column` starting with this release.

Expand All @@ -29,10 +35,14 @@ Please see the [User Guide](http://github.com/exasol/advanced-analytics-framewor
* #283: Updated description and README
* #290: Added user guide for database objects in module `exasol.analytics.schema`

## Refactoring
## Refactorings

* #286: Updated exasol-toolbox to 1.0.1
* #240: Enhanced `schema.column_type.ColumnType`
* #292: Added check for range of attribute `size` to `HashTypeColumn`
* #294: Refactored class `Column` once again, see user guide
* Re-introduced attribute `type`
* Re-added class `ColumnType`

## Internal

Expand Down
81 changes: 53 additions & 28 deletions doc/user_guide/database_objects.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,58 +19,83 @@ Interfaces and implementation classes for names of specific database objects:
| `UDFName` | `UDFNameImpl` |
| `ViewName` | `ViewNameImpl` |

## Class Column
## Class `Column`

### Subclasses for Each of the SQL Column Types
Class `Column` has attributes
* `name` of type `ColumnName`
* `type` of type `ColumnType`

There is a subclass with specific attributes for each of the SQL types:
Additionally there are convenience methods for
* Rendering the column for a `CREATE TABLE` statement
* Creating a new instance by parsing its SQL specification

### Subclasses of `ColumnType` for Each of the SQL Column Types

| Class | Attributes, value range, default value |
|-------------------------|------------------------------------------------------------------------|
| `BooleanColumn` | - |
| `CharColumn` | `size` (0-2000, default: 1), `charset`: CharSet, default: CharSet.UTF8 |
| `DateColumn` | - |
| `DecimalColumn` | `precision` (0-37, default: 18), `scale` (0-37, default: 0) |
| `DoublePrecisionColumn` | - |
| `GeometryColumn` | `srid`, (default: 0) |
| `HashTypeColumn` | `size` (default: 16), `unit` (`HashSizeUnit.BYTE` or `HashSizeUnit.BIT`, default: `HashSizeUnit.BYTE`) |
| `TimeStampColumn` | `precision` (1-9, default: 3), `local_time_zone` (`True` or `False`, default: False) |
| `VarCharColumn` | `size` (0-2000000), `charset`: CharSet (`CharSet.UTF8` or `CharSet.ASCII`, default: `CharSet.UTF8`) |
There is a subclass with specific attributes for each of the SQL types:

### Instantiate a Column Object
| Class | Attributes, value range, default value |
|-----------------------|------------------------------------------------------------------------|
| `BooleanType` | - |
| `CharType` | `size` (0-2000, default: 1), `charset`: CharSet, default: CharSet.UTF8 |
| `DateType` | - |
| `DecimalType` | `precision` (0-37, default: 18), `scale` (0-37, default: 0) |
| `DoublePrecisionType` | - |
| `GeometryType` | `srid`, (default: 0) |
| `HashTypeType` | `unit` (`HashSizeUnit.BYTE` or `HashSizeUnit.BIT`, default: `HashSizeUnit.BYTE`), `size` (value range: see below, default: 16) |
| `TimeStampType` | `precision` (1-9, default: 3), `local_time_zone` (`True` or `False`, default: False) |
| `VarCharType` | `size` (0-2000000), `charset`: CharSet (`CharSet.UTF8` or `CharSet.ASCII`, default: `CharSet.UTF8`) |

For `HashTypeType` the value range of attribute `size` depends on the
value of `unit`:
* `unit`== `HashSizeUnit.BYTE`, then `size` must be in range 1-1024
* `unit`== `HashSizeUnit.BIT`, then `size` must be in range 8-8192 and must be a multiple of 8

### Instantiating a `Column` Object

You can instantiate a column:

```python
col = DecimalColumn(ColumnName("D"), precision=10, scale=1)
col = Column(ColumnName("D"), DecimalType(precision=10, scale=1))
```

For convenience, there is also a class method simple(), which accepts a simple string for the column name:
Additionally there are convenience methods accepting a simple string for the column name:

```python
col = DecimalColumn.simple("D", precision=10, scale=1)
col_1 = boolean_column("B")
col_2 = char_column("C", size=10, charset=CharSet.ASCII)
col_3 = date_column("A")
col_4 = decimal_column("D", precision=2, scale=1)
col_5 = double_column("P")
col_6 = geometry_column("G", srid=1)
col_7 = hashtype_column("H", size=8, unit=HashSizeUnit.BIT)
col_8 = timestamp_column("T", precision=6)
col_9 = varchar_column("V", size=2000000)
```

### Render for a `CREATE TABLE` Statement
### Rendering a Column for a `CREATE TABLE` SQL Statement

Each column can be rendered for creating a `CREATE TABLE` statement:
```python
DecimalColumn.simple("D", precision=10, scale=1).for_create
>>> DECIMAL "D"(18,1)
decimal_column("D", precision=10, scale=1).for_create
>>> "D" DECIMAL(18,1)
```

### Parse from SQL Specification
### Parsing a Column from its SQL Specification

A column including its name and type can be parsed from its SQL specification:

Each column can be parsed from its SQL specification:
```python
char_column = Column.from_sql_spec("C", "CHAR(1) UTF8")
```

This conversion supports aliases:
* `INTEGER`, `DECIMAL` for `DecimalColumn`
* `DOUBLE PRECISION`, `DOUBLE`, `FLOAT` for `DoublePrecisionColumn`
This conversion supports aliases for some column types:
* `INTEGER`, `DECIMAL` for `DecimalType`
* `DOUBLE PRECISION`, `DOUBLE`, `FLOAT` for `DoublePrecisionType`

### Parsing a Column from Pyexasol Column Metadata

### Parse from Pyexasol Column Metadata
A column can also be parsed from [Pyexasol](https://github.com/exasol/pyexasol) column metadata:

Each column can be parsed from [Pyexasol](https://github.com/exasol/pyexasol) column metadata:
```python
timestamp_column = Column.from_pyexasol("A", {"type": "TIMESTAMP", "withLocalTimeZone": True})
```
8 changes: 4 additions & 4 deletions doc/user_guide/example-udf-script/create.sql
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ from exasol.analytics.query_handler.query.select import SelectQuery, SelectQuery
from exasol.analytics.query_handler.context.proxy.bucketfs_location_proxy import BucketFSLocationProxy
from exasol.analytics.schema import (
Column,
DecimalColumn,
VarCharColumn,
decimal_column,
varchar_column,
)
from datetime import datetime
from exasol.bucketfs import as_string
Expand Down Expand Up @@ -56,8 +56,8 @@ class ExampleQueryHandler(UDFQueryHandler):
query_handler_return_query = SelectQueryWithColumnDefinition(
query_string=table_query_string('SELECT "c1", "c2" from {table_name}'),
output_columns=[
VarCharColumn.simple("c1", size=100),
DecimalColumn.simple("c2"),
varchar_column("c1", size=100),
decimal_column("c2"),
])
return Continue(
query_list=query_list,
Expand Down
4 changes: 2 additions & 2 deletions exasol/analytics/audit/audit_query_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
)
from exasol.analytics.schema import (
Column,
DecimalColumn,
decimal_column,
)

LOG = logging.getLogger(__name__)
Expand Down Expand Up @@ -142,7 +142,7 @@ def _final_continue(self, audit_query: AuditQuery) -> Continue:
"""
input_query = SelectQueryWithColumnDefinition(
query_string="SELECT (CAST 1 as DECIMAL(1,0)) as DUMMY_COLUMN",
output_columns=[DecimalColumn.simple("DUMMY_COLUMN", precision=1, scale=0)],
output_columns=[decimal_column("DUMMY_COLUMN", precision=1, scale=0)],
)
return Continue(
query_list=self._augmented([audit_query]),
Expand Down
34 changes: 17 additions & 17 deletions exasol/analytics/audit/columns.py
Original file line number Diff line number Diff line change
@@ -1,42 +1,42 @@
from exasol.analytics.schema import (
DecimalColumn,
HashSizeUnit,
HashTypeColumn,
TimeStampColumn,
VarCharColumn,
decimal_column,
hashtype_column,
timestamp_column,
varchar_column,
)


class BaseAuditColumns:
LOG_TIMESTAMP = TimeStampColumn.simple("LOG_TIMESTAMP", precision=3)
SESSION_ID = DecimalColumn.simple("SESSION_ID", precision=20)
LOG_TIMESTAMP = timestamp_column("LOG_TIMESTAMP", precision=3)
SESSION_ID = decimal_column("SESSION_ID", precision=20)
# RUN_ID must be obtained initially and remain unchanged during lifetime
# of AuditLogger. AuditLogger sets it from uuid.uuid4().
RUN_ID = HashTypeColumn.simple("RUN_ID", size=16, unit=HashSizeUnit.BYTE)
ROW_COUNT = DecimalColumn.simple("ROW_COUNT", precision=36)
RUN_ID = hashtype_column("RUN_ID", size=16, unit=HashSizeUnit.BYTE)
ROW_COUNT = decimal_column("ROW_COUNT", precision=36)
# LOG_SPAN_NAME and LOG_SPAN_ID need to be generated and provided by the
# creator of the AuditQuery, i.e. lower level query_handlers.

# For ModifyQuery LOG_SPAN_NAME will be set to the Operation Type, e.g.
# CREATE_TABLE, CREATE_TABLE, INSERT. For other queries it can be a custom
# string indicating a specific execution phase.
LOG_SPAN_NAME = VarCharColumn.simple("LOG_SPAN_NAME", size=2000000)
LOG_SPAN_NAME = varchar_column("LOG_SPAN_NAME", size=2000000)
# SPAN IDs are UUIDs with 128 bit = 32 hex digits > 38 decimal digits
LOG_SPAN_ID = HashTypeColumn.simple("LOG_SPAN_ID", size=16, unit=HashSizeUnit.BYTE)
PARENT_LOG_SPAN_ID = HashTypeColumn.simple(
LOG_SPAN_ID = hashtype_column("LOG_SPAN_ID", size=16, unit=HashSizeUnit.BYTE)
PARENT_LOG_SPAN_ID = hashtype_column(
"PARENT_LOG_SPAN_ID", size=16, unit=HashSizeUnit.BYTE
)
# For ModifyQuery EVENT_NAME will be either "Begin" or "End". For other
# queries this can be a custom string, e.g. "ERROR", "COMMIT", ...
EVENT_NAME = VarCharColumn.simple("EVENT_NAME", size=128)
EVENT_NAME = varchar_column("EVENT_NAME", size=128)
# This will contain the string representation of a json document.
EVENT_ATTRIBUTES = VarCharColumn.simple("EVENT_ATTRIBUTES", size=2000000)
DB_OBJECT_TYPE = VarCharColumn.simple("DB_OBJECT_TYPE", size=128)
EVENT_ATTRIBUTES = varchar_column("EVENT_ATTRIBUTES", size=2000000)
DB_OBJECT_TYPE = varchar_column("DB_OBJECT_TYPE", size=128)
# Optional, can be NULL:
DB_OBJECT_SCHEMA = VarCharColumn.simple("DB_OBJECT_SCHEMA", size=128)
DB_OBJECT_SCHEMA = varchar_column("DB_OBJECT_SCHEMA", size=128)
# Contains the schema name for operations CREATE/DROP SCHEMA:
DB_OBJECT_NAME = VarCharColumn.simple("DB_OBJECT_NAME", size=128)
ERROR_MESSAGE = VarCharColumn.simple("ERROR_MESSAGE", size=200)
DB_OBJECT_NAME = varchar_column("DB_OBJECT_NAME", size=128)
ERROR_MESSAGE = varchar_column("ERROR_MESSAGE", size=200)

all = [
LOG_TIMESTAMP,
Expand Down
36 changes: 24 additions & 12 deletions exasol/analytics/schema/__init__.py
Original file line number Diff line number Diff line change
@@ -1,20 +1,32 @@
from exasol.analytics.schema.column import (
BooleanColumn,
CharColumn,
Column,
DateColumn,
DecimalColumn,
DoublePrecisionColumn,
GeometryColumn,
HashSizeUnit,
HashTypeColumn,
TimeStampColumn,
UnsupportedSqlType,
VarCharColumn,
boolean_column,
char_column,
date_column,
decimal_column,
double_column,
geometry_column,
hashtype_column,
timestamp_column,
varchar_column,
)
from exasol.analytics.schema.column_name import ColumnName
from exasol.analytics.schema.column_name_builder import ColumnNameBuilder
from exasol.analytics.schema.column_types import CharSet
from exasol.analytics.schema.column_type import (
BooleanType,
CharType,
ColumnType,
DateType,
DecimalType,
DoublePrecisionType,
GeometryType,
HashSizeUnit,
HashTypeType,
TimeStampType,
UnsupportedSqlType,
VarCharType,
)
from exasol.analytics.schema.column_type_utils import CharSet
from exasol.analytics.schema.connection_object_name import ConnectionObjectName
from exasol.analytics.schema.connection_object_name_builder import (
ConnectionObjectNameBuilder,
Expand Down
Loading
Loading