Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sqlalchemy #240

Merged
merged 118 commits into from
Feb 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
118 commits
Select commit Hold shift + click to select a range
fdbfc84
Implement base SQLAlchemy raster driver
nickeopti Dec 2, 2021
b6602fa
Add String size on table creation
nickeopti Dec 2, 2021
4465d0e
Use readonly properties for db_version and key_names
nickeopti Dec 2, 2021
abf5e85
Adhere flake8 and mypy, and introduce DB_SCHEME
nickeopti Dec 2, 2021
50ed8e4
Use RelationalDriver
nickeopti Dec 2, 2021
7037793
Use default ParseResult structure instead of specific MySQLCredential…
nickeopti Dec 2, 2021
b657bd3
Remove leftover print statements
nickeopti Dec 2, 2021
35a5ce8
Let functions optionally use unverified databases when connecting
nickeopti Dec 2, 2021
fc5504c
Fix column creation error and create tables on unverified database co…
nickeopti Dec 2, 2021
cba5f5b
Use indices on the keys in the DB to recreate correct order
nickeopti Dec 3, 2021
50608a5
index on sqlalchemy: cba5f5b Use indices on the keys in the DB to rec…
nickeopti Dec 7, 2021
172cabf
WIP on sqlalchemy: cba5f5b Use indices on the keys in the DB to recre…
nickeopti Dec 7, 2021
b0116cb
Align handling of errors with previous version
nickeopti Dec 7, 2021
627bc30
Check for missing scheme instead of missing hostname
nickeopti Dec 14, 2021
ac1cbbd
Tidy mysql driver code up a bit
nickeopti Dec 14, 2021
658d7cf
Remove leftover debugging stuff
nickeopti Dec 14, 2021
0d71226
Add sqla InvalidRequestError to list of exceptions to convert (and ti…
nickeopti Dec 14, 2021
63b2c70
Refactor sqlite driver to use the common RelationalDriver
nickeopti Dec 14, 2021
a5d9a1b
Ugly hack to make sqlite remote driver work with new and old structures
nickeopti Dec 17, 2021
e21fdee
Require SQLAlchemy
nickeopti Dec 17, 2021
e239d09
Cleanup driver code and adhere flake8, mypy
nickeopti Dec 21, 2021
d26ebc7
Rename common to relational_base
nickeopti Dec 21, 2021
f51175b
Add and use ._local_path field on RemoteSQLiteDriver instead of inher…
nickeopti Dec 21, 2021
6d435bc
Allow Path input and clean code
nickeopti Dec 21, 2021
2b14c83
Let drivers handle path resolving internally
nickeopti Dec 21, 2021
48f7d25
Add test for parsing of invalid schemes
nickeopti Dec 21, 2021
98ebb7d
Only test invalid scheme for mysql
nickeopti Dec 21, 2021
f5bc5f9
Handle paths such that they hopefully work on Windows as well
nickeopti Dec 21, 2021
0add2dc
Merge branch 'main' into sqlalchemy
nickeopti Dec 21, 2021
ddda1bc
Cache drivers within each process only
nickeopti Jan 4, 2022
faf8f64
Set connection transaction isolation level to READ UNCOMMITTED to ena…
nickeopti Jan 4, 2022
4a4657c
Satisfy mypy
nickeopti Jan 4, 2022
2a2c57f
Cleanup code
nickeopti Jan 4, 2022
0aa63c4
Fix failing url_parse test
nickeopti Jan 7, 2022
3a8144d
Use docstring instead of comments
nickeopti Jan 7, 2022
d71d2d6
Describe the verify argument in connect method
nickeopti Jan 7, 2022
8b95b20
Rename SQL_DATABASE_SCHEME to SQL_URL_SCHEME
nickeopti Jan 7, 2022
38d7880
Don't echo anymore
nickeopti Jan 7, 2022
5078bc3
Describe assertion check better
nickeopti Jan 7, 2022
7f53124
Merge branch 'sqlalchemy' of https://github.com/DHI-GRAS/terracotta i…
nickeopti Jan 7, 2022
c4bb17c
Restore mysql primary key size calculation
nickeopti Jan 7, 2022
f391f86
Describe the max primary key length
nickeopti Jan 7, 2022
bcd9a52
Improve exception convertions
nickeopti Jan 7, 2022
0d104c4
Use cleaner exception handling in connect()
nickeopti Jan 7, 2022
813ebc7
Undo renaming of _connection_callback
nickeopti Jan 7, 2022
1b6a7f5
fix mypy error
dionhaefner Jan 7, 2022
25c594b
Reimplement lazy loading max shape
nickeopti Jan 11, 2022
d5cf935
Merge branch 'sqlalchemy' of https://github.com/DHI-GRAS/terracotta i…
nickeopti Jan 11, 2022
e850659
Don't echo anywhere
nickeopti Jan 11, 2022
2f7eff2
Explicitly define mysql tables with specific charset
nickeopti Jan 11, 2022
a634b39
Just use Pathlib resolve method
nickeopti Jan 11, 2022
229bcc4
Move _METADATA_COLUMNS out as a class variable again
nickeopti Jan 11, 2022
36e9e8e
Improve parsing of paths
nickeopti Jan 11, 2022
421acb3
Set mysql driver encoding to uft8mb4 as well
nickeopti Jan 11, 2022
f455edd
Remove now unused code
nickeopti Jan 13, 2022
da2bc5b
update to most recent COG validate script
dionhaefner Jan 17, 2022
c1b7bb0
Split Driver
nickeopti Jan 31, 2022
d587759
Create TerracottaDriver
nickeopti Jan 31, 2022
959ecbf
Move functionality up into Driver
nickeopti Jan 31, 2022
e4fff5d
Split Driver
nickeopti Jan 31, 2022
c936905
Rename accordingly to Driver refactor
nickeopti Jan 31, 2022
230d081
Update tests according to Driver refactor
nickeopti Jan 31, 2022
8447a12
Merge branch 'sqlalchemy' into sqlalchemy-compositional
nickeopti Jan 31, 2022
19b84dc
Remove leftover debugging prints
nickeopti Jan 31, 2022
5161aec
move most logic from raster driver to raster.py module
dionhaefner Jan 31, 2022
9477bf7
go straight to :walrus: jail
dionhaefner Jan 31, 2022
56256ac
... and to py3.6 jail
dionhaefner Jan 31, 2022
afd865a
Add test for key standardization
nickeopti Jan 31, 2022
ac19a83
Merge branch 'sqlalchemy-compositional' of https://github.com/DHI-GRA…
nickeopti Jan 31, 2022
6e5b95a
Test raster retrieval with all resampling methods
nickeopti Jan 31, 2022
37cb0d0
Add test for raster.get_raster_tile
nickeopti Jan 31, 2022
76185e3
Test unknown resampling method
nickeopti Jan 31, 2022
0d1096f
Test raster.get_metadata with large_raster_threshold exceeded
nickeopti Jan 31, 2022
551ae7c
bump coverage
dionhaefner Feb 1, 2022
a2ab041
resolve merge conflicts
dionhaefner Feb 1, 2022
8d7ad06
replace type ignore with assertion
dionhaefner Feb 1, 2022
33373e1
:lipstick:
dionhaefner Feb 1, 2022
36eaf1d
Rename driver files and make key standardization a method
nickeopti Feb 1, 2022
046720d
Remember the new/renamed files!
nickeopti Feb 1, 2022
bd45e00
Use underscores in meta_store and raster_store
nickeopti Feb 1, 2022
d5d1b09
Also standardize the where/keys for get_datasets()
nickeopti Feb 1, 2022
3fdd90a
Rename to squeeze
nickeopti Feb 1, 2022
84a5219
Improve repr
nickeopti Feb 1, 2022
1086d52
Rename to GeoTiffRasterStore
nickeopti Feb 1, 2022
65cd29a
Rename to RelationalMetaStore
nickeopti Feb 1, 2022
fed5a66
Don't use too implicit hacks
nickeopti Feb 1, 2022
ab6449f
Update test to new repr
nickeopti Feb 1, 2022
9ad93b4
Merge branch 'sqlalchemy-compositional' of https://github.com/DHI-GRA…
nickeopti Feb 1, 2022
4ab4bdd
Rename filepath to handle
nickeopti Feb 1, 2022
f89052e
Don't print anything
nickeopti Feb 1, 2022
ca15c4a
Rename *_stores
nickeopti Feb 1, 2022
401728a
Re-rename keys to where
nickeopti Feb 1, 2022
62be08d
Check for missing dataset in get_metadata, not in squeeze
nickeopti Feb 4, 2022
236f677
Define keystype explicitly
nickeopti Feb 4, 2022
c8d93ee
Make keys standardization type check
nickeopti Feb 4, 2022
f523ebe
Improve descriptiveness of metadata reload comment
nickeopti Feb 4, 2022
0cce1b7
Re-rename handle to path
nickeopti Feb 4, 2022
8aad626
update docstrings
mrpgraae Feb 20, 2022
0c1c94c
pin pytest<7.0
mrpgraae Feb 20, 2022
06a6d1a
do not assemble rio env in driver
dionhaefner Feb 21, 2022
da9f20f
Update filename in module docstring
mrpgraae Feb 21, 2022
891185a
docstring polish :memo:
mrpgraae Feb 21, 2022
dc835a6
Improve reprs and satisfy flake8
nickeopti Feb 21, 2022
7e75ff4
Improve normalised path from sqlite metastores and update relevant docs
nickeopti Feb 21, 2022
69b7876
Update filenames in first line of files to reflect their actual filen…
nickeopti Feb 21, 2022
06d6d49
Always stringify url_or_path
nickeopti Feb 21, 2022
722e7df
Rename *Driver classes to *MetaStore
nickeopti Feb 21, 2022
41a26e7
Remove references to rasters in meta stores's documentation
nickeopti Feb 21, 2022
3b65c8f
Simplify docstrings in internal base_classes.py
nickeopti Feb 21, 2022
5a9969e
Fix bug (on Windows paths) in sqlite metastore _normalize_path
nickeopti Feb 21, 2022
d8b1ea2
Specify arguments to MetaStore.insert
nickeopti Feb 21, 2022
cd0fe1d
Specify path in meta stores to be of type str
nickeopti Feb 21, 2022
1556dce
Use SQLAlchemy dialect+driver terminology
nickeopti Feb 21, 2022
b2ebcba
fix API docs
dionhaefner Feb 21, 2022
a8ee100
Merge pull request #248 from DHI-GRAS/sqlalchemy-compositional
dionhaefner Feb 21, 2022
bfa90a8
merge
dionhaefner Feb 21, 2022
007fa95
Merge branch 'main' into sqlalchemy
dionhaefner Feb 21, 2022
2f7f9bb
unpin pytest
dionhaefner Feb 21, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,7 @@ jobs:

- name: Initialize mypy
run: |
mypy . > /dev/null || true
mypy --install-types --non-interactive
mypy --install-types --non-interactive . || true

- name: Run tests
run: |
Expand Down
39 changes: 18 additions & 21 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,30 +15,27 @@ Get a driver instance

.. autofunction:: terracotta.get_driver

SQLite driver
-------------
TerracottaDriver
----------------

.. autoclass:: terracotta.drivers.sqlite.SQLiteDriver
.. autoclass:: terracotta.drivers.TerracottaDriver
:members:
:undoc-members:
:special-members: __init__
:inherited-members:

Remote SQLite driver
--------------------

.. autoclass:: terracotta.drivers.sqlite_remote.RemoteSQLiteDriver
:members:
:undoc-members:
:special-members: __init__
:inherited-members:
:exclude-members: delete, insert, create
Supported metadata stores
-------------------------

MySQL driver
------------
SQLite metadata store
+++++++++++++++++++++

.. autoclass:: terracotta.drivers.mysql.MySQLDriver
:members:
:undoc-members:
:special-members: __init__
:inherited-members:
.. autoclass:: terracotta.drivers.sqlite_meta_store.SQLiteMetaStore

Remote SQLite metadata store
++++++++++++++++++++++++++++

.. autoclass:: terracotta.drivers.sqlite_remote_meta_store.RemoteSQLiteMetaStore

MySQL metadata store
++++++++++++++++++++

.. autoclass:: terracotta.drivers.mysql_meta_store.MySQLMetaStore
2 changes: 2 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
'Programming Language :: Python :: 3.7',
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: 3.9',
'Programming Language :: Python :: 3.10',
nickeopti marked this conversation as resolved.
Show resolved Hide resolved
'Framework :: Flask',
'Operating System :: Microsoft :: Windows :: Windows 10',
'Operating System :: MacOS :: MacOS X',
Expand Down Expand Up @@ -72,6 +73,7 @@
'shapely',
'rasterio>=1.0',
'shapely',
'sqlalchemy',
'toml',
'tqdm'
],
Expand Down
37 changes: 23 additions & 14 deletions terracotta/cog.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ def validate(src_path: str, strict: bool = True) -> bool:
def check_raster_file(src_path: str) -> ValidationInfo: # pragma: no cover
"""
Implementation from
https://github.com/cogeotiff/rio-cogeo/blob/0f00a6ee1eff602014fbc88178a069bd9f4a10da/rio_cogeo/cogeo.py
https://github.com/cogeotiff/rio-cogeo/blob/a07d914e2d898878417638bbc089179f01eb5b28/rio_cogeo/cogeo.py#L385

This function is the rasterio equivalent of
https://svn.osgeo.org/gdal/trunk/gdal/swig/python/samples/validate_cloud_optimized_geotiff.py
Expand All @@ -44,15 +44,13 @@ def check_raster_file(src_path: str) -> ValidationInfo: # pragma: no cover
errors.append('The file is not a GeoTIFF')
return errors, warnings, details

filelist = [os.path.basename(f) for f in src.files]
src_bname = os.path.basename(src_path)
if len(filelist) > 1 and src_bname + '.ovr' in filelist:
if any(os.path.splitext(x)[-1] == '.ovr' for x in src.files):
errors.append(
'Overviews found in external .ovr file. They should be internal'
)

overviews = src.overviews(1)
if src.width >= 512 or src.height >= 512:
if src.width > 512 and src.height > 512:
if not src.is_tiled:
errors.append(
'The file is greater than 512xH or 512xW, but is not tiled'
Expand All @@ -65,16 +63,28 @@ def check_raster_file(src_path: str) -> ValidationInfo: # pragma: no cover
)

ifd_offset = int(src.get_tag_item('IFD_OFFSET', 'TIFF', bidx=1))
ifd_offsets = [ifd_offset]
# Starting from GDAL 3.1, GeoTIFF and COG have ghost headers
# e.g:
# """
# GDAL_STRUCTURAL_METADATA_SIZE=000140 bytes
# LAYOUT=IFDS_BEFORE_DATA
# BLOCK_ORDER=ROW_MAJOR
# BLOCK_LEADER=SIZE_AS_UINT4
# BLOCK_TRAILER=LAST_4_BYTES_REPEATED
# KNOWN_INCOMPATIBLE_EDITION=NO
# """
#
# This header should be < 200bytes
if ifd_offset > 300:
errors.append(
f'The offset of the main IFD should be < 300. It is {ifd_offset} instead'
)

ifd_offsets = [ifd_offset]
details['ifd_offsets'] = {}
details['ifd_offsets']['main'] = ifd_offset

if not overviews == sorted(overviews):
if overviews and overviews != sorted(overviews):
errors.append('Overviews should be sorted')

for ix, dec in enumerate(overviews):
Expand Down Expand Up @@ -111,23 +121,22 @@ def check_raster_file(src_path: str) -> ValidationInfo: # pragma: no cover
)
)

block_offset = int(src.get_tag_item('BLOCK_OFFSET_0_0', 'TIFF', bidx=1))
if not block_offset:
errors.append('Missing BLOCK_OFFSET_0_0')
block_offset = src.get_tag_item('BLOCK_OFFSET_0_0', 'TIFF', bidx=1)

data_offset = int(block_offset) if block_offset else 0
data_offsets = [data_offset]
details['data_offsets'] = {}
details['data_offsets']['main'] = data_offset

for ix, dec in enumerate(overviews):
data_offset = int(
src.get_tag_item('BLOCK_OFFSET_0_0', 'TIFF', bidx=1, ovr=ix)
block_offset = src.get_tag_item(
'BLOCK_OFFSET_0_0', 'TIFF', bidx=1, ovr=ix
)
data_offset = int(block_offset) if block_offset else 0
data_offsets.append(data_offset)
details['data_offsets']['overview_{}'.format(ix)] = data_offset

if data_offsets[-1] < ifd_offsets[-1]:
if data_offsets[-1] != 0 and data_offsets[-1] < ifd_offsets[-1]:
if len(overviews) > 0:
errors.append(
'The offset of the first block of the smallest overview '
Expand Down Expand Up @@ -156,7 +165,7 @@ def check_raster_file(src_path: str) -> ValidationInfo: # pragma: no cover

for ix, dec in enumerate(overviews):
with rasterio.open(src_path, OVERVIEW_LEVEL=ix) as ovr_dst:
if ovr_dst.width >= 512 or ovr_dst.height >= 512:
if ovr_dst.width > 512 and ovr_dst.height > 512:
if not ovr_dst.is_tiled:
errors.append('Overview of index {} is not tiled'.format(ix))

Expand Down
55 changes: 35 additions & 20 deletions terracotta/drivers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,33 +3,36 @@
Define an interface to retrieve Terracotta drivers.
"""

import os
from typing import Union, Tuple, Dict, Type
import urllib.parse as urlparse
from pathlib import Path

from terracotta.drivers.base import Driver
from terracotta.drivers.base_classes import MetaStore
from terracotta.drivers.terracotta_driver import TerracottaDriver
from terracotta.drivers.geotiff_raster_store import GeoTiffRasterStore

URLOrPathType = Union[str, Path]


def load_driver(provider: str) -> Type[Driver]:
def load_driver(provider: str) -> Type[MetaStore]:
if provider == 'sqlite-remote':
from terracotta.drivers.sqlite_remote import RemoteSQLiteDriver
return RemoteSQLiteDriver
from terracotta.drivers.sqlite_remote_meta_store import RemoteSQLiteMetaStore
return RemoteSQLiteMetaStore

if provider == 'mysql':
from terracotta.drivers.mysql import MySQLDriver
return MySQLDriver
from terracotta.drivers.mysql_meta_store import MySQLMetaStore
return MySQLMetaStore

if provider == 'sqlite':
from terracotta.drivers.sqlite import SQLiteDriver
return SQLiteDriver
from terracotta.drivers.sqlite_meta_store import SQLiteMetaStore
return SQLiteMetaStore

raise ValueError(f'Unknown database provider {provider}')


def auto_detect_provider(url_or_path: Union[str, Path]) -> str:
parsed_path = urlparse.urlparse(str(url_or_path))
def auto_detect_provider(url_or_path: str) -> str:
parsed_path = urlparse.urlparse(url_or_path)

scheme = parsed_path.scheme
if scheme == 's3':
Expand All @@ -41,10 +44,10 @@ def auto_detect_provider(url_or_path: Union[str, Path]) -> str:
return 'sqlite'


_DRIVER_CACHE: Dict[Tuple[URLOrPathType, str], Driver] = {}
_DRIVER_CACHE: Dict[Tuple[URLOrPathType, str, int], TerracottaDriver] = {}


def get_driver(url_or_path: URLOrPathType, provider: str = None) -> Driver:
def get_driver(url_or_path: URLOrPathType, provider: str = None) -> TerracottaDriver:
"""Retrieve Terracotta driver instance for the given path.

This function always returns the same instance for identical inputs.
Expand All @@ -65,25 +68,37 @@ def get_driver(url_or_path: URLOrPathType, provider: str = None) -> Driver:

>>> import terracotta as tc
>>> tc.get_driver('tc.sqlite')
SQLiteDriver('/home/terracotta/tc.sqlite')
TerracottaDriver(
meta_store=SQLiteDriver('/home/terracotta/tc.sqlite'),
raster_store=GeoTiffRasterStore()
)
>>> tc.get_driver('mysql://root@localhost/tc')
MySQLDriver('mysql://root@localhost:3306/tc')
TerracottaDriver(
meta_store=MySQLDriver('mysql+pymysql://localhost:3306/tc'),
raster_store=GeoTiffRasterStore()
)
>>> # pass provider if path is given in a non-standard way
>>> tc.get_driver('root@localhost/tc', provider='mysql')
MySQLDriver('mysql://root@localhost:3306/tc')
TerracottaDriver(
meta_store=MySQLDriver('mysql+pymysql://localhost:3306/tc'),
raster_store=GeoTiffRasterStore()
)

"""
url_or_path = str(url_or_path)

if provider is None: # try and auto-detect
provider = auto_detect_provider(url_or_path)

if isinstance(url_or_path, Path) or provider == 'sqlite':
url_or_path = str(Path(url_or_path).resolve())

DriverClass = load_driver(provider)
normalized_path = DriverClass._normalize_path(url_or_path)
cache_key = (normalized_path, provider)
cache_key = (normalized_path, provider, os.getpid())

if cache_key not in _DRIVER_CACHE:
_DRIVER_CACHE[cache_key] = DriverClass(url_or_path)
driver = TerracottaDriver(
meta_store=DriverClass(url_or_path),
raster_store=GeoTiffRasterStore()
)
_DRIVER_CACHE[cache_key] = driver

return _DRIVER_CACHE[cache_key]
Loading