Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Followup: Return TIMESTAMP columns as native Python datetime objects #437

Closed
wants to merge 23 commits into from
Closed
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
312eb24
Changed the sql_path
Aymaru Feb 22, 2021
41a881d
Transform dates from crate to python datetime
Aymaru Feb 22, 2021
49076da
Datetime conversion implemented using map and generator
Aymaru Mar 2, 2021
c053da3
Updated tests
Aymaru Mar 2, 2021
feebd45
Using generators to work with large datasets
Aymaru Mar 3, 2021
6566879
fix
Aymaru Mar 3, 2021
a38860b
test
Aymaru Mar 3, 2021
e4fa0fa
updated datetime transformation using generators and updated test cases
Aymaru Mar 4, 2021
77b6550
cleaning debug prints
Aymaru Mar 4, 2021
ee77ec1
Passing a generator of the flags instead of passing the list of values
Aymaru Mar 4, 2021
5b33589
Removed tests
Aymaru Mar 4, 2021
7e39244
updated conversion of timestamps
Aymaru Mar 30, 2021
f750966
Added pandas dependency
Aymaru Mar 30, 2021
e6facf3
Changed pandas timestamp to python datetime && deleted pandas dependecy
Aymaru Mar 30, 2021
e6887f4
fixed - E226 missing whitespace around arithmetic operator
Aymaru Mar 30, 2021
a8ffdc3
Changed yield value
Aymaru Mar 31, 2021
1c46079
Validate date type in processors
Aymaru Mar 31, 2021
8ce9209
Fix tests: 979e82af6 adjusted the timestamps in the test fixture data
amotl Jul 19, 2022
ccc00a1
Fix tests: Use defined time zone when validating naive datetime objects
amotl Jul 25, 2022
a8532c5
Remove spurious `print` statement
amotl Jul 21, 2022
2cc1b90
Fix error when using _both_ `types=true` and `error_trace=true` options
amotl Jul 22, 2022
f355164
Polish `datetime` conversion implementation
amotl Jul 26, 2022
3154d32
Use `datetime.utcfromtimestamp`
amotl Jul 27, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 40 additions & 1 deletion src/crate/client/cursor.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

from .exceptions import ProgrammingError
import warnings
from datetime import datetime


class Cursor(object):
Expand Down Expand Up @@ -49,9 +50,47 @@ def execute(self, sql, parameters=None, bulk_parameters=None):

self._result = self.connection.client.sql(sql, parameters,
bulk_parameters)
if "rows" in self._result:

if "rows" not in self._result:
return

if "col_types" in self._result:
self.rows = iter(self._transform_result_types())

else:
self.rows = iter(self._result["rows"])

def _transform_result_types(self):
"""
Generate row items with column values converted to their corresponding
native Python types, based on information from `col_types`.

Currently, only converting to native `datetime` objects is implemented.
"""
datetime_column_types = [11, 15]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does someone have a resource at hand where those column code types are enumerated? The table at 1 coincidentally lists _timestamp without time zone as 1115, but here the code is apparently expecting two-digit integer numbers.

I will be happy to receive further pointers for better educating myself on this topic.

Footnotes

  1. https://crate.io/docs/crate/reference/en/5.0/interfaces/postgres.html#pg-type

Copy link
Member

@mfussenegger mfussenegger Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://crate.io/docs/crate/reference/en/5.0/interfaces/http.html#column-types

The code here will probably also need to handle arrays of timestamps.

Copy link
Member Author

@amotl amotl Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will add the conversion for container types as well.

Reading up on your reference, I am asking myself whether it is appropriate that type=15 (Unchecked object) is handled here as well?

datetime_columns_mask = [
True if col_type in datetime_column_types else False
for col_type in self._result["col_types"]
]
for row in self._result["rows"]:
yield list(self._transform_datetime_columns(row, iter(datetime_columns_mask)))

@staticmethod
def _transform_datetime_columns(row, column_flags):
"""
Convert all designated columns to native Python `datetime` objects.
"""
for value in row:
try:
flag = next(column_flags)
except StopIteration:
break

if flag and value is not None:
value = datetime.fromtimestamp(value / 1e3)
Copy link
Member

@mfussenegger mfussenegger Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using utcfromtimestamp is the less error prone option here.

There are two scenarios:

The user stores UTC on the server:

  • fromtimestamp would be wrong, or at least confusing as it would convert to localtime but without attaching a timezone info.
  • utcfromtimestamp would at least not be wrong.

The user stores local timestamps on the server. Unless they also store the timezone in a separate field this is error prone but

  • utcfromtimestamp would at least preserve the value used on the server
  • fromtimestamp would be wrong, because it would apply another offset.

It's important that we don't attach a tzinfo given that we don't have one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option could be to to let users configure the timezone on the client (with default UTC) and always create timezone aware objects.

The default could turn out to be wrong, but at least the user has an easy option to fix it, and it's easier to spot a incorrectly attached tzone then it is to figure out that a conversion happened and why/where some conversion happend.
It's also way more convenient/less error prone to continue working with tz aware datetimes.

Copy link
Member Author

@amotl amotl Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. With 3154d32, I've switched to use datetime.utcfromtimestamp instead.

Another option could be to to let users configure the timezone on the client (with default UTC) and always create timezone aware objects.

Shall we bring in the timezone awareness (#359, #361) with a subsequent patch right after this one?


yield value

def executemany(self, sql, seq_of_parameters):
"""
Prepare a database operation (query or command) and then execute it
Expand Down
16 changes: 13 additions & 3 deletions src/crate/client/doctests/client.txt
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,11 @@ fetchall()
Cursor Description
==================

.. hidden: Set time zone to UTC to make naive datetime objects deterministic.

>>> previous_timezone = os.environ.get("TZ")
>>> os.environ["TZ"] = "UTC"

The ``description`` property of the cursor returns a sequence of 7-item
sequences containing the column name as first parameter. Just the name field is
supported, all other fields are 'None'::
Expand All @@ -212,9 +217,9 @@ supported, all other fields are 'None'::
>>> result = cursor.fetchone()
>>> pprint(result)
['Aldebaran',
1658167836758,
1658167836758,
1658167836758,
datetime.datetime(2022, 7, 18, 18, 10, 36, 758000),
datetime.datetime(2022, 7, 18, 18, 10, 36, 758000),
datetime.datetime(2022, 7, 18, 18, 10, 36, 758000),
None,
None,
'Star System',
Expand All @@ -239,6 +244,11 @@ supported, all other fields are 'None'::
('description', None, None, None, None, None, None),
('details', None, None, None, None, None, None))

.. hidden: Restore time zone

>>> if previous_timezone: os.environ["TZ"] = previous_timezone


Closing the Cursor
==================

Expand Down
3 changes: 2 additions & 1 deletion src/crate/client/doctests/http.txt
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,8 @@ Issue a select statement against our with test data pre-filled crate instance::
>>> http_client = HttpClient(crate_host)
>>> result = http_client.sql('select name from locations order by name')
>>> pprint(result)
{'cols': ['name'],
{'col_types': [4],
'cols': ['name'],
'duration': ...,
'rowcount': 13,
'rows': [['Aldebaran'],
Expand Down
4 changes: 2 additions & 2 deletions src/crate/client/http.py
Original file line number Diff line number Diff line change
Expand Up @@ -315,7 +315,7 @@ class Client(object):
Crate connection client using CrateDB's HTTP API.
"""

SQL_PATH = '/_sql'
SQL_PATH = '/_sql?types=true'
"""Crate URI path for issuing SQL statements."""

retry_interval = 30
Expand Down Expand Up @@ -385,7 +385,7 @@ def __init__(self,

self.path = self.SQL_PATH
if error_trace:
self.path += '?error_trace=true'
self.path += '&error_trace=true'

def close(self):
for server in self.server_pool.values():
Expand Down
4 changes: 4 additions & 0 deletions src/crate/client/sqlalchemy/dialect.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,8 @@ def result_processor(self, dialect, coltype):
def process(value):
if not value:
return
if isinstance(value, datetime):
return value.date()
try:
return datetime.utcfromtimestamp(value / 1e3).date()
except TypeError:
Expand Down Expand Up @@ -128,6 +130,8 @@ def result_processor(self, dialect, coltype):
def process(value):
if not value:
return
if isinstance(value, datetime):
return value
try:
return datetime.utcfromtimestamp(value / 1e3)
except TypeError:
Expand Down
3 changes: 0 additions & 3 deletions src/crate/client/sqlalchemy/doctests/itests.txt
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,6 @@ Date should have been set at the insert due to default value via python method::
>>> dt.day == now.day
True

>>> (now - location.datetime_tz).seconds < 4
True

Comment on lines -83 to -85
Copy link
Member Author

@amotl amotl Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this test case protect against anything significant? Currently, it croaks on my machine with a difference of 79600s (22.11 h), and I can not make much other sense of it. Most probably, I am missing the point.

Verify the return type of date and datetime::

>>> type(location.date)
Expand Down
3 changes: 2 additions & 1 deletion src/crate/client/test_http.py
Original file line number Diff line number Diff line change
Expand Up @@ -431,12 +431,13 @@ def test_params(self):
client = Client(['127.0.0.1:4200'], error_trace=True)
parsed = urlparse(client.path)
params = parse_qs(parsed.query)
self.assertEqual(params["types"], ["true"])
self.assertEqual(params["error_trace"], ["true"])
client.close()

def test_no_params(self):
client = Client()
self.assertEqual(client.path, "/_sql")
self.assertEqual(client.path, "/_sql?types=true")
client.close()


Expand Down
1 change: 1 addition & 0 deletions src/crate/client/tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@ def refresh(table):


def setUpWithCrateLayer(test):
test.globs['os'] = os
test.globs['HttpClient'] = http.Client
test.globs['crate_host'] = crate_host
test.globs['pprint'] = pprint
Expand Down