Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for PostgreSQL/Redshift geometric operators #1723

Merged
merged 1 commit into from
Feb 20, 2025

Conversation

benrsatori
Copy link
Contributor

Add support for PostgreSQL and Redshift geometric operators.

Comment on lines 95 to 101
/// `+` Addition, also Translation (PostgreSQL/Redshift geometric operator), e.g. `box '((0,0),(1,1))' + point '(2.0,0)'`
Plus,
/// Minus, e.g. `a - b`
/// `-` Subtraction, Translation (PostgreSQL/Redshift geometric operator), e.g. `box '((0,0),(1,1))' - point '(2.0,0)'`
Minus,
/// Multiply, e.g. `a * b`
/// `*` Multiplication, also Scaling (PostgreSQL/Redshift geometric operator), e.g. `box '((0,0),(1,1))' * point '(2.0,0)'`
Multiply,
/// Divide, e.g. `a / b`
/// `/` Division, also Scaling (PostgreSQL/Redshift geometric operator), e.g. `box '((0,0),(2,2))' / point '(2.0,0)'`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we skip this diff? I'm thinking since these operators are fundamental and apply to various scenarios, no need to mention specific dialect semantics given neither the parser nor AST handles them specially as a result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/// `#` Number of points in path or polygon (PostgreSQL/Redshift geometric operator), e.g. `# path '((1,0),(0,1),(-1,0))'`
NumOfPoints,
/// `@-@` Length or circumference (PostgreSQL/Redshift geometric operator), e.g. `@-@ path '((0,0),(1,0))'`
LenOfCircumference,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the doc says length or circumference, is the Of a typo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a typo, fixed

@@ -53,6 +53,16 @@ pub enum UnaryOperator {
PGAbs,
/// Unary logical not operator: e.g. `! false` (Hive-specific)
BangNot,
/// `#` Number of points in path or polygon (PostgreSQL/Redshift geometric operator), e.g. `# path '((1,0),(0,1),(-1,0))'`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we include a link to the docs where the operators are described? that would be helpful for folks that come across it and need to find more context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 139 to 143
// `#` PostgreSQL Bitwise XOR, also Intersection (PostgreSQL/Redshift geometric operator), e.g. `box '((1,-1),(-1,1))' # box '((1,1),(-2,-2))'`
PGBitwiseXor,
/// Bitwise shift left, e.g. `a << b` (PostgreSQL-specific)
/// `<<` PostgreSQL Bitwise Shift Left, also Left of? (PostgreSQL/Redshift geometric operator), e.g. circle '((0,0),1)' << circle '((5,0),1)'
PGBitwiseShiftLeft,
/// Bitwise shift right, e.g. `a >> b` (PostgreSQL-specific)
/// `>>` PostgreSQL Bitwise Shift Right, Right of? (PostgreSQL/Redshift geometric operator), e.g. circle '((5,0),1)' >> circle '((0,0),1)'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually same with these, I think just mentioning the operator suffices, without going into semantics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -253,6 +268,40 @@ pub enum BinaryOperator {
/// Specifies a test for an overlap between two datetime periods:
/// <https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#overlaps-predicate>
Overlaps,
/// `##` Point of closest proximity (PostgreSQL/Redshift geometric operator), e.g. `point '(0,0)' ## lseg '((2,0),(0,2))'`
PointOfClosestProximity,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm actually thinking about it, I feel like the naming of the operators might become problematic, e.g. if another dialect supports ## operator then calling it PointOfClosestProximity would be confusing vs if the operator was called DoubleHash dialect agnostic. Would it make sense to name the operators in such a manner? i.e. by their characters or if theres a well known description that isnt describing the pg behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 1355 to 1358
if dialect_of!(self is PostgreSqlDialect | RedshiftSqlDialect)
&& GEOMETRIC_TYPES.contains(&w.keyword)
{
self.parse_geometric_type(w.keyword)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be its own branch within parse_expr_prefix_by_reserved_word instead?

It looks like we should be able to do something like

kw if self.dialect.supports_pg_geometric_types() && PG_GEOMETRIC_TYPES.contains(kw) => {
    self.parse_geometric_expr(kw)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 1536 to 1541
let token_with_span = self.next_token(); // Get the full TokenWithSpan
let value = match token_with_span.token {
// Extract the Token field
Token::SingleQuotedString(s) => Value::SingleQuotedString(s),
_ => return self.expected("SingleQuotedString", token_with_span), // Pass full TokenWithSpan
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let token_with_span = self.next_token(); // Get the full TokenWithSpan
let value = match token_with_span.token {
// Extract the Token field
Token::SingleQuotedString(s) => Value::SingleQuotedString(s),
_ => return self.expected("SingleQuotedString", token_with_span), // Pass full TokenWithSpan
};
let token_with_span = self.next_token();
let value = match token_with_span.token {
Token::SingleQuotedString(s) => Value::SingleQuotedString(s),
_ => return self.expected("SingleQuotedString", token_with_span),
};

thinking we can drop the comments in this case since the code is as descriptive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

| tok @ Token::AtAt
| tok @ Token::QuestionMarkDash
| tok @ Token::QuestionPipe
if dialect_of!(self is PostgreSqlDialect | RedshiftSqlDialect) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace these with a dialect method? the dialect_of has been deprecated, we can introduce a method like supports_pg_geometric_types() or similar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 287 to 297
pub fn only_psql_redshift() -> TestedDialects {
TestedDialects {
dialects: vec![
Box::new(PostgreSqlDialect {}),
Box::new(RedshiftSqlDialect {}),
],
options: None,
recursion_limit: None,
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by switching to the dialect method, it would be possible to reuse all_dialects_where in place of this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code to use all_dialects_where as to your suggestion

Comment on lines 2061 to 2066
//"&<", // (is strictly to the left of) // Same as OverlapsToLeft, causes duplicate failure
//"&>", // (is strictly to the right of) // Same as OverlapsToRight, causes duplicate failure
"|=|", // distance between A and B trajectories at their closest point of approach
"<<#>>", // n-D distance between A and B bounding boxes
"|>>", // A's bounding box is strictly above B's.
"~=", // bounding box is the same
//"|>>", // A's bounding box is strictly above B's. // Same as IsStrictlyAbove, causes duplicate failure
//"~=", // bounding box is the same // Same as SameAs, causes duplicate failure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I followed why these are commented out, what do we mean by causes duplicate failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented-out operators are already defined under different names in the code,
so they can no longer be considered custom operators.
If we include these again under their original symbols, it causes a duplicate match failure.
so I have now completely removed them.

@benrsatori benrsatori force-pushed the geo-op branch 3 times, most recently from b297ccb to 01f6646 Compare February 13, 2025 11:21
Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @benrsatori! Left some comments, the changes look good to me overall

@@ -912,6 +927,10 @@ pub trait Dialect: Debug + Any {
fn supports_array_typedef_size(&self) -> bool {
false
}
/// Returns true if the dialect supports geometric types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a link to the postgres docs here (and maybe an example sql)? it would help folks with context on what we mean by geometric types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -3079,6 +3128,55 @@ impl<'a> Parser<'a> {
Token::QuestionAnd => Some(BinaryOperator::QuestionAnd),
Token::QuestionPipe => Some(BinaryOperator::QuestionPipe),
Token::CustomBinaryOperator(s) => Some(BinaryOperator::Custom(s.clone())),
Token::DoubleSharp if dialect_of!(self is PostgreSqlDialect | RedshiftSqlDialect) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for these clauses as well could we use the introduced self.dialect.supports_geometric_types()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

src/tokenizer.rs Outdated
@@ -1308,6 +1356,28 @@ impl<'a> Tokenizer<'a> {
_ => self.start_binop(chars, "||", Token::StringConcat),
}
}
Some('&') if dialect_of!(self is PostgreSqlDialect | RedshiftSqlDialect) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly for these clauses it would be nice to use the self.dialect.supports_geometric_types() method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -879,3 +882,29 @@ pub enum ArrayElemTypeDef {
/// `Array(Int64)`
Parenthesis(Box<DataType>),
}

#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash, PartialOrd, Ord)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a short description to this struct with a link to the docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -53,6 +53,21 @@ pub enum UnaryOperator {
PGAbs,
/// Unary logical not operator: e.g. `! false` (Hive-specific)
BangNot,
/// `#` Number of points in path or polygon (PostgreSQL/Redshift geometric operator)
/// see <https://www.postgresql.org/docs/9.5/functions-geometry.html>
NumOfPoints,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do similar to the binary operator where we name them according to their symbols to be dialect agnostic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -386,6 +386,8 @@ pub enum DataType {
///
/// [bigquery]: https://cloud.google.com/bigquery/docs/user-defined-functions#templated-sql-udf-parameters
AnyType,
/// geometric type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we include a link to the postgres docs here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@benrsatori benrsatori force-pushed the geo-op branch 2 times, most recently from 9c62ffc to fdc90cf Compare February 18, 2025 10:31
Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @benrsatori!
cc @alamb

@iffyio
Copy link
Contributor

iffyio commented Feb 19, 2025

@benrsatori could you take a look at the CI failures?

@benrsatori
Copy link
Contributor Author

Hi @iffyio
I fixed the CI

@iffyio iffyio merged commit 339239d into apache:main Feb 20, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants