-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relational Databases (MySQL, PostgreSQL) Integration #195
base: main
Are you sure you want to change the base?
Conversation
…_greater_or_equal_than`, `is_greater_than`
…ay`, `is_on_saturday`, `is_on_thursday`, `is_on_tuesday`, `is_on_wednesday`, `is_on_weekday` and `is_on_weekend`
…`, `is_in_billions`, `is_in_millions`, `is_less_or_equal_than`, `is_less_than`
…day`, `is_on_sunday`, `is_on_thursday`, `is_on_tuesday`, `is_on_wednesday`, `is_on_weekday`
⤴️ Updated Docstrings⤴️ Updated `README.md`
@dsaad68 thanks for this great addition. I will conduct a review and proceed with the integration. |
If there are any issues with this suggestion, please let me know. |
Hi @dsaad68 there is no issues with the submission. We are in a ongoing review for the Journal of Open Source software, and they have indicated the lack of some documentation. Before adding more functionality, we would like to pass the JOSS review, with the consolidated docs, at least from the core classes and modules. I think once we pass this hurdle, we will be adding the new functionality. Hope that explains the delays. Thank you in advance. |
@canimus I totally understand, good luck. Let me know if I can help. |
Hi @dsaad68 thanks for your patience with this PR. Now that the paper is out of the way, and finally published, I would like to make sure your contribution is all-in, because this branch has been stalled for some time, and a few commits and merges have passed, would you kindly review if we can bring it to the current state, and resolve the conflicts highlighted above? Thanks in advance. |
@canimus I will look at this week. |
cuallee
Relational Databases (MySQL, PostgreSQL) Integration
Utilizing
Polars
with theConnectorX
engine to read from a database, tables in relational databases can now be checked and validated. ConnectorX, written in Rust, has native support for Apache Arrow, enabling it to transfer data directly into a Polars'sDataFrame
without copying the data (zero-copy). More information about the ConnectorX engine can be found here.This feature allows for the future integration of other DBMS, such as
Redshift
(via the PostgreSQL protocol) andClickHouse
(via the MySQL protocol), intocuallee
.✨ Feature Enhancements:
PostgreSQL
andMySQL
, enabling more robust data integrity checks.pyproject.toml
.🧪 Testing:
skip
.)PyTest
fixtures for automated unit tests forPostgreSQL
andMySQL
.init-db-psql.sql
andinit-db-mysql.sql
.test_validate.py
to validate the type of outputted DataFrame, ensuring the resulting DataFrame is typepolars_dataframe
.test_validate.py
to validate the error when a column is not found in the table.📖 Documentation:
README.md
How-To
to set up the test environment.⛑️ Know Issues:
It's important to note that most of
PostgreSQL
'sCompute
are inherited fromDuckDB
'sCompute
methods, so any changes inDuckDB
'sCompute
methods will affectPostgreSQL
'sCompute
methods.has_std
check could fail due to floating point precision. The approach suggested below could resolve it, but it needs to introduce a precision error parameter:🦺 Limitation:
PostgreSQL
Not all checks are currently available for
PostgreSQL
. Unavailable checks:is_daily
has_entropy
has_workflow
MySQL
Not all checks are currently available for
MySQL
. Unavailable checks:is_daily
has_entropy
has_workflow
has_percentile
has_correlation
is_inside_interquartile_range
Improve the
Compute
methods to support complex queries:Because the
Compute
methods are used to create a unified query, it is impossible to create complex queries with multipleCompute
methods. This section needed to be improved to support complex queries.Inherency Enhancement:
A significant portion of
PostgreSQL
's Compute functionalities are derived fromDuckDB
's Compute methods.Therefore, any modifications to the Compute methods in
DuckDB
will directly impact the functionalities inPostgreSQL
.Consolidating these shared methods into a separate class for better manageability is advisable.