Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function for ETL job #11

Merged
merged 5 commits into from
Oct 27, 2023
Merged

Function for ETL job #11

merged 5 commits into from
Oct 27, 2023

Conversation

Xinyihe123
Copy link

@Xinyihe123 Xinyihe123 commented Oct 7, 2023

Fixed: #8

The file transfer.py:
Given an organization ID, find the target organization, related planters, trees, and species and insert them into the target database.
Need to check the mentioned table names and column names are correct.

import argparse
import datetime

DATABASE_URL="postgresql://doadmin:l5al4hwte8qmj6x8@db-postgresql-sfo2-nextgen-do-user-1067699-0.db.ondigitalocean.com:25060/treetracker?ssl=true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't paste database_url here, it will compromise our db.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry for that. Didn't realize that's a public one.

Comment on lines 153 to 160
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Transfer data from source to target database.")
parser.add_argument("-t","--target_db", default=None, help="URI for the target PostgreSQL database.")
parser.add_argument("-s","--source_db", default=DATABASE_URL, help="URI for the source PostgreSQL database.")
parser.add_argument("-o","--org_id", required=True, type=int, help="ID of the target organization.")
parser.add_argument("-a","--action", default = False,type=bool, help="Whether to update the database.")
args = parser.parse_args()
transfer_data(target=args.target_db, source=args.source_db,organization_id=args.org_id, action=args.action)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this to the pattern as here?
https://github.com/Greenstand/treetracker-airflow-dags/blob/360580afffd51021661c3e88a78f8bae1462ed72/lib/capture_export.py
https://github.com/Greenstand/treetracker-airflow-dags/blob/360580afffd51021661c3e88a78f8bae1462ed72/lib/capture_export_test.py

Here is a simple guide for this test pattern:
https://github.com/Greenstand/treetracker-airflow-dags#option-1-develop-without-installing-airflow

By doing this, we can:

  • Make this function platform-agnostic, so it is easy to be reused by: cli, faas, airflow
  • Easier to be tested, unit test lib used in the sample code is a good tool for testing and developing, you can easily check your code even hot-refresh your code once you change sometime in your sourcecode.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I'll add the test file later.

@dadiorchen
Copy link
Contributor

@Xinyihe123 thank you for the contribution! Above is some suggestion from my side.

@Xinyihe123 Xinyihe123 force-pushed the main branch 3 times, most recently from 796962e to ab32668 Compare October 7, 2023 07:37
@dadiorchen
Copy link
Contributor

@Xinyihe123 your code looks good, I will run your script to try to do a migration, will back to you with my result.

@dadiorchen dadiorchen self-assigned this Oct 13, 2023
@dadiorchen
Copy link
Contributor

@Xinyihe123 all looks good, I will try to run your script to do a migration, will back to you with my feedback.

@dadiorchen
Copy link
Contributor

@Xinyihe123 I can not run the test, did you run it on your side? This is my command and result:

python3 -m unittest discover -s . -p "*_test.py" -v
  952
  953 ======================================================================
  954 ERROR: test (transfer_test.Test_Transfer)
  955 ----------------------------------------------------------------------
  956 Traceback (most recent call last):
  957   File "/root/treetracker-functions/python/ETL/transfer_test.py", line 20, in test
  958     transfer(dest_conn,src_conn, 11, action = False)
  959 TypeError: 'module' object is not callable
  960
  961 ----------------------------------------------------------------------
  962 Ran 1 test in 0.505s
  963
  964 FAILED (errors=1)

@Xinyihe123
Copy link
Author

python3 -m unittest discover -s . -p "*_test.py" -v

Yes I can run it on my side. It should be the import format, I've fixed it, it should be able to run on your side now.

@dadiorchen dadiorchen removed their assignment Oct 18, 2023
@dadiorchen dadiorchen merged commit 11a3ea9 into Greenstand:main Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ETL job to migrate an organization from prod database to test and dev database
2 participants