Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

turbodbc_intern.InterfaceError: Fetching Arrow result set failed. #459

Open
abhishekdhankar95 opened this issue Jan 24, 2025 · 1 comment

Comments

@abhishekdhankar95
Copy link

abhishekdhankar95 commented Jan 24, 2025

I'm working with
turbodbc version 1.5.2
python version 3.11.8
Oracle SQL

Turbodbc Installation was done using conda, not pip.

When I tried to read in a large table with ~3 million rows and 27 columns, in a Windows system with 32GBs of memory by using the following code:

import turbodbc
connection_string = 'Driver={<client_name>};Dbq=//<host_name>:<port>/<service_name>;Uid=<username>;Pwd=<password>;'
connection = turbodbc.connect(connection_string=connection_string)
cursor = connection.cursor()
cursor.execute("Select <18_to_27_column_names> from <table_name>")
batches = cursor.fetchallarrow()

I got the following error for the last line:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<path_to_site_packages>\site-packages\turbodbc\cursor.py", line 346, in fetchallarrow
    ).fetch_all()
      ^^^^^^^^^^^
turbodbc_intern.InterfaceError: Fetching Arrow result set failed.

I understand that 32GBs is not a lot of memory for this operation, but that is not really my problem. I am able to successfully read in the entire table piecemeal, i.e., by reading in half the columns in one fetchallarrow invocation and the other half using another fetchallarrow invocation.

So my questions are:

  1. I don't understand why this error is being raised only when I try to read-in/fetch all the columns at once. Memory could be an issue, but how am I able to read-in the entire table piecemeal? (I am new to arrow and turbodbc and arrow so I might be missing something very basic here)
  2. Is there a way to get more detailed/specific error description? I suspect that memory may be the issue, but Fetching Arrow result set failed seems to be a very generic error message.

Remedies I have tried:

  1. set the environment variable TURBODBC_LOGGING=DEBUG to see if I got a larger stack trace, but it did not change the trace.
  2. I've tried reading in one column at a time to see if one of the columns' was causing a problem, but I was able to fetch the individual columns without any errors. As explained above, I was able to fetch the entire table piecemeal as well.
  3. I successfully fetched a limited number of rows (first 80 to 100 rows) with all the columns.

Let me know if more information is needed!
Thank You!

@pacman82
Copy link
Collaborator

Not sure how you would get a better error out of it, yet there is a method fetcharrowbatches which would fetch data piece by piece. See: https://turbodbc.readthedocs.io/en/latest/pages/advanced_usage.html#obtaining-apache-arrow-result-sets-in-batches

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants