-
Notifications
You must be signed in to change notification settings - Fork 650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix import channel in Postgresql #12709
base: release-v0.17.x
Are you sure you want to change the base?
Fix import channel in Postgresql #12709
Conversation
…+ add required extension sphinxcontrib.jquery
24e4490
to
f1b712a
Compare
Build Artifacts
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that we are properly setting psycopg2 up with unicode handling as described here? https://www.psycopg.org/docs/usage.html#unicode-handling
This might point to a way to handle this in a way that doesn't cause a huge performance regression.
Also, I think we should add a regression test for the specific case we are fixing here - a importing a unicode string for a node title that is at the max length.
…n a lot of data needs to be inserted
dd5dd8a
to
130405f
Compare
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of questions to make sure the tests are doing what they ought - if the answer to my second question is yes, then we might need two test cases, one with utf-8 that is short enough but would overflow if it was converted to bytes, and another where the tag is just too long regardless.
I will manually test this to compare speed before and after.
}, | ||
{ | ||
"id": "0c20e2eb254b4070a713da63380ff0a3", | ||
"tag_name": "velocidad de reacciones químicas" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like this is still ASCII rather than having any UTF-8 characters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, this test is to check 32 chars are cut to 30, just that. I used a real case from channel 07cd1633691b4473b6fda08caf826253
just for the pleasure of using real data
): | ||
max_length = column_obj.type.length | ||
if max_length is not None: | ||
value = value[:max_length] if value is not None else default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this to handle cases where even with proper handling of utf-8 the tag name is too long because of slack of checks in generation of the SQLite db?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, while testing this, I've seen tag names 40 chars long in sqlite while max_length is 30 in the model.
Summary
When importing channel data, psycopg2 execute_values function was used. This function code converts to bytes all the strings in order to have a better performance.
However, converting an utf-8 char to byte results in more than one byte, making some strings unfit in the maximum number of chars of a column limit.
This PR:
execute_values
byexecutemany
Note: I had to cherry-pick the commit from #12466 to fix docs builds in GH
References
Closes: #11780
Reviewer guidance
Do tests pass?
In order to test the fix, this channel was failing and can be used in a kolibri installation using PG:
kolibri manage importchannel network --baseurl=https://studio.learningequality.org 07cd1633691b4473b6fda08caf826253
Testing checklist
PR process
Reviewer checklist
yarn
andpip
)