Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

global: MySQL 5.6 by default has collation "utf8_general_ci" (case-insensitive) #114

Open
slint opened this issue Aug 13, 2018 · 0 comments

Comments

@slint
Copy link
Member

slint commented Aug 13, 2018

The default collation in MySQL 5.6 (though this might apply for newer versions as well), is utf8_general_ci which is case-insensitive. This means that the following can happen:

class Identifier(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    value = db.Column(db.String(255), unique=True)
...

id1 = Identifier(value='ABC')
db.session.add(id1)
db.session.commit()
assert id1.id == 1

id2 = Identifier(value='abc')
db.session.add(id2)
db.session.commit()
# ...DB error for unique constraint violation...

fetched_id = Identifier.query.filter_by(value='aBc').one()
assert fetched_id == id1
assert fetched_id.id == 1

This is probably causing problems with many assumptions that we as developers make throughout the Invenio codebase (especially on tables in e.g. invenio-pidstore).

A more correct collation + charset would come from creating the database with:

CREATE DATABASE invenio CHARACTER SET utf8 COLLATE utf8_bin;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant