-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
global: new domain list feature #468
Conversation
@@ -103,6 +103,9 @@ class User(db.Model, Timestamp, UserMixin): | |||
_email = db.Column("email", db.String(255), unique=True) | |||
"""User email.""" | |||
|
|||
domain = db.Column(db.String(255), nullable=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New domain property is added on the user to allow for making a join against the domain table (needed for statistics).
@@ -463,3 +468,153 @@ def delete_by_user(cls, method, user): | |||
"""Unlink a user from an external id.""" | |||
with db.session.begin_nested(): | |||
cls.query.filter_by(id_user=user.id, method=method).delete() | |||
|
|||
|
|||
class DomainOrg(db.Model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed for being able to do statistics on different domains. E.g. a typical case is email domains like physics.uni.org
and bio.uni.org
all linking to the same organisation.
name = db.Column(db.String(255), nullable=False) | ||
"""Name of organisation.""" | ||
|
||
json = db.Column( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be used on zenodo to store e.g. country_code
, in_eu
, in_cern_ms
, in_cern_ams
which can be used for statistics purposes.
|
||
id = db.Column(db.Integer(), primary_key=True, autoincrement=True) | ||
|
||
pid = db.Column(db.String(255), unique=True, nullable=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on zenodo a ROR
) | ||
"""Link to parent organisation.""" | ||
|
||
parent = db.relationship("DomainOrg", remote_side=[id]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allows for creating hierarchy of organisations - useful again for statistics (on Zenodo the ROR hierarchy)
return obj | ||
|
||
|
||
class DomainCategory(db.Model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Label a domain with a category, on Zenodo will be used to classify domains in e.g. spammer, mail-provider, org, company.
Use to control possibility and capability of users registering with this domain. | ||
""" | ||
|
||
flagged = db.Column(db.Boolean(), default=False, nullable=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A boolean to flag a domain automatically - e.g. if we download domains from https://github.com/disposable/disposable then domains can be flagged for further inspection by an admin. This is different from status which is used directly to control user registration from the domain.
category = db.Column(db.Integer(), db.ForeignKey(DomainCategory.id), nullable=True) | ||
"""Category of domain.""" | ||
|
||
num_users = db.Column(db.Integer(), default=0, nullable=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be computed on say daily/weekly basis (not part of PR). Needed for easy analysis of domains and detecting domains with high number of blocked users.
@@ -216,3 +239,33 @@ def validate_username(username): | |||
# text explaining the validation rules. | |||
message = current_app.config["ACCOUNTS_USERNAME_RULES_TEXT"] | |||
raise ValueError(message) | |||
|
|||
|
|||
def validate_domain_form(form, field): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be needed in user-profiles to check the domain when a user want to change their email address.
@@ -44,6 +45,16 @@ def __init__(self, *args, **kwargs): | |||
if not self.next.data: | |||
self.next.data = request.args.get("next", "") | |||
|
|||
def validate(self, extra_validators=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During registration we check the domain. We only check when the registration forms require email confirmation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Added a few comments as food for thoughts, no need to answer, feel free to merge anytime.
|
||
__tablename__ = "accounts_domain_org" | ||
|
||
id = db.Column(db.Integer(), primary_key=True, autoincrement=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you ok with autoincrement
ids, or better going for UUIDs by default? :)
|
||
__tablename__ = "accounts_domains" | ||
|
||
id = db.Column(db.Integer(), primary_key=True, autoincrement=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here about the autoincrement
flagged = db.Column(db.Boolean(), default=False, nullable=False) | ||
"""Flag domain - used by automatic processes to flag domain.""" | ||
|
||
flagged_source = db.Column(db.String(255), default="", nullable=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that you have String(255), this field is more a keyword rather than a free text right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
prefix, domain = email.split("@", 1) | ||
prefix = prefix.lower().strip() | ||
domain = domain.lower().strip() | ||
if domain[-1] == ".": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess because [email protected].
is a valid email address, but we want to "normalize" the domain.
# Domain is inserted on domain list when a user confirms their email. | ||
domain = datastore.find_domain(user.domain) | ||
if domain is None: | ||
domain = datastore.create_domain(user.domain) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we are doing a session.add() here, is a db.session.commit
always happening after this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, checked. user_confirmed
signal is sent from the confirm_user()
function that's called here, which will do the commit:
return cls.query.filter_by(label=label).one_or_none() | ||
|
||
|
||
class Domain(db.Model, Timestamp): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to add __versioned__ = {"versioning": False}
to this one given the various num_*
columns updated daily by the cron?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a plain model, so it doesnt' have any versioning on by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 2 minor comments
invenio_accounts/datastore.py
Outdated
def create_domain(self, domain, **kwargs): | ||
"""Create a new domain.""" | ||
domain = Domain.create(domain, **kwargs) | ||
self.put(domain) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: this eventually calls self.mark_changed(..., uid=domain.id)
, probably just callingself.db.session.add(domain)
is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, Domain.create()
already does db.session.add()
so line can be removed.
prefix, domain = email.split("@", 1) | ||
prefix = prefix.lower().strip() | ||
domain = domain.lower().strip() | ||
if domain[-1] == ".": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess because [email protected].
is a valid email address, but we want to "normalize" the domain.
invenio_accounts/utils.py
Outdated
|
||
def split_emailaddr(email): | ||
"""Split email address in prefix and domain.""" | ||
prefix, domain = email.split("@", 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefix, domain = email.split("@", 1) | |
prefix, domain = email.rsplit("@", 1) |
To avoid bypassing with something like "user@foo"@gmail.com
, which would return foo"@gmail.com
as the domain (see this SO answer)
* New domain list feature that can be used to block email domains from registering, and automatically verifying users from other domains.
New domain list feature that is used to:
Checklist
Ticks in all boxes and 🟢 on all GitHub actions status checks are required to merge: