Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Egork/flake8 #171

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open

Egork/flake8 #171

wants to merge 34 commits into from

Conversation

egork520
Copy link
Contributor

Refactored the code to follow PEP8 style guide

Adding flake8 command to CICD pipeline

egork520 and others added 30 commits October 19, 2022 16:29
Copy link
Member

@soldni soldni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I am a little bit scared of such a giant PR touching so many files. Many of the changes make the code a lot less readable than before, and it would be good to manually refactor where necessary.

Some questions:

1. Is this a flavor of PEP8 we like?

Some non-standard things I noticed:

  1. 120 chars lines
  2. ok with mix of single and double quotes
  3. non-multiple indentation allowed
  4. new lines on binary operators

As an example, I went with the following in smashed:

[tool.black]
line-length = 79
include = '\.pyi?$'
exclude = '''
(
      __pycache__
    | \.git
    | \.mypy_cache
    | \.pytest_cache
    | \.vscode
    | \.venv
    | \bdist\b
    | \bdoc\b
)
'''

[tool.isort]
profile = "black"
line_length = 79
multi_line_output = 3

[tool.autopep8]
max_line_length = 79
in-place = true
recursive = true
aggressive = 3

[tool.mypy]
python_version = 3.8
ignore_missing_imports = true
no_site_packages = true
allow_redefinition = false

[tool.mypy-tests]
strict_optional = false

[tool.flake8]
exclude = [
    ".venv/",
    "tmp/"
]
per-file-ignores = [
    '*.py:E203',
    '__init__.py:F401',
    '*.pyi:E302,E305'
]

(Note that the pyproject.toml above requires flake8-pyi and Flake8-pyproject to run properly)

We could also align with other AI2 projects are using, e.g. Tango's.

2. we should probably adopt some auto-formatting tools we run before release

Again, in smashed I have a combo of flake8, isort, and autopep8 I ask contributors to run black . && flake8 . &&o isort .

3. Should we adopt mypy too?

Probably not immediately? But again, other projects use it.

4. Some of the automatic refactor kills semantics of comments

I left a note in a couple of places where this happens.

@@ -288,7 +288,7 @@ def _simple_line_detection(
Adapted from https://github.com/allenai/VILA/blob/e6d16afbd1832f44a430074855fbb4c3d3604f4a/src/vila/pdftools/pdfplumber_extractor.py#L24

Modified Oct 2022 (kylel): Changed return value to be List[int]
"""
""" # noqa
Copy link
Member

@soldni soldni Oct 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#noqa is a blanket ignore, we should use specific errors ignore instead.

@@ -9,7 +8,7 @@
class BasePredictor:

###################################################################
##################### Necessary Model Variables ###################
# Necessary Model Variables #
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shoudn't refactor this automatically

Comment on lines +129 to +132
if next_row_first_token_text[-len(plural_suffix):] == plural_suffix:
next_row_first_token_text = next_row_first_token_text[
: -len(plural_suffix)
]
: -len(plural_suffix)
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not very legible

Comment on lines +99 to +105
# input string list: [' Anon ', '1934', ' ', 'University and Educational Intelligence', ' ', 'Nature',
# ' ', '133', ' ', '805–805']
# tokenization removes empty string: ['[CLS]', 'an', '##on', '1934', 'university', 'and',
# 'educational',
# 'intelligence', 'nature', '133', '80', '##5', '–', '80', '##5', '[SEP]']
# skipping empty string results in skipping word id: [None, 0, 0, 1, 3, 3, 3, 3, 5, 7, 9, 9, 9, 9,
# 9, None]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is now very hard to read.

Comment on lines +6 to +7
from mmda.predictors.hf_predictors.bibentry_predictor.types import (BibEntryPredictionWithSpan,
BibEntryStructureSpanGroups)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a ton of non-multiple-of-4 indentation added by this PR–are we ok with them?

from tqdm import tqdm
import layoutparser as lp

from mmda.types import Document, Box, BoxGroup, Metadata
from mmda.types.names import *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc, the star import was intentional here? or maybe it was intentional somewhere else.


from PIL.Image import Image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicitly importing PIL instead of type annotations via PIL.Image might cause import errors if layoutparser dependencies are not installed. Please check on a minimal installation.

mmda/types/old/boundingbox.old.py
per-file-ignores =

max-line-length = 119
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why 119 vs 79 vs something else?
  2. Why is ai2_internal, tests, and examples not checked?

@egork520
Copy link
Contributor Author

In general, I am a little bit scared of such a giant PR touching so many files. Many of the changes make the code a lot less readable than before, and it would be good to manually refactor where necessary.

I agree and should probably asked before opening. Mix of styles in different parts of the code does not please my eyes.

Some questions:

1. Is this a flavor of PEP8 we like?

Some non-standard things I noticed:

  1. 120 chars lines
    It is up for discussion, the resolution of the screens I personally prefer longer lines.
  1. ok with mix of single and double quotes
    Personally I am used to single quotes unless double is needed. It is up for discussion.
  1. non-multiple indentation allowed
  2. new lines on binary operators

As an example, I went with the following in smashed:

[tool.black]
line-length = 79
include = '\.pyi?$'
exclude = '''
(
      __pycache__
    | \.git
    | \.mypy_cache
    | \.pytest_cache
    | \.vscode
    | \.venv
    | \bdist\b
    | \bdoc\b
)
'''

[tool.isort]
profile = "black"
line_length = 79
multi_line_output = 3

[tool.autopep8]
max_line_length = 79
in-place = true
recursive = true
aggressive = 3

[tool.mypy]
python_version = 3.8
ignore_missing_imports = true
no_site_packages = true
allow_redefinition = false

[tool.mypy-tests]
strict_optional = false

[tool.flake8]
exclude = [
    ".venv/",
    "tmp/"
]
per-file-ignores = [
    '*.py:E203',
    '__init__.py:F401',
    '*.pyi:E302,E305'
]

(Note that the pyproject.toml above requires flake8-pyi and Flake8-pyproject to run properly)

We could also align with other AI2 projects are using, e.g. Tango's.

Agree, might be worth formalizing style requirements for s2? Can even be part of 3 year vision plan (sharpen the saw)

2. we should probably adopt some auto-formatting tools we run before release

Again, in smashed I have a combo of flake8, isort, and autopep8 I ask contributors to run black . && flake8 . &&o isort .

3. Should we adopt mypy too?

Probably not immediately? But again, other projects use it.

4. Some of the automatic refactor kills semantics of comments

I left a note in a couple of places where this happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants