Skip to content
This repository was archived by the owner on Jan 7, 2024. It is now read-only.

Conversation

sourcery-ai[bot]
Copy link

@sourcery-ai sourcery-ai bot commented Nov 8, 2022

Branch master refactored by Sourcery.

If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.

See our documentation here.

Run Sourcery locally

Reduce the feedback loop during development by using the Sourcery editor plugin:

Review changes via command line

To manually merge these changes, make sure you're on the master branch, then run:

git fetch origin sourcery/master
git merge --ff-only FETCH_HEAD
git reset HEAD^

Help us improve this pull request!

@sourcery-ai sourcery-ai bot requested a review from jgoodson November 8, 2022 17:37
return line.split(delim)[1]
else:
raise RuntimeError("Unable to find version string.")
raise RuntimeError("Unable to find version string.")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function get_version refactored with the following changes:


for _, model, _ in pkgutil.iter_modules([str(Path(__file__).parent / 'models')]):
imported_module = importlib.import_module('.models.' + model, package=__name__)
imported_module = importlib.import_module(f'.models.{model}', package=__name__)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 13-13 refactored with the following changes:

data_path = Path(data_path)
data_file = f'refseq/maps{max_seq_len}/refseq_{split}.lmdb'
refseq_file = f'refseq/refseq.lmdb'
refseq_file = 'refseq/refseq.lmdb'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function GeCMaskedReconstructionDataset.__init__ refactored with the following changes:

Comment on lines -230 to -233
else:
# 10% chance to keep current representation
pass

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function GeCMaskedReconstructionDataset._apply_pseudobert_mask refactored with the following changes:

This removes the following comments ( why? ):

# 10% chance to keep current representation

Comment on lines -402 to -405
else:
# 10% chance to keep current token
pass

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function ProteinMaskedLanguageModelingDataset._apply_bert_mask refactored with the following changes:

This removes the following comments ( why? ):

# 10% chance to keep current token

Comment on lines -167 to +182
logger.error("Couldn't reach server at '{}' to download pretrained model "
"configuration file.".format(config_file))
logger.error(
f"Couldn't reach server at '{config_file}' to download pretrained model configuration file."
)

else:
logger.error(
"Model name '{}' was not found in model name list ({}). "
"We assumed '{}' was a path or url but couldn't find any file "
"associated to this path or url.".format(
pretrained_model_name_or_path,
', '.join(cls.pretrained_config_archive_map.keys()),
config_file))
f"Model name '{pretrained_model_name_or_path}' was not found in model name list ({', '.join(cls.pretrained_config_archive_map.keys())}). We assumed '{config_file}' was a path or url but couldn't find any file associated to this path or url."
)

raise
if resolved_config_file == config_file:
logger.info("loading configuration file {}".format(config_file))
logger.info(f"loading configuration file {config_file}")
else:
logger.info("loading configuration file {} from cache at {}".format(
config_file, resolved_config_file))
logger.info(
f"loading configuration file {config_file} from cache at {resolved_config_file}"
)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function BioConfig.from_pretrained refactored with the following changes:

"""Serializes this instance to a Python dictionary."""
output = copy.deepcopy(self.__dict__)
return output
return copy.deepcopy(self.__dict__)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function BioConfig.to_dict refactored with the following changes:

Comment on lines -49 to +51
"Parameter config in `{}(config)` should be an instance of class "
"`BioConfig`. To create a model from a pretrained model use "
"`model = {}.from_pretrained(PRETRAINED_MODEL_NAME)`".format(
self.__class__.__name__, self.__class__.__name__
))
f"Parameter config in `{self.__class__.__name__}(config)` should be an instance of class `BioConfig`. To create a model from a pretrained model use `model = {self.__class__.__name__}.from_pretrained(PRETRAINED_MODEL_NAME)`"
)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function BioModel.__init__ refactored with the following changes:

Comment on lines -67 to +80
p for n, p in param_optimizer if not any(nd in n for nd in no_decay)
p
for n, p in param_optimizer
if all(nd not in n for nd in no_decay)
],
"weight_decay": 0.01,
},
{
"params": [
p for n, p in param_optimizer if any(nd in n for nd in no_decay)
p
for n, p in param_optimizer
if any(nd in n for nd in no_decay)
],
"weight_decay": 0.0,
},
]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function BioModel.configure_optimizers refactored with the following changes:

@@ -1,5 +1,6 @@
"""PyTorch BERT model. """

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 12-15 refactored with the following changes:

self.block_sizes = block_sizes
else:
self.block_sizes = [num_hidden_layers // 3] * 3
self.block_sizes = block_sizes or [num_hidden_layers // 3] * 3
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function BioFunnelConfig.__init__ refactored with the following changes:

Comment on lines -65 to +69
archive_file = cls.pretrained_model_archive_map[pretrained_model_name_or_path]
return cls.pretrained_model_archive_map[pretrained_model_name_or_path]
elif os.path.isdir(pretrained_model_name_or_path):
archive_file = os.path.join(pretrained_model_name_or_path, WEIGHTS_NAME)
return os.path.join(pretrained_model_name_or_path, WEIGHTS_NAME)
else:
archive_file = pretrained_model_name_or_path
return archive_file
return pretrained_model_name_or_path
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TAPEModelMixin._get_model refactored with the following changes:

Comment on lines -74 to +73
new_keys = {}
for key in state_dict.keys():
new_keys[key] = cls._rewrite_module_name(key)
new_keys = {key: cls._rewrite_module_name(key) for key in state_dict}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TAPEModelMixin._rewrite_state_dict refactored with the following changes:

Comment on lines -246 to +256
any(s.startswith(cls.base_model_prefix) for s in state_dict.keys()):
start_prefix = cls.base_model_prefix + '.'
any(s.startswith(cls.base_model_prefix) for s in state_dict.keys()):
start_prefix = f'{cls.base_model_prefix}.'
if hasattr(model, cls.base_model_prefix) and \
not any(s.startswith(cls.base_model_prefix) for s in state_dict.keys()):
not any(s.startswith(cls.base_model_prefix) for s in state_dict.keys()):
model_to_load = getattr(model, cls.base_model_prefix)

load(model_to_load, prefix=start_prefix)
if len(missing_keys) > 0:
if missing_keys:
logger.info("Weights of {} not initialized from pretrained model: {}".format(
model.__class__.__name__, missing_keys))
if len(unexpected_keys) > 0:
if unexpected_keys:
logger.info("Weights from pretrained model not used in {}: {}".format(
model.__class__.__name__, unexpected_keys))
if len(error_msgs) > 0:
if error_msgs:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TAPEModelMixin.from_pretrained refactored with the following changes:

Comment on lines -46 to +51
[T5Block(config, has_relative_attention_bias=bool(i == 0)) for i in range(config.num_layers)]
[
T5Block(config, has_relative_attention_bias=i == 0)
for i in range(config.num_layers)
]
)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function T5Stack.__init__ refactored with the following changes:

outputs = sequence_logits

return outputs
return self.classify(sequence_output)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function SequenceToSequenceClassificationHead.forward refactored with the following changes:

logits = self.classify(pooled_output)

return logits
return self.classify(pooled_output)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function SequenceClassificationHead.forward refactored with the following changes:

Comment on lines -134 to -140
loader = DataLoader(
return DataLoader(
dataset,
num_workers=self.num_workers,
collate_fn=dataset.collate_fn,
batch_sampler=batch_sampler,
)
return loader
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function BioDataModule._prep_loader refactored with the following changes:

strands = torch.ones(shape[:-1], dtype=torch.long)
if lengths:
lengths = torch.ones(shape[:-1], dtype=torch.long) * lengths
lengths *= torch.ones(shape[:-1], dtype=torch.long)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TestGeCBertRaw.simpleForwardZeros refactored with the following changes:

  • Replace assignment with augmented assignment (aug-assign)

Comment on lines -89 to +91
for batch in SubsetRandomSampler(
list(BatchSampler(sorted_sampler, self.batch_size, self.drop_last))):
yield batch
yield from SubsetRandomSampler(
list(BatchSampler(sorted_sampler, self.batch_size, self.drop_last))
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function BucketBatchSampler.__iter__ refactored with the following changes:

  • Replace yield inside for loop with yield from (yield-from)

@sourcery-ai
Copy link
Author

sourcery-ai bot commented Nov 8, 2022

Sourcery Code Quality Report

✅  Merging this PR will increase code quality in the affected files by 0.19%.

Quality metrics Before After Change
Complexity 2.93 ⭐ 2.77 ⭐ -0.16 👍
Method Length 56.03 ⭐ 55.70 ⭐ -0.33 👍
Working memory 6.78 🙂 6.79 🙂 0.01 👎
Quality 75.06% 75.25% 0.19% 👍
Other metrics Before After Change
Lines 2927 2907 -20
Changed files Quality Before Quality After Quality Change
setup.py 79.44% ⭐ 79.98% ⭐ 0.54% 👍
tragec/init.py 62.67% 🙂 62.53% 🙂 -0.14% 👎
tragec/datasets.py 77.35% ⭐ 77.57% ⭐ 0.22% 👍
tragec/registry.py 72.47% 🙂 77.30% ⭐ 4.83% 👍
tragec/tokenizers.py 90.98% ⭐ 91.00% ⭐ 0.02% 👍
tragec/training.py 58.65% 🙂 58.77% 🙂 0.12% 👍
tragec/models/configuration.py 67.03% 🙂 64.87% 🙂 -2.16% 👎
tragec/models/modeling.py 64.33% 🙂 64.26% 🙂 -0.07% 👎
tragec/models/models_bert.py 93.01% ⭐ 92.93% ⭐ -0.08% 👎
tragec/models/models_funnel.py 89.20% ⭐ 90.61% ⭐ 1.41% 👍
tragec/models/tape_model.py 44.17% 😞 43.16% 😞 -1.01% 👎
tragec/models/utils_t5.py 71.16% 🙂 71.74% 🙂 0.58% 👍
tragec/tasks/task_mlm.py 82.26% ⭐ 82.28% ⭐ 0.02% 👍
tragec/tasks/task_mrm.py 72.48% 🙂 72.42% 🙂 -0.06% 👎
tragec/tasks/task_multiclass.py 75.37% ⭐ 75.42% ⭐ 0.05% 👍
tragec/tasks/task_pairwisecontact.py 76.71% ⭐ 76.72% ⭐ 0.01% 👍
tragec/tasks/task_seq2seqclass.py 84.08% ⭐ 83.79% ⭐ -0.29% 👎
tragec/tasks/task_singleclass.py 85.02% ⭐ 84.91% ⭐ -0.11% 👎
tragec/tasks/tasks.py 81.54% ⭐ 81.77% ⭐ 0.23% 👍
tragec/test/test_model.py 88.10% ⭐ 88.12% ⭐ 0.02% 👍
tragec/utils/_sampler.py 85.54% ⭐ 85.80% ⭐ 0.26% 👍

Here are some functions in these files that still need a tune-up:

File Function Complexity Length Working Memory Quality Recommendation
tragec/models/tape_model.py TAPEModelMixin.from_pretrained 29 😞 404 ⛔ 16.84% ⛔ Refactor to reduce nesting. Try splitting into smaller methods
tragec/models/modeling.py BioModel.configure_optimizers 15 🙂 243 ⛔ 13 😞 36.38% 😞 Try splitting into smaller methods. Extract out complex expressions
tragec/models/configuration.py BioConfig.__init__ 1 ⭐ 212 ⛔ 36 ⛔ 40.48% 😞 Try splitting into smaller methods. Extract out complex expressions
tragec/training.py run_train 5 ⭐ 195 😞 16 ⛔ 44.83% 😞 Try splitting into smaller methods. Extract out complex expressions
tragec/training.py process_trainer_kwargs 12 🙂 225 ⛔ 8 🙂 49.45% 😞 Try splitting into smaller methods

Legend and Explanation

The emojis denote the absolute quality of the code:

  • ⭐ excellent
  • 🙂 good
  • 😞 poor
  • ⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.


Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants