Skip to content

Releases: Lightning-AI/pytorch-lightning

Pruning & Quantization & SWA

18 Feb 23:04
3645eb1
Compare
Choose a tag to compare

[1.2.0] - 2021-02-18

Added

  • Added DataType, AverageMethod and MDMCAverageMethod enum in metrics (#5657)
  • Added support for summarized model total params size in megabytes (#5590)
  • Added support for multiple train loaders (#1959)
  • Added Accuracy metric now generalizes to Top-k accuracy for (multi-dimensional) multi-class inputs using the top_k parameter (#4838)
  • Added Accuracy metric now enables the computation of subset accuracy for multi-label or multi-dimensional multi-class inputs with the subset_accuracy parameter (#4838)
  • Added HammingDistance metric to compute the hamming distance (loss) (#4838)
  • Added max_fpr parameter to auroc metric for computing partial auroc metric (#3790)
  • Added StatScores metric to compute the number of true positives, false positives, true negatives and false negatives (#4839)
  • Added R2Score metric (#5241)
  • Added LambdaCallback (#5347)
  • Added BackboneLambdaFinetuningCallback (#5377)
  • Accelerator all_gather supports collection (#5221)
  • Added image_gradients functional metric to compute the image gradients of a given input image. (#5056)
  • Added MetricCollection (#4318)
  • Added .clone() method to metrics (#4318)
  • Added IoU class interface (#4704)
  • Support to tie weights after moving model to TPU via on_post_move_to_device hook
  • Added missing val/test hooks in LightningModule (#5467)
  • The Recall and Precision metrics (and their functional counterparts recall and precision) can now be generalized to Recall@K and Precision@K with the use of top_k parameter (#4842)
  • Added ModelPruning Callback (#5618, #5825, #6045)
  • Added PyTorchProfiler (#5560)
  • Added compositional metrics (#5464)
  • Added Trainer method predict(...) for high performence predictions (#5579)
  • Added on_before_batch_transfer and on_after_batch_transfer data hooks (#3671)
  • Added AUC/AUROC class interface (#5479)
  • Added PredictLoop object (#5752)
  • Added QuantizationAwareTraining callback (#5706, #6040)
  • Added LightningModule.configure_callbacks to enable the definition of model-specific callbacks (#5621)
  • Added dim to PSNR metric for mean-squared-error reduction (#5957)
  • Added promxial policy optimization template to pl_examples (#5394)
  • Added log_graph to CometLogger (#5295)
  • Added possibility for nested loaders (#5404)
  • Added sync_step to Wandb logger (#5351)
  • Added StochasticWeightAveraging callback (#5640)
  • Added LightningDataModule.from_datasets(...) (#5133)
  • Added PL_TORCH_DISTRIBUTED_BACKEND env variable to select backend (#5981)
  • Added Trainer flag to activate Stochastic Weight Averaging (SWA) Trainer(stochastic_weight_avg=True) (#6038)
  • Added DeepSpeed integration (#5954, #6042)

Changed

  • Changed stat_scores metric now calculates stat scores over all classes and gains new parameters, in line with the new StatScores metric (#4839)
  • Changed computer_vision_fine_tunning example to use BackboneLambdaFinetuningCallback (#5377)
  • Changed automatic casting for LoggerConnector metrics (#5218)
  • Changed iou [func] to allow float input (#4704)
  • Metric compute() method will no longer automatically call reset() (#5409)
  • Set PyTorch 1.4 as min requirements, also for testing and examples torchvision>=0.5 and torchtext>=0.5 (#5418)
  • Changed callbacks argument in Trainer to allow Callback input (#5446)
  • Changed the default of find_unused_parameters to False in DDP (#5185)
  • Changed ModelCheckpoint version suffixes to start at 1 (#5008)
  • Progress bar metrics tensors are now converted to float (#5692)
  • Changed the default value for the progress_bar_refresh_rate Trainer argument in Google COLAB notebooks to 20 (#5516)
  • Extended support for purely iteration-based training (#5726)
  • Made LightningModule.global_rank, LightningModule.local_rank and LightningModule.logger read-only properties (#5730)
  • Forced ModelCheckpoint callbacks to run after all others to guarantee all states are saved to the checkpoint (#5731)
  • Refactored Accelerators and Plugins (#5743)
    • Added base classes for plugins (#5715)
    • Added parallel plugins for DP, DDP, DDPSpawn, DDP2 and Horovod (#5714)
    • Precision Plugins (#5718)
    • Added new Accelerators for CPU, GPU and TPU (#5719)
    • Added Plugins for TPU training (#5719)
    • Added RPC and Sharded plugins (#5732)
    • Added missing LightningModule-wrapper logic to new plugins and accelerator (#5734)
    • Moved device-specific teardown logic from training loop to accelerator (#5973)
    • Moved accelerator_connector.py to the connectors subfolder (#6033)
    • Trainer only references accelerator (#6039)
    • Made parallel devices optional across all plugins (#6051)
    • Cleaning (#5948, #5949, #5950)
  • Enabled self.log in callbacks (#5094)
  • Renamed xxx_AVAILABLE as protected (#5082)
  • Unified module names in Utils (#5199)
  • Separated utils: imports & enums (#5256, #5874)
  • Refactor: clean trainer device & distributed getters (#5300)
  • Simplified training phase as LightningEnum (#5419)
  • Updated metrics to use LightningEnum (#5689)
  • Changed the seq of on_train_batch_end, on_batch_end & on_train_epoch_end, on_epoch_end hooks (#5688)
  • Refactored setup_training and remove test_mode (#5388)
  • Disabled training with zero num_training_batches when insufficient limit_train_batches (#5703)
  • Refactored EpochResultStore (#5522)
  • Update lr_finder to check for attribute if not running fast_dev_run (#5990)
  • LightningOptimizer manual optimizer is more flexible and expose toggle_model (#5771)
  • MlflowLogger limit parameter value length to 250 char (#5893)
  • Re-introduced fix for Hydra directory sync with multiple process (#5993)

Deprecated

  • Function stat_scores_multiple_classes is deprecated in favor of stat_scores (#4839)
  • Moved accelerators and plugins to its legacy pkg (#5645)
  • Deprecated LightningDistributedDataParallel in favor of new wrapper module LightningDistributedModule (#5185)
  • Deprecated LightningDataParallel in favor of new wrapper module LightningParallelModule (#5670)
  • Renamed utils modules (#5199)
    • argparse_utils >> argparse
    • model_utils >> model_helpers
    • warning_utils >> warnings
    • xla_device_utils >> xla_device
  • Deprecated using 'val_loss' to set the ModelCheckpoint monitor (#6012)
  • Deprecated .get_model() with explicit .lightning_module property (#6035)
  • Deprecated Trainer attribute accelerator_backend in favor of accelerator (#6034)

Removed

  • Removed deprecated checkpoint argument filepath (#5321)
  • Removed deprecated Fbeta, f1_score and fbeta_score metrics (#5322)
  • Removed deprecated TrainResult (#5323)
  • Removed deprecated EvalResult (#5633)
  • Removed LoggerStages (#5673)

Fixed

  • Fixed distributed setting and ddp_cpu only with num_processes>1 (#5297)
  • Fixed the saved filename in ModelCheckpoint when it already exists (#4861)
  • Fixed DDPHPCAccelerator hangs in DDP construction by calling init_device (#5157)
  • Fixed num_workers for Windows example (#5375)
  • Fixed loading yaml (#5619)
  • Fixed support custom DataLoader with DDP if they can be re-instantiated (#5745)
  • Fixed repeated .fit() calls ignore max_steps iteration bound (#5936)
  • Fixed throwing MisconfigurationError on unknown mode (#5255)
  • Resolve bug with Finetuning (#5744)
  • Fixed ModelCheckpoint race condition in file existence check (#5155)
  • Fixed some compatibility with PyTorch 1.8 (#5864)
  • Fixed forward cache (#5895)
  • Fixed recursive detach of tensors to CPU (#6007)
  • Fixed passing wrong strings for scheduler interval doesn't throw an error (#5923)
  • Fixed wrong requires_grad state after return None with multiple optimizers (#5738)
  • Fixed add on_epoch_end hook at the end of validation, test epoch (#5986)
  • Fixed missing process_dataloader call for TPUSpawn when in distributed mode (#6015)
  • Fixed progress bar flickering by appending 0 to floats/strings (#6009)
  • Fixed synchronization issues with TPU training (#6027)
  • Fixed hparams.yaml saved twice when using TensorBoardLogger (#5953)
  • Fixed basic examples (#5912, #5985)
  • Fixed fairscale compatible with PT 1.8 (#5996)
  • Ensured process_dataloader is called when tpu_cores > 1 to use Parallel DataLoader (#6015)
  • Attempted SLURM auto resume call when non-shell call fails (#6002)
  • Fixed wrapping optimizers upon assignment (#6006)
  • Fixed allowing hashing of metrics with lists in their state (#5939)

Contributors

@alanhdu, @ananthsub, @awaelchli, @Borda, @borisdayma, @carmocca, @ddrevicky, @deng-cy, @ducthienbui97, @justusschock, @kartik4949, @kaushikb11, @manipopopo, @marload, @neighthan, @peblair, @prampey, @pranjaldatta, @rohitgr7, @SeanNaren, @sid-sundrani, @SkafteNicki, @tadejsv, @tchaton, @teddykoker, @titu1994, @yuntai

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Standard weekly patch release

08 Feb 08:49
e429f97
Compare
Choose a tag to compare

[1.1.8] - 2021-02-08

Fixed

  • Separate epoch validation from step validation (#5208)
  • Fixed toggle_optimizers not handling all optimizer parameters (#5775)

Contributors

@ananthsub, @rohitgr7

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Standard weekly patch release

03 Feb 18:10
90a813f
Compare
Choose a tag to compare

[1.1.7] - 2021-02-03

Fixed

  • Fixed TensorBoardLogger not closing SummaryWriter on finalize (#5696)
  • Fixed filtering of pytorch "unsqueeze" warning when using DP (#5622)
  • Fixed num_classes argument in F1 metric (#5663)
  • Fixed log_dir property (#5537)
  • Fixed a race condition in ModelCheckpoint when checking if a checkpoint file exists (#5144)
  • Remove unnecessary intermediate layers in Dockerfiles (#5697)
  • Fixed auto learning rate ordering (#5638)

Contributors

@awaelchli @guillochon @noamzilo @rohitgr7 @SkafteNicki @sumanthratna

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Standard weekly patch release

26 Jan 20:49
c462b27
Compare
Choose a tag to compare

[1.1.6] - 2021-01-26

Changed

  • Increased TPU check timeout from 20s to 100s (#5598)
  • Ignored step param in Neptune logger's log_metric method (#5510)
  • Pass batch outputs to on_train_batch_end instead of epoch_end outputs (#4369)

Fixed

  • Fixed toggle_optimizer to reset requires_grad state (#5574)
  • Fixed FileNotFoundError for best checkpoint when using DDP with Hydra (#5629)
  • Fixed an error when logging a progress bar metric with a reserved name (#5620)
  • Fixed Metric's state_dict not included when child modules (#5614)
  • Fixed Neptune logger creating multiple experiments when GPUs > 1 (#3256)
  • Fixed duplicate logs appearing in console when using the python logging module (#5509)
  • Fixed tensor printing in trainer.test() (#5138)
  • Fixed not using dataloader when hparams present (#4559)

Contributors

@awaelchli @bryant1410 @lezwon @manipopopo @PiotrJander @psinger @rnett @SeanNaren @swethmandava @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Standard weekly patch release

21 Jan 16:21
e1c152b
Compare
Choose a tag to compare

[1.1.5] - 2021-01-19

Fixed

  • Fixed a visual bug in the progress bar display initialization (#4579)
  • Fixed logging on_train_batch_end in a callback with multiple optimizers (#5521)
  • Fixed reinit_scheduler_properties with correct optimizer (#5519)
  • Fixed val_check_interval with fast_dev_run (#5540)

Contributors

@awaelchli, @carmocca, @rohitgr7

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Standard weekly patch release

12 Jan 20:34
652df18
Compare
Choose a tag to compare

[1.1.4] - 2021-01-12

Added

  • Add automatic optimization property setter to lightning module (#5169)

Changed

  • Changed deprecated enable_pl_optimizer=True (#5244)

Fixed

  • Fixed transfer_batch_to_device for DDP with len(devices_ids) == 1 (#5195)
  • Logging only on not should_accumulate() during training (#5417)
  • Resolve interpolation bug with Hydra (#5406)
  • Check environ before selecting a seed to prevent warning message (#4743)

Contributors

@ananthsub, @SeanNaren, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Standard weekly patch release

06 Jan 10:17
4d9db86
Compare
Choose a tag to compare

[1.1.3] - 2021-01-05

Added

  • Added a check for optimizer attached to lr_scheduler (#5338)
  • Added support for passing non-existing filepaths to resume_from_checkpoint (#4402)

Changed

  • Skip restore from resume_from_checkpoint while testing (#5161)
  • Allowed log_momentum for adaptive optimizers in LearningRateMonitor (#5333)
  • Disabled checkpointing, earlystopping and logging with fast_dev_run (#5277)
  • Distributed group defaults to WORLD if None (#5125)

Fixed

  • Fixed trainer.test returning non-test metrics (#5214)
  • Fixed metric state reset (#5273)
  • Fixed --num-nodes on DDPSequentialPlugin (#5327)
  • Fixed invalid value for weights_summary (#5296)
  • Fixed Trainer.test not using the latest best_model_path (#5161)
  • Fixed existence check for hparams not using underlying filesystem (#5250)
  • Fixed LightningOptimizer AMP bug (#5191)
  • Fixed casted key to string in _flatten_dict (#5354)

Contributors

@8greg8, @haven-jeon, @kandluis, @marload, @rohitgr7, @tadejsv, @tarepan, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

standard weekly patch release

23 Dec 09:38
5820887
Compare
Choose a tag to compare

Overview

Detail changes

Added

  • Support number for logging with sync_dist=True (#5080)
  • Added offset logging step when resuming for Wandb logger (#5050)

Removed

  • enable_pl_optimizer=False by default to temporarily fix AMP issues (#5163)

Fixed

  • Metric reduction with Logging (#5150)
  • Remove nan loss in manual optimization (#5121)
  • Un-balanced logging properly supported (#5119)
  • Fix hanging in DDP HPC accelerators (#5157)
  • Fix saved filename in ModelCheckpoint if it already exists (#4861)
  • Fix reset TensorRunningAccum (#5106)
  • Updated DALIClassificationLoader to not use deprecated arguments (#4925)
  • Corrected call to torch.no_grad (#5124)

Contributors

@8greg8, @ananthsub, @borisdayma, @gan3sh500, @rohitgr7, @SeanNaren, @tchaton, @VinhLoiIT

If we forgot someone due to not matching commit email with GitHub account, let us know :]

standard weekly patch release

15 Dec 23:32
748a74e
Compare
Choose a tag to compare

Overview

Detail changes

Added

  • Add a notebook example to reach a quick baseline of ~94% accuracy on CIFAR10 using Resnet in Lightning (#4818)

Changed

  • Simplify accelerator steps (#5015)
  • Refactor load in checkpoint connector (#4593)

Removed

  • Drop duplicate metrics (#5014)
  • Remove beta arg from F1 class and functional (#5076)

Fixed

  • Fixed trainer by default None in DDPAccelerator (#4915)
  • Fixed LightningOptimizer to expose optimizer attributes (#5095)
  • Do not warn when the name key is used in the lr_scheduler dict (#5057)
  • Check if optimizer supports closure (#4981)
  • Extend LightningOptimizer to exposure underlying Optimizer attributes + update doc (#5095)
  • Add deprecated metric utility functions back to functional (#5067, #5068)
  • Allow any input in to_onnx and to_torchscript (#4378)
  • Do not warn when the name key is used in the lr_scheduler dict (#5057)

Contributors

@Borda, @carmocca, @hemildesai, @rohitgr7, @s-rog, @tarepan, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Model Parallelism Training and More Logging Options

10 Dec 01:05
cdbddbe
Compare
Choose a tag to compare

Overview

Lightning 1.1 is out! You can now train models with twice the parameters and zero code changes with the new sharded model training! We also have a new plugin for sequential model parallelism, more logging options, and a lot of improvements!
Release highlights: https://bit.ly/3gyLZpP

Learn more about sharded training: https://bit.ly/2W3hgI0

Detail changes

Added

  • Added "monitor" key to saved ModelCheckpoints (#4383)
  • Added ConfusionMatrix class interface (#4348)
  • Added multiclass AUROC metric (#4236)
  • Added global step indexing to the checkpoint name for a better sub-epoch checkpointing experience (#3807)
  • Added optimizer hooks in callbacks (#4379)
  • Added option to log momentum (#4384)
  • Added current_score to ModelCheckpoint.on_save_checkpoint (#4721)
  • Added logging using self.log in train and evaluation for epoch end hooks (#4913)
  • Added ability for DDP plugin to modify optimizer state saving (#4675)
  • Added casting to python types for NumPy scalars when logging hparams (#4647)
  • Added prefix argument in loggers (#4557)
  • Added printing of total num of params, trainable and non-trainable params in ModelSummary (#4521)
  • Added PrecisionRecallCurve, ROC, AveragePrecision class metric (#4549)
  • Added custom Apex and NativeAMP as Precision plugins (#4355)
  • Added DALI MNIST example (#3721)
  • Added sharded plugin for DDP for multi-GPU training memory optimizations (#4773)
  • Added experiment_id to the NeptuneLogger (#3462)
  • Added Pytorch Geometric integration example with Lightning (#4568)
  • Added all_gather method to LightningModule which allows gradient-based tensor synchronizations for use-cases such as negative sampling. (#5012)
  • Enabled self.log in most functions (#4969)
  • Added changeable extension variable for ModelCheckpoint (#4977)

Changed

  • Removed multiclass_roc and multiclass_precision_recall_curve, use roc and precision_recall_curve instead (#4549)
  • Tuner algorithms will be skipped if fast_dev_run=True (#3903)
  • WandbLogger does not force wandb reinit arg to True anymore and creates a run only when needed (#4648)
  • Changed automatic_optimization to be a model attribute (#4602)
  • Changed Simple Profiler report to order by percentage time spent + num calls (#4880)
  • Simplify optimization Logic (#4984)
  • Classification metrics overhaul (#4837)
  • Updated fast_dev_run to accept integer representing num_batches (#4629)
  • Refactored optimizer (#4658)

Deprecated

  • Deprecated prefix argument in ModelCheckpoint (#4765)
  • Deprecated the old way of assigning hyper-parameters through self.hparams = ... (#4813)
  • Deprecated mode='auto' from ModelCheckpoint and EarlyStopping (#4695)

Removed

  • Removed reorder parameter of the auc metric (#5004)

Fixed

  • Added feature to move tensors to CPU before saving (#4309)
  • Fixed LoggerConnector to have logged metrics on root device in DP (#4138)
  • Auto convert tensors to contiguous format when gather_all (#4907)
  • Fixed PYTHONPATH for DDP test model (#4528)
  • Fixed allowing logger to support indexing (#4595)
  • Fixed DDP and manual_optimization (#4976)

Contributors

@ananyahjha93, @awaelchli, @blatr, @Borda, @borisdayma, @carmocca, @ddrevicky, @george-gca, @gianscarpe, @irustandi, @janhenriklambrechts, @jeremyjordan, @justusschock, @lezwon, @rohitgr7, @s-rog, @SeanNaren, @SkafteNicki, @tadejsv, @tchaton, @williamFalcon, @zippeurfou

If we forgot someone due to not matching commit email with GitHub account, let us know :]