Releases: Lightning-AI/pytorch-lightning
Pruning & Quantization & SWA
[1.2.0] - 2021-02-18
Added
- Added
DataType
,AverageMethod
andMDMCAverageMethod
enum in metrics (#5657) - Added support for summarized model total params size in megabytes (#5590)
- Added support for multiple train loaders (#1959)
- Added
Accuracy
metric now generalizes to Top-k accuracy for (multi-dimensional) multi-class inputs using thetop_k
parameter (#4838) - Added
Accuracy
metric now enables the computation of subset accuracy for multi-label or multi-dimensional multi-class inputs with thesubset_accuracy
parameter (#4838) - Added
HammingDistance
metric to compute the hamming distance (loss) (#4838) - Added
max_fpr
parameter toauroc
metric for computing partial auroc metric (#3790) - Added
StatScores
metric to compute the number of true positives, false positives, true negatives and false negatives (#4839) - Added
R2Score
metric (#5241) - Added
LambdaCallback
(#5347) - Added
BackboneLambdaFinetuningCallback
(#5377) - Accelerator
all_gather
supports collection (#5221) - Added
image_gradients
functional metric to compute the image gradients of a given input image. (#5056) - Added
MetricCollection
(#4318) - Added
.clone()
method to metrics (#4318) - Added
IoU
class interface (#4704) - Support to tie weights after moving model to TPU via
on_post_move_to_device
hook - Added missing val/test hooks in
LightningModule
(#5467) - The
Recall
andPrecision
metrics (and their functional counterpartsrecall
andprecision
) can now be generalized to Recall@K and Precision@K with the use oftop_k
parameter (#4842) - Added
ModelPruning
Callback (#5618, #5825, #6045) - Added
PyTorchProfiler
(#5560) - Added compositional metrics (#5464)
- Added Trainer method
predict(...)
for high performence predictions (#5579) - Added
on_before_batch_transfer
andon_after_batch_transfer
data hooks (#3671) - Added AUC/AUROC class interface (#5479)
- Added
PredictLoop
object (#5752) - Added
QuantizationAwareTraining
callback (#5706, #6040) - Added
LightningModule.configure_callbacks
to enable the definition of model-specific callbacks (#5621) - Added
dim
toPSNR
metric for mean-squared-error reduction (#5957) - Added promxial policy optimization template to pl_examples (#5394)
- Added
log_graph
toCometLogger
(#5295) - Added possibility for nested loaders (#5404)
- Added
sync_step
to Wandb logger (#5351) - Added
StochasticWeightAveraging
callback (#5640) - Added
LightningDataModule.from_datasets(...)
(#5133) - Added
PL_TORCH_DISTRIBUTED_BACKEND
env variable to select backend (#5981) - Added
Trainer
flag to activate Stochastic Weight Averaging (SWA)Trainer(stochastic_weight_avg=True)
(#6038) - Added DeepSpeed integration (#5954, #6042)
Changed
- Changed
stat_scores
metric now calculates stat scores over all classes and gains new parameters, in line with the newStatScores
metric (#4839) - Changed
computer_vision_fine_tunning
example to useBackboneLambdaFinetuningCallback
(#5377) - Changed
automatic casting
for LoggerConnectormetrics
(#5218) - Changed
iou
[func] to allow float input (#4704) - Metric
compute()
method will no longer automatically callreset()
(#5409) - Set PyTorch 1.4 as min requirements, also for testing and examples
torchvision>=0.5
andtorchtext>=0.5
(#5418) - Changed
callbacks
argument inTrainer
to allowCallback
input (#5446) - Changed the default of
find_unused_parameters
toFalse
in DDP (#5185) - Changed
ModelCheckpoint
version suffixes to start at 1 (#5008) - Progress bar metrics tensors are now converted to float (#5692)
- Changed the default value for the
progress_bar_refresh_rate
Trainer argument in Google COLAB notebooks to 20 (#5516) - Extended support for purely iteration-based training (#5726)
- Made
LightningModule.global_rank
,LightningModule.local_rank
andLightningModule.logger
read-only properties (#5730) - Forced
ModelCheckpoint
callbacks to run after all others to guarantee all states are saved to the checkpoint (#5731) - Refactored Accelerators and Plugins (#5743)
- Added base classes for plugins (#5715)
- Added parallel plugins for DP, DDP, DDPSpawn, DDP2 and Horovod (#5714)
- Precision Plugins (#5718)
- Added new Accelerators for CPU, GPU and TPU (#5719)
- Added Plugins for TPU training (#5719)
- Added RPC and Sharded plugins (#5732)
- Added missing
LightningModule
-wrapper logic to new plugins and accelerator (#5734) - Moved device-specific teardown logic from training loop to accelerator (#5973)
- Moved accelerator_connector.py to the connectors subfolder (#6033)
- Trainer only references accelerator (#6039)
- Made parallel devices optional across all plugins (#6051)
- Cleaning (#5948, #5949, #5950)
- Enabled
self.log
in callbacks (#5094) - Renamed xxx_AVAILABLE as protected (#5082)
- Unified module names in Utils (#5199)
- Separated utils: imports & enums (#5256, #5874)
- Refactor: clean trainer device & distributed getters (#5300)
- Simplified training phase as LightningEnum (#5419)
- Updated metrics to use LightningEnum (#5689)
- Changed the seq of
on_train_batch_end
,on_batch_end
&on_train_epoch_end
,on_epoch_end hooks
(#5688) - Refactored
setup_training
and removetest_mode
(#5388) - Disabled training with zero
num_training_batches
when insufficientlimit_train_batches
(#5703) - Refactored
EpochResultStore
(#5522) - Update
lr_finder
to check for attribute if not runningfast_dev_run
(#5990) - LightningOptimizer manual optimizer is more flexible and expose
toggle_model
(#5771) MlflowLogger
limit parameter value length to 250 char (#5893)- Re-introduced fix for Hydra directory sync with multiple process (#5993)
Deprecated
- Function
stat_scores_multiple_classes
is deprecated in favor ofstat_scores
(#4839) - Moved accelerators and plugins to its
legacy
pkg (#5645) - Deprecated
LightningDistributedDataParallel
in favor of new wrapper moduleLightningDistributedModule
(#5185) - Deprecated
LightningDataParallel
in favor of new wrapper moduleLightningParallelModule
(#5670) - Renamed utils modules (#5199)
argparse_utils
>>argparse
model_utils
>>model_helpers
warning_utils
>>warnings
xla_device_utils
>>xla_device
- Deprecated using
'val_loss'
to set theModelCheckpoint
monitor (#6012) - Deprecated
.get_model()
with explicit.lightning_module
property (#6035) - Deprecated Trainer attribute
accelerator_backend
in favor ofaccelerator
(#6034)
Removed
- Removed deprecated checkpoint argument
filepath
(#5321) - Removed deprecated
Fbeta
,f1_score
andfbeta_score
metrics (#5322) - Removed deprecated
TrainResult
(#5323) - Removed deprecated
EvalResult
(#5633) - Removed
LoggerStages
(#5673)
Fixed
- Fixed distributed setting and
ddp_cpu
only withnum_processes>1
(#5297) - Fixed the saved filename in
ModelCheckpoint
when it already exists (#4861) - Fixed
DDPHPCAccelerator
hangs in DDP construction by callinginit_device
(#5157) - Fixed
num_workers
for Windows example (#5375) - Fixed loading yaml (#5619)
- Fixed support custom DataLoader with DDP if they can be re-instantiated (#5745)
- Fixed repeated
.fit()
calls ignore max_steps iteration bound (#5936) - Fixed throwing
MisconfigurationError
on unknown mode (#5255) - Resolve bug with Finetuning (#5744)
- Fixed
ModelCheckpoint
race condition in file existence check (#5155) - Fixed some compatibility with PyTorch 1.8 (#5864)
- Fixed forward cache (#5895)
- Fixed recursive detach of tensors to CPU (#6007)
- Fixed passing wrong strings for scheduler interval doesn't throw an error (#5923)
- Fixed wrong
requires_grad
state afterreturn None
with multiple optimizers (#5738) - Fixed add
on_epoch_end
hook at the end ofvalidation
,test
epoch (#5986) - Fixed missing
process_dataloader
call forTPUSpawn
when in distributed mode (#6015) - Fixed progress bar flickering by appending 0 to floats/strings (#6009)
- Fixed synchronization issues with TPU training (#6027)
- Fixed
hparams.yaml
saved twice when usingTensorBoardLogger
(#5953) - Fixed basic examples (#5912, #5985)
- Fixed
fairscale
compatible with PT 1.8 (#5996) - Ensured
process_dataloader
is called whentpu_cores > 1
to use Parallel DataLoader (#6015) - Attempted SLURM auto resume call when non-shell call fails (#6002)
- Fixed wrapping optimizers upon assignment (#6006)
- Fixed allowing hashing of metrics with lists in their state (#5939)
Contributors
@alanhdu, @ananthsub, @awaelchli, @Borda, @borisdayma, @carmocca, @ddrevicky, @deng-cy, @ducthienbui97, @justusschock, @kartik4949, @kaushikb11, @manipopopo, @marload, @neighthan, @peblair, @prampey, @pranjaldatta, @rohitgr7, @SeanNaren, @sid-sundrani, @SkafteNicki, @tadejsv, @tchaton, @teddykoker, @titu1994, @yuntai
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
Standard weekly patch release
[1.1.7] - 2021-02-03
Fixed
- Fixed
TensorBoardLogger
not closingSummaryWriter
onfinalize
(#5696) - Fixed filtering of pytorch "unsqueeze" warning when using DP (#5622)
- Fixed
num_classes
argument in F1 metric (#5663) - Fixed
log_dir
property (#5537) - Fixed a race condition in
ModelCheckpoint
when checking if a checkpoint file exists (#5144) - Remove unnecessary intermediate layers in Dockerfiles (#5697)
- Fixed auto learning rate ordering (#5638)
Contributors
@awaelchli @guillochon @noamzilo @rohitgr7 @SkafteNicki @sumanthratna
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.1.6] - 2021-01-26
Changed
- Increased TPU check timeout from 20s to 100s (#5598)
- Ignored
step
param in Neptune logger's log_metric method (#5510) - Pass batch outputs to
on_train_batch_end
instead ofepoch_end
outputs (#4369)
Fixed
- Fixed
toggle_optimizer
to resetrequires_grad
state (#5574) - Fixed FileNotFoundError for best checkpoint when using DDP with Hydra (#5629)
- Fixed an error when logging a progress bar metric with a reserved name (#5620)
- Fixed
Metric
'sstate_dict
not included when child modules (#5614) - Fixed Neptune logger creating multiple experiments when GPUs > 1 (#3256)
- Fixed duplicate logs appearing in console when using the python logging module (#5509)
- Fixed tensor printing in
trainer.test()
(#5138) - Fixed not using dataloader when
hparams
present (#4559)
Contributors
@awaelchli @bryant1410 @lezwon @manipopopo @PiotrJander @psinger @rnett @SeanNaren @swethmandava @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.1.5] - 2021-01-19
Fixed
- Fixed a visual bug in the progress bar display initialization (#4579)
- Fixed logging
on_train_batch_end
in a callback with multiple optimizers (#5521) - Fixed
reinit_scheduler_properties
with correct optimizer (#5519) - Fixed
val_check_interval
withfast_dev_run
(#5540)
Contributors
@awaelchli, @carmocca, @rohitgr7
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.1.4] - 2021-01-12
Added
- Add automatic optimization property setter to lightning module (#5169)
Changed
- Changed deprecated
enable_pl_optimizer=True
(#5244)
Fixed
- Fixed
transfer_batch_to_device
for DDP withlen(devices_ids) == 1
(#5195) - Logging only on
not should_accumulate()
during training (#5417) - Resolve interpolation bug with Hydra (#5406)
- Check environ before selecting a seed to prevent warning message (#4743)
Contributors
@ananthsub, @SeanNaren, @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.1.3] - 2021-01-05
Added
- Added a check for optimizer attached to
lr_scheduler
(#5338) - Added support for passing non-existing
filepaths
toresume_from_checkpoint
(#4402)
Changed
- Skip restore from
resume_from_checkpoint
whiletesting
(#5161) - Allowed
log_momentum
for adaptive optimizers inLearningRateMonitor
(#5333) - Disabled checkpointing, earlystopping and logging with
fast_dev_run
(#5277) - Distributed group defaults to
WORLD
ifNone
(#5125)
Fixed
- Fixed
trainer.test
returning non-test metrics (#5214) - Fixed metric state reset (#5273)
- Fixed
--num-nodes
onDDPSequentialPlugin
(#5327) - Fixed invalid value for
weights_summary
(#5296) - Fixed
Trainer.test
not using the latestbest_model_path
(#5161) - Fixed existence check for
hparams
not using underlying filesystem (#5250) - Fixed
LightningOptimizer
AMP bug (#5191) - Fixed casted key to string in
_flatten_dict
(#5354)
Contributors
@8greg8, @haven-jeon, @kandluis, @marload, @rohitgr7, @tadejsv, @tarepan, @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]
standard weekly patch release
Overview
Detail changes
Added
- Support number for logging with
sync_dist=True
(#5080) - Added offset logging step when resuming for Wandb logger (#5050)
Removed
enable_pl_optimizer=False
by default to temporarily fix AMP issues (#5163)
Fixed
- Metric reduction with Logging (#5150)
- Remove nan loss in manual optimization (#5121)
- Un-balanced logging properly supported (#5119)
- Fix hanging in DDP HPC accelerators (#5157)
- Fix saved filename in
ModelCheckpoint
if it already exists (#4861) - Fix reset
TensorRunningAccum
(#5106) - Updated
DALIClassificationLoader
to not use deprecated arguments (#4925) - Corrected call to
torch.no_grad
(#5124)
Contributors
@8greg8, @ananthsub, @borisdayma, @gan3sh500, @rohitgr7, @SeanNaren, @tchaton, @VinhLoiIT
If we forgot someone due to not matching commit email with GitHub account, let us know :]
standard weekly patch release
Overview
Detail changes
Added
- Add a notebook example to reach a quick baseline of ~94% accuracy on CIFAR10 using Resnet in Lightning (#4818)
Changed
Removed
Fixed
- Fixed trainer by default
None
inDDPAccelerator
(#4915) - Fixed
LightningOptimizer
to expose optimizer attributes (#5095) - Do not warn when the
name
key is used in thelr_scheduler
dict (#5057) - Check if optimizer supports closure (#4981)
- Extend LightningOptimizer to exposure underlying Optimizer attributes + update doc (#5095)
- Add deprecated metric utility functions back to functional (#5067, #5068)
- Allow any input in
to_onnx
andto_torchscript
(#4378) - Do not warn when the name key is used in the
lr_scheduler
dict (#5057)
Contributors
@Borda, @carmocca, @hemildesai, @rohitgr7, @s-rog, @tarepan, @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Model Parallelism Training and More Logging Options
Overview
Lightning 1.1 is out! You can now train models with twice the parameters and zero code changes with the new sharded model training! We also have a new plugin for sequential model parallelism, more logging options, and a lot of improvements!
Release highlights: https://bit.ly/3gyLZpP
Learn more about sharded training: https://bit.ly/2W3hgI0
Detail changes
Added
- Added "monitor" key to saved
ModelCheckpoints
(#4383) - Added
ConfusionMatrix
class interface (#4348) - Added multiclass AUROC metric (#4236)
- Added global step indexing to the checkpoint name for a better sub-epoch checkpointing experience (#3807)
- Added optimizer hooks in callbacks (#4379)
- Added option to log momentum (#4384)
- Added
current_score
toModelCheckpoint.on_save_checkpoint
(#4721) - Added logging using
self.log
in train and evaluation for epoch end hooks (#4913) - Added ability for DDP plugin to modify optimizer state saving (#4675)
- Added casting to python types for NumPy scalars when logging
hparams
(#4647) - Added
prefix
argument in loggers (#4557) - Added printing of total num of params, trainable and non-trainable params in ModelSummary (#4521)
- Added
PrecisionRecallCurve, ROC, AveragePrecision
class metric (#4549) - Added custom
Apex
andNativeAMP
asPrecision plugins
(#4355) - Added
DALI MNIST
example (#3721) - Added
sharded plugin
for DDP for multi-GPU training memory optimizations (#4773) - Added
experiment_id
to the NeptuneLogger (#3462) - Added
Pytorch Geometric
integration example with Lightning (#4568) - Added
all_gather
method toLightningModule
which allows gradient-based tensor synchronizations for use-cases such as negative sampling. (#5012) - Enabled
self.log
in most functions (#4969) - Added changeable extension variable for
ModelCheckpoint
(#4977)
Changed
- Removed
multiclass_roc
andmulticlass_precision_recall_curve
, useroc
andprecision_recall_curve
instead (#4549) - Tuner algorithms will be skipped if
fast_dev_run=True
(#3903) - WandbLogger does not force wandb
reinit
arg to True anymore and creates a run only when needed (#4648) - Changed
automatic_optimization
to be a model attribute (#4602) - Changed
Simple Profiler
report to order by percentage time spent + num calls (#4880) - Simplify optimization Logic (#4984)
- Classification metrics overhaul (#4837)
- Updated
fast_dev_run
to accept integer representing num_batches (#4629) - Refactored optimizer (#4658)
Deprecated
- Deprecated
prefix
argument inModelCheckpoint
(#4765) - Deprecated the old way of assigning hyper-parameters through
self.hparams = ...
(#4813) - Deprecated
mode='auto'
fromModelCheckpoint
andEarlyStopping
(#4695)
Removed
- Removed
reorder
parameter of theauc
metric (#5004)
Fixed
- Added feature to move tensors to CPU before saving (#4309)
- Fixed
LoggerConnector
to have logged metrics on root device in DP (#4138) - Auto convert tensors to contiguous format when
gather_all
(#4907) - Fixed
PYTHONPATH
for DDP test model (#4528) - Fixed allowing logger to support indexing (#4595)
- Fixed DDP and manual_optimization (#4976)
Contributors
@ananyahjha93, @awaelchli, @blatr, @Borda, @borisdayma, @carmocca, @ddrevicky, @george-gca, @gianscarpe, @irustandi, @janhenriklambrechts, @jeremyjordan, @justusschock, @lezwon, @rohitgr7, @s-rog, @SeanNaren, @SkafteNicki, @tadejsv, @tchaton, @williamFalcon, @zippeurfou
If we forgot someone due to not matching commit email with GitHub account, let us know :]