lightning_template.utils.callbacks.model_checkpoint
===================================================

.. py:module:: lightning_template.utils.callbacks.model_checkpoint


Classes
-------

.. autoapisummary::

   lightning_template.utils.callbacks.model_checkpoint.ModelCheckpointWithLinkBest


Module Contents
---------------

.. py:class:: ModelCheckpointWithLinkBest(save_best: Optional[bool] = None, *args, **kwargs)

   Bases: :py:obj:`lightning.pytorch.callbacks.ModelCheckpoint`


   Save the model periodically by monitoring a quantity. Every metric logged with
   :meth:`~lightning.pytorch.core.LightningModule.log` or :meth:`~lightning.pytorch.core.LightningModule.log_dict` is
   a candidate for the monitor key. For more information, see :ref:`checkpointing`.

   After training finishes, use :attr:`best_model_path` to retrieve the path to the
   best checkpoint file and :attr:`best_model_score` to retrieve its score.

   :param dirpath: directory to save the model file.

                   Example::

                       # custom path
                       # saves a file like: my/path/epoch=0-step=10.ckpt
                       >>> checkpoint_callback = ModelCheckpoint(dirpath='my/path/')

                   By default, dirpath is ``None`` and will be set at runtime to the location
                   specified by :class:`~lightning.pytorch.trainer.trainer.Trainer`'s
                   :paramref:`~lightning.pytorch.trainer.trainer.Trainer.default_root_dir` argument,
                   and if the Trainer uses a logger, the path will also contain logger name and version.
   :param filename: checkpoint filename. Can contain named formatting options to be auto-filled.

                    Example::

                        # save any arbitrary metrics like `val_loss`, etc. in name
                        # saves a file like: my/path/epoch=2-val_loss=0.02-other_metric=0.03.ckpt
                        >>> checkpoint_callback = ModelCheckpoint(
                        ...     dirpath='my/path',
                        ...     filename='{epoch}-{val_loss:.2f}-{other_metric:.2f}'
                        ... )

                    By default, filename is ``None`` and will be set to ``'{epoch}-{step}'``, where "epoch" and "step" match
                    the number of finished epoch and optimizer steps respectively.
   :param monitor: quantity to monitor. By default it is ``None`` which saves a checkpoint only for the last epoch.
   :param verbose: verbosity mode. Default: ``False``.
   :param save_last: When ``True``, saves a `last.ckpt` copy whenever a checkpoint file gets saved. Can be set to
                     ``'link'`` on a local filesystem to create a symbolic link. This allows accessing the latest checkpoint
                     in a deterministic manner. Default: ``None``.
   :param save_top_k: if ``save_top_k == k``,
                      the best k models according to the quantity monitored will be saved.
                      If ``save_top_k == 0``, no models are saved.
                      If ``save_top_k == -1``, all models are saved.
                      Please note that the monitors are checked every ``every_n_epochs`` epochs.
                      If ``save_top_k >= 2`` and the callback is called multiple times inside an epoch, and the filename remains
                      unchanged, the name of the saved file will be appended with a version count starting with ``v1`` to avoid
                      collisions unless ``enable_version_counter`` is set to False. The version counter is unrelated to the top-k
                      ranking of the checkpoint, and we recommend formatting the filename to include the monitored metric to avoid
                      collisions.
   :param mode: one of {min, max}.
                If ``save_top_k != 0``, the decision to overwrite the current save file is made
                based on either the maximization or the minimization of the monitored quantity.
                For ``'val_acc'``, this should be ``'max'``, for ``'val_loss'`` this should be ``'min'``, etc.
   :param auto_insert_metric_name: When ``True``, the checkpoints filenames will contain the metric name.
                                   For example, ``filename='checkpoint_{epoch:02d}-{acc:02.0f}`` with epoch ``1`` and acc ``1.12`` will resolve
                                   to ``checkpoint_epoch=01-acc=01.ckpt``. Is useful to set it to ``False`` when metric names contain ``/``
                                   as this will result in extra folders.
                                   For example, ``filename='epoch={epoch}-step={step}-val_acc={val/acc:.2f}', auto_insert_metric_name=False``
   :param save_weights_only: if ``True``, then only the model's weights will be
                             saved. Otherwise, the optimizer states, lr-scheduler states, etc are added in the checkpoint too.
   :param every_n_train_steps: Number of training steps between checkpoints.
                               If ``every_n_train_steps == None or every_n_train_steps == 0``, we skip saving during training.
                               To disable, set ``every_n_train_steps = 0``. This value must be ``None`` or non-negative.
                               This must be mutually exclusive with ``train_time_interval`` and ``every_n_epochs``.
   :param train_time_interval: Checkpoints are monitored at the specified time interval.
                               For all practical purposes, this cannot be smaller than the amount
                               of time it takes to process a single training batch. This is not
                               guaranteed to execute at the exact time specified, but should be close.
                               This must be mutually exclusive with ``every_n_train_steps`` and ``every_n_epochs``.
   :param every_n_epochs: Number of epochs between checkpoints.
                          This value must be ``None`` or non-negative.
                          To disable saving top-k checkpoints, set ``every_n_epochs = 0``.
                          This argument does not impact the saving of ``save_last=True`` checkpoints.
                          If all of ``every_n_epochs``, ``every_n_train_steps`` and
                          ``train_time_interval`` are ``None``, we save a checkpoint at the end of every epoch
                          (equivalent to ``every_n_epochs = 1``).
                          If ``every_n_epochs == None`` and either ``every_n_train_steps != None`` or ``train_time_interval != None``,
                          saving at the end of each epoch is disabled
                          (equivalent to ``every_n_epochs = 0``).
                          This must be mutually exclusive with ``every_n_train_steps`` and ``train_time_interval``.
                          Setting both ``ModelCheckpoint(..., every_n_epochs=V, save_on_train_epoch_end=False)`` and
                          ``Trainer(max_epochs=N, check_val_every_n_epoch=M)``
                          will only save checkpoints at epochs 0 < E <= N
                          where both values for ``every_n_epochs`` and ``check_val_every_n_epoch`` evenly divide E.
   :param save_on_train_epoch_end: Whether to run checkpointing at the end of the training epoch.
                                   If this is ``False``, then the check runs at the end of the validation.
   :param enable_version_counter: Whether to append a version to the existing file name.
                                  If this is ``False``, then the checkpoint files will be overwritten.

   .. note::

      For extra customization, ModelCheckpoint includes the following attributes:
      
      - ``CHECKPOINT_JOIN_CHAR = "-"``
      - ``CHECKPOINT_EQUALS_CHAR = "="``
      - ``CHECKPOINT_NAME_LAST = "last"``
      - ``FILE_EXTENSION = ".ckpt"``
      - ``STARTING_VERSION = 1``
      
      For example, you can change the default last checkpoint name by doing
      ``checkpoint_callback.CHECKPOINT_NAME_LAST = "{epoch}-last"``
      
      If you want to checkpoint every N hours, every M train batches, and/or every K val epochs,
      then you should create multiple ``ModelCheckpoint`` callbacks.
      
      If the checkpoint's ``dirpath`` changed from what it was before while resuming the training,
      only ``best_model_path`` will be reloaded and a warning will be issued.
      
      If you provide a ``filename`` on a mounted device where changing permissions is not allowed (causing ``chmod``
      to raise a ``PermissionError``), install `fsspec>=2025.5.0`. Then the error is caught, the file's permissions
      remain unchanged, and the checkpoint is still saved. Otherwise, no checkpoint will be saved and training stops.

   :raises MisconfigurationException: If ``save_top_k`` is smaller than ``-1``,
       if ``monitor`` is ``None`` and ``save_top_k`` is none of ``None``, ``-1``, and ``0``, or
       if ``mode`` is none of ``"min"`` or ``"max"``.
   :raises ValueError: If ``trainer.save_checkpoint`` is ``None``.

   Example::

       >>> from lightning.pytorch import Trainer
       >>> from lightning.pytorch.callbacks import ModelCheckpoint

       # saves checkpoints to 'my/path/' at every epoch
       >>> checkpoint_callback = ModelCheckpoint(dirpath='my/path/')
       >>> trainer = Trainer(callbacks=[checkpoint_callback])

       # save epoch and val_loss in name
       # saves a file like: my/path/sample-mnist-epoch=02-val_loss=0.32.ckpt
       >>> checkpoint_callback = ModelCheckpoint(
       ...     monitor='val_loss',
       ...     dirpath='my/path/',
       ...     filename='sample-mnist-{epoch:02d}-{val_loss:.2f}'
       ... )

       # save epoch and val_loss in name, but specify the formatting yourself (e.g. to avoid problems with Tensorboard
       # or Neptune, due to the presence of characters like '=' or '/')
       # saves a file like: my/path/sample-mnist-epoch02-val_loss0.32.ckpt
       >>> checkpoint_callback = ModelCheckpoint(
       ...     monitor='val/loss',
       ...     dirpath='my/path/',
       ...     filename='sample-mnist-epoch{epoch:02d}-val_loss{val/loss:.2f}',
       ...     auto_insert_metric_name=False
       ... )

       # retrieve the best checkpoint after training
       checkpoint_callback = ModelCheckpoint(dirpath='my/path/')
       trainer = Trainer(callbacks=[checkpoint_callback])
       model = ...
       trainer.fit(model)
       checkpoint_callback.best_model_path

   .. tip:: Saving and restoring multiple checkpoint callbacks at the same time is supported under variation in the
       following arguments:

       *monitor, mode, every_n_train_steps, every_n_epochs, train_time_interval*

       Read more: :ref:`Persisting Callback State <extensions/callbacks_state:save callback state>`


   .. py:attribute:: CHECKPOINT_NAME_BEST
      :value: 'best'


   .. py:attribute:: save_best
      :value: None


   .. py:method:: _update_best_and_save(current: torch.Tensor, trainer: lightning.pytorch.Trainer, monitor_candidates: Dict[str, lightning.pytorch.utilities.types._METRIC]) -> None


   .. py:method:: _save_checkpoint(trainer: lightning.pytorch.Trainer, filepath: str) -> None


   .. py:method:: _save_last_checkpoint(trainer: lightning.pytorch.Trainer, monitor_candidates: Dict[str, torch.Tensor]) -> None


   .. py:method:: _save_best_checkpoint(trainer: lightning.pytorch.Trainer, monitor_candidates: Dict[str, lightning.pytorch.utilities.types._METRIC]) -> None