Optimizer Config

Optimizer Config#

lightning CLI only supports one optimizer and at most one lr scheduler using --optimizer and --lr_scheduler flags, which may not satisfy our needs sometimes.

Therefore, we added a new flag named --optimizer_config to support more complex optimizer configurations. The value of --optimizer_config flag is a very complex object, let’s describe it step by step.

Overview#

First, we display the complete config object here, so you can get the whole picture, and jump back to here anytime you read the following context.

optimizer_config:
    -   optimizer:
            class_path: torch.optim.AdamW
            init_args:
                params:
                    -   params: backbone
                        lr: 1e-4
                    -   params: [ backbone.layer1, backbone.layer2 ]
                        weight_decay: 1e-4
                    -   null
                lr: 1e-3
                weight_decay: 1e-2
        frequency: null
        lr_scheduler:
            scheduler:
                class_path: torch.optim.lr_scheduler.MultiStepLR
                init_args:
                    milestones: [8, 11]
            interval: epoch
            frequency: 1
            monitor: val_loss
            strict: True
            name: None
            warmup_config:
                scheduler:
                    class_path: lightning_template.utils.optim.WarmupScheduler
                    init_args:
                        warmup_iters: 500
                frequency: 1
    -   optimizer:
            class_path: torch.optim.AdamW
            init_args:
                params:
                    -   params: backbone
                        lr: 1e-4
                    -   params: [ backbone.layer1, backbone.layer2 ]
                        weight_decay: 1e-4
                    -   null
                lr: 1e-3
                weight_decay: 1e-2
        frequency: null
        lr_scheduler:
            scheduler:
                class_path: torch.optim.lr_scheduler.MultiStepLR
                init_args:
                    milestones: [8, 11]
            interval: epoch
            frequency: 1
            monitor: val_loss
            strict: True
            name: None
            warmup_config:
                scheduler:
                    class_path: lightning_template.utils.optim.WarmupScheduler
                    init_args:
                        warmup_iters: 500
                frequency: 1

Single optimize config#

As described in Overview, the value of --optimizer_config flag is a very complex object, let’s describe it level by level. First of all, the value should be a single optimize_config dict or a list of optimize_config dict, a single optimize_config dict is equal to a list with only one item which is a optimize_config dict.

optimizer_config:
    <a single optimize_config object>
optimizer_config:
    -   <a single optimize_config object>
    -   <a single optimize_config object>
    -   <a single optimize_config object>

A optimize_config dict can contain three keys, which are optimizer frequency and lr_scheduler, with values <a optimizer config object> <null or int> and <a lightning lr scheduler config object>.

# optimize_config object
optimizer:
    <a optimizer config object>
frequency: <null or int>
lr_scheduler:
    <a lightning lr scheduler config object>

The frequency and lr_scheduler key are optional, so <a optimizer config object> can also be put here and it will be parsed as {'optimizer': <an optimizer config object> }, which means that the following optimize_config dict

# optimize_config object
<a optimizer config object>

will be treated as

# optimize_config object
optimizer:
    <a optimizer config object>

`frequency` key#

The frequency key can only used when there are multiple optimizers, and it has to be either set to None for all optimizers or set to int for all optimizers, it will raise an error if the values of frequency of some optimizers have been set to None and others have been set to int.

`frequency` key is None#

When all frequency are set to None, every optimizer will be used to update the model on every iteration.

`frequency` key is int#

When all frequency are set to int, only one optimizer will be selected to update the model on every iteration according to the batch index.

For example, if there are two optimizers with frequency equal to 2 and 3 respectively. On every 5 batches, the first optimizer will selected on the first 2 batches and the second optimizer will be selected on the last 3 batches. For every batch, only the selected optimizer will be used to update the model. Therefore, setting all frequecy to None is not equal to setting all of them to 1.

Optimizer config#

<a optimizer config object> represents an optimizer following lightning CLI instantiate_class arguments format, which means it contains two keys named class_path and init_args. class_path is an import str to the class, init_args is optional, if exists its value will be used to instantiate the class. For more details, see arguments with class type doc.

However, there is no way to specify the parameters for optimizers in the lightning CLI instantiate_class arguments format, especially when there are many optimizers. Therefore, we add a method to support this. We use a str or None or List[str, None] to represent a list of the model’s parameters. a str represents a list of the model’s parameters with a name starts with this str, but a parameter will only appear once, so if multiple strs match the same parameter, this parameter will be matched by the longest str. If some parameters are not matched by any str, it will be matched by None.

For example, if a model has a fc layer and a backbone that contains layers 0-3. The following optimizer config

# optimizer config object
class_path: torch.optim.AdamW
init_args:
    params:
        -   params: backbone
            lr: 1e-4
        -   params: [ backbone.layer1, backbone.layer2 ]
            weight_decay: 1e-4
        -   null
    lr: 1e-3
    weight_decay: 1e-2

will construct an optimizer with three params groups as follows:

-   [ backbone.layer0, backbone.layer3 ]
-   [ backbone.layer1, backbone.layer2 ]
-   [ fc ]

Lightning lr scheduler config#

<a lightning lr scheduler config object> represents a lightning lr scheduler, which contains several keys named scheduler, interval, frequency, etc. All keys other than scheduler are optional, and their default value is as follows, for more details, see configure optimizers doc:

# lightning lr scheduler config object
lr_scheduler:
    scheduler:
        <a lr scheduler config object>
    interval: epoch
    frequency: 1
    monitor: val_loss
    strict: True
    name: null
    warmup_config:
        <a warmup lr scheduler config object>

In fact, <a lightning lr scheduler config object> also contains opt_idx and reduce_on_plateau keys, but lightning will set them automatically, so we do not need to set them manually.

lr scheduler config#

For the optimizer config, we use lightning CLI instantiate_class arguments format to represent a lr scheduler. The optimizer argument will be set to the optimizer in the same optim_config object, so there is no need to set it manually.

For example, a typical lr scheduler config will look like this:

# lr scheduler config object
class_path: torch.optim.lr_scheduler.MultiStepLR
init_args:
    milestones: [8, 11]

warmup lr scheduler config#

A warmup lr scheduler (implemented by this method) config is a part-support Lightning lr scheduler config, which means only the scheduler and frequency key are supported, and the interval will be set to step forcefully.

A complete warmup lr scheduler config will look like this:

# warmup lr scheduler config object
warmup_config:
    scheduler:
        class_path: lightning_template.utils.optim.WarmupScheduler
        init_args:
            warmup_iters: 500
    frequency: 1

Since the frequecy is optional, you can omit it, and use a lr scheduler config as a warmup lr scheduler config. Therefore, it will look like:

# warmup lr scheduler config object
warmup_config:
    class_path: lightning_template.utils.optim.WarmupScheduler
    init_args:
        warmup_iters: 500

Furthermore, if you use the lightning_template.utils.optim.WarmupScheduler as warmup scheduler, you can omit it also, now the warmup scheduler config will look like this:

# warmup lr scheduler config object
warmup_config:
    warmup_iters: 500

For more detail of lightning_template.utils.optim.WarmupScheduler, see the source code