Skip to content

Utilities

param_groups_weight_decay

param_groups_weight_decay(
    model, weight_decay=0.01, additional_layers=None
)

Creates parameter groups, excluding bias and normalization layers from weight decay.

Parameters:

Name Type Description Default
model Module

Model to optimize

required
weight_decay float

Weight decay coefficient (default: 1e-2)

0.01
additional_layers Iterable[str] | None

Additional layer names to exclude from weight decay (default: None)

None

Returns:

Type Description
list[dict[str, Any]]

List of parameter groups with and without weight decay.

param_groups_weight_decay is adapted from timm's optimizer factory methods.

Example

param_groups_weight_decay takes a model and returns two optimizer parameter group dictionaries. One with bias and normalization terms without weight decay and another dictionary with the rest of the model parameters with weight decay. The weight_decay passed to param_groups_weight_decay will override the optimizer's default weight decay.

params = param_groups_weight_decay(model, weigh_decay=1e-5)
optimizer = StableAdamW(params, decouple_lr=True)

prepare_for_gradient_release

prepare_for_gradient_release(
    model, optimizer, ignore_existing_hooks=False
)

Register post_accumulate_grad_hooks on parameters for the gradient release optimization step.

Parameters:

Name Type Description Default
model Module

Model to register post_accumulate_grad_hooks. Only registers on parameters with requires_grad=True.

required
optimizer OptimiOptimizer

Optimizer providing the fused optimizer step during the backward pass. Requires optimizer to be initialized with gradient_release=True

required
ignore_existing_hooks bool

If True, ignores existing post_accumulate_grad_hooks on parameters and registers gradient release hooks (default: False)

False

For details on using prepare_for_gradient_release, please see the gradient release docs.

remove_gradient_release

remove_gradient_release(model)

Removes post_accumulate_grad_hooks created by prepare_for_gradient_release.

Parameters:

Name Type Description Default
model Module

Model to remove gradient release post_accumulate_grad_hooks from.

required

For details on using remove_gradient_release, please see the gradient release docs.