Utilities¶
param_groups_weight_decay ¶
param_groups_weight_decay(
model, weight_decay=0.01, additional_layers=None
)
Creates parameter groups, excluding bias and normalization layers from weight decay.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
Module
|
Model to optimize |
required |
weight_decay |
float
|
Weight decay coefficient (default: 1e-2) |
0.01
|
additional_layers |
Iterable[str] | None
|
Additional layer names to exclude from weight decay (default: None) |
None
|
Returns:
Type | Description |
---|---|
list[dict[str, Any]]
|
List of parameter groups with and without weight decay. |
param_groups_weight_decay
is adapted from timm's optimizer factory methods.
Example¶
param_groups_weight_decay
takes a model and returns two optimizer parameter group dictionaries. One with bias and normalization terms without weight decay and another dictionary with the rest of the model parameters with weight decay. The weight_decay
passed to param_groups_weight_decay
will override the optimizer's default weight decay.
params = param_groups_weight_decay(model, weigh_decay=1e-5)
optimizer = StableAdamW(params, decouple_lr=True)
prepare_for_gradient_release ¶
prepare_for_gradient_release(
model, optimizer, ignore_existing_hooks=False
)
Register post_accumulate_grad_hooks on parameters for the gradient release optimization step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
Module
|
Model to register post_accumulate_grad_hooks. Only registers on parameters with
|
required |
optimizer |
OptimiOptimizer
|
Optimizer providing the fused optimizer step during the backward pass. Requires
optimizer to be initialized with |
required |
ignore_existing_hooks |
bool
|
If True, ignores existing post_accumulate_grad_hooks on parameters and registers gradient release hooks (default: False) |
False
|
For details on using prepare_for_gradient_release
, please see the gradient release docs.
remove_gradient_release ¶
remove_gradient_release(model)
Removes post_accumulate_grad_hooks created by prepare_for_gradient_release
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
Module
|
Model to remove gradient release post_accumulate_grad_hooks from. |
required |
For details on using remove_gradient_release
, please see the gradient release docs.