Skip to content

GLU Channel Mixer¤

discretax.channel_mixers.glu.GLU ¤

Gated Linear Unit (GLU) layer.

Attributes:

Name Type Description
w1

First linear layer.

w2

Second linear layer.

Source

https://arxiv.org/pdf/2002.05202

__init__(in_features: int, key: PRNGKeyArray, *args, out_features: int | None = None, use_bias: bool = True, **kwargs) ¤

Initialize the GLU layer.

Parameters:

Name Type Description Default
in_features int

dimensionality of the input features.

required
key PRNGKeyArray

JAX random key for initialization.

required
out_features int | None

optional dimensionality of the output features (defaults to in_features).

None
use_bias bool

whether to include a bias term in the linear layers.

True
*args

Additional positional arguments (ignored).

required
**kwargs

Additional keyword arguments (ignored).

required
__call__(x: Array) -> Array ¤

Forward pass of the GLU layer.

Parameters:

Name Type Description Default
x Array

Input tensor.

required

Returns:

Type Description
Array

Output tensor after applying gated linear transformation.