GLU Channel Mixer¤
discretax.channel_mixers.glu.GLU
¤
Gated Linear Unit (GLU) layer.
Attributes:
| Name | Type | Description |
|---|---|---|
w1 |
First linear layer. |
|
w2 |
Second linear layer. |
Source
https://arxiv.org/pdf/2002.05202
__init__(in_features: int, key: PRNGKeyArray, *args, out_features: int | None = None, use_bias: bool = True, **kwargs)
¤
Initialize the GLU layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
in_features
|
int
|
dimensionality of the input features. |
required |
key
|
PRNGKeyArray
|
JAX random key for initialization. |
required |
out_features
|
int | None
|
optional dimensionality of the output features (defaults to in_features). |
None
|
use_bias
|
bool
|
whether to include a bias term in the linear layers. |
True
|
*args
|
Additional positional arguments (ignored). |
required | |
**kwargs
|
Additional keyword arguments (ignored). |
required |
__call__(x: Array) -> Array
¤
Forward pass of the GLU layer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Array
|
Input tensor. |
required |
Returns:
| Type | Description |
|---|---|
Array
|
Output tensor after applying gated linear transformation. |