Parameter name . | Description . | Range . | Final value (rat | human) . |
---|---|---|---|
Number of layers | Multiple layers of each of the recurrent units could be stacked on top of each other. | [1; 5] | 2 | 1 |
Hidden size | Size of the hidden state vector. | [10; 500] | 290 | 88 |
Loss function | As the Pearson correlation coefficient (CC) was the final evaluation metric of networks’ performance, it could be used as the cost function instead of the mean squared error (MSE) loss. | [MSE, CC, MSE and CC] | CC | CC |
Learning rate | A parameter defining the rate at which network weights were updated during training. | [10−5; 1] | 0.001 | 0.00121 |
L2 | Strength of the L2 weight regularization. | [0; 10] | 0.0003 | 0.0221 |
Gradient clipping | Gradient clipping (Pascanu et al. 2013) limits the magnitude of the gradient to a specified value. | [yes; no] | no | no |
Dropout | In the case of using a multi-layer RNN, dropout (Srivastava et al. 2014) could be set. | [0; 0.2] | 0.128 |— |
Residual connection | Employing a residual connection i.e., feeding the input directly to the linear readout alongside the RNN’s hidden state. | [yes; no] | yes | no |
Batch size | The number of single-vessel time courses processed by the network in the training stage before each weight update. | [3; 32] | 22 | 10 |
Number of epochs | How many times the network processed the whole training dataset during training. | [1; 100] | 87 | 69 |
Washout time | The number of input signals’ time points used to drive the network into a state that is specific to a given input. These time points are not used for readout training and prediction. | Fixed | 250 | 250 |
Parameter name . | Description . | Range . | Final value (rat | human) . |
---|---|---|---|
Number of layers | Multiple layers of each of the recurrent units could be stacked on top of each other. | [1; 5] | 2 | 1 |
Hidden size | Size of the hidden state vector. | [10; 500] | 290 | 88 |
Loss function | As the Pearson correlation coefficient (CC) was the final evaluation metric of networks’ performance, it could be used as the cost function instead of the mean squared error (MSE) loss. | [MSE, CC, MSE and CC] | CC | CC |
Learning rate | A parameter defining the rate at which network weights were updated during training. | [10−5; 1] | 0.001 | 0.00121 |
L2 | Strength of the L2 weight regularization. | [0; 10] | 0.0003 | 0.0221 |
Gradient clipping | Gradient clipping (Pascanu et al. 2013) limits the magnitude of the gradient to a specified value. | [yes; no] | no | no |
Dropout | In the case of using a multi-layer RNN, dropout (Srivastava et al. 2014) could be set. | [0; 0.2] | 0.128 |— |
Residual connection | Employing a residual connection i.e., feeding the input directly to the linear readout alongside the RNN’s hidden state. | [yes; no] | yes | no |
Batch size | The number of single-vessel time courses processed by the network in the training stage before each weight update. | [3; 32] | 22 | 10 |
Number of epochs | How many times the network processed the whole training dataset during training. | [1; 100] | 87 | 69 |
Washout time | The number of input signals’ time points used to drive the network into a state that is specific to a given input. These time points are not used for readout training and prediction. | Fixed | 250 | 250 |
Parameter name . | Description . | Range . | Final value (rat | human) . |
---|---|---|---|
Number of layers | Multiple layers of each of the recurrent units could be stacked on top of each other. | [1; 5] | 2 | 1 |
Hidden size | Size of the hidden state vector. | [10; 500] | 290 | 88 |
Loss function | As the Pearson correlation coefficient (CC) was the final evaluation metric of networks’ performance, it could be used as the cost function instead of the mean squared error (MSE) loss. | [MSE, CC, MSE and CC] | CC | CC |
Learning rate | A parameter defining the rate at which network weights were updated during training. | [10−5; 1] | 0.001 | 0.00121 |
L2 | Strength of the L2 weight regularization. | [0; 10] | 0.0003 | 0.0221 |
Gradient clipping | Gradient clipping (Pascanu et al. 2013) limits the magnitude of the gradient to a specified value. | [yes; no] | no | no |
Dropout | In the case of using a multi-layer RNN, dropout (Srivastava et al. 2014) could be set. | [0; 0.2] | 0.128 |— |
Residual connection | Employing a residual connection i.e., feeding the input directly to the linear readout alongside the RNN’s hidden state. | [yes; no] | yes | no |
Batch size | The number of single-vessel time courses processed by the network in the training stage before each weight update. | [3; 32] | 22 | 10 |
Number of epochs | How many times the network processed the whole training dataset during training. | [1; 100] | 87 | 69 |
Washout time | The number of input signals’ time points used to drive the network into a state that is specific to a given input. These time points are not used for readout training and prediction. | Fixed | 250 | 250 |
Parameter name . | Description . | Range . | Final value (rat | human) . |
---|---|---|---|
Number of layers | Multiple layers of each of the recurrent units could be stacked on top of each other. | [1; 5] | 2 | 1 |
Hidden size | Size of the hidden state vector. | [10; 500] | 290 | 88 |
Loss function | As the Pearson correlation coefficient (CC) was the final evaluation metric of networks’ performance, it could be used as the cost function instead of the mean squared error (MSE) loss. | [MSE, CC, MSE and CC] | CC | CC |
Learning rate | A parameter defining the rate at which network weights were updated during training. | [10−5; 1] | 0.001 | 0.00121 |
L2 | Strength of the L2 weight regularization. | [0; 10] | 0.0003 | 0.0221 |
Gradient clipping | Gradient clipping (Pascanu et al. 2013) limits the magnitude of the gradient to a specified value. | [yes; no] | no | no |
Dropout | In the case of using a multi-layer RNN, dropout (Srivastava et al. 2014) could be set. | [0; 0.2] | 0.128 |— |
Residual connection | Employing a residual connection i.e., feeding the input directly to the linear readout alongside the RNN’s hidden state. | [yes; no] | yes | no |
Batch size | The number of single-vessel time courses processed by the network in the training stage before each weight update. | [3; 32] | 22 | 10 |
Number of epochs | How many times the network processed the whole training dataset during training. | [1; 100] | 87 | 69 |
Washout time | The number of input signals’ time points used to drive the network into a state that is specific to a given input. These time points are not used for readout training and prediction. | Fixed | 250 | 250 |
This PDF is available to Subscribers Only
View Article Abstract & Purchase OptionsFor full access to this pdf, sign in to an existing account, or purchase an annual subscription.