• What is the advantage and disadvantage of attentional models compared to RNNs.
Choose one correct answer from four candidates:
• In practice, what is the most accurate description for activation functions (such as
Sigmoid, Sum, Tanh, ReLU) used in neural networks?
1. They must be differentiable.
2. They can be non-differentiable, but only for a small number of points.
3. They can be any continuous functions.
4. They must be non-linear to be learnable.
• Given a neural network with N input nodes, no hidden layers, one output node, with
entropy loss and sigmoid activation functions, which of the following algorithms (with
the proper hyper-parameters and initialization) can be used to find the global opti
1. Stochastic Gradient Descent
2. Batch Gradient Descent
3. Mini-Batch Gradient Descent
4. All of the above
• You want to train a neural network to predict the next 30 daily prices using the previous
30 daily prices as inputs. Which model selection and explanation make the most sense?
1. A fully connected deep feed-forward network because it considers all input prices
in the hidden layers to make the best decision.
2. A single one-directional RNN because it considers the order of the prices, and the
output length is the same as the input length.
3. A bidirectional RNN because the prediction benefits from future labels.
4. A one-directional encoder-decoder architecture can generate a sequence of future
prices based on all historical input prices.
• Draw the computational graph of a one-hidden layer feed-forward neural network and
write the derivatives of each variable in the backpropagation.