Skip links

机器学习代写|Deep Learning CS583 Quiz 3


1 Question

• What is the advantage and disadvantage of attentional models compared to RNNs.
Choose one correct answer from four candidates:

• In practice, what is the most accurate description for activation functions (such as
Sigmoid, Sum, Tanh, ReLU) used in neural networks?

1. They must be differentiable.
2. They can be non-differentiable, but only for a small number of points.
3. They can be any continuous functions.
4. They must be non-linear to be learnable.

• Given a neural network with N input nodes, no hidden layers, one output node, with
entropy loss and sigmoid activation functions, which of the following algorithms (with
the proper hyper-parameters and initialization) can be used to find the global opti

1. Stochastic Gradient Descent
2. Batch Gradient Descent
3. Mini-Batch Gradient Descent
4. All of the above

• You want to train a neural network to predict the next 30 daily prices using the previous
30 daily prices as inputs. Which model selection and explanation make the most sense?

1. A fully connected deep feed-forward network because it considers all input prices
in the hidden layers to make the best decision.

2. A single one-directional RNN because it considers the order of the prices, and the
output length is the same as the input length.

3. A bidirectional RNN because the prediction benefits from future labels.

4. A one-directional encoder-decoder architecture can generate a sequence of future
prices based on all historical input prices.

• Draw the computational graph of a one-hidden layer feed-forward neural network and
write the derivatives of each variable in the backpropagation.

Leave a comment