Note: answers are bolded
Which of the following is able to approximate any continuous function to an arbitrary accuracy?
- A two-layer neural network (input layer, output layer) using a linear activation function.
- A two-layer neural network (input layer, output layer) using a non-linear activation function.
- A three-layer neural network (input layer, hidden layer, output layer) using a linear activation function.
- A three-layer neural network (input layer, hidden layer, output layer) using a non-linear activation function.
The use of sigmoid functions makes back-propagation possible because it is continuous and differentiable. Besides enabling back-propagation, the sigmoid function also makes neural network a:
- linear classifier
- non-linear classifier
While training neural networks with at least one hidden layer (and using a nonlinear activation function), would the initialization of weight vectors have an impact on the performance of the neural network?
- No, because back-propagation using gradient descent would always find the best weights.
- Yes, Neural networks in the given configuration optimize a non-convex objective function.
- No, Neural networks in the given configuration always optimize a convex objective function and will reach the minimum eventually.
Which of the following are reasons why one may prefer using one-vs-all (OvA) over all-vs-all (AvA) in the multiclass classification setting (Multiple choices may be correct)?
- OvA involves learning fewer classifiers than AvA
- OvA is able to learn problems that AvA cannot
- Each individual classifier for OvA receives a larger set of examples for training than for AvA (Assuming uniform label distribution)
- OvA makes weaker assumptions regarding the separability of the data than AvA does
In k-class classification, one-vs-all at least requires k classifiers for k different labels.