Spring 2017
Quiz 4
Note: answers are bolded

Stochastic gradient descent, when used with the hinge loss, leads to which update rule?
 Winnow
 Widrow's Adaline
 Perceptron
 AdaGrad

In a mistakedriven algorithm, if we make a mistake on example x_{i} with label y_{i}, we update the weights w so that we now predict y_{i} correctly.
 True
 False

Which of the following properties is true about the (original) Perceptron algorithm?
 The Perceptron always converges to the best linear separator for a given dataset.
 The convergence criteria for Perceptron depends on the initial value of the weight vector.
 If the dataset is not linearly separable, the Perceptron algorithm does not converge and keeps cycling between some sets of weights.
 If the dataset is not lineary separable, the Perceptron algorithm learns the linear separator with least misclassifications.

Let's assume that we are using the standard Averaged Perceptron algorithm for training and testing (prediction). Let's further assume that it makes k mistakes on the training data. Now, how many weight vectors do we require to predict the label for a test instance?
 O(1)
 O(k)
 O(k^{2})
 Not enough information.

Winnow has a better mistake bound than Perceptron when only k of n features are relevant to the prediction and k << n.
 True
 False
Dan Roth