CS446: Machine Learning

Spring 2017

Quiz 4

Note: answers are bolded

Stochastic gradient descent, when used with the hinge loss, leads to which update rule?
1. Winnow
2. Widrow's Adaline
3. AdaGrad

In a mistake-driven algorithm, if we make a mistake on example x_i with label y_i, we update the weights w so that we now predict y_i correctly.
1. True

Which of the following properties is true about the (original) Perceptron algorithm?
1. The Perceptron always converges to the best linear separator for a given dataset.
2. The convergence criteria for Perceptron depends on the initial value of the weight vector.
3. If the dataset is not lineary separable, the Perceptron algorithm learns the linear separator with least misclassifications.

Let's assume that we are using the standard Averaged Perceptron algorithm for training and testing (prediction). Let's further assume that it makes k mistakes on the training data. Now, how many weight vectors do we require to predict the label for a test instance?
1. O(1)
2. O(k²)
3. Not enough information.

Winnow has a better mistake bound than Perceptron when only k of n features are relevant to the prediction and k << n.
1. False

Dan Roth