CS446: Machine Learning

Spring 2017


Quiz 5

Note: answers are bolded
  1. The size of the largest set of points that can be shattered by a linear threshold function mapping from ℝ to {0, 1} is?
    1. One point
    2. Two points
    3. Three points
    4. Four points

  2. What is the difference between the standard PAC learning setting and the agnostic PAC learning setting?
    1. In standard PAC learning, the sample complexity is polynomial in 1/ε, 1/δ, and n, while in agnostic PAC learning this is not true.
    2. In standard PAC learning, the generalization bounds are based on the size of the hypothesis class, while in agnostic PAC learning, generalization bounds are based on the VC dimension of the hypothesis class.
    3. In standard PAC learning, efficient learnability requires that the time required to learn is polynomial in 1/ε, 1/δ, and n, while in agnostic PAC learning this is not true.
    4. In standard PAC learning, the hypothesis is required to be consistent with the training data, while in agnostic PAC learning this is not necessary.

  3. What can be said about the learnability of concept class C where the VC dimension of C is not finite?
    1. C may or may not be learnable.
    2. C is not PAC learnable.
    3. C is PAC learnable because any number of examples can be shattered.
    4. C is PAC learnable because there is a lower bound on the number of examples for which with probability at least (1-δ), c ∈ C has error less than ε.

  4. While trying to determine the VC-dimension of a concept class H, we find that we can shatter all samples of size = 3. With some effort, we find that all sample of size 5 cannot be shattered by H. Given this information which of the following statements is most accurate about the VC-dimension of H (represented as VC(H))?
    1. VC(H) = 3
    2. VC(H) = 5
    3. VC(H) can be 3 or 4
    4. VC(H) > 5

  5. Given a data set S, you have the choice of two hypothesis spaces to use, H1 ⊆ H2. Disregarding computational complexity issues, which one would you choose?
    1. Always H2, since it is more likely that I’ll find a hypothesis that is consistent with S.
    2. Always H1, due to Occam’s Razor.
    3. H1, unless I know that it is not expressive enough.

Dan Roth