We show that if the closure of a function class F under the metric ind
uced by some probability distribution is not convex, then the sample c
omplexity for agnostically learning F with squared loss (using only hy
potheses in F) is Omega(ln(1/delta)/epsilon(2)) where 1 - delta is the
probability of success and epsilon is the required accuracy. In compa
rison, if the class F is convex and has finite pseudodimension, then t
he sample complexity is O(1/epsilon(ln 1/epsilon + ln 1/delta)). If a
nonconvex class F has finite pseudodimension, then the sample complexi
ty for agnostically learning the closure of the convex hull of F, is O
(1/epsilon(1/epsilon ln 1/epsilon + ln 1/delta)). Hence, for agnostic
learning, learning the convex hull provides better approximation capa
bilities with little sample complexity penalty.