We formalize a model for supervised learning of action strategies in dynami
c stochastic domains and show that PAC-learning results on Occam algorithms
hold in this model as well. We then identify a class of rule-based action
strategies for which polynomial time learning is possible. The representati
on of strategies is a generalization of decision lists; strategies include
rules with existentially quantified conditions, simple recursive predicates
, and small internal state, but are syntactically restricted. We also study
the learnability of hierarchically composed strategies where a subroutine
already acquired can be used as a basic action in a higher level strategy.
We prove some positive results in this setting, but also show that in some
cases the hierarchical learning problem is computationally hard.