Many of today's high level parallel languages support dynamic, fine-gr
ained parallelism. These languages allow the user to expose all the pa
rallelism in the program, which is typically of a much higher degree t
han the number of processors. Hence an efficient scheduling algorithm
is required to assign computations to processors at runtime. Besides h
aving low overheads and good load balancing, it is important for the s
cheduling algorithm to minimize the space usage of the parallel progra
m. This paper presents a scheduling algorithm that is provably space-e
fficient and time-efficient for nested parallel languages. In addition
to proving the space and time bounds of the parallel schedule generat
ed by the algorithm, we demonstrate that it is efficient in practice.
We have implemented a runtime system that uses our algorithm to schedu
le parallel threads. The results of executing parallel programs on thi
s system show that our scheduling algorithm significantly reduces memo
ry usage compared to previous techniques, without compromising perform
ance.