Characterization and parallelization of decision-tree induction

Citation
Jp. Bradford et Jab. Fortes, Characterization and parallelization of decision-tree induction, J PAR DISTR, 61(3), 2001, pp. 322-349
Citations number
32
Categorie Soggetti
Computer Science & Engineering
Journal title
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
ISSN journal
07437315 → ACNP
Volume
61
Issue
3
Year of publication
2001
Pages
322 - 349
Database
ISI
SICI code
0743-7315(200103)61:3<322:CAPODI>2.0.ZU;2-5
Abstract
This paper examines the performance and memory-access behavior of the C4.5 decision-tree induction program, a representative example of data mining ap plications, for both uniprocessor and parallel implementations. The goals o f this paper are to characterize C4.5, in particular its memory hierarchy u sage, and to decrease the run-time of C4.5 via algorithmic improvement and parallelization. Performance is studied via RSIM, an execution driven simul ator, for three uniprocessor models that exploit instruction level parallel ism to varying degrees. This paper makes the following four contributions. The first contribution is presenting a complete characterization of the C4. 5 decision-tree induction program. The results show that with the exception of the input data set, the working set fits into an 8-Kbyte data cache; th e instruction working set also fits into an 8-Kbyte instruction cache. For data sets larger than the L2 cache, performance is limited by accesses to m ain memory. The results further establish that four-way issue can provide u p to a factor of two performance improvement over single-issue for larger L 2 caches; for smaller L2 caches, out-of-order dispatch provides a large per formance improvement over in-order dispatch. The second contribution made b y this paper is examining the effect on the memory hierarchy of changing th e layout of the input dataset in memory, showing again that the performance is limited by memory accesses. One proposed data layout decreases the dyna mic instruction count by up to 24%, but usually results in lower performanc e due to worse cache behavior. Another proposed data layout does not improv e the dynamic instruction count over the original layout, but has better ca che behavior and decreases the run-time by up to a factor of two. Third, th is paper presents the first decision-tree induction program parallelized fo r a ccNUMA architecture. A method for splitting the decision tree hash tabl e is discussed that allows the hash table to be updated and accessed simult aneously without the use of locks. The performance of the parallel version is compared to the original version of C4.5 and a uniprocessor version of C 4.5 using the best data layout found. Speedup curves from a six-processor S un E4000 SMP system show a speedup on the induction step of 3.99, and simul ation results show that the performance is mostly unaffected by increasing the remote memory access time until it is over a factor of ten greater than the local memory access time. Last, this paper characterizes the paralleli zed decision-tree induction program. Compared to the uniprocessor version, the parallel version exerts significantly less pressure on the memory hiera rchy, with the exception of having a much larger first level data working s et. (C) 2001 Academic Press.