There has been much recent algorithmic work on the problem of reconstr
ucting the evolutionary history of biological species. Computer virus
specialists are interested in finding the evolutionary history of comp
uter viruses-a virus is often written using code fragments from one or
more other viruses, which are its immediate ancestors. A phylogeny fo
r a collection of computer viruses is a directed acyclic graph whose n
odes are the viruses and whose edges map ancestors to descendants and
satisfy the property that each code fragment is ''invented'' only once
. To provide a simple explanation for the data, we consider the proble
m of constructing such a phylogeny with a minimum number of edges. Thi
s optimization problem is NP-hard, and we present positive and negativ
e results for associated approximation problems. When tree solutions e
xist, they can be constructed and randomly sampled in polynomial time.
(C) 1998 Academic Press.