A major research goal for compilers and environments is the automatic
derivation of tools from formal specifications. However, the formal mo
del of the language is often inadequate; in particular, LR(lc) grammar
s are unable to describe the natural syntax of many languages, such as
C++ and Fortran, which are inherently non-deterministic. Designers of
batch compilers work around such limitations by combining generated c
omponents with ad hoc techniques (for instance, performing partial typ
e and scope analysis in tandem with parsing). Unfortunately, the compl
exity of incremental systems precludes the use of batch solutions. The
inability to generate incremental tools for important languages inhib
its the widespread use of language-rich interactive environments. We a
ddress this problem by extending the language model itself, introducin
g a program representation based on parse dags that is suitable for bo
th batch and incremental analysis. Ambiguities unresolved by one stage
are retained in this representation until further stages can complete
the analysis, even if the resolution depends on further actions by th
e user. Representing ambiguity explicitly increases the number and var
iety of languages that can be analyzed incrementally using existing me
thods. To create this representation, we have developed an efficient i
ncremental parser for general context-free grammars. Our algorithm com
bines Tomita's generalized LR parser with reuse of entire subtrees via
state-matching. Disambiguation can occur statically, during or after
parsing, or during semantic analysis (using existing incremental techn
iques); program errors that preclude disambiguation retain multiple in
terpretations indefinitely. Our representation and analyses gain effic
iency by exploiting the local nature of ambiguities: for the SPEC95 C
programs, the explicit representation of ambiguity requires only 0.5%
additional space and less than 1% additional time during reconstructio
n.