In search of more compression, researchers have recently sought to des
cribe digital video of real scenes not as sequences of frames but rath
er as collections of objects that are rendered and combined according
to scripting information. Depending upon the application and the scene
analysis tools available, representations may range front two-dimensi
onal layers to full three-dimensional computer-graphics-style data bas
es. The significance of these more meaningful representations goes bey
ond compression, however, enabling new forms of interactivity and pers
onalization, as well as new degrees of freedom in post-production. Thi
s paper proposes a computational framework for a television receiver t
hat can handle digital video in forms from ''traditional'' motion-comp
ensated transform coders to sets of three-dimensional objects and disc
usses the requirements for a scripting language to control such a rece
iver. It is also noted that the concept of scalability can be expanded
to include ''intelligently resizable video,'' where the originator of
a video sequence can specify how the scene is to be composed and cut
for displays of differing sizes and aspect ratios.