During a project examining the use of machine learning techniques for
oil spill detection, we encountered several essential questions that w
e believe deserve the attention of the research community. We use our
particular case study to illustrate such issues as problem formulation
, selection of evaluation measures, and data preparation. We relate th
ese issues to properties of the oil spill application, such as its imb
alanced class distribution, that are shown to be common to many applic
ations. Our solutions to these issues are implemented in the Canadian
Environmental Hazards Detection System (CEHDS), which is about to unde
rgo field testing.