The similarity join is an important database primitive which has been succe
ssfully applied to speed up applications such as similarity search, data an
alysis and data mining. The similarity join combines two point sets of a mu
ltidimensional vector space such that the result contains all point pairs w
here the distance does not exceed a parameter E. In this paper, we propose
the Epsilon Grid Order, a new algorithm for determining the similarity join
of very large data sets. Our solution is based on a particular sort order
of the data points, which is obtained by laying an equi-distant grid with c
ell length E over the data space and comparing the grid cells lexicographic
ally. A typical problem of grid-based approaches such as MSJ or the epsilon
-kdB-tree is that large portions of the data sets must be held simultaneou
sly in main memory. Therefore, these approaches do not scale to large data
sets. Our technique avoids this problem by an external sorting algorithm an
d a particular scheduling strategy during the join phase. In the experiment
al evaluation, a substantial improvement over competitive techniques is sho
wn.