ITA
ENG

High performance parallel query processing on a 100 node ATM connected PC cluster

Authors

Tamura, T Oguchi, M Kitsuregawa, M

Citation

T. Tamura et al., High performance parallel query processing on a 100 node ATM connected PC cluster, IEICE T INF, E82D(1), 1999, pp. 54-63

Citations number

Categorie Soggetti

Information Tecnology & Communication Systems

Journal title

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS

ISSN journal

09168532 → ACNP

Volume

E82D

Issue

Year of publication

1999

Pages

54 - 63

Database

ISI

SICI code

0916-8532(199901)E82D:1<54:HPPQPO>2.0.ZU;2-F

Abstract

We developed a PC cluster system which consists of 100 PCs as a test bed fo r massively parallel query processing. Each PC employs the 200 MHz Pentium Pro CPU and is connected with others through an ATM switch. Because the que ry processing applications are insensitive to the communication latency and mainly perform integer operations, the ATM connected PC cluster approach c an be considered a reasonable solution for high performance database server s with low costs. However, there has been no challenge to construct large s cale PC clusters for database applications, as far as the authors know. Tho ugh we employed commodity components as much as possible. we developed the DBMS itself, because that was a key component for obtaining high performanc e in parallel query processing, and there seemed no system which could meet our demand. On each PC node, a server program which acts as a database ker nel is running to process the queries in cooperation with other nodes. The kernel was designed to execute pipelined operators and handle voluminous da ta efficiently, to achieve high performance on complex decision support typ e queries. We used the standard benchmark, TPC-D, on a 100 GB database to v erify the feasibility of our approach, through comparison of our system wit h commercial parallel systems. As a whole, our system exhibited sufficientl y high performance which was competitive with the current TPC-D top records , in spite of not using indices. For some heavy queries in the benchmark, w hich have high selectivity and joinability, our system performed much bette r. In addition, we applied transposed file organization to the database for further performance improvement. The transposed file organization vertical ly partitions the tuples, enabling attribute-by-attribute access to the rel ations. This resulted in significant performance improvement by reducing th e amount of disk I/O and shifting the bottleneck to computation.