| |
COMPUTING CLUSTERS
Two large parallel processor clusters and a bank of web servers support the
UCSC Genome Browser and its associated tools and databases. These facilities also support much of the computational genome research conducted within CBSE. The newest of the two clusters, the PitaKluster,
consists of 198 dual AMD Opteron processor compute nodes, each having 4 gigabytes of memory, housed in 3 Rackable storage units. The PitaKluster's 396 processors, which run on a Linux operating system, can perform over a trillion instructions/second.
The older KiloKluster was originally a bank of
1,024 Pentium III processors running on the GNU/Linux operating
system and housed in 8 racks. This cluster is now shrinking by attrition. Both systems were designed to provide an
exceptional amount of inexpensive computing power in minimal space.
In addition, CBSE has a test cluster composed of 50 dual Pentium 4 Xeon compute nodes, each with 2 gigabytes of memory.
These three computational clusters are supported by several file servers, providing almost 40 terabytes of network storage.
WEB SERVERS
The web servers for the UCSC Genome Browser consist of 8 dual AMD Opteron processors that offer 1.6 terabytes of internal storage and 8 gigabytes of memory. These machines have access to a central file server that provides 5 extra terabytes of shared disk area. Fourteen additional servers provide web access to BLAT (Blast-like alignment tool) software. Each of these machines has 16 gigabytes of memory, since BLAT is a memory-intensive application. Finally, a download server allows users to download our data; it serves over 100 gigabytes of data every day. Our web servers are hosted by the UCSC ITS Data Center, which is designed to function 24/7, 365 days a year.
WHY PARALLEL PROCESSORS?
Computer clusters such as these are a cost effective way to process large amounts of data. Since bioinformatics problems are “embarrassingly parallel,” they do not require high speed inter-process communication to perform calculations. This eliminates the need for high-priced networking equipment. Taking advantage of this fact by employing parallel but separate computation by many processors, we have pioneered the development of “super-computing on-the-cheap” for the specific needs of genome presentation, annotation, and analysis.
The PitaKluster is the third-generation bioinformatics cluster at UCSC, gradually taking over for the second-generation system, The KiloKluster. The first generation was a cluster of 100 Pentium III processors that was built to assemble the first working draft of the human genome in June of 2000, using a 10,000-line program written by Jim Kent called GigAssembler.
These computing systems are funded through the Howard Hughes Medical Institute, the National Human Genome Research Institute (NHGRI), the California Institute for Quantitative Biosciences (QB3), and the National Cancer Institute.
|