Center for Biomolecular Science & Engineering: Promoting discovery and invention in the post-genomic age
Baskin School of Engineering
UCSC Home
Home People Research News & Events Academics Outreach Jobs
   You are here: Home > Research > Research Facilities > Computing
left tag research
Research Areas
Research Facilities
Funding Opportunities
Ethics
button ucsc genome browser
 
 

UCSC Genome Browser overview

Genome research primer

Head: Genome Computing Systems at CBSE
 
Photo: detail of the PitaKluster, by Branwyn Wagman

COMPUTING CLUSTERS
Two large parallel processor clusters and a bank of web servers support the UCSC Genome Browser and its associated tools and databases. These facilities also support much of the computational genome research conducted within CBSE. The newest of the two clusters, the PitaKluster, consists of 198  dual AMD Opteron processor compute nodes, each having 4 gigabytes of memory, housed in 3 Rackable storage units. The PitaKluster's 396 processors, which run on a Linux operating system, can perform over a trillion instructions/second. The older KiloKluster was originally a bank of 1,024 Pentium III processors running on the GNU/Linux operating system and housed in 8 racks. This cluster is now shrinking by attrition. Both systems were designed to provide an exceptional amount of inexpensive computing power in minimal space. In addition, CBSE has a test cluster composed of 50 dual Pentium 4 Xeon compute nodes, each with 2 gigabytes of memory. These three computational clusters are supported by several file servers, providing almost 40 terabytes of network storage.

Photo: System Administrators Erich Weiler, Jorge Garcia, Chester Manuel, and Victoria Lin with the PitaKluster
The CBSE systems administration team keeps these computing resources, including the PitaKluster shown here, up and running 24/7. From left, Erich Weiler, Jorge Garcia, Chester Manuel, and Victoria Lin. Photo by Branwyn Wagman

WEB SERVERS
The web servers for the UCSC Genome Browser consist of 8 dual AMD Opteron processors that offer 1.6 terabytes of internal storage and 8 gigabytes of memory. These machines have access to a central file server that provides 5  extra terabytes of shared disk area. Fourteen additional servers provide web access to BLAT (Blast-like alignment tool) software. Each of these machines has 16 gigabytes of memory, since BLAT is a memory-intensive application. Finally, a download server allows users to download our data; it serves over 100 gigabytes of data every day. Our web servers are hosted by the UCSC ITS Data Center, which is designed to function 24/7, 365 days a year.

WHY PARALLEL PROCESSORS?
Computer clusters such as these are a cost effective way to process large amounts of data. Since bioinformatics problems are “embarrassingly parallel,” they do not require high speed inter-process communication to perform calculations. This eliminates the need for high-priced networking equipment. Taking advantage of this fact by employing parallel but separate computation by many processors, we have pioneered the development of “super-computing on-the-cheap” for the specific needs of genome presentation, annotation, and analysis.

The PitaKluster is the third-generation bioinformatics cluster at UCSC, gradually taking over for the second-generation system, The KiloKluster. The first generation was a cluster of 100 Pentium III processors that was built to assemble the first working draft of the human genome in June of 2000, using a 10,000-line program written by Jim Kent called GigAssembler.

These computing systems are funded through the Howard Hughes Medical Institute, the National Human Genome Research Institute (NHGRI), the California Institute for Quantitative Biosciences (QB3), and the National Cancer Institute.

 
UCSC Home

© January 2005,
CBSE

Updated 7/2008