Computer Cluster Farm to Help Duke

February 1, 2003

Duke’s computing experts hope to help faculty campus-wide raise a bumper crop of research using a new "cluster farm" approach to high-end computing. The farm, a growing collection of high-powered processors racked up in the North Building, will give Duke researchers the dedicated, 24/7 computing power they need -- without the headaches of storage and administration.

Developed by the Center for Computational Science, Engineering and Medicine (CSEM) and the Office of Information Technology (OIT), the cluster farm is expected to raise Duke’s overall research-computing efficiency while lowering barriers to critical projects.

John Harer, vice provost for academic affairs and CSEM director, said CSEM is cultivating the cluster farm to enable faculty to "undertake projects they might not undertake otherwise." Prompted by the university’s "Building on Excellence" strategic plan, he explained, "We saw a big challenge in analyzing and storing the kind of data our researchers collect. We have the opportunity to be really great, but we had a problem in the area of computing."

Basically, faculty researchers have bought cluster computers, often with grant funds, and then run into understandable problems in housing them, cooling them and/or setting up their software, said Harer.

Given the kind of computing demands posed by current research in areas from genomics to biomedical engineering, environmental science and cognitive neuroscience, Harer said that, "More and more people are hitting the wall in what they can do. "Sometimes it's not enough active memory to run the programs; sometimes it's not enough processing power."

The cluster farm exploits high-bandwidth connections to enable the physical equipment to be housed in a central facility while still providing exclusive access and around-the-clock support.

Duke’s investment should pay off for cluster farmers such as Craig Henriquez, W.H. Gardner Jr. associate professor of Biomedical Engineering and Computer Science.

"With the North Carolina Super Computing Center shutting down," he explained, "we lost access to a significant part of the large-scale computing resources we need to perform computer simulations. The cluster will effectively replace part of these resources." Henriquez’ team simulates electrical signaling in the heart and brain to understand how heart arrhythmias form and to analyze how neuronal networks encode information. "We need to use multiple processors simultaneously to obtain the results in hours rather than weeks," he said.

The cluster farm will "optimize support and house the systems with proper power and cooling," Henriquez noted. "We can hire one or two administrators to manage the cluster in a few locations rather than have each group find space for clusters and support a person with research funds for their own personal system," he said.

Cluster computers are popular. Used to balance computing workload and boost processor resources, "clusters" use redundant links to connect multiple machines (typically PCs or UNIX workstations plus storage devices) into a unit that functions as a single, virtually no-fail system. Clusters also enable parallel processing, which enables researchers to run demanding applications quicker by distributing the computing requirements across several machines. As a result, researchers can attain supercomputing-like power without the expense of supercomputing, said Harer.

Duke’s cluster farm takes the concept to the next step by allowing researchers to buy as many clusters as they need. They literally own the processors and determine their use, he said. "In no way are we trying to take control away from anybody," he emphasized. Researchers can buy different-sized "plots" of computer clusters for different-sized "crops" of research problems. The farm currently has 200 nodes, each with two processors, including a core cluster of 64 nodes. It’s a work in progress, said Harer. CSEM will add capacity as it learns from the process both technical and administratively.

Buyers of computing capacity on the farm are asked to lock in a minimum of eight nodes (16 processors). CSEM will provide the front-end expertise to help people decide what to buy, how to set up code, and provide the core cluster. CSEM is serving as a source for clusters to enable faculty to try the facility before they commit to the purchase. OIT will provide system support and administration.

"Duke will take care of the clusters for you," said Harer. "In return, we ask for the flexibility, when you’re not using them, to offer the resources to other researchers. In the same way, you can use some overflow if you need extra help for a couple of days."

Faculty using the cluster will enjoy a fiscal advantage in grant proposals, said Harer. "When they apply for grants, researchers can apply more money to equipment because Duke is providing the systems administration for free," he said. Harer said he hopes that the cluster farm will facilitate the grants process by giving researchers better computing support and more up-to-date equipment.

Tom Kepler, a professor of biostatistics and bioinformatics, said that the Center for Bioinformatics and Computational Biology has purchased 32 nodes. The center, which, among other things, studies infectious disease, pathogenesis and immune response, runs simulations of immune-cell interactions. "To make them sufficiently realistic we need to increase memory, speed and so on," said Kepler. "Analyses of gene expression and other complex statistics also gobble up computer power," he said.

"Even if we had our own cluster we wouldn’t be able to hire a full-time person just to look after it," said Kepler. "Having a single cluster farm with expert, centralized systems administration makes a great deal of sense. It’s a model that I suspect is going to work throughout Duke and be followed elsewhere as well," he said. Harer said that he hopes that researcher demand for the cluster farm will spark its rapid growth, thus even further increasing its usefulness to researchers.

"By fall of 2004 I hope we’ll be really starved for space, with 200 to 400 nodes and a lot of faculty really happy with performance," he said.

Besides providing support and design for the cluster farm, CSEM also runs a Cluster and Grid Technology Group (CGTG). This group provides education and training in high-performance computing, offers consulting services to researchers who maintain their own private clusters, and directly supports research efforts in new cluster and computing technologies.