LinuxHPC.org - Linux High Performance Computing and Linux Clusters

Linux Cluster RFQ Form
Reach Multiple Vendors With One Linux Cluster RFQ Form. Save time and effort, let LinuxHPC.org do all the leg work for you free of charge. Request A Quote...

Latest News

Solving the programming problems of parallel Linux clusters

Posted by Ilya Mirman, Friday November 10 2006 @ 03:16PM EST

The case for promoting Linux clusters over traditional supercomputers has focused on hardware affordability. Open source architectures based on standards-based multicore CPUs promise to make high-performance computers (HPCs) affordable and accessible to a mass market of mainstream technical computing users, proponents argue. What were yesterday’s large proprietary systems costing a quarter million dollars are today very cost-effective Linux clusters costing $20K or so.

A big reason is the increasing computational sophistication of commodity, standards-based microprocessors. The AMD Opteron processor and Intel Xeon, for example, have gained traction in the HPC market thanks to their ability to support 64-bit computations and large, high-speed memory capacity at an affordable price. In fact, the Opteron now powers approximately 10 percent of the world’s 500 most powerful supercomputers. But while commoditization and open source technology has certainly lowered costs, the enormous complexities of parallel programming Linux clusters remain the biggest barrier to their accessibility. The “software gap” – the gap between hardware capabilities of a Linux cluster and actual benefits we can practically extract through programming – is wide and growing. There is a lack of applications available for parallel computers, and custom development of parallel applications is fundamentally flawed.

Here’s why: technical computing spans two divorced realms – desktop computers and HPCs. Both environments have much to offer, but the disconnect must be overcome if the power of Linux clusters is to be harnessed.

Desktop computers have been the preferred platform point for science and engineering, particularly during the early stages of new product or system modeling, simulation, and optimization. The interactivity offered by these tools lends themselves well to the iterative process of research and discovery.

However, the desktop’s performance limitations are outpacing Moore’s Law due to the single-core CPU. Users understand that their success depends no longer on increasing the clock speeds of CPUs, but from putting multiple processors to work simultaneously in cluster or other parallel architectures. Yet parallel architectures are inherently non-interactive, batch-mode beasts, which stymie the real-time feedback needed for scientific and engineering productivity.

The interactive dilemma

Millions of engineers and scientists have access to a rich set of interactive high-level software applications in two general categories: 1) very high level languages (VHLLs) for custom application development, such as MATLAB, Python, Mathematica, Maple, or IDL; and 2) vertical applications developed by commercial independent software vendors (ISVs), such as SolidWorks for computer-aided design or Ansys for finite-element analysis.

The desktop tools offer an easy way to manipulate high-level objects (e.g., matrices with MATLAB, or parameter-driven geometric features in SolidWorks), hiding many of the underlying low-level programming complexities from the user. They also provide an interactive development and execution environment, the usage mode needed for productivity in science and engineering. But in the HPC world there are few commercial applications for parallel servers – less than 5% of the desktop science and engineering applications run on Linux clusters, or any other kind of parallel HPC server for that matter. This limited application availability is compounded by the specialized nature of the models and algorithms. Consequently, most technical applications for clusters are handed off to a parallel programming specialist to transform into custom code. Typically these applications are a prototype program written in a desktop-based high-level application (MATLAB, etc.), and “prose” that attempt to capture the particular model, system, or algorithm. The parallel programmer then writes the application in C, Fortran, and MPI (message passing interface) used for inter-processor communication and synchronization – relatively complex low-level programming. Only after the application is developed for the HPC server can it be executed to allow testing and scaling with the real data.

This process is slow, expensive, inflexible, and remarkably error-prone. Because each of these steps can be several months, scientists and engineers are limited to how much iteration to the algorithms and models they can make. More than 75 percent of the “time to solution” is spent programming the models for use on HPCs, rather than developing and refining them up front, or using them in production to make decisions and discoveries.

Bridging the gap

The ideal solution to this problem is a “fusion technology” that combines the power of a Linux cluster with the desktop application. It must enable scientists and engineers to write applications in their favorite VHLLs on desktops, and have them automatically parallelized and able to run interactively on Linux clusters. In other words, let the end users continue to work in their preferred environments, hide from them parallel programming challenges, and let them more easily access the Linux cluster.

With this approach, they could write just enough of the application in a VHLL to start testing with real data, as they incrementally refine the application. They could also take advantage of the many popular open source parallel libraries already in the public domain, turning these traditionally batch mode tools into interactive resources. With an interactive workflow, the time to “first calculation” can be within minutes, rather than the several months or years required to first program the parallel application.

Recently, several VHLL software vendors have introduced new parallel solutions that bridge desktops to Linux clusters. They are typically hybrid platforms built with both proprietary and open source components. For example, Interactive Supercomputing’s (ISC’s) Star-P is an interactive parallel computing platform that incorporates a number of popular open source libraries, which the company worked with the open source community to integrate, debug and improve the libraries’ performance. These open source libraries include:

- ScaLAPACK, a software library for linear algebra computations on distributed-memory computers,

- ATLAS, the Automatically Tuned Linear Algebra Software designed to provide portably optimal linear algebra software,

- FFTW, a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, and

- SuperLU, a general purpose library for the direct solution of large, sparse, nonsymmetric systems of linear equations on high performance machines.

Interactive parallel computing platforms such as Star-P also feature software development toolkits (SDKs) that enable users to “plug in” existing codes from the open source community. This plug-in capability will give the hundreds of thousands of scientists, engineers, and analysts working at government, academic and commercial research facilities that use high-performance computing to easily string together open-source libraries. Consider, for example, Trilinos, an open source project developed by the Sandia National Labs to help facilitate the design, development, integration and ongoing support of mathematical software libraries. A Trilinos package is an integral unit usually developed by a small team of experts in a particular algorithms area such as algebraic preconditioners, nonlinear solvers, etc. Ken Stanley of 500 Software and principle architect of the Amesos direct sparse solver package in the Trilinos framework, developed a Star-P interface to the framework to provide broad-ranging high-performance capabilities for solving numerical systems at the heart of many complex multiphysics applications.

Once programming barriers are lowered, many more scientists and engineers can take advantage of the affordability and accessibility of Linux clusters and other high performance open source technology to experience supercomputing for the first time. The development of custom HPC codes that used to take months or years will become as interactive as our desktop PCs are today.

# # #

Ilya Mirman is vice president at Interactive Supercomputing

http://www.interactivesupercomputing.com , and can be reached at imirman@interactivesupercomputing.com

< HPC Affects Storage, Too | Wolfram Previewing Advances in Grid Computing and Eclipse-Based Software Development at SC06 >

Supercomputing '07
Nov 10-16, Reno, NV

Register now...

Sponsors

Affiliates

Golden Eggs
(HP Visual Diagram and Config Guides)

Clusters:

CP4000 32x DL145G2 GigE Opteron, Dual Core

CP4000 64x DL145 GigE Opteron

CP4000 102x DL145 GigE Opteron

CP4000 32x DL145 Myri Opteron

Rocks Cluster 16-22 DL145 Opteron

Rocks Cluster 30-46 DL145 Opteron

Rocks Cluster 64-84 DL145 Opteron

LC3000 GigaE 24-36 DL145 Opteron

LC3000 Myri 16-32x DL145 Opteron

LC3000 GigaE 16-22x DL145 Opteron

LC2000 GigaE 16-22x DL360G3 Xeon
ProLiant:
>

DL365 System 2600Mhz 2P 1U Opteron Dual Core

DL360 G5 System 3000Mhz 2P 1U EM64T Dual/Quad Core

DL385 G2 2600Mhz 2P Opteron Dual Core

DL380 G5 3000Mhz 2P EM64T Dual/Quad Core

DL140 3060MHz 2P IA32

DL140 G2 3600MHz 2P EM64T

DL145 2600MHz 2P Opteron

DL145 G2 2600MHz 2P Opteron Dual Core

DL360 G4 3400MHz 2P EM64T

DL360 G4p 3800MHz 2P EM64T

DL380 G4 3800MHz 2P EM64T

DL385 2800MHz 2P Opteron Dual Core

DL560 3000MHz 4P IA32

DL580 G3 3330MHz 4P EM64T

DL585 2800MHz 4P Opteron Dual Core
Integrity:

Montecito 2P-16P, rx2660-rx8640 (multi-system diagram)

rx2660 1600MHz 2P 2U Montecito Systems and Cluster

rx6600 1600MHz 4P 7U Single & Cluster

rx3600 1600MHz 2P 4U Single & Cluster

rx2620 1600MHz 2P 2U Single & Cluster

Superdome 64P base configuration

Integrity Family Portrait (rx1620 thru rx8620), IA64

rx1620 1600MHz 2P MSA1000 Cluster IA64

rx2620 1600MHz 2P MSA1000 Cluster IA64

rx4640 1600MHz 4P MSA1000 Cluster IA64

rx7620 1600MHz 8P 10U Systems and MSA1000 Cluster

rx8620 1600MHz 16P 17U Systems and MSA1000 Cluster
Storage:

MSA30-MI Dual SCSI Cluster, rx3600, rx6600 and rx2660

MSA30-MI Dual SCSI Cluster, rx1620...rx4640

MSA500 G2, SCSI

MSA1510i IP SAN 48TB, SCSI and SATA

MSA1500 48TB, SCSI and SATA
Misc:

Dual Core AMD64 and EM64T systems with MSA1500

Appro: Enterprise and High Performance Computing Whitepapers

Is Your HPC Cluster Ready for Multi-core Processors?:
Multi-core processors bring new challenges and opportunities for the HPC cluster. Get a first look at utilizing these processors and strategies for better performance.

Accelerating Results through Innovation:
Achieve maximum compute power and efficiency with Appro Cluster Solutions. Our highly scalable clusters are designed to seamlessly integrate with existing high performance, scientific, technical, and commercial computing environments.

Keeping Your Cool in the Data Center:
Rethinking IT architecture and infrastructure is not a simple job. This whitepaper helps IT managers overcome challenges with thermal, power, and system management.

Unlocking the Value of IT with Appro HyperBlade:
A fully integrated cluster combining advantages of blade and rack-mount servers for a flexible, modular, scalable architecture designed for Enterprise and HPC applications.
AMD Opteron-based products | Intel Xeon-based products

Hewlett-Packard: Linux High Performance Computing Whitepapers

Unified Cluster Portfolio:
A comprehensive, modular package of tested and pre-configured hardware, software and services for scalable computation, data management and visualization.

Your Fast Track to Cluster Deployment:
Designed to enable faster ordering and configuration, shorter delivery times and increased savings. Customers can select from a menu of popular cluster components, which are then factory assembled into pre-defined configurations with optional software installation.

Message Passing Interface library (HP-MPI):
A high performance and production quality implementation of the Message-Passing Interface (MPI) standard for HP servers and workstations.

Cluster Platform Express:
Cluster Platform Express comes straight to you, factory assembled and available with pre-installed software for cluster management, and ready for deployment.
AMD Opteron-based ProLiant nodes | Intel Xeon-based ProLiant nodes

Home

About

News Archives

Contribute News, Articles, Press Releases

Mobile Edition

Contact

Advertising/Sponsorship

Search

Privacy

SpyderByte.com

;Technical Portals