The case for promoting Linux clusters over traditional supercomputers has focused on hardware affordability. Open source architectures based on standards-based multicore CPUs promise to make high-performance computers (HPCs) affordable and accessible to a mass market of mainstream technical computing users, proponents argue. What were yesterday’s large proprietary systems costing a quarter million dollars are today very cost-effective Linux clusters costing $20K or so.
A big reason is the increasing computational sophistication of commodity, standards-based microprocessors. The AMD Opteron processor and Intel Xeon, for example, have gained traction in the HPC market thanks to their ability to support 64-bit computations and large, high-speed memory capacity at an affordable price. In fact, the Opteron now powers approximately 10 percent of the world’s 500 most powerful supercomputers.
But while commoditization and open source technology has certainly lowered costs, the enormous complexities of parallel programming Linux clusters remain the biggest barrier to their accessibility. The “software gap” – the gap between hardware capabilities of a Linux cluster and actual benefits we can practically extract through programming – is wide and growing. There is a lack of applications available for parallel computers, and custom development of parallel applications is fundamentally flawed.
Here’s why: technical computing spans two divorced realms – desktop computers and HPCs. Both environments have much to offer, but the disconnect must be overcome if the power of Linux clusters is to be harnessed.
Desktop computers have been the preferred platform point for science and engineering, particularly during the early stages of new product or system modeling, simulation, and optimization. The interactivity offered by these tools lends themselves well to the iterative process of research and discovery.
However, the desktop’s performance limitations are outpacing Moore’s Law due to the single-core CPU. Users understand that their success depends no longer on increasing the clock speeds of CPUs, but from putting multiple processors to work simultaneously in cluster or other parallel architectures. Yet parallel architectures are inherently non-interactive, batch-mode beasts, which stymie the real-time feedback needed for scientific and engineering productivity.
The interactive dilemma
Millions of engineers and scientists have access to a rich set of interactive high-level software applications in two general categories: 1) very high level languages (VHLLs) for custom application development, such as MATLAB, Python, Mathematica, Maple, or IDL; and 2) vertical applications developed by commercial independent software vendors (ISVs), such as SolidWorks for computer-aided design or Ansys for finite-element analysis.
The desktop tools offer an easy way to manipulate high-level objects (e.g., matrices with MATLAB, or parameter-driven geometric features in SolidWorks), hiding many of the underlying low-level programming complexities from the user. They also provide an interactive development and execution environment, the usage mode needed for productivity in science and engineering.
But in the HPC world there are few commercial applications for parallel servers – less than 5% of the desktop science and engineering applications run on Linux clusters, or any other kind of parallel HPC server for that matter. This limited application availability is compounded by the specialized nature of the models and algorithms. Consequently, most technical applications for clusters are handed off to a parallel programming specialist to transform into custom code. Typically these applications are a prototype program written in a desktop-based high-level application (MATLAB, etc.), and “prose” that attempt to capture the particular model, system, or algorithm. The parallel programmer then writes the application in C, Fortran, and MPI (message passing interface) used for inter-processor communication and synchronization – relatively complex low-level programming. Only after the application is developed for the HPC server can it be executed to allow testing and scaling with the real data.
This process is slow, expensive, inflexible, and remarkably error-prone. Because each of these steps can be several months, scientists and engineers are limited to how much iteration to the algorithms and models they can make. More than 75 percent of the “time to solution” is spent programming the models for use on HPCs, rather than developing and refining them up front, or using them in production to make decisions and discoveries.
Bridging the gap
The ideal solution to this problem is a “fusion technology” that combines the power of a Linux cluster with the desktop application. It must enable scientists and engineers to write applications in their favorite VHLLs on desktops, and have them automatically parallelized and able to run interactively on Linux clusters. In other words, let the end users continue to work in their preferred environments, hide from them parallel programming challenges, and let them more easily access the Linux cluster.
With this approach, they could write just enough of the application in a VHLL to start testing with real data, as they incrementally refine the application. They could also take advantage of the many popular open source parallel libraries already in the public domain, turning these traditionally batch mode tools into interactive resources. With an interactive workflow, the time to “first calculation” can be within minutes, rather than the several months or years required to first program the parallel application.
Recently, several VHLL software vendors have introduced new parallel solutions that bridge desktops to Linux clusters. They are typically hybrid platforms built with both proprietary and open source components. For example, Interactive Supercomputing’s (ISC’s) Star-P is an interactive parallel computing platform that incorporates a number of popular open source libraries, which the company worked with the open source community to integrate, debug and improve the libraries’ performance. These open source libraries include:
- ScaLAPACK, a software library for linear algebra computations on distributed-memory computers,
- ATLAS, the Automatically Tuned Linear Algebra Software designed to provide portably optimal linear algebra software,
- FFTW, a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, and
- SuperLU, a general purpose library for the direct solution of large, sparse, nonsymmetric systems of linear equations on high performance machines.
Interactive parallel computing platforms such as Star-P also feature software development toolkits (SDKs) that enable users to “plug in” existing codes from the open source community. This plug-in capability will give the hundreds of thousands of scientists, engineers, and analysts working at government, academic and commercial research facilities that use high-performance computing to easily string together open-source libraries. Consider, for example, Trilinos, an open source project developed by the Sandia National Labs to help facilitate the design, development, integration and ongoing support of mathematical software libraries. A Trilinos package is an integral unit usually developed by a small team of experts in a particular algorithms area such as algebraic preconditioners, nonlinear solvers, etc. Ken Stanley of 500 Software and principle architect of the Amesos direct sparse solver package in the Trilinos framework, developed a Star-P interface to the framework to provide broad-ranging high-performance capabilities for solving numerical systems at the heart of many complex multiphysics applications.
Once programming barriers are lowered, many more scientists and engineers can take advantage of the affordability and accessibility of Linux clusters and other high performance open source technology to experience supercomputing for the first time. The development of custom HPC codes that used to take months or years will become as interactive as our desktop PCs are today.
# # #
Ilya Mirman is vice president at Interactive Supercomputing
http://www.interactivesupercomputing.com , and can be reached at imirman@interactivesupercomputing.com