LinuxHPC.org - Linux High Performance Computing and Linux Clusters

Linux Cluster RFQ Form
Reach Multiple Vendors With One Linux Cluster RFQ Form. Save time and effort, let LinuxHPC.org do all the leg work for you free of charge. Request A Quote...

Latest News

Cost Recovery by Design

Posted by Philip Carinhas, Thursday April 20 2006 @ 12:43PM EDT

by Dominique Heger & Philip Carinhas, Fortuitous Technologies

Intro

Often seen as an extra expense, performance and capacity planning often saves a project more money in the long run. Costs are usually recovered by the completion of the initial implementation phase if not sooner. Moreover, projects that are properly planned will achieve design goals and allow future scalability at a significantly lower total cost.

Performance Planning Issues

In today's parallel, heterogeneous, and interconnected IT wilderness, predicting and controlling cost factors surrounding systems performance and capacity planning is overwhelming at best. For larger IT projects, it is not uncommon to find situations where the cost factors for performance tuning and capacity problems reflect the largest and the least controlled expenses. To illustrate, a sudden slowdown of an enterprise wide application may trigger user complaints, delayed projects, an IT support backlog, and ultimately a financial loss to the organization. By the time the performance problem is located, analyzed, worked around, tested, and verified, an organization may have spent tens of thousands of dollars in time, IT resources, and hardware, only to fall back into the same vicious cycle the very next year.

The Crux

When performance is designed into the final solution, costs can be contained and reduced while ensuring required performance with scalability potential. This approach shifts the emphasis away from the installation and setup phase to the planning and design stages. It is paramount that IT not only understand the expected workload behavior, but responsibly act by conducting feasibility and design studies prior to spending many thousand of dollars on a solution that in a best case scenario, may not be optimal, and in a worst case scenario, completely fails.

Hidden Costs Associated with Bad Planning

Unneeded Hardware
Application performance issues have an immediate impact on customer satisfaction and an organization's bottom line. It is not uncommon that while a performance issue surfaces, organizations start adding more (often expensive) hardware into the operation mix, without fully understanding where the problem truly lies nor understanding how the extra hardware will affect overall system performance. Hence, working on the symptoms and not the underlying cause may provide an organization with some relieve in the short run, but intensifies the issues in the long run, as even more hardware has to be troubleshot and analyzed. In addition, there are these costs associated with redundant hardware:
- Electricity
- Extra Cooling (several times the electricity costs)
- Extra IT Overhead (See Below)
- Hardware Replacement Costs (drives, fans, psu, et al)
IT Overhead
In addition to hardware costs, the IT personnel costs associated with unplanned performance tuning exercises can be excruciating. IT managers may be forced to commit hundreds of man-hours to solve even simpler performance problems. As in some circumstances, the actual source of the problem may not be easily identified, IT personnel may spend hours or days analyzing and tuning the wrong subsystem. To make matters worse, some performance tuning exercises may require crossing over into the domains of security, reliability, or availability. Proper design and planning can reduce these costs.

Security and HA
Without initial proper planning, fire-fighting scenarios such as these may result into additional work for an organization's security or high-availability (HA) personnel as well. Proper design and planning can significantly reduce these costs as well.

Lost Revenue
Without proper planning, projects run the risk of partial or total failure which can drive away associated revenue. There is no excuse for a project to fail from a lack of adequate planning and design. Even if the system is not designed for direct revenue stream, it can cause loss for internal customers and related systems.

An Illustration

As an example of the shortcomings of zealous use of hardware lets consider CompanyX, whose 10 node cluster would not perform well under stress. The managers authorized IT to buy 5 more servers to increase performance, which resulted in no noticeable performance gain. When the system was finally examined, a simple model immediately showed that the memory and IO subsystem were bottlenecked, and the optimal number of compute nodes was about 10.

Summary

In short, the proper approach to managing systems performance is to design performance into the solution. If the system is already in production, the recommendation is to conduct a performance study that covers application, operating system, and hardware subsystems, respectively. It is paramount to understand not only the actual workload behavior, but also the interaction between the application, the OS, and the hardware. Treating performance related issues early on in an IT project avoids hidden cost scenarios, and is exponentially cheaper than performing extraneous tuning after deployment.

About the Authors

Dominique Heger has over 18 years of IT experience, focusing on systems performance, capacity planning, cluster technology, performance modeling, algorithms and data structures, and I/O scalability. Philip Carinhas is the President and CEO of Fortuitous, and has over 15 years experience in Linux and enterprise computing. They can be found at http://fortuitous.com

< Cluster Interconnects: The Whole Shebang | Metascheduling - Free study compiled by field experts at GridwiseTech >

Supercomputing '07
Nov 10-16, Reno, NV

Register now...

Sponsors

Affiliates

Golden Eggs
(HP Visual Diagram and Config Guides)

Clusters:

CP4000 32x DL145G2 GigE Opteron, Dual Core

CP4000 64x DL145 GigE Opteron

CP4000 102x DL145 GigE Opteron

CP4000 32x DL145 Myri Opteron

Rocks Cluster 16-22 DL145 Opteron

Rocks Cluster 30-46 DL145 Opteron

Rocks Cluster 64-84 DL145 Opteron

LC3000 GigaE 24-36 DL145 Opteron

LC3000 Myri 16-32x DL145 Opteron

LC3000 GigaE 16-22x DL145 Opteron

LC2000 GigaE 16-22x DL360G3 Xeon
ProLiant:
>

DL365 System 2600Mhz 2P 1U Opteron Dual Core

DL360 G5 System 3000Mhz 2P 1U EM64T Dual/Quad Core

DL385 G2 2600Mhz 2P Opteron Dual Core

DL380 G5 3000Mhz 2P EM64T Dual/Quad Core

DL140 3060MHz 2P IA32

DL140 G2 3600MHz 2P EM64T

DL145 2600MHz 2P Opteron

DL145 G2 2600MHz 2P Opteron Dual Core

DL360 G4 3400MHz 2P EM64T

DL360 G4p 3800MHz 2P EM64T

DL380 G4 3800MHz 2P EM64T

DL385 2800MHz 2P Opteron Dual Core

DL560 3000MHz 4P IA32

DL580 G3 3330MHz 4P EM64T

DL585 2800MHz 4P Opteron Dual Core
Integrity:

Montecito 2P-16P, rx2660-rx8640 (multi-system diagram)

rx2660 1600MHz 2P 2U Montecito Systems and Cluster

rx6600 1600MHz 4P 7U Single & Cluster

rx3600 1600MHz 2P 4U Single & Cluster

rx2620 1600MHz 2P 2U Single & Cluster

Superdome 64P base configuration

Integrity Family Portrait (rx1620 thru rx8620), IA64

rx1620 1600MHz 2P MSA1000 Cluster IA64

rx2620 1600MHz 2P MSA1000 Cluster IA64

rx4640 1600MHz 4P MSA1000 Cluster IA64

rx7620 1600MHz 8P 10U Systems and MSA1000 Cluster

rx8620 1600MHz 16P 17U Systems and MSA1000 Cluster
Storage:

MSA30-MI Dual SCSI Cluster, rx3600, rx6600 and rx2660

MSA30-MI Dual SCSI Cluster, rx1620...rx4640

MSA500 G2, SCSI

MSA1510i IP SAN 48TB, SCSI and SATA

MSA1500 48TB, SCSI and SATA
Misc:

Dual Core AMD64 and EM64T systems with MSA1500

Appro: Enterprise and High Performance Computing Whitepapers

Is Your HPC Cluster Ready for Multi-core Processors?:
Multi-core processors bring new challenges and opportunities for the HPC cluster. Get a first look at utilizing these processors and strategies for better performance.

Accelerating Results through Innovation:
Achieve maximum compute power and efficiency with Appro Cluster Solutions. Our highly scalable clusters are designed to seamlessly integrate with existing high performance, scientific, technical, and commercial computing environments.

Keeping Your Cool in the Data Center:
Rethinking IT architecture and infrastructure is not a simple job. This whitepaper helps IT managers overcome challenges with thermal, power, and system management.

Unlocking the Value of IT with Appro HyperBlade:
A fully integrated cluster combining advantages of blade and rack-mount servers for a flexible, modular, scalable architecture designed for Enterprise and HPC applications.
AMD Opteron-based products | Intel Xeon-based products

Hewlett-Packard: Linux High Performance Computing Whitepapers

Unified Cluster Portfolio:
A comprehensive, modular package of tested and pre-configured hardware, software and services for scalable computation, data management and visualization.

Your Fast Track to Cluster Deployment:
Designed to enable faster ordering and configuration, shorter delivery times and increased savings. Customers can select from a menu of popular cluster components, which are then factory assembled into pre-defined configurations with optional software installation.

Message Passing Interface library (HP-MPI):
A high performance and production quality implementation of the Message-Passing Interface (MPI) standard for HP servers and workstations.

Cluster Platform Express:
Cluster Platform Express comes straight to you, factory assembled and available with pre-installed software for cluster management, and ready for deployment.
AMD Opteron-based ProLiant nodes | Intel Xeon-based ProLiant nodes

Home

About

News Archives

Contribute News, Articles, Press Releases

Mobile Edition

Contact

Advertising/Sponsorship

Search

Privacy

SpyderByte.com

;Technical Portals