by Dominique Heger & Philip Carinhas,
Fortuitous Technologies
Intro
Often seen as an extra expense, performance and capacity planning
often saves a project more money in the long run. Costs are usually
recovered by the completion of the initial implementation phase if
not sooner. Moreover, projects that are properly planned will
achieve design goals and allow future scalability at a
significantly lower total cost.
Performance Planning Issues
In today's parallel, heterogeneous, and interconnected IT
wilderness, predicting and controlling cost factors surrounding
systems performance and capacity planning is overwhelming at best.
For larger IT projects, it is not uncommon to find situations where
the cost factors for performance tuning and capacity problems
reflect the largest and the least controlled expenses. To
illustrate, a sudden slowdown of an enterprise wide application may
trigger user complaints, delayed projects, an IT support backlog,
and ultimately a financial loss to the organization. By the time
the performance problem is located, analyzed, worked around,
tested, and verified, an organization may have spent tens of
thousands of dollars in time, IT resources, and hardware, only to
fall back into the same vicious cycle the very next year.
The Crux
When performance is designed into the final solution, costs can be
contained and reduced while ensuring required performance with
scalability potential. This approach shifts the emphasis away from
the installation and setup phase to the planning and design stages.
It is paramount that IT not only understand the expected workload
behavior, but responsibly act by conducting feasibility and design
studies prior to spending many thousand of dollars on a solution
that in a best case scenario, may not be optimal, and in a worst
case scenario, completely fails.
Hidden Costs Associated with Bad Planning
- Unneeded Hardware
Application performance issues have an immediate impact on customer
satisfaction and an organization's bottom line. It is not uncommon
that while a performance issue surfaces, organizations start adding
more (often expensive) hardware into the operation mix, without
fully understanding where the problem truly lies nor understanding
how the extra hardware will affect overall system performance.
Hence, working on the symptoms and not the underlying cause may
provide an organization with some relieve in the short run, but
intensifies the issues in the long run, as even more hardware has
to be troubleshot and analyzed. In addition, there are these costs
associated with redundant hardware:
- Electricity
- Extra Cooling (several times the electricity costs)
- Extra IT Overhead (See Below)
- Hardware Replacement Costs (drives, fans, psu, et al)
- IT Overhead
In addition to hardware costs, the IT personnel costs associated
with unplanned performance tuning exercises can be excruciating. IT
managers may be forced to commit hundreds of man-hours to solve
even simpler performance problems. As in some circumstances, the
actual source of the problem may not be easily identified, IT
personnel may spend hours or days analyzing and tuning the wrong
subsystem. To make matters worse, some performance tuning exercises
may require crossing over into the domains of security,
reliability, or availability. Proper design and planning can reduce
these costs.
- Security and HA
Without initial proper planning, fire-fighting scenarios such as
these may result into additional work for an organization's
security or high-availability (HA) personnel as well. Proper design
and planning can significantly reduce these costs as well.
- Lost Revenue
Without proper planning, projects run the risk of partial or total
failure which can drive away associated revenue. There is no excuse
for a project to fail from a lack of adequate planning and design.
Even if the system is not designed for direct revenue stream, it
can cause loss for internal customers and related systems.
An Illustration
As an example of the shortcomings of zealous use of hardware lets
consider CompanyX, whose 10 node cluster would not perform well
under stress. The managers authorized IT to buy 5 more servers to
increase performance, which resulted in no noticeable performance
gain. When the system was finally examined, a simple model
immediately showed that the memory and IO subsystem were
bottlenecked, and the optimal number of compute nodes was about 10.
Summary
In short, the proper approach to managing systems performance is to
design performance into the solution. If the system is already in
production, the recommendation is to conduct a performance study
that covers application, operating system, and hardware subsystems,
respectively. It is paramount to understand not only the actual
workload behavior, but also the interaction between the
application, the OS, and the hardware. Treating performance related
issues early on in an IT project avoids hidden cost scenarios, and
is exponentially cheaper than performing extraneous tuning after
deployment.
About the Authors
Dominique Heger has over 18 years of IT experience, focusing on systems performance, capacity planning, cluster technology, performance modeling, algorithms and data structures, and I/O scalability. Philip Carinhas is the President and CEO of Fortuitous, and has over 15 years experience in Linux and enterprise computing. They can be found at http://fortuitous.com