One of the measures of success of a Beowulf cluster is the number of people waiting in line to run their code on the system. Build your own low-cost supercomputer, and your cycle-starved colleagues will quickly become your new best friends. But, when they all get accounts and start running jobs, they'll soon find themselves battling each other for the limited resources of the machine.
At that time, you'll need a package that automatically schedules jobs and allocates resources on the cluster -- something akin to the batch job queuing facilities used on the business systems and mainframes of yesteryear. Batch facilities would line up jobs, execute them in turn as the appropriate resources became available, and deliver the output of each job back to the submitter.
Clustered systems and emerging grid technologies have driven the need for new job scheduling packages in the computational science realm. Two scheduling packages that are increasingly being used are OpenPBS (Portable Batch System, http://www.openpbs.org, and Sun Microsystems's Grid Engine, http://gridengine.sunsource.net. OpenPBS is an open source package offered and supported by Veridian Systems. (Veridian also offers an enhanced commercial version called PBS Pro.) The source for Grid Engine is also available at no cost. Both packages run on a wide
variety of Unix and Linux systems, and may be used for both serial and parallel job control. The month's column will focus on implementing OpenPBS on a typical Beowulf cluster.
Full story...
Earlier Linux Mag Extreme Linux columns...
Scalable I/O on Clusters
High Performance Interconnects