A classic cluster is essentially a number of computers grouped together in a
manner that allows them to share infrastructure, such as disk space, and work
together by sharing program data while those programs are running. However,
this simple definition, though accurate, does not really capture the full
capability of a modern cluster system, as it excludes a very important concept.
This concept, which has been developed to essentially become the core of
clustering in general, is the scheduling system. The functional purpose of the
scheduling system is to eliminate the need to know what individual computers
are doing.
When presented with multiple computers, you do not know what they are doing
without individually checking them. Anything could be running on them, by
anybody who has access to them. If you want to run a program, you would have to
check each computer to see which, if any, have enough available resources, disk
space, processors, memory, to run your program. only is it inconvenient
to manually check each computer, but if none of them have any available
resources, then you will be forced to check again (manually) at a later time.
A scheduling system removes this need, by aggregating data, and monitoring
its system, a scheduler will keep an accurate and up to date
picture of what resources are available and where. Even beyond tracking
resources, a scheduler will allow you to submit instructions for running your
program, and then run your program on your behalf once the necessary resources
are available.