What is Clustering
Clustering technology is something mentioned throughout the server configurations and server technologies, but rarely do people know what really clustering means. Sure there are some rough explanations, but those explanations can also be very accurate as the main definition of clustering would be connecting two or more computer systems into a one unity with different purposes that can be only achieved by this type of computer interface. When it comes to clustering, this specific technology has been developed in multiple fields of IT. You will notice its usage for grouping multiple computers in one whole to create a virtually powerful machine, to an example of the clustering technology used for forming file systems. Currently, they execute a huge role in dealing with issues associated with technology, engineering as well as modern day business transactions. Clustering itself has spread to all possible braches of computer technology, like the supercomputers whose prices go from one million dollars and way beyond. Yet one the other hand you can find the same technology applied on some applications or services who are cheap or free of charge totally. But in general there are a couple of sorts of clusters that are mostly used nowadays.
Types of clusters
- High Performance Cluster – these clusters are originally intended for applications and usages where the resource requirements are normally very high all the time and therefore require a great amount of power which normally would only be obtainable using supercomputers, even though they are also made on cluster technology. However partial amount of resources can be achieved and produced by using the cluster technology to create a virtual computer system by unifying multiple computer systems.
- High Availability Cluster – machines used for clusters usually are made to last longer than usual due to their constant uptime. Downtime in general, with cluster servers, doesn’t exist and that is made possible due to their independence from each other and their possibility to work as a stand-alone servers.
- Cluster for Load Balancing – in general cluster made in this way are meant for programs that simply overload one, two or more machines with their actions. This way of connectivity distributes the applications load on multiple machines so the load is taken of one or more computer systems (all depending on the number of computer system that make that cluster).
Clusters are used in surroundings where ‘stronger’ machines are required and but the definition of stronger depends on the type of company that uses the cluster technology and to what why, which finally decides how ‘strong’ should your cluster be. In fact every piece of equipment counts and will decide the overall quality result of the cluster configuration. However to maximize the power of a cluster technology and draw 100% out of it, you will need a cluster management system, which will configure all of the parameters of the server so you can be sure in your configuration at any time, to whatever purpose the configuration might be.
History of Clusters
I will be brief in this section as there is a lot to write about here, but in general I will mention only the most important dates in cluster evolution. The date when the first cluster connection was been made is relatively unknown, but generally speaking the first cluster is bonded by the fact when the first connection between two computers has been made. That was the beginning of cluster technology and this is so because the general idea of linking two computers is for them to share/use each other resources. That is the core idea of the cluster technology and in 1969 the ARPANET succeeded in creating what was arguably the world's first commodity-network based computer cluster by linking four different computer centers. After that the ARPANET expended to what we all know today as Internet. The “first” commercial clustering product was ARCnet, developed by Datapoint in 1977. In 1984 a VAX cluster was released together with a VAX operating system which was made to be compatible. No history of commodity computer clusters would be complete without noting the pivotal role played by the development of Parallel Virtual Machine (PVM) software in 1989. This open source software based on TCP/IP communications enabled the instant creation of a virtual supercomputer—a high performance compute cluster—made out of any TCP/IP connected systems. These were just some of dates in creation and development of cluster technology, however there is much more to it than this.
Advantages of using different types of clusters
Because of its very variable advantages that can be configured and maintain active for a very long time clustering was always a very attractive choice for many companies, research centers, universities, but also for a those individuals who could manage something with a couple PC’s connected together for power boost in resources. Clusters provide these functions at different categories which were mentioned earlier before: High performance, High availability, High efficiency and Scalability. So clusters are set up according to different requirements that different people need from them.
- High Performance or HPCC (High Performance Computing Cluster) are clusters where activities are performed which need high computational capability, substantial memory, or even the two simultaneously. Performing these types of work may possibly endanger the cluster resources for extended amounts of time.
- High Availability or HA are clusters that is characterized by another feature and that is reliability. This type of clusters is generally known as the type that continues its work forward even if the hardware has stopped working at some point.
- High Efficiency - this type of cluster is something very similar to the High Performance but has an opposite nature. In HPCC you set this type of cluster generally for providing enough resources for a single application that requires that amount of resources, but with High Efficiency you need a lot of resources due to a great number of applications that work simultaneously.
But so far this was all a rough and general classification and selection of cluster technology. However there are important tiles of this technology that need to be explained in order to fully understand the behavior of the cluster technology and how to optimize it for your personal necessities. Those parts are the following: Nodes, Storage, Operating Systems, Network Connections, Middleware, Communication protocols and services, Applications and Parallel Programming Environments.
When it comes to forming clusters the thing on which upon you will form them are the computers systems that you have available or have specially reserved for that purpose. Node represents the unity of different elements that represent a whole using those different parts. Mostly a node represents a computer system, but in clustering technology there are two different types of nodes that are used for creating a cluster. They are nodes that are involved and not involved. The difference between the two of them is not that big and yet it is. In a cluster with dedicated nodes, the nodes do not have a keyboard, mouse or monitor and its use is exclusively dedicated to performing tasks related to the cluster. While in a non-dedicated cluster with nodes, the nodes have a keyboard, mouse and monitor and its use is not solely dedicated to tasks related to the cluster, the cluster makes use of clock cycles that the user’s computer does not being used to carry out their tasks. An important fact when making a cluster out of nodes are the nodes you will be using for that purpose. The difference between the nodes shouldn’t differ that much, and for optimal performance same resources, architecture and operating system are wanted for best possible result.
A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. A shared disk file system uses a storage area network (SAN) or RAID to provide direct disk access from multiple computers at the block level. NAS (Network Attached Storage) is specifically dedicated to the storage device through the network (usually TCP / IP) that uses an operating system optimized to provide access via CIFS, NFS, FTP or TFTP.
When it comes to the Operating System there isn’t much to say about that, cause that is commonly known subject, and the performance supported by an OS aren’t no longer restricted as they were for some time, especially thanks to the introduction of the x64 architecture which lifted the boundaries of computing long time ago. The thing that the OS’s have to have is multithreaded and multi-users ability.
If you want to set up a cluster you have to connect multiple machines in our case and choose a type of connectivity so the clustering can begin. However most of us have heard about the Ethernet connection and are mostly using it, but that is not the only thing that can be used as a network connection for forming clusters. Different technologies have different advantages, but also flaws. Ethernet has protocols that do excessive transfer rate errors check which result in performance drop, but this problem might get solved due to the new Gigabit Ethernet connection which results in great increase of transfer rate, yet still not enough for some clusters. Myrinet is somewhat very similar to the Ethernet technology and there aren’t some greater accomplishments on that field of work. That is why Infiniband is being created, with the performance to use the clusters in full capacity.
Infiniband – is a technology that is evolving all the time and shows a lot of potential for cluster usages. Channel integration resulted in massive speed increase using the Infininet and will provide from 2Gbps with one channel to 96Gbps with twelve channels. However there are still unwanted results like the poor start-up response when the connectivity is started.
This software has been earlier mentioned as a very important part of the clustering technology and it really is as it represents the layer between the OS and the virtual clustering machines. Middleware has multiple functions which need to be satisfied for a perfect and easy running of the cluster server. One of those functions is the fact and the basics of cluster technology which allows you to fell the advanced resource usage that only a clustering technology could offer. Another thing you should be aware of with Middleware is the different responsibilities that await you to maintain that kind of system working properly. Server Maintenance can be a key role in cases of breakdown of cluster servers and therefore with proper migration or regular back-ups a lot of data can be salvaged and almost no downtime experienced. Load Balancing was mentioned earlier as it is a part of Middleware, but another thing that is tightly bonded with Load balancing is the prioritization of the resources that are used.
Everything when put together, you are given a technology with so much advantages and possibilities that what you will experience will certainly repay you especially in funds. You should invested for a high end machines, but using cluster you have accomplished just that, but in a more cheaper way, resulting in improvement from the start.