|
|
|
|
System Management If building a Beowulf only required assembling the nodes, installing software on each node, and connecting the nodes to each other with a network there wouldn't be any need for systems administrators. Once you have assembled a Beowulf, you have to keep it running, maintain software, add and remove user accounts, organize the file system layout, and perform countless other tasks that fall under the heading of systems management. Some of these tasks are not any different from normal LAN administration, which entire books have been written. Unfortunately, the rules have not been fully established for Beowulf system administration. It is still a black art requiring LAN administration knowledge as well as some parrallel programming skills and a creative ability to adapt workstation and LAN software to the Beowulf environment. We'll discuss a few topics on this page. Setting Up a Clone Root Partition
Before assigning IP addresses to beowulf nodes, designing the network topology, and booting the machine, one must decide how the system will be accessed. The most commonly used configuration is what is called the "Guarded Beowulf". The guarded beowulf assigns reserved IP addresses to all of it's internal nodes, even when using multiple networks. To communicate with the outside world, a single front-end, called a 'worldly node", is given an extra network interface with an IP address from the organization's local area network. Sometimes more than one worldly node is provided. Regardless of the number of worldly nodes, to access the rest of the system a user must first log in to a worldly node. The benefit of this approach is you don't consume precious IP addresses and you constrain system access to a limited number of points. This will facilitate overall system management and security policy implementation. The disadvantage is that it is not possible for internal nodes to access the external network. To communicate with each other each node needs to be assigned a unique name. Beowulf clusters communicate internally over one or more private system area networks. One of the nodes has an additional network connection to the outside world. Because the internal nodes are not directly connected to the outside, they can use certain reserved IP addresses used for local networks. For example, the worldly node could be assigned an IP address of 192.168.1.1 and each internal node would be sequentially numbered from 192.168.1.2 to 192.168.1.16 . Remember, the worldly node has an additional network interface which will connect to the outside world. This address is different from the internal address. Naming should be kept simple. Whatever you choose the node number should reflect the IP address number. This will make things easier when trouble comes along, especially if you label each node externally. For example, if node 192.168.1.5 keeps giving you trouble you will know that it is node5 . Armed with this information you can find the guilty party quickly by looking for the PC with the label node5 on the front. This scheme also facilitates the writing of custom system utilities, diagnostic tools, and other short programs implemented as shell scripts. The next step in turning a mass of PC's into a beowulf is to install an os and standard software on all of the nodes. This can be a daunting task even for a small cluster of eight PC's. Rather than install software on each machine individually you can set up one internal node and clone the remaining nodes. The internal nodes of a Beowulf cluster are almost always identically configured. The hardware may be slightly different, incorporating different generation processors, disks, and network interface cards, but the file system layout, kernel version, and installed softwrae is the same. Only the worldly node exhibits differeces as it generally serves as the repository of user home directories and centralized software installations exported via NFS. In general you will install the operating system and extra support software on the worldly node first. Then you will configure a single internal node, and clone the rest from it. This way you only have to configure two systems. Aside from saving time up front, cloning also facilitates major system upgrades. You may decide to completely change the software configuration of your internal nodes, requiring an update to all the nodes. By cloning internal nodes, you only have to go through the reconfiguration process for one machine. In addition, cloning makes it easier to recover from certain unexpected events like disk failures or accidental file system corruption. All you have to do is install a new disk ( in the case of disk failure) and reclone the node. At the monment, there are no standard software distributions for node cloning. Most Beowulf sites either write their own software, or borrow it from colleagues, but most of the software follows the basic procedure I am about to describe. Node cloning relies on the BOOTP protocol to provide a node with an IP address and a root file system for the duration of the cloning procedure. In brief, the following steps are involved.
The basic premise behind the cloning procedure is for the new node to mount a root file system over NFS, which contains the cloning support programs, configuration files, and partition archives. When the Linux kernel finishes loading, it looks for a program called /sbin/init, which executes system initialization scripts and puts the system into multiuser mode. The cloning procedure replaces the standard /sbin/init with a program that partitions the hard drives, untars partition archives, and executes any custom cloning configuration scripts before rebooting the newly cloned system. To get ready for cloning you have to configure an initial internal node. How you configure it will depend on how you intend to use your Beowulf. But you will more than likely install the basic operating system and network clients like the NFS automounter. Whether or not you install a full set of compilers and message passing libraries is up to you, but in general, development tools are not duplicated across internal nodes and normally reside on the worldly node. After configuring the internal node, you ned to make an archive of each disk partition, omitting /proc, which is not a physiscal disk partition. Some cloning software may provide a front-end that asks you some questions and automatically archives each partition for you. Most of the time you will have to take care of it yourself. The normal procedure is to change your current working directory to the partition mount point and use the following command: tar zlcf /worldly/nfsroot/partition-name.tgz The l option tells tar to only archive files in directories stored on the local partition, avoiding files in directories that serve as mount points for other partitions. A potential pitfall of this archiving method is that you may not have enough room on the local disk to store the partitions. Rather than create them locally, you should store the tar file on an NFS partition on the worldly node. Ultimately, you will have to transfer the files to the worldly node, so you might as well do it all in one step. Setting Up a Clone Root Partition Now you need to create a root directory for cloning on the worldly node. This should be exported via NFS to the internal node network. The directory should contain the following subdirectories: bin, dev, etc, lib, mnt, proc, sbin, tmp. The proc and mnt directories must be empty, as they will be used as mount points during the cloning process. The dev directory must contain all the standard Linux device files. device files are special, and cannot be copied normally. the easiest way to create this directory is by letting tar do the work for you by executing the following command as root: tar -C / -c -f - dev | tar xf - This will create a decv directory containing all the device files found on your system. All the remaining directories can be copied normally, except for tmp and etc which should be empty. You should have no need for a usr directory tree. It is possible to trim down the files to the bare minimum necessary for cloning, but it isn't necessary. You will need to add an fstab file to etc containing only the following line, so that the /proc file system may be mounted properly: none /proc proc default 0 0 You may also need to include a hosts file. Once you have your NFS root file system in place, move the partition archives to the root file system. Depending on the specific cloning software you are using, you may have to create a special directory to store these files. If you are writing your own cloning scripts, place the archives in a sensible location for your scripts. Then replace the NFS root sbin/init executable with your cloning init script. This script will be invoked by the clone node's kernel to launch the cloning process. Tasks performed by the script include drive partitioning, archive extraction, and configuration file tweaking. Some configuration files have to be tweaked if your nodes aren't set up to configure themselves through DHCP or BOOTP. The primary configuration files are ones dependent on the node IP address, such as /etc/sysconfig/network and /etc/sysconfig/network-scripts/ifcfg-eth0 on Red Hat based systems. At this point you can add the NFS root directory to the list of exported file systems, making sure to export it only to your internal node network. Now it's time to decide what IP addresses the internal nodes will have. Once this is done you can create a bootptab file and enable the bootpd deamon on the worldly node. The bootptab file must include a root path entry for the NFS exported root directory. You specify this with the rp option, as in the example below. .default:\ To activate the BOOTP deamon you should only have to create an /etc/bottptab file, uncomment the line in the /etc/initd.conf that invokes bootp and restart the inetd server on the worldly node. The bootptab file tells bootp how to map hardware addresses to IP addresses and host names. The .default entry is a macro defining a set of options common to all of the entries. each entry includes these default options by including tc=.default. The other entries are simply hostnames followed by IP addresses and hardware addresses. After making it to this point, everything should be in place on the server to start the clone process. All that remains is to create a boot floppy that will launch the cloning process on the client. Creating a boot clone floppy requires familiarity with compiling the Linux kernel. This is because the boot disk does not actually perform the node cloning, but bootstraps the cloning procedure by talking to a bootp server to get an IP address and mount the root file system over the network. The default kernel you may have installed more than likely does not have this capability built-in. It is necessary to build a kernel with these abilities, commonly called an NFSROOT kernel because it mounts a root file system over NFS. The CACR Beowulf cloning software contains a pre-built NFSROOT kernel, but it cannot be guaranteed to work on your system. If you are not familiar with compiling the Linux kernel take a look at the kernel upgrade protion of this website. When compiling the kernel for a clone floppy you must make sure that NFS root file system support is enabled. Once compiled, the kernel will be stored ina file called zImage or bzImage depending on the compression used. If you use this unaltered kernel to create a boot floppy, it will try to mounta local partition as the root file system. There is some voodoo involved to make it boot using the NFS directory obtained via the BOOTP. The root device used by a kernel is stored in the kernel image and can be altered with the rdev program, usually located in /usr/sbin. You want the root device to be the NFS system, but no device file exists for this purpose, so you have to create one. You can do this with the following command: mknod /dev/nfsroot b 0 255 This creates a dummy block device with special major and minor device numbers that have special meaning to the Linux kernel. This interprets the device as an NFS root file system when set as a root device with: rdev zImage /dev/nfsroot Now that the kernel's root device is set to be mounted via NFS, you can write the kernel to a floppy with the dd command: dd if=zImage of=/dev/fd0 bs=512 After creating your first clone disk, you should test it on a node and make sure everything works. After this test, you can duplicate the floppy and clone all of your nodes at once.
|