Setup GFS with GNBD and DM-Multipath From APLogicsWiki Jump to: navigation, search This guide will help you create a 5 node GFS cluster with 2 GNBD servers and 3 children nodes using DM-Multipath to deliver a redundant cluster. I am using Cent OS 5.0 in this setup. Prereqs: 1. Statically assigned IP addresses on each node 2. Hosts file (/etc/hosts) which includes each node's hostname and IP 3. Working NTP service (optional but helpful) 1. Install the cluster packages: # yum groupinstall "Cluster Storage" "Clustering" 2. Reboot each node: # reboot [edit] Configure your cluster (Any Server Node) 1. Start the Cluster Configuration: vi /etc/cluster/cluster.conf [edit] GFS partition configuration (Server Nodes) 1. Create new partition: - I attached a 2nd hdd to each of my server nodes server# fdisk /dev/hdb n p 1 enter enter w # reboot 2. Initialize the partition for use by LVM: # pvcreate /dev/hdb1 Verify: # pvscan 3. Create the volume group: # vgcreate mycluster /dev/hdb1 Note: 'juncluster' is same as in cluster.conf Please use the same name you defined in cluster.conf Verify: # pvdisplay /dev/hdb1 4. Create the logical volumes in the existing volume group: server1# lvcreate -L500M -n gfs_cluster1 mycluster server1# lvcreate -L500M -n gfs_cluster2 mycluster # vgchange -aly 5. Create the GFS file system: server1# gfs_mkfs -p lock_dlm -t juncluster:gfs_cluster1 -j 3 /dev/mycluster/gfs_cluster1 server2# gfs_mkfs -p lock_dlm -t juncluster:gfs_cluster2 -j 3 /dev/mycluster/gfs_cluster2 This will destroy any data on /dev/mycluster/gfs_cluster2. Are you sure you want to proceed? [y/n] y Device: /dev/mycluster/gfs_cluster2 Blocksize: 4096 Filesystem Size: 425924 Journals: 3 Resource Groups: 8 Locking Protocol: lock_dlm Lock Table: mycluster:gfs_cluster2 Syncing... All Done [edit] Install and Configure DM-Multipath (Non-server Nodes) 1. Install DM-multipath # yum -y install device-mapper-multipath 2. Configure device-mapper-multipath: Edit the multipath.conf file on the non-server nodes: 1. comment out the default blacklist 2. change any of the existing defaults as needed 3. save the configuration file # vi /etc/multipath.conf Change this: blacklist { devnode "*" } To this: #blacklist { # devnode "*" #} [edit] Starting the Cluster Services 1. For all nodes in the cluster - start each service on each host and then proceed to the next service: service cman start service clvmd start service gfs start service rgmanager start 2. Start GNBD on the Server nodes: # /sbin/gnbd_serv -v Export the filesystem so that nodes can retrieve it: server1# gnbd_export -v -e gfs_cluster1 -d /dev/mycluster/gfs_cluster1 -u gfs_uid server2# gnbd_export -v -e gfs_cluster2 -d /dev/mycluster/gfs_cluster2 -u gfs_uid # gnbd_export -v -l 3. Start multipath and import the GNBDs from the Server nodes (Non-Server Nodes): modprobe dm-multipath service multipathd start modprobe gnbd gnbd_import -v -i server1 gnbd_import -v -i server2 Verify: # gnbd_import -l [root@node1 ~]# gnbd_import -l Device name : gfs_cluster1 ---------------------- Minor # : 0 sysfs name : /block/gnbd0 Server : server1 Port : 14567 State : Open Connected Clear Readonly : No Sectors : 1024000 Device name : gfs_cluster2 ---------------------- Minor # : 1 sysfs name : /block/gnbd1 Server : server2 Port : 14567 State : Open Connected Clear Readonly : No Sectors : 1024000 Verify: # multipath -ll [root@node1 ~]# multipath -l mpath0 (gfs_uid) dm-3 GNBD,GNBD [size=500M][features=0][hwhandler=0] \_ round-robin 0 [prio=0][enabled] \_ #:#:#:# gnbd1 252:1 [active][ready] \_ #:#:#:# gnbd0 252:0 [active][ready] 4. Mount the file system on Non-Server nodes: # mkdir /gfs # mount -t gfs /dev/mapper/mpath0 /gfs/ Verify: # mount [root@node1 ~]# mount /dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/hda1 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) none on /sys/kernel/config type configfs (rw) /dev/mapper/mpath0 on /gfs type gfs (rw,hostdata=jid=2:id=131073:first=0) [edit] Scripts for the Server Nodes Startup: # vi /etc/rc.local service cman start service clvmd start service gfs start service rgmanager start Shutdown: # vi /etc/init.d/cluster_shutdown service rgmanager stop service gfs stop service clvmd stop service cman stop # chmod 755 /etc/init.d/cluster_shutdown # ln -s /etc/init.d/cluster_shutdown /etc/rc0.d/K98cluster_shutdown # ln -s /etc/init.d/cluster_shutdown /etc/rc6.d/K98cluster_shutdown [edit] Scripts for the Non-Server Nodes Startup: # vi /etc/rc.local service cman start service clvmd start service gfs start service rgmanager start modprobe dm-multipath service multipathd start modprobe gnbd Shutdown: # vi /etc/init.d/cluster_shutdown umount /gfs service multipathd stop service rgmanager stop service gfs stop service clvmd stop service cman stop # chmod 755 /etc/init.d/cluster_shutdown # ln -s /etc/init.d/cluster_shutdown /etc/rc0.d/K98cluster_shutdown # ln -s /etc/init.d/cluster_shutdown /etc/rc6.d/K98cluster_shutdown [edit] Troubleshooting What happens when a GNBD server node fails and the steps to take to recover: 1. GNBD server2 node fails and goes offline 2. Fencing kicks in and takes the host off the network if it isn't already 3. Read/Writes continue on /gfs During this time if you issue "multipath -ll" you may see the following: [root@node1 ~]# multipath -ll gnbd1: checker msg is "directio checker reports path is down" mpath0 (gfs_uid) dm-2 GNBD,GNBD [size=500M][features=0][hwhandler=0] \_ round-robin 0 [prio=1][active] \_ #:#:#:# gnbd1 252:1 [active][faulty] \_ #:#:#:# gnbd0 252:0 [active][ready] 4. Server2 restarts and reconnects to the cluster, unfenced, DM-multipath restores the paths (from the gnbd_export command) and gnbd_import should still has a record of server2's export That should be all you need to do. 5. If you are still receiving I/O errors when writing to /gfs do one of the following: unmount /gfs remove any imports (gnbd_import -R or gnbd_import -r server1 ; gnbd_import -r server2;) reimport (gnbd_import -v -i server1/2) verify the systems are recognized in multipath (multipath -ll) remount the file system (mount -t gfs /dev/mapper/mpath0 /gfs) OR reboot the node that is having problems If there are no read/writes on /gfs while server2 is offline, simply reboot server2 and wait for it to connect to resume read/writes. What happens when a child node in the gfs cluster (using GNBD and dm-multipath) goes offline and the steps to take to recover: 1. node1 fails and goes offline 2. Fencing kicks in and takes the host off the network if it isn't already 3. Read/Writes continue on the /gfs mount on the other nodes 4. node1 restarts and reconnects to the cluster 5. Reimport from the GNBD nodes: # gnbd_import -v -i server1 # gnbd_import -v -i server2 Verify w/multipath: # multipath -ll [root@node1 ~]# multipath -ll mpath0 (gfs_uid) dm-2 GNBD,GNBD [size=500M][features=0][hwhandler=0] \_ round-robin 0 [prio=2][enabled] \_ #:#:#:# gnbd1 252:1 [active][ready] \_ #:#:#:# gnbd0 252:0 [active][ready] 6. Remount the file system: # mount -t gfs /dev/mapper/mpath0 /gfs == DRBD for GFS == drbd == yum -y install drbd82 kmod-drbd82 shutdown -r now fdisk /dev/hdd n p 1 w # cat /etc/drbd.conf #global { usage-count yes; } common { syncer { rate 100M; al-extents 257; } } resource r0 { protocol C; startup{ become-primary-on both; degr-wfc-timeout 60; wfc-timeout 30; } net { allow-two-primaries; cram-hmac-alg sha1; shared-secret "FooFunFactory"; #after-sb-0pri discard-younger-primary; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } on server1{ device /dev/drbd0; #disk /dev/hdb1; disk /dev/mapper/mycluster-gfs_cluster1; address 192.168.20.108:7788; meta-disk /dev/hdd1 [0]; #meta-disk internal; } on server2{ device /dev/drbd0; #disk /dev/hdb1; disk /dev/mapper/mycluster-gfs_cluster2; address 192.168.20.109:7788; meta-disk /dev/hdd1 [0]; #meta-disk internal; } } *********** ERROR ************* [root@test2 ~]# drbdadm create-md r0 v08 Magic number not found md_offset 134217728 al_offset 134221824 bm_offset 134254592 Found ext3 filesystem This would corrupt existing data. If you want me to do this, you need to zero out the first part of the device (destroy the content). You should be very sure that you mean it. Operation refused. Command 'drbdmeta /dev/drbd0 v08 /dev/hda3 1 create-md' terminated with exit code 40 drbdadm aborting ******************************* dd if=/dev/zero of=/dev/hdd1 bs=1M count=1000 drbdadm create-md r0 service drbd start drbdadm create-md r0 1.Syncronize for harddisk drbdsetup /dev/drbd0 primary -o 1.Make server become primary on both of drbd server. drbdsetup /dev/drbd0 primary cat /proc/drbd