Last edit: Andrei Ryjov aka aryzhov@spasu.net - Fri
Jan 12 22:48:12 MET 2007
Sun
Cluster-enabled Jumpstart for $SITE
Network services
necessary for Jumpstart.
Jumpstart server at $SITE presumes manual restart
of network daemons, essential for bootstrapping the machines to
be
staged. Such daemons include:
/usr/sbin/in.rarpd
resolves hardware/MAC (usually, Ethernet) addresses into IPV4
addresses. At $SITE, flat local files are used for resolution.
/etc/hosts file contains
IP-to-name
table, /etc/ethers file -
MAC-to-address table. Acting together, these two files provide
MAC-to-IP match.
For rarpd to work properly on a multihosted Sun machine,
local-mac-address? in EEPROM must be set to "true"
In this case, local MAC address for specific NIC may be different
from one shown by banner at Open Boot Prom (OBP), and can be found
eiher from booted Solaris after the relevant interface is plumbed
(ifconfig qfe5 plumb, for instance), or from OBP as described here:
{0} ok
{0} ok banner
SUNW,Sun-Blade-1000 (2 X UltraSPARC-III) , No Keyboard
Copyright 1998-2002 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.5, 2560 MB memory installed, Serial #51376277.
Ethernet address 0:3:ba:f:f0:95, Host ID: 830ff095.
{0} ok
{0} ok show-devs
/ppm@8,410050
/upa@8,480000
....
/pci@8,600000/pci@1/pci@4
/pci@8,600000/pci@1/pci@0
/pci@8,600000/pci@1/pci@4/network@3
/pci@8,600000/pci@1/pci@4/network@2
/pci@8,600000/pci@1/pci@0/network@1
/pci@8,600000/pci@1/pci@0/network@0
/pci@8,600000/SUNW,qlc@4/fp@0,0
/pci@8,600000/SUNW,qlc@4/fp@0,0/disk
....
{0} ok
{0} ok cd /pci@8,600000/pci@1/pci@4/network@3
{0} ok .properties
max-frame-size 00 00 40 00
network-interface-type ethernet
device_type network
name
network
local-mac-address
00 03 ba b1 9d 73
version
Sun PCI Quad Gigaswift 10/100/1000Base-T FCode
2.12 03/11/11
phy-type ...
rarp trouble-shooting usually involves
looking at actual kernel arp tables (/usr/sbin/arp -an) and snooping
the packets on the relevant NIC (snoop -d qfe5 - if qfe5 is the
interface name). Most frequent problems with rarpd arise from duplicate
IP addresses, or from confusing the server's IP/MAC for client's IP/MAC
in /etc/hosts and/or /etc/ethers files.
in.tftpd - provides the Stage2
boot image to the clients in response to TFTP broadcast requiest sent
by Stage1 (boot PROM in firmware). The boot images are usually kept in
/tftpboot directory on the boot (usually same as Jumpstart) server, and
in fact are symlinks to inetboot.* file for relevant client
architecture and OS version. To enable tftpboot for the new client,
usually a new symling must be created in /tftpboot directory. The
basename of this symlink can be easily found by looking at TFTP
broadcasts visible in the snoop session as described in rarpd section
above.
tftpd daemon is usually triggered by inetd. Uncomment the tftp line in
/etc/inetd.conf and kill -HUP the inetd process in order to
enable tftpd. Trouble-shooting usually involves trying tftp from
command line on another machine as follows:
# tftp 10.130.11.12
tftp>
tftp> get ABCDEFGH
where ABCDEFGH is basename of symlink to
inetboot image, as described above. The file will be transferred to
local host and saved in current directory, which proves that tftp works
correctly. In addition, there is a debug option in in.tftp commend.
RTFM for details. Note that Solaris 10 uses SMF for daemon management
(even though the classic inetd-based and rc-based methods are still
possible). Enabling tftpd in Solaris 10_0202 requires generation of new
SMF service, which for tftp can be easily done via inetconv
utility. RTFM.
/usr/sbin/rpc.bootparamd
- tells the Stage2 where Stage3 (Solaris kernel)
can be found, and then tells Stage3 where root filesystem image can be
mounted from.
The necessary data is kept in /etc/bootparams file. This file also
contains information used by other programs in Jumpstart, for instance,
sysidcfg or getbootargs running at different phases of Jumpstart on the
client, may as well send BOOTPARAMS broadcast requests in order to
fetch information from /etc/bootparams file on the Jumpstart server.
For trouble-shooting of bootparamd, look at the snoop session. Also,
start rpc.bootparamd -d in foreground to get more verbosity from it.
Bootparamd options and bootparams file format are described in man
pages. Even more details can be found in Advanced Solaris Installation
Guide on docs.sun.com.
NFS server - requires rpcbind, lockd, stsatd (client
services), nfsd and mountd (server services). Read Solaris Network
Administration on docs.sun.com fro more details on NFS configuration
and trouble-shooting. Note that
Solaris-10 NFS servers by default offers NFS protocol v4, whereas
Stage3 booting supports v3 maximum. If the Jumpstart client panics on
mounting the root filesystem, reduce the server protocol level in
/etc/default/nfs. RTFM.
Jumpstart phases.
After booting the Stage3 (Solaris kernel) and mounting the root
filesystem (pointed to by "root=" in /etc/bootparams),
sbin/init found on this filesystems, processes commands in
etc/inittab. This inittab slightly differs from inittab usually found
on machines bootng from the hard disk. It also refers to rc scrips that
are slightly different. These scripts initialize Jumpstart variables
and directories on the so-called "diskless client" (mostly on a
virtual/RAM disk mounted on /tmp) and start Jumpstart phases such
as Begin, Install and Finish.
Begin phase consists of built-in Begin and custom Begin.
Same for Finish. The location of master scripts for custom Begin and
Finish phases is specified in rules.ok file found in Jumpstart
Config directory pointed to by "install_config=" in
/etc/bootparams file on Jumpstart server. On Jumpstart client, the name
of this directory is kept in $SI_CONFIG_DIR variable that is usually
exported to all sub-scripts in all phases.
Install phase runs right after
Begin phase, and essentially is processing of Jumpstart Profile
(describing disk slicing and package information) by
/usr/sbin/install.d/pfinstall program.
A simple Profile example:
install_type initial_install
system_type standalone
partitioning explicit
#
filesys
c0t3d0s0 1000 /
filesys
c0t3d0s1 1000 swap
filesys
c0t3d0s6 free /export
#
# Only the core cluster and some needed packages are installed
#
cluster SUNWCreq
More advanced Profile example:
install_type initial_install
system_type standalone
partitioning explicit
#
filesys mirror:d100 c1t0d0s0
c1t1d0s0 4096
/
logging
filesys mirror:d101 c1t0d0s1 c1t1d0s1
16384 swap
filesys mirror:d105 c1t0d0s5
c1t1d0s5 1024 /globaldevices logging
filesys mirror:d106 c1t0d0s6
c1t1d0s6 8192
/opt
logging
filesys mirror:d107 c1t0d0s7
c1t1d0s7 free
/var
logging
#
#
metadb
c1t0d0s4 size 8192 count 3
metadb
c1t1d0s4 size 8192 count 3
#
cluster SUNWCreq
#
# Clusters and packages that we don't need
#
cluster
SUNWCdtrace delete
cluster
SUNWCaudd delete
cluster
SUNWCbs delete
....
cluster
SUNWCkrb5 delete
cluster
SUNWClexpt delete
#
package
SUNWatfsu delete
package
SUNWatfsr delete
package
SUNWdtcor delete
...
More information about Profile format and options can
be found in Advanced Solaris Installation Guide on docs.sun.com
Finish phase runs after Install
phase. The location of Custom Finish master script is specified in the
same rules.ok file found in $SI_CONFIG_DIR directory.
Begin and Finish algorythms implemented for $SITE, will be described
further in great details.
$SITE extentions.
In order to simplify and unify Jumpstart clients
management, following changes have been applied to standard Jumpstart
configurations and procedures described in Advanced Solaris
Installation Guide:
Single bootparams entry, Profile,
Begin/Finish master scripts, rules.ok for all clients.
In bootparams file, wildcard "*" is used instead of client's
name or IP address, which matches all clients. This way, any
machine, broadcasting on this wire for BOOTPARAMS, gets the same data.
If you need to jumpstart a machine using different parameters (location
of root filesystem image, sysidcfg, config and packages directories,
etc), make shure to add the more specific entries BEFORE the less
specific (more common) ones, as the entries are processed
top-to-bottom, and thus, if common entry is matched, the more specific
enry following it, will be never looked at by bootparamd.
Profile, unlike in standard Jumpstart, is a symbolic softlink to
/tmp/Profile. Real /tmp/Profile is generated by a subscript
Scripts/Misc/MakeProfile triggered by Begin master script. Location of
the master Begin and Finish scripts, as usually, can be found
from rules.ok file. However, unlike in standard Jumpstart,
rules.ok contains only one line, and does not need to be "recompiled"
every time a new client is added. Instead, new client is added to
Profile directory as described in "Config directory structure and
contents" section below.
Default Begin and Finish master scripts, referred to by rules.ok file,
are only wrappers for picking up the correct site-specific and
client-host-specific subtree within Profiles directory structure
described here later. Besides, some of subscripts triggered by these
master scripts, process special comments in dynamically generated
/tmp/Profile and thus effectively run the Begin and Finish subscripts
on site-specific and host-specific basis. More details follow in
further sections.
Additions to Profile syntax
Special comments can be added to $SITE Jumpstart Client Profile in
order to specify Begin, Finish and some other actions. Extended Profile
example:
########## Begin script overwrites ###########
#
#!Begin
#!Begin
#!Begin cfgadm -c configure c2
#!Begin cfgadm -c configure c3
#!Begin devfsadm
#!Begin
#!Begin echo | format
#!Begin
#!Begin # StartShell Debugging from Begin
#!Begin
#
install_type
initial_install
system_type standalone
partitioning explicit
#
filesys mirror:d100
c1t0d0s0 c1t1d0s0 4096
/ logging
filesys mirror:d105 c1t0d0s5
c1t1d0s5 1024 /globaldevices
logging #!Rename d91,d92,d93,d94,d95,d96
...
...
cluster SUNWCuser
...
########## Finish script replacement ###########
#
#!Finish
#!Finish #
#!Finish # The order is important
here !
#!Finish #
#!Finish # Note that patches
installation is part of
#!Finish # the Scripts/Finish/Std
and therefore can not be
#!Finish # excluded without
modifying Std,
#!Finish # or adding "nopatch" to
the boot line:
#!Finish #
#!Finish # boot
/pci@9,700000/network@2:dhcp - install 2.9 nopatch
#!Finish #
#!Finish add_home
#!Finish add_users
#!Finish add_sudo
#!Finish
#!Finish add_SAN
#!Finish
#!Finish # add_OracleClient #
We call add_OracleClient from add_WMQ+DB2 - see comments there
#!Finish # Oracle packages also
screws up the PIDs and GIDs if called before add_WMQ+DB2
#!Finish
#!Finish add_WMQ+DB2
#!Finish
#!Finish
add_RSC #
RemoteSystemControl card for Ex80/Ex90
#!Finish
#!Finish add_SunCluster
# Can be added anywhere, as we now have a hanshaking
#!Finish
# mechanism to tell the secondary node that he's now
#!Finish
# allowed to mount the shared (but not global yet)
#!Finish
# filesystems.
#!Finish
# See also comments in Scripts/Misc/add_WMQ+DB2
#!Finish
#!Finish
#!Finish
This profile resides within Jumpostart Profiles directory tree in
the form of separate pieces for Begin, Types, DiskSliicing and Finish
phases. Any of such pieces may be picked up either from
Standard (Std or AnySite) directory, from site-specific ($SITE in
this case) directory, or host-specific directory under site-specific
one. This selection and joining is performen by FindSiteName() function
in Scripts/Misc/!Include/Subroutines and
Scripts/Misc/MakeProfile script. Algorythm is well described in the
comments there.
Lines beginning with #!Begin are processed by Begin master
script, comments with #!Finish - by Finish master.
The #!Rename comment in the end of "filesystem" line states that
metadevice name, automatically assigned by jumpstart, must be
immediately changed to d91 for the first cluster node, d92 for the
second one, etc. This helps to avoid name conflicts within
cluster global devices namespace.
The items like add_SAN or add_WMQ+DB2 are names of the scripts,
normally found in Scripts/Misc. However, shell lines of any complexity
can be put there instead, as you can see in some #!Begin lines above.
This profile syntax and generation mechanism allow to structurize the
client specifics in 3 levels (default, site-specific anf
host-specific),
as well as keep all specifics within well defined and obvious directory
tree. The fact that rules.ok is static and not re-compiled for every
client, has a drawback that syntax of Profile pieces is never checked.
This sometims may result in extra netboots necessary for profile
debugging. The system messages resulting from failures in Profile,
however, are usually very clear and easy to interpret.
It is quite usual for standard Jumpstart to
occasionally run some
scripts, in so-called "chrooted" environment. For $SITE, "some" is
changed to "most", i.e. chrooted environment is a rule rather than
exception, for most of the Finish scripts. Almost every Finish
subscript
checks if he is running in chrooted environment, and if not, does
"chroot /a $0 $*" i.e. starts himself chrooted. Robust
chroot environment requires careful preparation, which is done once in
the beginning of Finish phase by Scripts/Misc/Chroot_Prepare called
from Scripts/Finish/Std.
One more extention has been added to cover the bug with writing to the
mirrored disks. Jumpstart is capable of creating the mirrors right from
the very beginning, and tends to write the OS to the mirror rather than
to a single submirror. However, this is half-done, and under certain
circumstances in Solaris 9, the mirrors become unsynchronized. Search
on groups.google.com for "aryzhov submirror
dirty at jumpstart" to read a more detailed problem description and
discussion. A solution to this problem is Scripts/Misc/MDReMount script
called from Scripts/Finish/Std that umounts
partially syncronysed submirrors, and mounts and syncs them cleanly
again.
Two examples for administration automation have
been included in $SITE Finish script set - user creation and upload of
site-specific configuration data. Predefined users, along with their
initial home directory contents, can be added to
Profiles/$SITE/SiteConfig/RootPatch/export/home/ (and some sample users
already exist there), and, if their public SSH keys are added to
relevant subdirectories (as in existing examples), users can login
immediately after Jumpstart using their private SSH keys. Besides export/home,
the $SITE/SiteConfig/RootPatch may
contain any subdirectories and files that will automatically
overwrite the files on the target client at the end of Jumpstart
Finish. Example of such
files are
Profiles/$SITE/SiteConfig/RootPatch/usr/local/etc/sudoers and
Profiles/$SITE/SiteConfig/RootPatch/etc/netmasks. Scripts/Misc/add_users
serves for adding users, whose home directories are found in Profiles/$SITE/SiteConfig/RootPatch/export/home/,
and Scripts/Misc/add_RootPatch
uploads the whole contents of Profiles/$SITE/SiteConfig/RootPatch/
to the target host. add_RootPatch
is started from Scripts/Finish/Std, and add_users - from
Profiles/$SITE/Finish/$HOSTNAME. Sysadmins are welcome to move
add_users from host-specific Profile piece to Std, thus making add_user
run on all Jumpstart clients by default.
Config directory
structure and contents
As mentioned before,
Jumpstart config directory is the one where rules.ok file
resides. The actual location of this directory is cpecified by
"install_config=" statement in Jumpstart Server's /etc/bootparams, and,
on the client, at any phase of Jumpstart, is in $SI_CONFIG_DIR variable. In
standard Sun Jumpstart, as well as at $SITE, this directory is
usually mounted on /tmp/install_config on the Jumpstart Client. The
following files and subdirectories are important parts of
$SI_CONFIG_DIR:
Makefile is used by "make"
command, which generates the rules.ok from rules file. Since $SITE
rules.ok needs no changes at client addition, running "make" may be
only needed when rules.ok is lost or corrupted. See Advanced Solaris
Installation Guide for details on rules compilation and rules.ok
generation.
Profiles directory is certainly
a most excitng part of $SITE Juumpstart. You'll see several README
files inside it, so here we describe its structure very briefly.
Profiles/Std is a softlink to /tmp/Profiles that normally
should not exist if you look at this directory from the server side
(thus Profiles/Std being a broken sublink, is normal). As
mentioned before, /tmp/Profile is generated dynamically on the
Jumpstart Client, and therefore is only visible from the client.
Profiles/AnySite directory contains pieces of profile used by dynamic
profile generation script, Scripts/Misc/MakeProfile , when a more
specific (site-specific or host-specific) piece could not be
found.
The method of finding specific pieces is described in the comments to
FindSiteName subroutine in Scripst/Misc/!Includes/Subroutines.
Profiles/$SITE directory contains site-specific and $SITE-host-specific
profile pieces, as well as some other files needed for $SITE machines
configuration. Let's dig inside the $SITE directory.
Profiles/$SITE/Begin contains pieces of custon Begin phase that will be
added to dynamic Jumpstart /tmp/Profile my MakeProfile script. Some
files there in fact may be softlinks to more generic Begin files in the
same directory. For instance, for SunCluster nodes, it is usually worth
having single master file, linking the files with node names to it.
This way, when MakeProfile picks a piece corresponding to the booted
client (by it's name), it will for sure use identical Begin
script for all nodes within this SunCluster. Same method can be
used when you for some other reasons want identical Begin phase for
several different machines. If Profiles/$SITE/Begin/$HOST
file (or link) is not found, then MakeProfile searches for Profiles/$SITE/Begin/Std,
then for Profiles/AnySite/Begin/Std. If neither is found, error message
appears on the console and interactive shell starts for debugging
purposes.
Exactly the same algorythm is used for /Profiles/$SITE/DiskSlicing/,
Profiles/$SITE/Packages/, Profiles/$SITE/Finish/.
Profiles/$SITE/Types/ is empty, so Profiles/AnySite/Types/Std will be
used for all $SITE Jumpstart clients.
Profiles/AnySite/Postinstall/ support was not implemented for $SITE, as
it would require hacking the standar Sun RC and "suninstall" scripts on
install DVD image. Postinstall phase can be introduced to run some
scripts after custom, and then built-in Finish are complete, logs
closed and filesystems unmounted. This may be needed, for instance, to
transfer the logs to centralized storage, add/resync additional disk
mirrors, etc. Not implemented for $SITE.
sysidcfg file contains
information necessary to avoid interactive questions from rc and
suninstall after net boot and before Begin phase. Read
sysidcfg man page for details.
Scripts/Begin/Std and Scripts/Finish/Std are master
scripts for Begin and Finish phases, respectively. Main tasks
included in Begin phase for $SITE, are:
MakeProfile - generates dynamic profile /tmp/Profile, which, in
addition to standart Profile information (disk slicing and Solaris
packages choice), contais special comments that are processed at custom
Begin and custom Finish phases of Jumpstart. Processing of such
#!Begin lines of just generated profile is started by the line
eval "`egrep "$MYLINES" $SI_PROFILE | sed
s/$MYLINES//`"
Other, not so important items in Scrpts/Begin/Std are MountMedia and
AutoRevArp. MountMedia may mount Solaris package directories from
non-standard places (possibly specified in pieces of client's Profile
instead of server's bootparams), such as local hard disk
directory, different NFS server, etc. AutoRevArp, in addition to
main boot interface already configured at bootstrap, will also
configure all other network interfaces, if relevant RARP
information is available on the wires (i.e. there are RARP
servers like one on in Jumpstart server, on the other subnets connected
to this Jumpstart client)
Finish phase is usually more host-specific, as it installs and
configures most of host's middleware and applications. We shall
carefully go through two most important Finish phase subscripts
in two sections following the next "Media" section.
Media
directory
Larger and more static pieces of data used by Jumpstart, such as
Solaris DVD image, OS patches, middleware, freeware and
applications packages, are collected in media subtree and available to
the client via /tmp/install_media directory. Solaris DVD sometimes may
reside in some different place and be mounted on /cdrom directory. This
depends on "install=" parameter in Jumpstart server's /etc/bootparams
file, and on the fact whether MountMedia remounts have been
requested in Profile pieces. During installation, on the client, the
location of media directory can be found from $JS_INSTALL_MEDIA
variable.
Middleware installation
Scripts/Misc/add_WMQ+DB2 called from
Profiles/$SITE/Finish/$HOSTNAME, is a wrapper for Middleware
configuration and installation scripts provided by Lee Hollingdale at
IBM. Those IBM scripts, along with relevand IBM packages, reside in
Jumpstart "media" subdirectory,
$JS_INSTALL_MEDIA/Packages/IBM. For detailed description of
middleware installarion procedure, refer to the comments in the script.
Here, only a brief summary will be given.
Since start/sctop/check actions for middleware will be performed by Sun
Cluster software, some cluster-related configuration information will
be required within add_WMQ+DB2 script. One of important parameters is
location of cluster-related configuration data. The same data is needed
by cluster installation and configuration itself, therefore setting of
relevant variables has been moved out of these scripts and performed by
SetSunClusterEnv() defined in Scripts/Misc/!Includes/Subroutines and
called in the very beginning of add_WMQ+DB2 script.
Prerequisites for middleware
installation include special users, groups, directories and symlinks on
local and shared filesystems, with special correct ownership and
permissions. All these objects are created in the beginning of add_WMQ+DB2
script, before the IBM scripts are started.
As Global Filesystems is
one of most important pre-requisite for middleware and application
cluster services, the creation and handling of Global Filesystems
has been assigned to add_WMQ+DB2 script, which is not
an extremely elegant solution, but still is logical, since middleware
is the main
and only consumer of these filesystems. Functionally, these filesystems
are more tightly coupled with middleware than with cluster
software, who only is a method, not an objective.
Shared disk devices necessary to hold Global Filesystems, are chosen in
add_WMQ+DB2 script, basing on sizes of the
yet unallocated disks. Thus, largest free LUNs (not mounted and not
part of metadevices) will be allocated first. For this reason, we
configyre DB2 filesystem first, and then MW1 and MW2 filesystems. Such
algorythm is very unflexible and inherits hadcoded dependencies on
hardware specifics, on one side, and on application/middleware
specifics, on the other. However, since add_WMQ+DB2
has been anyway designed for one specific cluster only, and will have
to be re-written for any other cluster with different functionality,
adding hardware dependencies to this script is probably not a fatal
design flaw. Better, more generic desing solutions, may be introduced
on
the progress from pilot phase.
Middleware installation, as prepared by IBM, is aware of clustered
environment, and therefore installation process wrapped by add_WMQ+DB2
script, slightly differs between primary and secondary
nodes. It presumes that directories with middleware data and
configuration files are mounted to the primary node only.
However, at certain points, secondary node needs access to the shared
disk filesystems as well. At such stage, those shared filesystems can
not be yet mounted as Sun Cluster Global Filesystem, since cluster is
not active yet. Therefore, for asynchronous installation of cluster
nodes, some sort of locking mechanism is required to prevent the
non-global shared filesystems from concurrent access. For such
locking (sometimes referred to as "node handshaking" in the
comments), cylinder #0 of share disks is not used, and
filesystems start from cylinder 1. Cylinder 0, offset 8
sectors (in order not to corrupt the disk label) used by the add_WMQ+DB2
script to communicate the installation status between the nodes. See
more details in the comments.
Installation of IBM middleware is preformed by 3 scripts supplied by
IBM: wmq_install.ksh wmb_install.ksh
db2_server_install.ksh. In addition, syncScripts.ksh serves for copying
the configuration scripts (which are other than installation
scripts) to the relevant directories - usually, the home
directories of users/components such as mqm, wmb, db2.
SC_Append_Hosts() defined in Subroutines, appends the
logical (virtual) hostnames and ip addresses used by the
services, to /etc/hosts on each client node.
The names and addresses are taken from
Profiles/$SITE/SiteConfig/SunCluster/asp0708/etc/hosts - in case the
cluster's name is asp0708. In fact, the Jumpstart/Cluster scripts don't
care about cluster name. Instead, they look for cluster, basing on the
names of it's nodes, which must be the same as hostnames. This is
described in moredetails in the next, "cluster" section.
Besides IBM middleware, Oracle client installation has been added
to add_WMQ+DB2 script. The reason for not moving this
activity to a separate script is that middleware configuration is very
tightly coupled with Oracle configuration, and some things for
Oracle have to be configured before, after and in process of
middleware configuration. And vice versa - some middleware
configuration activites must be regarded at Oracle client installation
time.
Cluster installation - Jumpstart
phase
Software prerequisites for
SunCluster are SAN packages (which also were prerequisite for
middleware installation and therefore moved out to a separate script,
Scripts/Misc/add_SAN), number of Java/Cacao packages that are
installed in the beginning of Scripts/Misc/add_SunCluster script.
The most interesting part of this script is running the commands from $SC_CONFIG_LINE that triggers a
vanilla installation script "scinstall" supplied by Sun on
SunCluster distribution media. Sun scinstall, like most other scripts
at Finisg phase, runs in chrooted environment. Command-line arguments
in $SC_CONFIG_LINE are taken by sourcing the file
Profiles/$SITE/SiteConfig/SunCluster/asp0708/Cluster_Wide that includes
lines like follows:
SC_NODE_IDs="
gen0suasp07:1
gen0suasp08:2
"
#############################
SC_CONFIG_LINE_PRIMARY="
./scinstall -i
-F
-C $SC_INSTANCE
-A trtype=dlpi,name=ce3
-A trtype=dlpi,name=ce5
-B type=$SC_INTERCONNECT_TYPE
-P task=quorum,state=INIT
"
#############################
SC_CONFIG_LINE_SECONDARY="
./scinstall -i
-N $PRIM_NODE
-C $SC_INSTANCE
-A trtype=dlpi,name=ce3
-A trtype=dlpi,name=ce5
-B type=$SC_INTERCONNECT_TYPE
-m endpoint=:ce3,endpoint=$PRIM_NODE:ce3
-m endpoint=:ce5,endpoint=$PRIM_NODE:ce5
"
#############################
Like with all other cluster- and middleware-related parameters, as well
as for most other Finish scripts, location of config files is
determined basing on hostname of current node: nodename/hostname should
appear either in the path to the file, or in the file itself as part of
pre-determined assignment that includes hostname. For
instance, SetSunClusterEnv() defined in
Scripts/Misc/!Includes/Subroutines, determines the top-level cluster
config variables, combining both methods:
SC_SITE_CONFIG_DIR=$SI_CONFIG_DIR/Profiles/$SITE/SiteConfig/SunCluster
...
SC_NODENAME=$SI_HOSTNAME
SC_INST_CONFIG_DIR=`
for f in \`find $SC_SITE_CONFIG_DIR -name
Cluster_Wide\`; do
. \$f
echo \"\$SC_NODE_IDs\" | egrep -s
$SC_NODENAME:[0-9] && echo \`dirname \$f\`
done
`
I.e. it searches in a whole site-specific config directory
Profiles/$SITE/SiteConfig/SunCluster for all files named "Cluster_Wide",
and in each such file, looks for appearance of
hostname:nodeID (you can see such lines in the very top of Cluster_Wide
file above).
Sun scinstall script is designed so that, having finished the
installation on the secondary node, it suspends the secondary node, and
the whole Jumpstart
does not progress any further until primary not only finishes Cluster
installation, but also completes the whole Jumpstart on the primary,
reboots it, and
starts the cluster there. Sometimes, the primary node is
referred to as "sponsoring" node, since scinstall performs most of
configuration on the primary node only, and then just uploads it to
secondary as soon as the cluster daemons on the secondary node come up.
GlobalDevices filesystem described in Jumpstart profile (See
"Profile syntax extentions" section above), are renamed and re-mounted
by the same script, MDReMount, as the "dirty" submirrors, in the
beginning of the Finish/Std, as described in the end of
"Profile syntax extentions" section.
Do not confuse GlobalDevices filesystem (which resides on local, i.e.
non-shared disk and only holds the global device nodes),
with GlobalFilesystem (that resides on the shared disk).
The latter one is configured by middleware installation/configuration
script, add_WMQ+DB2, as described in the relevant section.
Cluster configuration -
post-Jumpstart phase
Unfortunately, for time restrictions at pilot phase of the project, one
of cluster confuguration steps had to be moved outside the Jumpstart
framework, and run after both nodes reboot. These steps are performed
by Scripts/Misc/add_SunCluster.Post-Jumpstart which in fact is not a
script, but rather a template for interactive input, or for
copy-pasting commands to the shell running on the nodes. Exception
handling in this file is at much lover level than is standard for most
Finish scripts. Operator has to carefully watch for possible errors
messages and
react or adjust the commands according to SunCluster documentation.
Cluster services/resources,
resource groups and dependencies between such, have been designed by
IBM and described in great details in the document "Creating an
HA-ready NextGen middleware environment" by Lee.Hollingdale@IBM.COM.
Here, howevwr, we shall briefly decribe the servuces again.
The pilot NextGen cluster, asp0708, supports 3 services (DB, MW1,
MW2 - each implemented as a separate resource group with own
resources.
DB is a resource group
(further, "RG") resposible for running the IBM DB2 database server
process. It includes:
vIP/hostname resource, ngdbcell
Storage resource (type SUNW.HAStoragePlus:2), cfs4DB
The main service resource (standard data service type
SUNW.gds:5), db2inst1
Each middleware RG consists of network, storage and
four dataservice resources - two MQ manager and two message
brokers
RG MW1 consist of
resources ngmwcell1, cfs4MW1, MGENPH1GM1, MPRDCFG1, MGENPH1BK1,
MPRDBRK1
RG MW2 consist of
resources ngmwcell2, cfs4MW2, MGENPH1GM2, MPRDCFG2, MGENPH1BK2,
MPRDBRK2
Each resource withing RG implicitly depends on the relevant network
resource (this is SunCluster default), and therefore, in each RG, the
network resource is being started first at RG bootstrap, and stopped
last at RG shutdown. Next is cfs (Cluster Filesystem, or storage,
resource) This dependency is not implicit, so we define explicitly that
all resource in the group (except for network resource to avoid
dependency loopback) depend on relevant storage (cfs4.. resource).
Finally, there middleware-specific dependecies between functional. i.e.
middleware components, that, unfortunately, cross the RG borders. For
instance, resources MPRDBRK1 and MPRDBRK2 from
MW1 and MW2 RGs, respectively, depend on db2inst1 resource (database
server) from DB RG. Fortunately, the MW resources do not depend
on DB network or storage, which allows to distribute the RG
independently between cluster nodes. See more
details, along with dependency diagrams, in INM's document mentioned
above. Also, carefully read through comments in Scripts/Misc/add_SunCluster.Post-Jumpstart
file.
StartStopCheck script has
been developed to bootstrap, shutdown and monitor the functional
resources within cluster resource group. This sript resides on the
Cluster filesystem, and thus is permanently visible to both nodes.
Within Jumpstart framework, the file is located in
Profiles/$SITE/SiteConfig/SunCluster/asp0708/Scripts/StartStopCheck.
See comments in the scrip for detail on Start/Stop/Check methods.