Fix Solaris 10 SMF Errors in 1 Minutes

SMF is one of the new features in Solaris 10, it replaces the old
rc scripts to bring up the system services.

However, SMF errors are very destructive and error messages are
often very misleading. For example, a typo in /etc/vfstab file
will cause svc:/system/filesystem/local:default to fail and stop
the server at single user-mode. When a NIS server isn't reachable,
when booting a Solaris 10 client will stop at single user mode, reporting
"network/service, network/rpc/keyserv" errors.

The quick solution to fix most SMF related errors are:

1. check log files at /var/svc/log/ directory;
2. check /lib/svc/share/README file;
3. put svcs.startd in debug mode and reboot server to review the log.


Two examples.

Example 1: SMF auditd error and its fix.

When boot up a Solaris 10 server, we get this error:

Booting to milestone "svc:/milestone/multi-user:default"
Requesting System Maintenance Mode
(See /lib/svc/share/README for more information)
Console login service(s) cannot run

svc:/system/auditd:Method "/lib/svc/method/svc-auditd" failed with exit
status 98
[system/auditd:default failed (see 'svcs -x' for details)]


Quick Fix:

Let's put svcs.startd in debug mode:


# svccfg


svc:> select svc:/system/svc/restarter:default

...default> addpg options application

...default> setprop options/logging = astring: debug

svc:/system/svc/restarter:default> quit

# svcprop -p options/logging svc:/system/svc/restarter:default

debug


To make it easy for us to check the log, let's truncate the old log file
# cp /var/svc/log/svc.startd.log /var/log/svc/startd.log.bak
# echo > /var/svc/log/svc/startd.log


Now, reboot the server:

# init 6 


Check log:

# cat /var/svc/log/svc.startd.log


After analyzing the log file, we know the problem is audited service can't start,
because system/console-login service depends on auditd, therefore, the console
service can't start.

Disable auditd service by running "bsmunconv" will solve the problem.

root@sun1:/root>/etc/security/bsmunconv
bsmunconv: ERROR: this script should be run at run level S.
Are you sure you want to continue? [y/n] y
This script is used to disable the Basic Security Module (BSM).
Shall we continue the reversion to a non-BSM system now? [y/n] y
bsmunconv: INFO: removing c2audit:audit_load from /etc/system.
bsmunconv: INFO: stopping the cron daemon.

The Basic Security Module has been disabled.
Reboot this system now to come up without BSM.


Verify auditd is disabled:
root@sun1:/var/lib>svcs -l auditd
fmri svc:/system/auditd:default
name Solaris audit daemon
enabled false
state disabled
next_state none
state_time Thu Oct 01 22:10:36 2009
restarter svc:/system/svc/restarter:default
dependency require_all/none svc:/system/filesystem/local (online)
dependency require_all/none svc:/milestone/name-services (online)
dependency require_all/none svc:/system/system-log (online)


Finally, we need to turn off the svc.startd's debugging mode:

#svccfg

svc:> select restarter
svc:/system/svc/restarter> list
:properties
default

svc:/system/svc/restarter> listpg
options application
general framework
tm_common_name template
tm_man_svc_startd template

svc:/system/svc/restarter> delpg options

svc:/system/svc/restarter> listprop
general framework
general/entity_stability astring Unstable
general/single_instance boolean true
tm_common_name template
tm_common_name/C ustring "master restarter"
tm_man_svc_startd template
tm_man_svc_startd/manpath astring /usr/share/man
tm_man_svc_startd/section astring 1M
tm_man_svc_startd/title astring svc.startd
svc:/system/svc/restarter> end

Related directory : /lib/svc/method

Example 2: x4600 boot-archive error due to /etc/vfstab and /kernel/drv file sync errors and its fix:

WARNING - The following files in / differ from the boot archive:
/etc/rtc_config
/etc/path_to_inst
cannot find: /etc/cluster/nodeid: No such file or directory
/etc/devices/devid_cache
cannot find: /etc/devices/mdi_ib_cache: No such file or directory
/kernel/drv/e1000g.conf
The recommended action is to reboot and select "Solaris failsafe"
option from the boot menu. Then follow prompts to update the
boot archive.
To continue booting at your own risk, clear the service:
# svcadm clear system/boot-archive

Sep 4 18:52:03 svc.startd[7]: svc:/system/boot-archive:default: Method "/lib/svc/method/boot-archive" failed with exit status 95.
Sep 4 18:52:03 svc.startd[7]: system/boot-archive:default failed fatally: transitioned to maintenance (see 'svcs -xv' for details)
Requesting System Maintenance Mode
(See /lib/svc/share/README for more information.)
Console login service(s) cannot run

nygserver1 console login: root
Password:
Last login: Thu Mar 5 11:22:48 on console
Access to this computer is prohibited unless authorised
Accessing programs or data unrelated to your job is prohibited
If you are not authorised, disconnect now.
You have mail.
ROOT@nygserver1:/root # svcs -xv
svc:/system/filesystem/local:default (local file system mounts)
State: maintenance since Thu 05 Mar 2009 01:48:54 PM EST
Reason: Start method exited with $SMF_EXIT_ERR_FATAL.
See: http://sun.com/msg/SMF-8000-KS
See: /var/svc/log/system-filesystem-local:default.log
Impact: 32 dependent services are not running:
svc:/application/psncollector:default
svc:/system/sysidtool:net
svc:/network/rpc/bind:default
svc:/network/nfs/nlockmgr:default
svc:/network/nfs/status:default
svc:/network/nis/client:default
svc:/network/nfs/cbd:default
svc:/network/nfs/mapid:default
svc:/application/sthwreg:default
svc:/application/stosreg:default
svc:/system/sysidtool:system
svc:/platform/i86pc/kdmconfig:default
svc:/milestone/multi-user:default
svc:/milestone/multi-user-server:default
svc:/system/basicreg:default
svc:/system/zones:default
svc:/application/xyzoneglobal:default
svc:/application/xyperfmv2-client:default
svc:/application/ipmievd:default
svc:/system/vxvm/vxvm-recover:default
svc:/system/filesystem/autofs:default
svc:/system/system-log:default
svc:/network/ssh:default
svc:/system/dumpadm:default
svc:/system/fmd:default
svc:/network/inetd:default
svc:/system/filesystem/volfs:default
svc:/system/cron:default
svc:/system/vxfs/vxfsldlic:default
svc:/application/font/fc-cache:default
svc:/system/sac:default
svc:/application/opengl/ogl-select:default

svc:/network/rpc/smserver:default (removable media management)
State: uninitialized since Thu 05 Mar 2009 01:48:32 PM EST
Reason: Restarter svc:/network/inetd:default is not running.
See: http://sun.com/msg/SMF-8000-5H
See: man -M /usr/share/man -s 1M rpc.smserverd
Impact: 1 dependent service is not running:
svc:/system/filesystem/volfs:default
ROOT@nygserver1:/root # svcadm clear /system/filesystem/local
ROOT@nygserver1:/root # svc:/system/filesystem/local:default: WARNING: /sbin/mountall -l failed: exit status 1
Mar 5 12:52:33 svc.startd[7]: svc:/system/filesystem/local:default: Method "/lib/svc/method/fs-local" failed with exit status 95.
Mar 5 12:52:33 svc.startd[7]: system/filesystem/local:default failed fatally: transitioned to maintenance (see 'svcs -xv' for details)

ROOT@nygserver1:/root # df -k
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c3t0d0s0 8266719 2517305 5666747 31% /
/devices 0 0 0 0% /devices
ctfs 0 0 0 0% /system/contract
proc 0 0 0 0% /proc
mnttab 0 0 0 0% /etc/mnttab
swap 65174120 652 65173468 1% /etc/svc/volatile
objfs 0 0 0 0% /system/object
/usr/lib/libc/libc_hwcap2.so.1
8266719 2517305 5666747 31% /lib/libc.so.1
fd 0 0 0 0% /dev/fd
swap 65173468 0 65173468 0% /tmp
swap 65173476 8 65173468 1% /var/run
swap 65173468 0 65173468 0% /dev/vx/dmp
swap 65173468 0 65173468 0% /dev/vx/rdmp
/dev/dsk/c3t0d0s6 1019856 857004 101661 90% /zones
/dev/vx/dsk/server1dg/server1_export_home
2560000 17716 2383398 1% /export/home
/dev/dsk/c3t0d0s4 24792158 24601 24519636 1% /var/crash/nygserver1
/dev/vx/dsk/server1dg/server1_export_home_sitescope
537600 17218 487866 4% /export/home/sitescope
/dev/vx/dsk/server1dg/server1_data
135168 2480 124695 2% /zones/fs/server1/data
/dev/vx/dsk/server1dg/server1_log
134144 1762 124115 2% /zones/fs/server1/log
/dev/vx/dsk/server1dg/server1_data_cvs
2560000 612351 1836098 26% /zones/fs/server1/data/cvs
/dev/vx/dsk/server1dg/server1_log_appserv
7168000 1234473 5563108 19% /zones/fs/server1/log/appserv
/dev/vx/dsk/server1dg/server1_data_d4icache
46727168 7466357 36807507 17% /zones/fs/server1/data/d4icache
/dev/vx/dsk/server1dg/server1_log_dynamo
7168000 451354 6296902 7% /zones/fs/server1/log/dynamo
/dev/vx/dsk/server1dg/server1_data_skore
12288000 5325560 6547595 45% /zones/fs/server1/data/skore
/dev/vx/dsk/server1dg/server1_data_appserv
20480000 13916850 6163060 70% /zones/fs/server1/data/appserv
/dev/vx/dsk/server1dg/server1_data_dynamo
10240000 6478927 3535135 65% /zones/fs/server1/data/dynamo
/dev/vx/dsk/server1dg/server1_data_d4icache_wip
8192000 4060477 3884874 52% /zones/fs/server1/data/d4icache/wip
ROOT@nygserver1:/root # svcs -x
svc:/system/filesystem/local:default (local file system mounts)
State: maintenance since Thu 05 Mar 2009 12:52:33 PM EST
Reason: Start method exited with $SMF_EXIT_ERR_FATAL.
See: http://sun.com/msg/SMF-8000-KS
See: /var/svc/log/system-filesystem-local:default.log
Impact: 32 dependent services are not running. (Use -v for list.)

svc:/network/rpc/smserver:default (removable media management)
State: uninitialized since Thu 05 Mar 2009 01:48:32 PM EST
Reason: Restarter svc:/network/inetd:default is not running.
See: http://sun.com/msg/SMF-8000-5H
See: rpc.smserverd(1M)
Impact: 1 dependent service is not running. (Use -v for list.)


******** fix *************
1. more /var/svc/log/system-filesystem-local:default.log
2. compare /etc/vfstab and vxprint -g dg -v see if there are fs in vfstab don't have volume;
(sometimes, people remove fs but don't update /etc/vfstab file)
3. once clear vfstab file, do this and the control -D to exit maintenance mode, server will be normal.
svcadm clear /system/filesystem/local




ROOT@nygserver1:/root # more /var/svc/log/system-filesystem-local:default.log
[ Mar 5 12:35:11 Method "start" exited with status 95 ]
[ Mar 5 13:48:48 Executing start method ("/lib/svc/method/fs-local") ]
UX:vxfs mount: ERROR: V-3-20002: Cannot access /dev/vx/dsk/server1dg/server1_d
ata_sitescope: No such file or directory
UX:vxfs mount: ERROR: V-3-24996: Unable to get disk layout version
UX:vxfs mount: ERROR: V-3-20002: Cannot access /dev/vx/dsk/server1dg/server1_l
og_sitescope: No such file or directory
UX:vxfs mount: ERROR: V-3-24996: Unable to get disk layout version
UX:vxfs mount: ERROR: V-3-20002: Cannot access /dev/vx/dsk/server1dg/server1_d
ata_chiit1: No such file or directory
UX:vxfs mount: ERROR: V-3-24996: Unable to get disk layout version
UX:vxfs mount: ERROR: V-3-20002: Cannot access /dev/vx/dsk/server1dg/server1_d
ata_amweb: No such file or directory
UX:vxfs mount: ERROR: V-3-24996: Unable to get disk layout version
UX:vxfs mount: ERROR: V-3-20002: Cannot access /dev/vx/dsk/server1dg/server1_l
og_sitescope: No such file or directory
UX:vxfs mount: ERROR: V-3-24996: Unable to get disk layout version
UX:vxfs mount: ERROR: V-3-21264: /dev/vx/dsk/server1dg/server1_log_dynamo is a
lready mounted, /zones/fs/server1/log/dynamo is busy,
allowable number of mount points exceeded
UX:vxfs mount: ERROR: V-3-21264: /dev/vx/dsk/server1dg/server1_log_appserv is
already mounted, /zones/fs/server1/log/appserv is busy,
allowable number of mount points exceeded
WARNING: /sbin/mountall -l failed: exit status 1
bootadm: no matching entry found: Solaris_reboot_transient
[ Mar 5 13:48:54 Method "start" exited with status 95 ]
[ Mar 5 12:52:33 Leaving maintenance because clear requested. ]
[ Mar 5 12:52:33 Enabled. ]
[ Mar 5 12:52:33 Executing start method ("/lib/svc/method/fs-local") ]
UX:vxfs mount: ERROR: V-3-20002: Cannot access /dev/vx/dsk/server1dg/server1_d
ata_chiit1: No such file or directory
UX:vxfs mount: ERROR: V-3-24996: Unable to get disk layout version
UX:vxfs mount: ERROR: V-3-20002: Cannot access /dev/kvx/dsk/server1dg/server1_d
ata_amweb: No such file or directory
UX:vxfs mount: ERROR: V-3-24996: Unable to get disk layout version
UX:vxfs mount: ERROR: V-3-20002: Cannot access /dev/vx/dsk/server1dg/server1_d
ata_sitescope: No such file or directory
UX:vxfs mount: ERROR: V-3-24996: Unable to get disk layout version
UX:vxfs mount: ERROR: V-3-20002: Cannot access /dev/vx/dsk/server1dg/server1_l
og_sitescope: No such file or directory
UX:vxfs mount: ERROR: V-3-24996: Unable to get disk layout version
UX:vxfs mount: ERROR: V-3-20002: Cannot access /dev/vx/dsk/server1dg/server1_l
og_sitescope: No such file or directory
UX:vxfs mount: ERROR: V-3-24996: Unable to get disk layout version
WARNING: /sbin/mountall -l failed: exit status 1
bootadm: no matching entry found: Solaris_reboot_transient
[ Mar 5 12:52:33 Method "start" exited with status 95 ]


cliff note:

Please make sure when you make a change in the filesystem, such as remove a filesystem - make sure you comment out the line in the /etc/vfstab file, solaris 10 won't boot if it can't mount all the entries in /etc/vfstab.

The following fs were listed in /etc/vfstab file but Volumes don't exist in veritas diskgroup...

#/dev/vx/dsk/server1dg/server1_data_apps_pinetdv
#/dev/vx/dsk/server1dg/server1_data_sitescope
#/dev/vx/dsk/server1dg/server1_log_sitescope
#/dev/vx/dsk/server1dg/server1_data_amweb
#/dev/vx/dsk/server1dg/server1_data_chiit1
#/dev/vx/dsk/server1dg/server1_log_sitescope

ROOT@nygserver1:/root # vxprint -g server1dg -v
TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0
v server1_data fsgen ENABLED 270336 - ACTIVE - -
v server1_data_appserv fsgen ENABLED 40960000 - ACTIVE - -
v server1_data_cvs fsgen ENABLED 5120000 - ACTIVE - -
v server1_data_dynamo fsgen ENABLED 20480000 - ACTIVE - -
v server1_data_d4icache fsgen ENABLED 93454336 - ACTIVE - -
v server1_data_d4icache_wip fsgen ENABLED 16384000 - ACTIVE - -
v server1_data_skore fsgen ENABLED 24576000 - ACTIVE - -
v server1_export_home fsgen ENABLED 5120000 - ACTIVE - -
v server1_export_home_sitescope fsgen ENABLED 1075200 - ACTIVE - -
v server1_export_home1 fsgen ENABLED 20971520 - ACTIVE - -
v server1_log fsgen ENABLED 268288 - ACTIVE - -
v server1_log_appserv fsgen ENABLED 14336000 - ACTIVE - -
v server1_log_dynamo fsgen ENABLED 14336000 - ACTIVE - -
v server1dg_usr_local fsgen ENABLED 2097152 - ACTIVE - -
v server1dg_zones_export_home fsgen ENABLED 4194304 - ACTIVE - -
v server1dg_zones_fs fsgen ENABLED 4194304 - ACTIVE - -
v server1dg_zones_hosts fsgen ENABLED 4194304 - ACTIVE - -


***************** Fix - duplicated entries in /etc/vfstab - remove extra ones ****************
[ Mar 19 18:24:31 Stopping because service disabled. ]
[ Mar 19 18:24:31 Executing stop method (null) ]
[ Mar 19 18:50:17 Executing start method ("/lib/svc/method/fs-local") ]
UX:vxfs mount: ERROR: V-3-21264: /dev/vx/dsk/server1dg/server1_log_appserv is
already mounted, /zones/fs/server1/log/appserv is busy,
allowable number of mount points exceeded
UX:vxfs mount: ERROR: V-3-21264: /dev/vx/dsk/server1dg/server1_log_dynamo is a
lready mounted, /zones/fs/server1/log/dynamo is busy,
allowable number of mount points exceeded
WARNING: /sbin/mountall -l failed: exit status 1
bootadm: no matching entry found: Solaris_reboot_transient
[ Mar 19 18:50:24 Method "start" exited with status 95 ]
[ Mar 19 19:02:30 Leaving maintenance because clear requested. ]
[ Mar 19 19:02:30 Enabled. ]
[ Mar 19 19:02:30 Executing start method ("/lib/svc/method/fs-local") ]
bootadm: no matching entry found: Solaris_reboot_transient
[ Mar 19 19:02:31 Method "start" exited with status 0 ]
[ Mar 19 21:40:09 Stopping because service disabled. ]
[ Mar 19 21:40:09 Executing stop method (null) ]
[ Mar 20 08:54:32 Executing start method ("/lib/svc/method/fs-local") ]
UX:vxfs mount: ERROR: V-3-21264: /dev/vx/dsk/server1dg/server1_log_appserv is
already mounted, /zones/fs/server1/log/appserv is busy,
allowable number of mount points exceeded
UX:vxfs mount: ERROR: V-3-21264: /dev/vx/dsk/server1dg/server1_log_dynamo is a
lready mounted, /zones/fs/server1/log/dynamo is busy,
allowable number of mount points exceeded
WARNING: /sbin/mountall -l failed: exit status 1
bootadm: no matching entry found: Solaris_reboot_transient
[ Mar 20 08:54:39 Method "start" exited with status 95 ]