Thursday, January 17, 2013

Powerdown parameter for Solaris boxes

http://gurkulindia.com/main/2012/09/powerdown-parameter-for-solaris-boxes/#more-7132

I came across one unique parameter which I would like to share on our gurkulindia, specially for small business organisations where we dont have seperate power backups for DC.
Has anyone noticed while working on Solaris boxes if you will power down (init 5 or poweroff from LOM/ALOM/ILOM) the Solaris boxes, they wont come up automatically after physical power on (only LOM boots but No OS). Which is a serious concern if any power outage occur or any power unit failure occurs etc etc…
Ex: Suppose a power outage occurred in any DC and 100 Solaris boxes went down. Now after power restoration, you wont be able to see any Solaris Box up and running. Manual intervention is required and SA need to power on the boxes manually from LOM prompt. Practically this situation is highly unacceptable in any organisation and this is the default feature of Solaris.
After a long investigation and googling and PDF’s I safely landed on earth :-) and the landmark is below. There is a parameter on LOM/ALOM (for Sparc arch.) and ILOM (for intel arch.) which you wont find in help. The default value for that parameter is FALSE which wont allow your OS instance to boot up and hence it just stuck and wait for manual intervention at LOM level only.
I have tested the same in large scaled Solaris environment while performing hardware replacements & upgrades, when the value is FALSE the server stuck at LOM and OS wont come up, on the other hand if the value is changed to TRUE and power restored the Box comes up.
Below is the PARAMETER explanation with possible options:
sc_powerstatememory
ALOM runs as soon as power is applied to the host server, even if the server is powered off. When you first apply power to the host server, ALOM starts to run, but the server does not start up until you power it on.
The sc_powerstatememory variable allows you to specify the state of the host server as false (keep the host server off) or true (return the server to the state it was in when the power was removed). This is useful in the event of a power failure, or if you physically move the server to a different location.
For example, if the host server is running when power is lost and the sc_powerstatememory variable is set to false, the host server remains off when power is restored. If the sc_powerstatememory variable is set to true, the host server restarts when the power is restored.
The values for this variable are as follows.
true — “Remembers” the state of the host server when power was removed and returns the server to that state when power is reapplied.
 
false — Keeps the server off when power is applied
I think one change this value to TRUE on Solaris servers so that in case if any unintentional power outage occurs (when NO power backup is there) the server restoration time would be minimum.
Below is the procedure to check and set this parameter is below:
sc> showsc sc_powerstatememory
False
sc> setsc sc_powerstatememory true
sc> showsc sc_powerstatememory
True
Note: 1.) AIX and Linux are having this feature by default i.e both will boot up after power restoration by default.
              2.) Dont get confused with auto-boot, auto-boot is to restrict your OS to OK prompt.

Deployment and understanding of LDOMS (Logical Domains)

http://gurkulindia.com/main/2012/07/deployment-and-understanding-of-ldoms-logical-domains/#more-6944

Deployment and understanding of LDOMS (Logical Domains):
Ldom technology allows us to allocate a system’s various resources such as memory, CPUs and devices (I/O,Networl interfaces etc) into logical groupings and create multiple, discrete systems each having their own operating system, resources, and identity within a single computer system.
Ex: In General Server Techonoly, an OS is directly interacting with hardware through defined system procedure calls and uses complete hardawre resources. But in LDOM the concept is bit different, a new layer between Hardware and OS has been introduced which is called “HYPERVISOR”. It interacts as a medium between Hardware & OS and hence present the hardware components as defined by users to OS. In this case the resource allocation/representation is be done by HYPERVISOR and hence individual OS can have well defined hardware resources. Also remember in LDOM the allocated devices/resources are called VDEV’s (Virtual Devices).
Note: Ldom techology only supported on T-series servers, sparc-4v architecture with Solaris-10  only.
Role of Hypervisor:
===============
The hypervisor, with its stable sun4v interface, is the centerpiece to creating logical domains. Important points to remember are:
• The hypervisor is the layer between the operating system and hardware.
• The hypervisor implements a stable sun4v interface. The operating system makes calls to the hypervisor, and therefore, does not need to know intimate details about the hardware, even if the platform changes.
• The hypervisor is very thin; it exists only to support the operating system for hardware-specific functions, making it small and simple, which assists in
stability.
• The hypervisor creates a virtual machine allowing the system to be partitioned by exposing some of the resources to a specific partition and hiding others.
• The hypervisor creates communication channels, logical domain channels (LDCs), between domains to provide a conduit for services such as networks and shared devices.
Domain Types:
============
There are several different roles for logical domains, and these are mainly defined by context, their usage defines them. A domain may have one or more of these roles, such as combining the functions of an I/O and service domain:
• Control domain – Creates and manages other logical domains and services by communicating with the hypervisor.
• Service domain – Provides services, such as a virtual network switch or a virtual disk service, to other logical domains.
• I/O domain – Has direct ownership of and direct access to physical input/output devices, such as a PCI Express card or a network device. Can optionally share those devices to other domains by providing services.
• Guest domain – Presents a virtual machine that subscribes to services provided by service domains, and is managed by the control domain.
LDOM Daemons:
==============
There are two LDOM daemons ruuning on the system which are.
LDMD (Logical Domain Daemon)
VNTSD (Virtual Network Terminal Server Daemon)
Note: Some common terms used in LDOM are Virtual Machine Description (ldm list-spconfig), Virtual Devices (CPU’s, Memory, I/O Devices), Networking (Virtual network (vnet) device, Virtual network switch (vsw)), Storage (Virtual disk client (vdc) driver, Virtual disk server (vds) driver), Console, Cryptographic Devices etc. I will suggest to refer man ldm for all these terms to have much clear understanding.
In this section, I will expalin and create a Controler Domain (Host Domain or Primary Domain).
Pre-requisites:
============
1.) Solaris OS release level:
=====================
The Solaris OS release level should be at-least Solaris 10 11/06 OS. If not then release upgrade should be required to have LDOM software on your system.
# cat /etc/release
  Solaris 10 11/06 s10s_u3wos_10 SPARC
 Copyright 2006 Sun Microsystems, Inc. All Rights Reserved.
  Use is subject to license terms.
       Assembled 14 November 2006
2.) Recommneded Patches:
======================
Its always recommneded to have latest Bundle patch installed on your system. Otherwise make sure below patches should be there in the system. If not present kindly install them before proceeding further:
•118833-36, Kernel update patch
•124921-02, which contains updates to the Logical Domains 1.0 drivers and utilities
•125043-01, which contains updates to the qcn (console) drivers.
Note: Kindly go through the readme file for these patches as they required reconfig reboots.
3.) Firmware Versions:
==================
Its always recommended to have latest firmware level installed on your system. As this ensures the hardware and hypervisor can communicate correctly, and all of the features of the your servers can operate.
We can check our current firmware level at ALOM/OS level as follow:
sc> showhost
Sun-Fire-T2000 System Firmware 6.7.12  2011/07/06 20:03
Host flash versions:
   OBP 4.30.4.d 2011/07/06 14:29
   Hypervisor 1.7.3.c 2010/07/09 15:14
   POST 4.30.4.b 2010/07/09 14:24
# prtconf -V
OBP 4.30.4.d 2011/07/06 14:29
Note: I have carried out my testing on T-2000 server and patched the box to latest firmware level.
4.) Installation of Logical Domains Manager:
===================================
The LDOM package is not a default part of Solaris OS. We have to download it from Oracle site and have to apply that on our system as superuser.
SUNWldm.v – Logical Domains Manager package
SUNWjass – Solaris Security Toolkit packages
I have installed LDOM 1.2 version on my machine as shown below:
# pkginfo -l SUNWldm
   PKGINST:  SUNWldm
      NAME:  Logical Domains Manager
  CATEGORY:  application
      ARCH:  sparc.sun4v
   VERSION:  1.2,REV=2009.06.25.09.48
   BASEDIR:  /
    VENDOR:  Sun Microsystems, Inc.
      DESC:  LDoms Manager – UltraSPARC CMT virtualization
    PSTAMP:  svlpen-on10-020090625094807
  INSTDATE:  Jun 21 2012 11:44
   HOTLINE:  Please contact your local service provider
    STATUS:  completely installed
     FILES:       66 installed pathnames
                  11 shared pathnames
                  20 directories
                  18 executables
                6564 blocks used (approx)
# ldm -V
Logical Domain Manager (v 1.2)
        Hypervisor control protocol v 1.3
        Using Hypervisor MD v 1.1
System PROM:
        Hypervisor      v. 1.7.3.       @(#)Hypervisor 1.7.3.c 2010/07/09 15:1415
        OpenBoot        v. 4.30.4.      @(#)OBP 4.30.4.d 2011/07/06 14:29
Note: This package is installed as a script and not as a package.
Creation of Controller Domain:
=========================
1.) Test the ldom software is working and is communicating with the HYPERVISOR:
# ldm list
——————————————————————————
Notice: the LDom Manager is running in configuration mode. Configuration and
resource information is displayed for the configuration under construction;
not the current active configuration. The configuration being constructed
will only take effect after it is downloaded to the system controller and
the host is reset.
——————————————————————————
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
primary          active     -t-c–  SP      32    32640M   0.1%  5d 4h 54m
Note: a.) Path for ldm comamnd is /opt/SUNWldm/bin/
            b.) The “t” in FLAGS shows the domain is not yet up and “n” instead of “t” indicates that the server is up.
2.) Creation of default services, which should always be there in Controller Domains. The services are Disk Services, Console Services and Network services. Below are the steps to create them:
# /opt/SUNWldm/bin/ldm add-vds primary-vds0 primary
—————————————————————————-
Notice: the LDom Manager is running in configuration mode. Any configuration
changes made will only take effect after the machine configuration is
downloaded to the system controller and the host is reset.
—————————————————————————-
# /opt/SUNWldm/bin/ldm add-vcc port-range=5000-5100 primary-vcc0 primary
—————————————————————————-
Notice: the LDom Manager is running in configuration mode. Any configuration
changes made will only take effect after the machine configuration is
downloaded to the system controller and the host is reset.
—————————————————————————-
# /opt/SUNWldm/bin/ldm add-vsw net-dev=e1000g0 primary-vsw0 primary
—————————————————————————-
Notice: the LDom Manager is running in configuration mode. Any configuration
changes made will only take effect after the machine configuration is
downloaded to the system controller and the host is reset.
—————————————————————————-
3.) List the services which we have just created:
# /opt/SUNWldm/bin/ldm list-services primary
—————————————————————————-
Notice: the LDom Manager is running in configuration mode. Configuration and
resource information is displayed for the configuration under construction;
not the current active configuration. The configuration being constructed
will only take effect after it is downloaded to the system controller and
the host is reset.
—————————————————————————-

Vds: primary-vds0
Vcc: primary-vcc0
port-range=5000-5100
Vsw: primary-vsw0
mac-addr=0:14:4f:f9:68:d0
net-dev=e1000g0
mode=prog,promisc
4.) Now our next step would be to provide the Resources to controller Domain. We will specify an amount of CPU and memory that should be considered a good starting point. Our Primary Domain resources would be as:
• 1 x MAU (cryptographic) unit – these are bound on a per-core basis, and need
to be set up prior to assigning VCPUs
• 4 x virtual CPUs (1 core on an Ultra SPARC T1 system)
• 1024 Mbytes memory (as we will not be using ZFS to deliver disk services, we
do not need the minimum of 4Gbytes memory.)
• Configuration will be saved as “initial”
# /opt/SUNWldm/bin/ldm set-mau 1 primary
—————————————————————————-
Notice: the LDom Manager is running in configuration mode. Any configuration
changes made will only take effect after the machine configuration is
downloaded to the system controller and the host is reset.
—————————————————————————-
# /opt/SUNWldm/bin/ldm set-vcpu 4 primary
—————————————————————————-
Notice: the LDom Manager is running in configuration mode. Any configuration
changes made will only take effect after the machine configuration is
downloaded to the system controller and the host is reset.
—————————————————————————-
# /opt/SUNWldm/bin/ldm set-memory 1024m primary
—————————————————————————-
Notice: the LDom Manager is running in configuration mode. Any configuration
changes made will only take effect after the machine configuration is
downloaded to the system controller and the host is reset.
—————————————————————————-
5.) Creation and use of newly created configuration:
Now that the control domain is configured the way we want it, we need to store it. The hypervisor will then use this configuration after the next power cycle. First, we will list available configurations and then create a new one called initial, and finally confirm its creation:
# /opt/SUNWldm/bin/ldm list-spconfig
factory-default [current]
# /opt/SUNWldm/bin/ldm add-spconfig initial
# /opt/SUNWldm/bin/ldm list-spconfig
factory-default [current]
initial [next]
Note: We can see the new configuration initial has been created and listed as the configuration to use at the next reboot. If a configuration with the name initial already existed, we would receive an error, and would need to use the ldm remove-spconfig command to remove the existing configuration first.
6.) Now we need to reboot our box for the previous changes which we have made to take effects releasing resources for other logical domains.
# shutdown -i6 -g0 -y
Note: Do not use reboot comamnd. Always use init or shutdown command so that no rc phase gets bypassed.
7.) Enable Daemons, This is the point where we have to enable our vntsd.
# svcadm enable vntsd
# svcs -a | grep -i ldom
# svcs -a | grep -i ldom
disabled       Jun_27   svc:/ldoms/agents:default
online         Jun_27   svc:/ldoms/ldmd:default
online        Jun_27   svc:/ldoms/vntsd:default
8.) Check the services running on the controller domain:
# /opt/SUNWldm/bin/ldm list-bindings primary
———————————————————————
Name: primary
State: active
Flags: transition,control,vio service
OS:
Util: 12%
Uptime: 11m
Vcpu: 4
vid pid util strand
0 0 18% 100%
1 1 13% 100%
2 2 9.8% 100%
3 3 5.4% 100%
Mau: 1
Memory: 124m
real-addr phys-addr size
0×4000000 0×4000000 124m
Vars: reboot-command=boot
IO: pci@780 (bus_a)
pci@7c0 (bus_b)
………………….
# ldm list
——————————————————————————
Notice: the LDom Manager is running in configuration mode. Configuration and
resource information is displayed for the configuration under construction;
not the current active configuration. The configuration being constructed
will only take effect after it is downloaded to the system controller and
the host is reset.
——————————————————————————
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
primary          active     -n-c–  SP      4     1024M   0.1%  5d 1h 10m
Our first controller Domain is ready for use. I will explain and create our Guest Domains in my next post and will cover the complete LDOM stuff in that. Moreover I will try to cover the release upgrade and Ldom patchings for Guest and Controller domains with backout plans.
Reference: Oracle Documentation & support.oracle.com

Enabling SVM in Failsafe and password recovery in Solaris

http://gurkulindia.com/main/2012/09/enabling-svm-in-failsafe-and-password-recovery-in-solaris/

In one of our previous post “Solaris Troubleshooting (Magic of Solaris 10) – Root Password Recovery for any Solaris 10 (without CD/DVD)”. We tried to show how to recover system’s root password in Failsafe mode without any media for OS having single disk (i.e native device, c#t#d#s#). But if your system is mirrored then you wont be able to recover the same without loading SVM module in failsafe (also with any other media) mode. As your box wont allow you to mount the metadevice. In this post I will try to present the procedure to load SVM module in such cases.


http://gurkulindia.com/main/2011/08/solaris-troubleshooting-magic-of-solaris-10-root-password-recovery-for-any-solaris-10-without-cddvd/
If we will boot Solaris OS in failsafe/CD/DVD/network mode, In that case, no Solaris Volume Manager (SVM) module will be loaded, and its impossible to work on mirrored OS on the installed OS without de-synchronizing the mirrors. If you will try to boot the box it will crash / panic and may corrupt your data too.
Below is the process to load the SVM driver and configuration files in the alternate media boot environment:
1.) Bring the server at OK prompt.
# init 0
2.) Boot your box in failsafe mode.
# OK boot -F failsafe
3.) Once your server will boot up in Failsafe, mount your rootdisk at /a. Here my rootdisk is c1t1d0s0.
# mount /dev/dsk/c1t1d0s0 /a
4.) Copy the configuration to enable SVM module in failsafe mode.
# cp /a/kernel/drv/md.conf /kernel/drv
5.) umount the root fs slice.
# umount /a
6.) Now we have to load the SVM module to enable it in failsafe mode:
# update_drv -f md
devfsadm: mkdir failed for /dev 0x1ed: Read-only file system <- You will see this messages
7.) Now you will be able to mount the md metadevices and will be able to make any changes.
# mount /dev/md/dsk/d0 /a
8.) Take a copy of /a/etc/passwd & /a/etc/shadow file.
# cp -p /a/etc/passwd /a/etc/passwd-orig
# cp -p /a/etc/shadow /a/etc/shadow-orig
9.) Now stick to basic and remove the encrypted password entry for root from /a/etc/shadow file.
#grep root /a/etc/shadow
root:WP7grKsEFAgt.:15182::::::
#grep root /a/etc/shadow
root::15182::::::
10.) Update the boot archive as below before proceeding with the reboot.
# bootadm update-archive -R /a
Creating boot_archive for /a
updating /a/platform/sun4u/boot_archive
11.) Umount the metadevice and Reboot your system, this time you are allowed to login into the server without password. Now first recommended thing would be to set you password for root.
# umount /a
# init 6
Note: After step seven we can sync the mirrors if necessary using metasync -r. I was able to recover the box many times without using it.

Solaris Troubleshooting (Magic of Solaris 10) – Root Password Recovery for any Solaris 10 (without CD/DVD)


The most common problem that can be seen in many environment is to recover root password.

As commited in my previous post, Herein I will be going to present a New Procedure for Root Password recovery for Solaris 10 without CD/DVD.
Below is the detailed procedure for the same.
1.) Bring the server at OK prompt.
yogesh-test# init 0
2.) Here comes the magic of Solaris 10, There is a new boot mode called Failsafe mode which is used to boot your server from RAM without any need of CD/DVD. I will show you the complete booting sequence in Failsafe below with the example.

{0} ok boot -F failsafe
Probing system devices
Probing memory
ChassisSerialNumber 0730TL21HC
Probing I/O buses
screen not found.
keyboard not found.
Keyboard not present. Using ttya for input and output.
Probing system devices
Probing memory
ChassisSerialNumber 0730TL21HC
Probing I/O buses
Sun Fire V245, No Keyboard
Copyright 2007 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.22.33, 8192 MB memory installed, Serial #74086296.
Ethernet address 0:14:4f:45:c9:7e, Host ID: 8479b97e.
Rebooting with command: boot -F failsafe
Boot device: /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/disk@1,0 File and args: -F failsafe
SunOS Release 5.10 Version Generic_142909-17 64-bit
Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights reserved.
WARNING: i2c_0 failed to add interrupt.
WARNING: i2c_0 operating in POLL MODE only
Hardware watchdog enabled
Configuring devices.
Searching for installed OS instances…
No installed OS instance found.
Starting shell.
# uname -a
SunOS 5.10 Generic_142909-17 sun4u sparc SUNW,Sun-Fire-V245
# df -k
Filesystem kbytes used avail capacity Mounted on
/ramdisk-root:a 201463 178943 2374 99% / ————> Server booted from RAM.
/devices 0 0 0 0% /devices
ctfs 0 0 0 0% /system/contract
proc 0 0 0 0% /proc
mnttab 0 0 0 0% /etc/mnttab
swap 7725248 320 7724928 1% /etc/svc/volatile
objfs 0 0 0 0% /system/object
sharefs 0 0 0 0% /etc/dfs/sharetab
swap 7725504 576 7724928 1% /tmp
/tmp/dev 7725504 576 7724928 1% /dev
fd 0 0 0 0% /dev/fd
#
Note: Sometimes (every time is X-86 servers) the Failsafe mode will ask you to mount the rootdisk before giving you the Shell prompt. Do not mount at that time, just say No every where and bring the failsafe mode at Shell Prompt and then do the mount and update archive manually.

//
//
3.) Once your server will boot up in Failsafe, mount your rootdisk at /a. Here my rootdisk is c1t1d0s0.
# mount /dev/dsk/c1t1d0s0 /a
# df -h /a
Filesystem size used avail capacity Mounted on
/dev/dsk/c1t1d0s0 7.9G 7.1G 714M 92% /a
4.) Take a copy of /a/etc/passwd & /a/etc/shadow file.
# cp -p /a/etc/passwd /a/etc/passwd-orig
# cp -p /a/etc/shadow /a/etc/shadow-orig
5.) Now stick to basic and remove the encrypted password entry for root from /a/etc/shadow file.
Before Modifications:
#grep root /a/etc/shadow
root:WP7grKsEFAgt.:15182::::::
After Modifications:
#grep root /a/etc/shadow
root::15182::::::
6.) Update the boot archive as below before proceeding with the reboot.
# bootadm update-archive -R /a
Creating boot_archive for /a
updating /a/platform/sun4u/boot_archive
7.) Reboot your system, this time you are allowed to login into the server without password. Now first recommended thing would be to set you password for root.
# init 6
This is one of the best feteaur I have observed in Solaris 10. Which make the troubleshooting part much flexible in case of remote service areas.
Note: The procedure for Solaris X-86 is also same, In X-86 select the Failsafe mode from GRUB Menu which is present in the GRUB menu bydefault. The GRUB is picking the entry from /boot/grub/menu.lst.

http://gurkulindia.com/main/2011/08/solaris-troubleshooting-magic-of-solaris-10-root-password-recovery-for-any-solaris-10-without-cddvd/

Wednesday, January 16, 2013

Linux List The Open Ports And The Process That Owns Them

http://www.cyberciti.biz/tips/linux-display-open-ports-owner.html

So how do you list the network open ports on your Linux server and the process that owns them? The answer is simple. Use the following command (must be run as the root user):

sudo lsof -i
sudo netstat -lptu
sudo netstat -tulpn

Tuesday, January 15, 2013

[Solaris] Which process is bound to a given port ?

https://blogs.oracle.com/JoachimAndres/entry/solaris_which_process_is_bound1


Again I was faced with the problem of a port being busy and needed to determine what process was bound to it. The little script below I picked up some time ago from the internet came in handy. I unfortunately cannot remember though to whom I owe credits. Here it is:
#!/bin/ksh
line='---------------------------------------------'
pids=$(/usr/bin/ps -ef -o pid=)
if [ $# -eq 0 ]; then
read ans?"Enter port you would like to know pid for: "
else
ans=$1
fi
for f in $pids
do
/usr/proc/bin/pfiles $f 2>/dev/null | /usr/xpg4/bin/grep -q "port: $ans"
if [ $? -eq 0 ]; then
echo $line
echo "Port: $ans is being used by PID:\\c"
pargs -l $f
#/usr/bin/ps -o pid,args -p $f
fi
done
exit 0

=================================


#!/bin/ksh
        pfexec pfiles /proc/* 2>/dev/null | nawk -v port=$1 '
/^[0-9]/ { cmd=$0; type="unknown"; continue }
$1 == "SOCK_STREAM" { type="tcp" }
$1 == "SOCK_DGRAM" { type="udp" }
$2 == "AF_INET" {
        if ((port!="")&&($5!=port)) continue;
        if (cmd!="") {
                printf("%s\n    %s:%s/%s\n",cmd,$3,$5,type); cmd="";
        }
}'