Tuesday, August 30, 2011

Managed node fails to synchronize



Technote (troubleshooting)


Problem(Abstract)

The managed node fails to synchronize with the following errors.

Symptom

00002a04 NodeSync E ADMS0005E: The system is unable to generate synchronization request:
javax.management.JMRuntimeException: ADMN0022E: Access is denied for the getRepositoryEpoch operation on ConfigRepository MBean because of insufficient or empty credentials.
.
.
00002a04 NodeSyncTask A ADMS0036E: The configuration synchronization failed.

00002a05 ServiceLogger I com.ibm.ws.ffdc.IncidentStreamImpl initialize FFDC0009I: FFDC opened incident stream file
F:\IBM\WebSphere\AppServer\profiles\ctgAppSrv01\logs\ffdc\nodeagent_0000
2a05_09.09.20_22.01.17_0.txt

00002a05 ServiceLogger I com.ibm.ws.ffdc.IncidentStreamImpl resetIncidentStream FFDC0010I: FFDC closed incident stream file
F:\IBM\WebSphere\AppServer\profiles\ctgAppSrv01\logs\ffdc\nodeagent_00002a05_09.09.20_22.01.17_0.txt

00002a06 NodeSync E ADMS0005E: The system is unable to generate synchronization request: javax.management.JMRuntimeException: ADMN0022E: Access is denied for the getRepositoryEpoch operation on ConfigRepository MBean because of insufficient or empty credentials.
.
.
00002a34 RoleBasedAuth E SECJ0306E: No received or invocation credential exist on the thread. The Role based authorization check will not have an accessId of the caller to check. The parameters are: access check method isNodeSynchronized on resource NodeSync and module NodeSync. The stack trace is java.lang.Exception:
Invocation and received credentials are both null at
com.ibm.ws.security.role.RoleBasedAuthorizerImpl.checkAccess(RoleBasedAu thorizerImpl.java:287)

Cause

The error listed above usually indicate that the LTPA keys might have been automatically regenerated. However, those keys might not have pushed correctly to nodes and thus causing this problem.

Resolving the problem

To solve this problem try disabling automatic generation of Lightweight Third Party Authentication keys. In Administrative console, you can disable "Automatically generate key" as follows;
In Administrative console:
  1. SSL certificate and key management -> Key set groups ->Select Key set group name -> uncheck the box for "Automatically generate keys"

    Clear the Automatically generate keys option.
  2. From the Key set groups -> check key set Group name and hit Generated Keys tab.
  3. Click OK and Save to save the changes to the master configuration.
  4. Stop the dmgr
  5. On dmgr side delete the contents under wstemp, temp and config/temp folder from <profile_root>
  6. Start the dmgr
  7. Stop the Node/Server using stopNode/stopServer commands from the <profile_root>/bin of AppServer
  8. Manually synchronize the node by running syncNode.sh from <profile_root>/bin, since security is enabled then please run following command

    syncNode.sh <DMgr_hostName> <SOAP_PORT_of_DMGR> -username <username> -password <password>
  9. Start the node and server.
  10. Logon to Dmgr Administrative console and check the Node/Server availability.

It's recommended that users of IBM WebSphere Application Server upgrading to the latest Fix Pack since we have some known issues related LTPA which have fixed in the later Fix Packs.

For additional information, please open this link: http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.express.doc/info/exp/ae/twsu_jaxr_sec.html

Refer to the following URL to obtain the latest Fix Pack:
http://www.ibm.com/support/docview.wss?uid=swg27004980#ver61

Saturday, August 27, 2011

Understanding UNIX / Linux filesystem Inodes


Understanding UNIX / Linux filesystem Inodes

by nixcraft on November 10, 2005 · 32 comments

The inode (index node) is a fundamental concept in the Linux and UNIX filesystem. Each object in the filesystem is represented by an inode. But what are the objects? Let us try to understand it in simple words. Each and every file under Linux (and UNIX) has following attributes:
=> File type (executable, block special etc)
=> Permissions (read, write etc)
=> Owner
=> Group
=> File Size
=> File access, change and modification time (remember UNIX or Linux never stores file creation time, this is favorite question asked in UNIX/Linux sys admin job interview)
=> File deletion time
=> Number of links (soft/hard)
=> Extended attribute such as append only or no one can delete file including root user (immutability)
=> Access Control List (ACLs)
All the above information stored in an inode. In short the inode identifies the file and its attributes (as above) . Each inode is identified by a unique inode number within the file system. Inode is also know as index number.

inode definition

An inode is a data structure on a traditional Unix-style file system such as UFS or ext3. An inode stores basic information about a regular file, directory, or other file system object.

How do I see file inode number?

You can use ls -i command to see inode number of file
$ ls -i /etc/passwd
Sample Output
32820 /etc/passwd
You can also use stat command to find out inode number and its attribute:
$ stat /etc/passwdOutput:
File: `/etc/passwd'
Size: 1988            Blocks: 8          IO Block: 4096   regular file
Device: 341h/833d       Inode: 32820       Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2005-11-10 01:26:01.000000000 +0530
Modify: 2005-10-27 13:26:56.000000000 +0530
Change: 2005-10-27 13:26:56.000000000 +0530

Inode application

Many commands used by system administrators in UNIX / Linux operating systems often give inode numbers to designate a file. Let us see he practical application of inode number. Type the following commands:
$ cd /tmp
$ touch \"la*
$ ls -l

Now try to remove file "la*
You can't, to remove files having created with control characters or characters which are unable to be input on a keyboard or special character such as ?, * ^ etc. You have to use inode number to remove file. This is fourth part of "Understanding UNIX/Linux file system, continue reading rest of the Understanding Linux file system series (this is part IV):
  • Part I - Understanding Linux superblock
  • Part II - Understanding Linux superblock
  • Part III - An example of Surviving a Linux Filesystem Failures
  • Part IV - Understanding filesystem Inodes
  • Part V - Understanding filesystem directories
  • Part VI - Understanding UNIX/Linux symbolic (soft) and hard links
  • Part VII - Why isn't it possible to create hard links across file system boundaries? 

Why command df and du reports different output?

by LinuxTitli on January 31, 2006 · 16 comments

You will never notice something like this on FreeBSD or Linux Desktop home system or your personal UNIX or Linux workstation. However, sometime on a production UNIX server you will notice that both df (display free disk space) and du (display disk usage statistics) reporting different output. Usually df will output a bigger disk usage than du.
If Linux or UNIX inode is deallocated you will see this problem. If you are using clustered system (file system such as GFS) you may see this scenario commonly.
Note following examples are FreeBSD and GNU/Linux specific.
Following is normal output of df and du for /tmp filesystem:
# df -h /tmp
Output:
Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/ad0s1e    496M     22M    434M     5%    /tmp
Now type du command:
# du -d 0 -h /tmp/
Output:
22M    /tmp/

Why is there a mismatch between df and du outputs?

However, some time it reports different output (a bigger disk usage), for example:
# df -h /tmp/
Output:
Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/ad0s1e    496M     39M    417M     9%    /tmp
Now type du command:
# du -d 0 -h /tmp/
Output:
22M    /tmp/
As you see, both df and du reporting different output. Many new UNIX admin get confused with output (39M vs 22M).
Open file descriptor is main causes of such wrong information. For example if file called /tmp/application.log is open by third party application OR by a user and same file is deleted, both df and du reports different output. You can use lsof command to verify this:
# lsof | grep tmp
Output:
bash   594  root  cwd   VDIR  0,86      512      2 /tmp
bash   634  root  cwd   VDIR  0,86      512      2 /tmp
pwebd  635  root  cwd   VDIR  0,86      512      2 /tmp
pwebd  635  root  3rW   VREG  0,86 17993324     68 /tmp (/dev/ad0s1e)
pwebd  635  root   5u   VREG  0,86        0     69 /tmp (/dev/ad0s1e)
lsof   693  root  cwd   VDIR  0,86      512      2 /tmp
grep   694  root  cwd   VDIR  0,86      512      2 /tmp
You can see 17993324K file is open on /tmp by pwebd (our in house software) but deleted accidentally by me. You can recreate above scenario in your Linux, FreeBSD or Unixish system as follows:
First, note down /home file system output:
# df -h /home
# du -d 0 -h /home

If you are using Linux then use du as follows:
# du -s -h /tmp
Now create a big file:
# cd /home/user
# cat /bin/* >> demo.txt
# cat /sbin/* >> demo.txt

Login on other console and open file demo.txt using vi text editor:
# vi /home/user/demo.txt
Do not exit from vi (keep it running).
Go back to another console and remove file demo.txt
# rm demo.txt
Now run both du and df to see the difference.
# df -h /home
# du -d 0 -h /home

If you are using Linux then use du as follows:
# du -s -h /tmp
Login to another terminal and close vi.
Now close the vi and the root cause of the problem should be resoled, the du and df outputs should be correct.

# lsof -n -P | grep deleted
rsync 29911 root 3r REG 8,17 15496725683 26230786 /an/old/file (deleted)

Thursday, August 25, 2011

Jumpstart Network Boot Protocol - RARP

        Normally, The install server provides the boot program for booting
        clients. However, under one condition, the Solaris network booting
        architecture requires you to set up a separate "boot server". A boot
        server is a system with just enough information to boot up a client
        over a network. You have to setup a boot server when the install
        client is on a different subnet than the install server.
       
        SPARC install clients require a boot server when they exist on
        different subnets because the network booting architecture uses the
        reverse address resolution protocol (RARP). When a client boots, it
        issues a RARP request in order to obtain its IP address. RARP, however
        does not acquire the netmask number, which is required to distribute
        information across a router on a network. If the install/boot server
        exists across a router the boot will fail because the network traffic
        cannot be routed correctly without a netmask number.
       
        The result is that you can install a client across a router, but you
        cannot boot a client across a router. So you will have to setup a
        separate boot server on the same subnet as the client.

Wednesday, August 24, 2011

Configure Multipath IO Linux

·         Retrieve WWWN for Requesting Storage
systool -c fc_host -v | grep port

    port_id             = "0x021c00"
    port_name           = "0x50014380056631f0"
    port_state          = "Online"
    port_type           = "NPort (fabric via point-to-point)"
    supported_classes   = "Class 3"
    supported_speeds    = "1 Gbit, 2 Gbit, 4 Gbit, 8 Gbit"
    port_id             = "0x021c00"
    port_name           = "0x50014380056631f2"
    port_state          = "Online"
    port_type           = "NPort (fabric via point-to-point)"
    supported_classes   = "Class 3"
supported_speeds    = "1 Gbit, 2 Gbit, 4 Gbit, 8 Gbit"

Record the port_name, you need it to request for the SAN storage.
·     Discouver New LUNs
echo - - - >/sys/class/scsi_host/host0/scan
·         Retrieve the wwid of the LUNs
multipath -l

CluData (360a98000486e5435653457423949616e) dm-4 NETAPP,LUN
[size=50G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 4:0:1:0 sdb 8:16  [active][undef]
 \_ 5:0:1:0 sdd 8:48  [active][undef]
\_ round-robin 0 [prio=0][enabled]
 \_ 4:0:0:0 sda 8:0   [active][undef]
 \_ 5:0:0:0 sdc 8:32  [active][undef]

·         Configure Multipathd for SAN LUN
Save following content as /etc/multipath.conf
Change the wwid and alias to match the LUN you requested.
defaults {
        user_friendly_names yes
}
blacklist {
        devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
        devnode "^hd[a-z]"
        devnode "^ccis*"
}
multipaths {
        multipath {
                wwid                    360a98000486e5435653457423949616e
                alias                   CluData
        }
}

·         Enable multipathd service

chkconfig multipathd on

service multipathd start

Refer to below post to finish the rest.
http://feijiangnan.blogspot.com/2011/05/how-to-create-linux-lvm.html

Monday, August 15, 2011

While Loop Multiple Conditions

set -x
dmgrStatus="CHECKING"

while [ "${dmgrStatus}" != "FAILED" ] && [ "${dmgrStatus}" != "SUCCESSFUL" ] && [ "${dmgrStatus}" != "TIMEDOUT" ]
do
echo "$dmgrStatus"
dmgrStatus="SUCCESSFUL"
echo "$dmgrStatus"

done