Monday, September 7, 2009

Book on WebSphere Application Server 7.0 Administration Guide from Packt Publishing


WebSphere Application Server 7.0 Administration Guide from Packt Publishing written by Steven Robinson is one of the few books available on WebSphere 7 Administration. This book is very useful for Websphere 7 Administrators from mid-level to advanced skills to manage Websphere Application Server 7.0. This book covers all the topics from installation, configuration, monitoring to product maintenance. This book has a lot of well written examples with step by step instructions including screenshots of major aspects of server configuration. The chapters are easy to read and you should be able to complete the entire book in few weeks and become a complete WebSphere 7.0 Administration Expert. The book also talk about WebSphere Messaging in chapter 6 in detail on how to configure using the default or WebSphere MQ provider.I would definitely recommend this book. The book is available in both eBook and as a regular printed book. More details about the book can be found in publishers website

Sunday, September 6, 2009

Slowing down your server process does have some benefits

Although lot people are finding various ways to speed up their application servers to improve performance, there are also benefits to slow down your server process especially when testing it to find problems that you won't see in normal scenarios. As you might have seen in your experience that 90% of the time things work well in a normal scenario when there the load on the server is normal, db response throughput is normal, network throughput is optimal , etc. But all of a sudden things start to break when something gets backed up like db fails to respond in an expected time leading to number of concurrent requests due to browser refresh by the user might break the applications due to deadlock or other concurrent related problems. Due to the availability of powerful multiprocessor severs nowadays , requests are executed in terms of milliseconds , hence it would be difficult to catch concurrency related problems unless you recreate the scenario where the requests are taking more time to process. Although one might say this can be simulated by doing a load test on the server, i agree but if there is a problem it would be a overwhelming task and extremely difficult for someone to debug given the amount of requests and objects created during the test, especially if there is a memory leak. Hence the same effect can be achieved by slowing down your server process with just using few concurrent requests which makes things easy to debug and trace.

For (e.g) check this below jsp lrumap.jsp, where it creates a simple LRUMap object, puts it into the session and then serializes it. You will not find any problems with the jsp as long as it runs quickly (usually takes about 10ms is a decent desktop server) with multiple concurrent requests, but if the code slows down due to I/O or some other reason , you will end up with the java.utl.ConcurrentModificationException if the code is executing the serialization part so.writeObject(lmap); and adding of the element lmap.put(lobj[i],new Integer(i)); at the same time even though the LRUMap is synchronizedm, the reason being the serialization code uses an iterator to iterate through the elements in which the iterator is not synchronized which caused the exception. Unless you make the server process to run slow you will never end up catching it until you see it in production during high load conditions where it will become extremely difficult to debug.



How to slow down a process ?


The trick is that you basically limit the CPU usage of the process which will slow down the process as it won't get enough CPU cycles to execute. There is a open source program in Linux CPU Limit can be used for this purpose or you can write a simple shell script where you can send STOP and CONT signal in a loop using the kill command. Although the later is not efficient but it works, but the CPU limit program is much precise even though it'd doing the same but through a C program.

1) Download and install the CPU Limit program.
2) If your System Administrator don't allow then use the shell script,

#!/bin/bash
while [ 1 ]
do
kill -STOP $1
usleep 100000
kill -CONT $1
done

Note using usleep can make the sleep in microseconds where as sleep will sleep in seconds which is not suitable for process executing transactions in milliseconds.

3) find the process id of the process you wanted to slow down.

4) Execute cpulimit -p -l

(e.g) cpulimit -p 11502 -l 10 - this command will allocate max of 10% CPU to the process 11502.

Tuesday, September 1, 2009

Running your own copy of production server instance in your desktop

Although the title seems little strange and scary I will explain to you in this article why it's needed, it's advantages and how you can setup easily without much changes. Being an Administrator and Support Engineer throughout my career i haven't seen an single I.T department having an exact replica of the production and testing/staging environment, there is always some difference between those environments like the most common is the amount of production data is not same as that of staging data which changes the equation of how the application behaves. There are situations where production support engineers won't even have access to staging/testing environments because it's either managed by a different vendor, or because of SOX compliance or due to various security reasons. Being an administrator , the application is kind of a black box compared to the developers where you might not have had much chance to play with it and understand how it really works, most of them you will come know from the word of mouth from the developers which sometimes may not be true in terms of how it's behaving in a production as they are aware only in terms how it worked in their development environment which is usually a standalone server compared to a usual cluster environment in production. During outages, you are more focused on bringing the server backup up instead of finding what caused the issue and most of the time it's too late or you don't have enough time to collect the data. Hence it is necessary in my opinion to have your own production copy , so that you can turn on traces , understand how the application works, understand the symptoms if some services breaks and find ways to fix it, so that when the real problem comes you will be prepared for it and tackle the problem quickly with little or no down time. Although many will disagree and you might want to explain the benefits which out weighs the problems.

You are basically going to run the server with a system proxifier like proxychains and tunneling all connections through the SSH proxy from your dekstop with ease without much modifications due to firewall restrictions. Here are some suggested steps,

1) Tar up your production server installation directory and untar it up in your desktop.

2) Make sure you create any directories or symbolic links that the server references that is not
included as a part of installation directory, like log and config directories.

3) Download and Install system proxifier proxychain .

4) Configure /etc/proxychains.conf to point to the socks server. (e.g) socks5 127.0.0.1 9050

5) If there are DNS name resolution issues follow the instructions Performing UDP tunneling through an SSH connection to setup local dns proxy server or enable proxy_dns property in /etc/proxychains.conf

6) Run a SSH proxy from your desktop , (e.g) ssh -D 9050 user@prod-servername.

7) start the server with proxychains or modify the startup script to include proxychains (e.g) proxychain /root/Desktop/jdk1.6.0_16//bin/java -Djava.util.logging.config.file=/opt/apache-tomcat-6.0.20/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.endorsed.dirs=/opt/apache-tomcat-6.0.20/endorsed -classpath :/opt/apache-tomcat-6.0.20/bin/bootstrap.jar -Dcatalina.base=/opt/apache-tomcat-6.0.20 -Dcatalina.home=/opt/apache-tomcat-6.0.20 -Djava.io.tmpdir=/opt/apache-tomcat-6.0.20/temp org.apache.catalina.startup.Bootstrap start
you can get the process string by doing ps -auxww after you normaly started the server. Now your server should be able to start and able to connect through the SSH proxy and proxychains output will show how the connections are made ,
(e.g)
ProxyChains-3.1 (http://proxychains.sf.net)
S-chain-<>-127.0.0.1:9050-<--timeout INFO: Mon Aug 31 22:37:41 PDT 2009: PlatformDetector detected platform: tomcat
S-chain-<>-127.0.0.1:9050-<>--10.23.23.11:1521--<>

8) website's domain/virtualhost mapping to your local ip to access the website in /etc/hosts. (e.g) www.example.com 127.0.0.1

Sunday, August 30, 2009

IBM WebSphere Portal in Real-World Cloud Computing

Since IBM and Amazon announced the avalability of WebSphere Portal Server and Lotus Web Content Management Standard Edition on the Amazon EC2 Web Service, I used to wonder whether there are any real-world customers using it. But now it seems from the news, that an investment firm Quintana Capital Group converted its Web site and IBM WebSphere-powered portal to Amazon's EC2. Here is the URL http://www.qeplp.com/wps/portal/ of their portal powered by IBM WebSphere Portal on the Amazon EC2.


Also you can run a traceroute to confirm that it's hosted in EC2.

>tracert www.qeplp.com
100 ms 98 ms 97 ms ec2-174-129-234-118.compute-1.amazonaws.com [174.129.234.118]

Saturday, August 22, 2009

WebSphere Java process hangs and freezes

We recently had an issue where the websphere Java process got hung and freezes in 4 servers almost at the same time, where 3 server nodes are part of a cluster and the other one is a standalone. Restarting of websphere AppServer fixed the issue. This issue was still puzzling as to why all the servers got hung at the same time and even the one that is not part of the cluster got hung as well. We did some investigation and found the commonality among all these servers is that all the websphere installation directory is nfs mounted on a NAS (Network Attached storage) device. We suspected that either nfs mount or the NAS might have had problems as there was no better explanation for all the server to go down at the same time. We checked the OS /var/log/messages file and found these nfs service messages happened around the same time the server went down ,

Aug 20 04:09:45 appserver01 kernel: nfs: server nasserver01 OK
Aug 20 04:10:51 appserver01 kernel: nfs: server nasserver01 not responding, still trying
Aug 20 04:10:51 appserver01 kernel: nfs: server nasserver01 not responding, still trying
Aug 20 04:10:53 appserver01 kernel: nfs: server nasserver01 OK

These messages seems to be related to nfs timeout. As there were no problem with the NAS device itself , it was clear that nfs service was timing out might have caused the issue. We changed the nfs to use the TCP and nfs version 3 which is more reliable instead of UDP with some additional tuning parameters. Once remounting with new parameters the problem didn't happen so far. Here are the new setting for the nfs mount over TCP.

/etc/fstab:

nasserver01:/app/WebSphere /mnt/WebSphere (rw,noatime,hard,intr,tcp,nfsvers=3,retrans=5,rsize=8192,wsize=8192,timeo=14,addr=10.10.1.20)

In case if the problem still exists after the tuning , nfsstat or tcpdump traces can be used to analyze the problem.

Thursday, August 20, 2009

Is Java really "Write once, run anywhere" ?

As many of us generally know, Java is popular for it's cross-platform portability "Write once, run anywhere", but i wanted to give it a test and see if it's truly one. I tried to run WebSphere 7 Application Server itself using Sun JRE 1.6.0 instead of IBM J9 VM which is bundled with AppServer and see if it works with cross vendor JVM on the same platform. I had to make couple of changes, the startServer.sh script adds IBM JVM specific arugments ( -Xshareclasses:name=webspherev70_%g,groupAccess,nonFatal -Xscmx50M )which i had to remove , changed the WAS_HOME/java to point to Sun's JAVA_HOME and set the environment variables as set in setupCmdline.sh and startServer.sh and ran the process from the cmdline with the huge list of arguments with the one i got from the process string when it's started from the startServer.sh script.

The AppServer failed to start with the following exception,

java.lang.NoClassDefFoundError: com/ibm/wsspi/buffermgmt/WsByteBufferPoolManager
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:621)

and

[20:48:03:436 PDT] 0000000a ServerCollabo E WSVR0121E: An exception occurred getting a socket for port 38930 on hostname waslinux with an IP address of 192.168.1.10.

I added the the AppServer/plugins/* directory to the classpath and even to -Djava.ext.dirs, but this time the JVM just exits and getting terminated without writing anything into the logs. I tried different things to find the actual root of the failure by enabling verbose:jni traces, strace and using jdb, but couldn't find anything useful and ultimately gave up. Note the same method of starting websphere using IBM J9 VM from the cmdline seems to work. Both Sun JDK and Open JDK just terminates for no reason.

I also ran apache tomcat server on both IBM's and Sun JVM and it seems to run fine without problems.

It seems like WebSphere AppServer java code is not compatible to run in other vendor JVM's, hence the promise of Java hasn't come true at least in this case where it's just "write once , run anywhere as long as you stick to the same JVM vendor that you used to develop and test " :) .


*If anyone had tried and been successful please comment on my post, I would really like to run and see as i can use some of the tools like jvisualvm, jmap, jps, jstack , etc which gets bundled with Sun JDK and not with IBM.

Monday, August 17, 2009

Problem running startxwin.bat in Cygwin/X on windows

Nowadays more and more graphical tools like jconsole , jvisualvm are shipped with java and several other tools like tda-Thread Dump Analyzer , IBM HeapAnalyzer , etc are available for download to analyze and debug problems, hence it seems like you need to have some kind of graphical terminal like Xserver or VNC to manage your environment. Since most of the production environments are UNIX/Linux based and more commonly connected through windows desktop, you might need to have a windows based Xserver or VNC client. VNC Servers are not very common in the enterprise as they deemed to be insecure compared to X and needed additional installation and configuration while X comes with the OS. One such windows based open source free XServer is Cygwin/X which i decided to give it a try and ran into some problems before i made it to work, hence i wanted to write about it to avoid the same problem by someone else. The installer is little different where it lists you different packages of Cygwin along with Cygwin/X packages so you just need to select X11 if that's what you needed.



Once installed you just have to goto the c:\cygwin\bin in a cmd prompt and run startxwin.bat ,which should start the Xserver, but i my case the batch file was giving some issues,

startxwin.bat - Starting on Windows NT/2000/XP/2003'c:\cygwin\bin\run' is not recognized as an internal or external command,operable program or batch file.'c:\cygwin\bin\run' is not recognized as an internal or external command,operable program or batch file.

So in order to make it work i had to change the one line in the startxwin.bat file as shown below,

SET RUN=%CYGWIN_ROOT%\bin\run -p /usr/bin

replace with,

SET RUN=run -p /usr/bin

Now you should be able to connect with your favorite SSH client like putty or secureCRT with X11 Forwarding.