Wednesday, June 25, 2008

Disk space problems in WebSphere Portal Installations

You might have powerful servers like Dell PowerEdge servers with quad core processors to run your WebSphere Portal, but servers do most of the time come up with a little disk storage space with faster disks compare to your laptop or desktop disks,which nowadays they even come up with 1 TB (TeraByte) in capacity for few hundred dollars. Unless you periodically clean up the unwanted files you would easily end up using the disk space and run into problems. Below are some of the locations within WebSphere Portal installation that you can monitor to cleanup unwanted files,


Cleanup IBM JVM javacore and heapdump files

IBM javacore and heapdumps might be automatically created due to OutOfMemory errors or when jvm crashes due to certain conditions, or it can be user initiated to debug certain problems like hanging threads, high cpu utilization and memory leaks. Javacore ( javacore.20080306.020447.11544.txt ) and heapdumps ( heapdump.20080128.195700.20399.phd ) files can be found under WAS_HOME/profiles/Portal01 directory usually in sizes of several mega bytes depending on your applications and when several of them gets created you might easily run into disk space problems. Clean up or move these files as and when necessary and work with IBM support to debug this issue and prevent these files from getting generated.

JCR Search Indexes might get huge in size

JCR Search indexes that are created under WPS_HOME/jcr/search can sometimes be large in size particularly docsdata.dat in terms of several GB's more than the actual content size itself, so whenever you think the size is large than what your content size itself , then it's safe to delete all the contents under WPS_HOME/jcr/ directory and restart WebSphere Portal to recreate the indexes.

Archived Portlet Applications .ear files

Archived Portlet Applications .ear files are created under the WPS_HOME/deployed directory everytime you install a portlet application, the number of files will get increased particulary when you do several deployments, particulary this can happen during your development cycles. As long as you are not required to locally archive these files on that specific system, then they can be safely removed (i.e., deleted or moved to another storage location.)

Rotate SystemErr.log, trace.log and SystemOut.log

By default WebSphere Portal keeps upto 5 historical files with sizes of 5 MB each and will start rotating the logs. Sometimes it seems low because log files gets huge quickly when you enable additional tracing and u that yodon't want to loose the trace information during the problem recreation if it start rotating the logs. so choose the size of a file and the number of historical files wisely and not overrun your disk space. Also native_stderr.log and native_stdout.log don't get rotated , hence you need to periodically clean the files or zero the file size, as i have seen problems that WebSphere process would hung if native_stdout.log file get to 4GB in size and it won't come up until you clean and restart, particularly in linux environment.

Use disk usage (du) command to determine huge directory size

After checking all the above you might still run into unknown large files like applicaton logs, core dumps , etc. so in order to find them, run this command ls -d * | xargs -i bash -c 'echo {};du -ch $PWD/{}/| grep total' | grep G -B 1 which will list you directory and files who sizes that are in GB, based on the results you should be able to make discussion with the respective team and delete them if they are not needed.


I will keep them updated when i find some more.

Intermittent slowness in websphere portal server java process

In a 2 node WebSphere Portal Cluster with WCM installed/enabled you might see performance degradation on either of the nodes randomly and you might see resources are used up more like CPU and IO on one node compared to the other. This might keep changing between nodes. From further investigation we found the root cause is JCR Search Indexer, the search indexer is started based on which node is started first, hence it seems to be random node performance issues. We haven't got a definitive answer from IBM , so went ahead and turned off the JCR Search Indexer and that fixed the issue which anyway we are not using the search index in our production server.

Steps to disable JCR Search Indexer.
  • Go to [WPS_ROOT]\jcr\lib\com\ibm\icm\icm.properties
  • Set ‘jcr.textsearch.enabled’ to false
  • You will need to restart the portal server for the changes to take effect.
  • Repeat the steps in each node.
Note this setting won't affect the Portal Search which is different from the Web Content Management AuthorTime Search OR PDM Search where the JCR index is utilized.

Beware of the login attribute length limitation in WebSphere Portal

Beware of the login attribute limitation of 32 characters in length in WebSphere Portal 6.0 using WCM and when connecting to Java Content Repository (JCR) which is nothing but PDM (Portal Document Manager). So when designing and choosing a login attribute ( uid, cn , samAccountName, email) for your portal application you should carefully select or enforce the attribute in such a way that it's not more than 32 characters in length. This really makes the life difficult as you can't even use the basic email address as your login attribute as there is a possibility that some valid email addresses will easily more than characters for example, the world's richest Warren Buffet warren.buffet@berkshirehathaway.com can't even register with your website. Good news to all this is that they support upto 175 character length in the newer WebSphere Portal 6.1 release.

Certain symptoms you might notice are that users with more than 32 characters of login id might complain about missing images or document that is referred in the webpage from PDM or WCM and you might see the following exception in the logs during their login.


Error logging in: com.ibm.content.exception.LoginException:
javax.jcr.LoginException: Unable to establish session with DB2® Content
Manger Runtime Edition for User

Monday, June 23, 2008

Live Popularity Of Enterprise Portal Server Market by Google Keword Search using Yahoo Pipes

I just came across this interesting mashup kind of a tool called Yahoo Pipes an interactive feed aggregator and manipulator. I was thinking to use the pipes to build something relevant to my websphere blog and then decided to build a pipe which basically plot a enterpise portal popularity graph based on the total number of realtime google search results returned for each of the popular portal servers names like WebSphere Portal, Weblogic Portal, Sun One, Sharepoint, Jboss and Oracle. You can see the source of the pipe here http://pipes.yahoo.com/pipes/pipe.edit?_id=269ed254b3df0a7426e34187c98c0d17 The results also kind of confirmed the truth with this news that IBM WebSphere Portal is the leader in Enterprise portal market.

If you are not seeing the graph below please click here , because it seems like google search blocks yahoo pipes for making repeated calls, you will see the graph snapshot that was taken on July 13th 2008.



Sunday, June 15, 2008

IBM HTTP Server child process core dumps

IBM HTTP Server child process core dumps failing to serve some or all of the requests. You might see the following Segmentation fault errors in the error.log of the server.

[Sun Feb 10 19:26:52 2008] [notice] IBM_HTTP_Server/6.1.0.9 Apache/2.0.47 (Unix) configured -- resuming normal operations
[Sun Feb 10 19:26:52 2008] [notice] CoreDumpDirectory not set; core dumps may not be written for child process crashes
[Sun Feb 10 19:27:20 2008] [notice] child pid 7755 exit signal Segmentation fault (11)
[Sun Feb 10 19:27:21 2008] [notice] child pid 7756 exit signal Segmentation fault (11)
[Sun Feb 10 19:27:23 2008] [notice] child pid 7758 exit signal Segmentation fault (11)
[Sun Feb 10 19:27:25 2008] [notice] child pid 7759 exit signal Segmentation fault (11)
[Sun Feb 10 19:27:26 2008] [notice] child pid 7760 exit signal Segmentation fault (11)
[Sun Feb 10 19:27:27 2008] [notice] child pid 7761 exit signal Segmentation fault (11)
[Sun Feb 10 19:27:37 2008] [notice] child pid 7762 exit signal Segmentation fault (11)
[Sun Feb 10 19:29:36 2008] [notice] child pid 7763 exit signal Segmentation fault (11)
[Sun Feb 10 19:29:49 2008] [notice] child pid 7771 exit signal Segmentation fault (11)
[Sun Feb 10 19:30:21 2008] [notice] child pid 7772 exit signal Segmentation fault (11)
[Sun Feb 10 19:30:27 2008] [notice] child pid 7773 exit signal Segmentation fault (11)
[Sun Feb 10 20:12:12 2008] [notice] child pid 8056 exit signal Segmentation fault (11)

In our case the problem appears to be that we had mistakenly specified the ResponseChunkSize value that is too high like 400000. This value is the number of 1024 byte pages so it is multipled by 1024 which exceeds an acceptable value. so setting it to a lower value of 4000 seems to have resolved the issue. It took several days before we could figure out the issue after working with IBM, and these are following steps you can do that might help debugging the issue if in your case is not related to the ResponseChunkSize.

Please do the following:

1. To set up system to obtain cores:

http://publib.boulder.ibm.com/httpserv/ihsdiag/coredumps.html

2. Make sure you have latest IHSDIAG for debugging IHS issues.

http://www-1.ibm.com/support/docview.wss?uid=swg24008409

3. Run IHSDIAG against core that is obtained:

http://publib.boulder.ibm.com/httpserv/ihsdiag/gather_crash_doc.html

4. Run IHSDIAG against good system to get system information:

http://publib.boulder.ibm.com/httpserv/ihsdiag/describeconfig.html

Saturday, June 14, 2008

Finding websites running WebSphere Portal

I used to wonder which websites or companies are using IBM WebSphere Portal Server. The easiest way to find them is to do a Google search on /wps/portal keyword which is nothing but the default context root or the base url for WebSphere Portal. I was surprised to see several websites ranging from government websites of different countries, banks, retail, telecom etc. The search should return all websites running websphere portal with default context root /wps/portal unless they have changed it using modify-servlet-path task which i believe atleast 80%-90% of installation wouldn't have done.

Sunday, June 8, 2008

OutOfMemory errors with WebSphere Portal

WebSphere Portal server might throw OutOfMemory (OOM) errors and might got hung and unresponsive when serving or dowloading large documents through WCM (IBM Workplace Web Content Management ) which are stored in PDM (Portal Document Manager). You might see the following error in SystemErr.log and the JVM might have produced a heapdump,



[5/14/08 14:35:32:984 PDT] 00000086 ModuleManager E IWKCT1382X: Major exception caught: com.presence.connect.business.module.ErrorExecutingRequestException: IWKMU1062X: Message: IWKCT1366X: Exception caught servicing a Servlet request for PDMProxy, Cause: java.lang.OutOfMemoryError
[5/14/08 14:35:32:985 PDT] 00000086 ModuleManager E IWKCT1383X: Unexpected error from Module. Details of cause to follow:
[5/14/08 14:35:32:986 PDT] 00000086 ModuleManager E
java.lang.OutOfMemoryError
at com.presence.connect.connector.content.ContentAPIConnection.getBytes(ContentAPIConnection.java(Compiled Code))
at com.presence.connect.connector.content.ContentAPIConnection.getItemContents(ContentAPIConnection.java:516)
at com.presence.connect.connector.content.ContentAPIConnector.getItemContents(ContentAPIConnector.java:216)
at com.aptrix.pluto.resource.PDMResourceUtils.createPDMMime(PDMResourceUtils.java:632)
at com.aptrix.pluto.resource.PDMResourceUtils.getPDMData(PDMResourceUtils.java:419)
at com.aptrix.pluto.resource.PDMResourceServerModule.getResourceMime(PDMResourceServerModule.java:968)
at com.aptrix.pluto.resource.PDMResourceServerModule.buildResponse(PDMResourceServerModule.java:516)
at com.aptrix.pluto.resource.PDMResourceServerModule.retrieveResource(PDMResourceServerModule.java:355)
at com.aptrix.pluto.resource.PDMResourceServerModule.process(PDMResourceServerModule.java:276)
at com.presence.connect.business.module.ModuleManager.launchBusiness(ModuleManager.java:121)
at com.presence.connect.business.module.ModuleManager.launchBusiness(ModuleManager.java:384)
at com.presence.connect.RequestExecutable.execute(RequestExecutable.java:84)
at com.presence.connect.dispatcher.Task.run(Task.java:151)
at com.presence.connect.ConnectClient.processSynchronous(ConnectClient.java:167)
at com.presence.connect.ConnectServlet.process(ConnectServlet.java:298)
at com.presence.connect.ConnectServlet.doGet(ConnectServlet.java:120)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:743)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:856)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1572)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java(Compiled Code))


In order to resolve the issue please apply the IBM recommended ifix PK66778 and tune the JVM heap size, xloratio , pCluster & kCluster based on your application needs, expected load and the optimum performance. IBM Monitoring and Diagnostic Tools for Java - Garbage Collection and Memory Visualizer (GCMV) which is a part of IBM Support Assistant Tools can help you tune these parameters based on the Garbage Collection logs history.
There was an error in this gadget