Problem with SUN Grid Engine

From: Daniel Wetzler (Daniel.Wetzler_at_uni-koeln.de)
Date: 01/28/04


Date: Wed, 28 Jan 2004 15:09:33 +0100

Hallo,

unfortunately I have also a problem with Sun Grid Engine :

I try to work on a 11 node cluster using sge.
All nodes have access to a nfs-shared directory with all
files used by sge.
My problem is that all nodes arte displayed in error mode in
qmon, but I don't know how to get the error message.
I don't get error messages after stopping and starting the
sge daemons on the computers manually.

Does anyone know how to find out why a node is displayed
in error mode ?

The other thing is, that even in idle mode the sge-daemons
on the nodes seem to hang (qmon displayes the queue belonging to
the node red) and I have to restart the sge daemons on the mashine
manually. After that the node is displayed normal again.
Does anyone know what could cause the sge-dawemon hangups ?

Greetings,

Daniel Wetzler