MPI/poe problems on p690/AIX 5.2

From: Michael E. Thomadakis (miket_at_hellas.tamu.edu)
Date: 03/25/05


Date: Fri, 25 Mar 2005 14:51:53 -0600

Hello all,

after upgrading to the latest POE/PPE environment on a p690/AIX5.2 we've
noticed that mpi jobs could not start.

-------------------
We have:

% lslpp -l '*poe*'
  Fileset Level State Description
  ----------------------------------------------------------------------------
Path: /usr/lib/objrepos
  ppe.poe 4.1.1.6 APPLIED poe Parallel Operating
                                                 Environment
Path: /etc/objrepos
  ppe.poe 4.1.1.6 APPLIED poe Parallel Operating
                                                 Environment

-------------------
Note that we are not using load-leveler or any other resource manager.

-------------------
For instance:

% mpiexec -n 4 ./mpi
ERROR: 0031-024 agave.tamu.edu: no response; rc = -1

and I get a '/tmp/mplog.913612' created with

% cat /tmp/mplog.913612
AIX Parallel Environment pmd4 version @(#) 2003/06/11 13:19:38
The ID of this process is 913612
The version of this pmd for version checking is 4100
The hostname of this node is agave
The short hostname of this node is agave
Fri Mar 25 14:46:48 2005
 ERROR: 0031-203 malformed from address: <Error 0>
pmd_exit reached!, exit code is 1
No collective communication shared memory segments to clean up.
-------------------

Does anyone have any similar experience with poe not working on AIX5.2 after
applying upgrades to latest levels? Or is there any special handling
neccessary for POE now to work?

Thanks a lot!
Michael



Relevant Pages