Re: Problems with SCO 5.0.6 and Informix 7.12 (long)

From: Bela Lubkin (belal_at_sco.com)
Date: 01/28/05

  • Next message: John Smith: "Can't compile with OSS646B on SCO 5.0.4"
    Date: 28 Jan 2005 12:59:12 -0500
    
    

    Roberto Zini wrote:

    > I'd like to share with the groups a problem a customer of ours is facing
    > on a 5.0.6 box with an old Informix 7.12 database (operating on a RAW
    > partition).

    > SCO OS 5.0.6 + SMP + RS506A + OSS648A + OSS651A + OSS644B + OSS650A.
    >
    > The box is an IBM Server xSeries 255 (8685-C1X) with an IBM CONTROLLER
    > ServeRAID-6M (drivers v 7.10); the disks are arranged in a RAID-5 array
    > (HW driven).
    >
    > This box is a dual Xeon 3.2 Ghz box with 4GB of RAM.
    >
    > This server hosts approx 350 users with an account package written in 4GL.
    >
    > Approx once or twice a day, the database server (namely, a couple of
    > instances of "oninit") ends up taking approx 99% of the CPU usage so
    > forcing the administrator to manually "kill" (-9) these processes and
    > issue an "oninit" to get the engine back to work.
    >
    > One way to trigger the hang is the parallel execution of a HUGE query
    > (which handles several MILLIONS records); as soon as the operation
    > starts, "sar -U" reports an high %wio value but the system does not slow
    > too much. After approx 30 to 40 minutes after, the HD lights (which were
    > pretty "wild" during the query) return to a more normal flashing and the
    > above oninit instances are frozen.

    > For those of you who like reading on, here are some excerpts from the
    > "mpsar -U" command (along with my comments); the massive query started
    > at approx 15:00 and the oninit processed did hang at approx 15:55.
    >
    > 08:00:01 %usr %sys %wio %idle (-u)
    > 14:40:00 9 5 9 77
    > 14:45:00 13 6 50 31
    > 14:50:00 16 6 67 11
    > 14:55:00 26 8 61 5
    > 15:00:00 26 8 60 6
    > 15:05:00 26 8 62 5
    > 15:10:00 30 7 60 2
    > 15:15:00 20 7 70 4
    > 15:20:00 20 7 70 3
    > 15:25:00 18 7 71 3
    > 15:30:00 20 8 68 4
    > 15:35:00 18 7 71 4
    > 15:40:00 68 3 28 1
    > 15:45:00 99 1 0 0
    > 15:50:00 99 1 0 0
    > 15:55:00 98 2 0 0
    > 16:00:00 98 1 0 1

    > The oninit processes got complete control over the CPU utilization; the
    > high %usr value makes me think about a looping (bug) condition of the
    > engine.

    Looks like that.

    > 08:00:01 msg/s sema/s (-m)
    > 14:40:00 0.00 2563.73
    > 14:45:00 0.00 4192.57
    > 14:50:00 0.00 5048.27
    > 14:55:00 0.00 7597.28
    > 15:00:00 0.00 7247.46
    > 15:05:00 0.00 7148.11
    > 15:10:00 0.00 5934.86
    > 15:15:00 0.00 5720.28
    > 15:20:00 0.00 5669.58
    > 15:25:00 0.00 4660.03
    > 15:30:00 0.00 5560.06
    > 15:35:00 0.00 4718.81
    > 15:40:00 0.00 2091.79
    > 15:45:00 0.00 12.82
    > 15:50:00 0.00 12.79
    > 15:55:00 0.00 13.10
    > 16:00:00 0.00 12.67

    We can see here that the database engine normally does a lot of
    semaphore activity, but once it gets into this bad state, it stops.
    It's looping in core.

    > 08:00:01 scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s (-c)
    > 14:40:00 47760 14680 333 6.73 7.87 8205354 87337
    > 14:45:00 46093 13535 322 6.06 7.05 8468079 99356
    > 14:50:00 32606 8562 321 4.07 5.07 5434781 91777
    > 14:55:00 37005 8743 519 3.63 4.57 5372965 174227
    > 15:00:00 39274 9617 524 4.26 5.30 5978026 167135
    > 15:05:00 37010 8952 485 3.54 4.65 5916037 152972
    > 15:10:00 40199 10591 444 3.64 4.69 6629425 147242
    > 15:15:00 34367 8692 413 3.19 4.12 5436009 121322
    > 15:20:00 42047 11255 429 5.18 6.30 6784568 157546
    > 15:25:00 42976 12003 390 4.14 4.75 7369400 148823
    > 15:30:00 55452 15765 390 4.37 5.06 9125374 133925
    > 15:35:00 45569 12885 335 3.23 3.73 7570264 100898
    > 15:40:00 14034 3503 248 5.50 5.66 2033284 80457
    > 15:45:00 3519 823 260 6.73 6.45 342912 82584
    > 15:50:00 3104 572 219 6.05 5.78 246789 59167
    > 15:55:00 4157 856 406 9.71 9.28 433506 187387
    > 16:00:00 2938 577 240 6.42 6.12 315012 97352

    Likewise, when it's distracted by this spin, it stops doing so many
    system calls.

    It definitely looks like a spin inside the database user process.

    Both `truss` and `trace` can attach to a running process, to show you
    the system calls it is doing. Try those. It's likely that they'll show
    no calls being made.

    `dbx` can also attach to a running process; if you tell it to step by
    instruction, you can watch the process spin for a while. If you see it
    looping through the same instructions repeatedly, that's a hint. You
    won't be able to make too much sense of it without the source, but a
    loop is a loop...

    >Bela<


  • Next message: John Smith: "Can't compile with OSS646B on SCO 5.0.4"

    Relevant Pages

    • Re: SQLCLI
      ... Pffbbpppt. ... of a database table I want it to. ...  If I go into STRSQL and type the ...  The running process has the exact same library list as ...
      (comp.sys.ibm.as400.misc)
    • Re: RunCommand acCmdRecordsGoToNext
      ... Another thing that may come in handy if that doesn't is to loop through the ... The database I am utilizing this ... increasing intRecordCount on each loop. ... Dim strPassword As String ...
      (microsoft.public.access.forms)
    • Re: Sql Connection
      ... ..i dont like the code either....again a example of a loop ... making calls to the database .... ... what i did yesterday is just get the parsed data to a temp table on the ... the question is the open connection where the Datareader is used insid ...
      (microsoft.public.dotnet.languages.csharp)
    • Re: View contents of a folder
      ... For example, if your backup files are located on your D drive, in the backup ... to loop through that directory looking for other files that match the ... allows you to copy the backup file over your backend database, ... >> Folder browser and let you select a file via that mechanism. ...
      (microsoft.public.access.queries)
    • Re: How to Delete Rows in Excel In a Do Loop
      ... have to delete there are some that have a entry in each and every cell ... Richard Buttrey wrote: ... time you go through the loop. ... Now copy this formula down column A for the whole database ...
      (microsoft.public.excel.worksheet.functions)