Re: More hardware problems (advice needed)

From: Bill Moran (wmoran_at_potentialtech.com)
Date: 07/06/03

  • Next message: Wayne Pascoe: "Mounting extended dos partition"
    Date: Sun, 06 Jul 2003 14:10:21 -0400
    To: Adam <blueeskimo@gmx.net>
    
    

    Adam wrote:
    > My main FreeBSD (4.8) box has died on me again, and I'm 99% certain it's
    > due to hardware failure. However, I'm having a very hard time
    > determining what hardware is going bad, due to the nature of the crash.
    >
    > Let me describe the scenario.
    >
    > I was working on the machine, not doing anything out of the ordinary.
    > All of a sudden, my mouse stopped responding. I thought maybe moused had
    > crashed, so I did 'ps -aux |fgrep moused'. This caused ps to segfault,
    > which caused me to nearly soil myself. So, I decided to quickly kill all
    > my apps and exit X so I could reboot. When I closed X, I noticed a lot
    > of errors on my console about dc0 (my Linksys NIC interface, external)
    > having underruns, and that ad2 was timed out. I also noticed that my LAN
    > connection to my other box was dead. I tried to reboot, and all went
    > well until it got to the 'Rebooting...', at which point it hung. I
    > waited for 10+ minutes, thinking it might eventually reboot, but it was
    > stuck, so I turned it off.
    >
    > When I powered back up, I got tons of errors that the kernel couldn't be
    > loaded, and I couldn't even get into single-user mode. So, I made a
    > fixit floppy and fired up the fixit shell, and start poking around to
    > see what happened. I was able to mount ad3 and ad2 just fine, but
    > mounting ad0 caused fixit to panic and the machine reboot.
    >
    > So, this is where I am now. For those of you that remember, I had
    > another crash & burn experience on that machine a couple months ago,
    > where the machine just suddenly froze completely and my ad0 was trashed
    > when I boot back up. That time, I didn't have backups. This time, I do.
    > But, before I work on that computer again, I think I need to replace
    > some hardware.
    >
    > I've heard pretty good arguments for both the ad0 drive (Western Digital
    > 120gb, 2mb cache), and for the motherboard/cpu (Asus A7V266-E, Athlon
    > 1600+). I used memtest86 to test the RAM, which came up clean.
    >
    > I doubt if its a power problem, since I've got a very nice case (Antec
    > 1080, 400+ watts). Also, I've got another machine in my apartment that
    > hasn't experienced any weird problems like this.
    >
    > The CPU might be overheating, but its hard to tell. Roughly 5 minutes
    > after the crash, I checked the CPU temperature from the BIOS, which
    > registered 63C for the CPU. I have no idea how hot the CPU was at the
    > time of the crash, but it definitely had to have cooled off a bit in
    > those 5 minutes.

    Sounds like a HDD going ... I had a similar sceneria a few months ago
    and it was the HDD.
    You could get a FreeSBIE CD, boot it and run cpuburn to test the CPU.

    > I don't have enough $$ to replace all the hardware, so I'd like some
    > expert advice as to what is the most likely culprit. I don't know if
    > I'll be able to convince any of Asus, AMD, or Western Digital to give me
    > an RMA number, but I can try (also would like some advice on this to
    > maximize my chances).

    -- 
    Bill Moran
    Potential Technologies
    http://www.potentialtech.com
    _______________________________________________
    freebsd-questions@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-questions
    To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org"
    

  • Next message: Wayne Pascoe: "Mounting extended dos partition"

    Relevant Pages

    • SUMMARY: e3500 reboot after "fatal error FATAL" // CPU address controller issue (??)
      ... with self-diagnosis // sporatic reboots // hardare failure issues. ... machine would crash and there would be very few diagnostic messages to ... Only then were we able to locate the failed piece of hardware. ... failed cpu board in slot 7" ...
      (SunManagers)
    • Re: Spontaneous unclean reboots
      ... rather then reboot it. ... On older hardware there aren't any over heating protection, which makes it possible to keep on running until the CPU burns or something else happens. ... I do agree it's some kind of hardware trouble. ...
      (alt.os.linux)
    • More hardware problems (advice needed)
      ... determining what hardware is going bad, due to the nature of the crash. ... my apps and exit X so I could reboot. ... The CPU might be overheating, ...
      (freebsd-questions)
    • HP-UX 11.31 on new rx servers instability
      ... Dual-Core Itanium2 CPU. ... The crash was associated in loosing parts of the I/O ... hardware behind them. ...
      (comp.sys.hp.hpux)
    • Re: VDQ : Error inserting acpi_cpufreq
      ... reboot. ... It follows below "Checking for new hardware" ... your CPU doesn't support frequency control so it will run at its usual 233 Mhz or whatever. ...
      (Fedora)