Re: challenge: end of life for 6.2 is premature with buggy 6.3




On Thu, 2008-06-05 at 11:14 -0500, Paul Schmehl wrote:
--On Thursday, June 05, 2008 10:23:55 -0400 John Baldwin <jhb@xxxxxxxxxxx>
wrote:

FWIW, at Y! 6.3 is more stable than 6.2 (I had a list of about 10 patches for
known deadlocks and kernel panics that were errata candidates for 6.2 that
never made it into RELENG_6_2 but all of them are in 6.3). We also have many
machines with bge(4) and from our perspective 6.3 has less issues with bge0
devices than 6.2.


I'm glad to hear that. I have a server that uses bce, and it was completely
non-functional until I hunted down some beta code that made it usable. I'd
like to upgrade, but this is a critical server with no redundancy (and it's a
hobby site with no money to pay for expensive support), and I'm not about to
upgrade unless I know for certain the problems won't reoccur, because I have to
upgrade remotely and pay money if the system goes down.

The problems with that driver were bad enough when the server was being
configured in my study. (The system would lock up, and only a hard reboot
would restore networking.) It would be hell trying to troubleshoot problems if
I had to drive the 45 miles to the hosting site and spend a night there trying
to get the server back up, then go to work the next day.

# uname -a
FreeBSD www.stovebolt.com 6.1-RELEASE-p10 FreeBSD 6.1-RELEASE-p10 #2: Mon Oct
16 15:38:02 CDT 2006 root@xxxxxxxxxxxxxxxxx:/usr/obj/usr/src/sys/GENERIC
i386

# grep bce /var/run/dmesg.boot
bce0: <Broadcom NetXtreme II BCM5708 1000Base-T (B1), v0.9.6> mem
0xf4000000-0xf5ffffff irq 16 at device 0.0 on pci9
bce0: ASIC ID 0x57081010; Revision (B1); PCI-X 64-bit 133MHz
miibus0: <MII bus> on bce0
bce0: Ethernet address: 00:13:72:fb:2a:ad
bce1: <Broadcom NetXtreme II BCM5708 1000Base-T (B1), v0.9.6> mem
0xf8000000-0xf9ffffff irq 16 at device 0.0 on pci5
bce1: ASIC ID 0x57081010; Revision (B1); PCI-X 64-bit 133MHz
miibus1: <MII bus> on bce1
bce1: Ethernet address: 00:13:72:fb:2a:ab

# grep bce0 /var/log/messages
May 2 09:10:31 www kernel: bce0: link state changed to DOWN
May 2 09:10:39 www kernel: bce0: link state changed to UP
May 25 07:49:49 www kernel: bce0: link state changed to DOWN
May 25 07:50:31 www kernel: bce0: link state changed to UP
May 26 21:28:36 www kernel: bce0: link state changed to DOWN
May 26 21:28:40 www kernel: bce0: link state changed to UP
May 27 13:13:21 www kernel: bce0: link state changed to DOWN
May 27 13:13:31 www kernel: bce0: link state changed to UP

It's been like that since the server was installed.

So, if I upgrade to 6.3 or 7.0, am I still going to experience these problems?
Is the server going to stop working entirely? How can I know that for sure
before starting an upgrade?

Because, I have a 7.0 STABLE workstation (I'm sending this email from it) with
a serious problem with umass, and no fix seems to be forthcoming. On a
workstation, I can work around problems. On a critical server, not so much.

Look, I know this is open source, all volunteer (hell, I'm a port maintainer
myself) and guys' time is extremely valuable (whose isn't?), but it seems to me
there needs to be better communication between the folks who know the code and
those who only run boxes. You might be able to read diffs and say, "Aha,
they've fixed the problem", but I can't. I don't know, if I upgrade to 6.3, if
the server will stop passing packets or not. And I can't take the chance that
it will.

Saying put up or shut up isn't going to win many friends. I can't use the
server for testing. It's a website with 5 to 7 million hits per month.

MInd you, I haven't complained about this and I'm not complaining now. I'm
simply saying it would be more productive if folks *listened* to what people
say about a particular problem and gave it some thought before firing salvos at
the "complainers" and demanding that they contribute to solving the problem
somehow.

--
Paul Schmehl

I think that, especially with open source products, there is a large
emphasis on testing in your own environments, and choosing the 'correct'
version of a particular software package is important. For example, at
$JOB, we had a lot of servers running 6.1 as it was an extended lifetime
release, so no point jumping to 6.2, instead we waited for 6.3 to pass
our integration testing.

We buy usually the same chassis for all our servers, and test
extensively before deploying to a new chassis/OS/anything. This is the
definition of change management, which is expensive, takes lots of time
and planning, and doesn't guarantee zaroo bugs - just a high likelihood
of not hitting them. It also isn't smooth, when we tested 6.1, we found
a multitude of bugs in bce(4), which we worked with net@ and David
Christensen of Broadcom to get fixed (they work lovely now :).

If you don't want to do this sort of work, then yes, things may fail
unexpectedly (sort of unexpectedly, I would consider not doing any
testing and then having things fail as a logical consequence..). It is
usually up to $BOSS to decide if you want to invest the resources (time,
people and money) locally to do change management planning, or outsource
your support to a 3rd party, or ignore the problem (this is a polite way
of saying 'put up or shut up', in my mind.)

If you just have <5 servers, the expense of getting duplicates for
testing, change management and release management may be too much to
handle. If you've got >100 servers, you really have no excuses not to do
CM; in my mind, not doing it is reckless.

This isn't a cost of OSS, we also do this for windows updates and
service packs etc.

Tom

Attachment: signature.asc
Description: This is a digitally signed message part



Relevant Pages

  • Re: NT to W2K3 Migration
    ... You cannot just promote a member server to AD. ... upgrade your PDC to windows 2000 or 2003, whichever flavor of AD you want to ... Upgrading from Windows NT Server 4.0 ... ensure that you have designed a DNS ...
    (microsoft.public.windows.server.active_directory)
  • Re: Native 2003 mode and NT4 workstations?
    ... Migrating from Windows NT Server 4.0 to Windows Server 2003 ... How to Upgrade from Windows NT Server 4.0 ... ensure that you have designed a DNS ...
    (microsoft.public.windows.server.active_directory)
  • Re: NT to W2K3 Migration
    ... How to Upgrade from Windows NT Server 4.0 ... Best Practice Active Directory Design for Managing Windows Networks ...
    (microsoft.public.windows.server.active_directory)
  • Re: Migrating NT4 to Windows 2003
    ... Migrating from Windows NT Server 4.0 to Windows Server 2003 ... How to Upgrade from Windows NT Server 4.0 ... Best Practice Active Directory Design for Managing Windows Networks ... ensure that you have designed a DNS ...
    (microsoft.public.windows.server.active_directory)
  • Re: Migrating NT4 to Windows 2003
    ... Migrating from Windows NT Server 4.0 to Windows Server 2003 ... How to Upgrade from Windows NT Server 4.0 ... Best Practice Active Directory Design for Managing Windows Networks ...
    (microsoft.public.windows.server.active_directory)