OT: Linux Kernel: coupling and maintainability

From: Mikko Putkonen (miputkon_at_paju.oulu.fi)
Date: 03/31/05


Date: Thu, 31 Mar 2005 18:30:25 +0000 (UTC)

Hello, c.o.v.

Any comments on this one?

---------------------------------------------------------------------

[http://www.groklaw.net/article.php?story=2005033107583993]

*Coupling and the Maintainability of the Linux Kernel
~ by Dr Stupid*

A recently presented paper
<http://csdl.computer.org/comp/trans/ts/2004/10/e0694abs.htm> has the
following abstract, something that would certainly gain the attention
of anyone interested in Linux kernel development:

    *Categorization of Common Coupling and Its Application to the
    Maintainability of the Linux Kernel*

    Data coupling between modules, especially common coupling, has
    long been considered a source of concern in software design,
    but the issue is somewhat more complicated for products that
    are comprised of kernel modules together with optional
    nonkernel modules. This paper presents a refined categorization
    of common coupling based on definitions and uses between kernel
    and nonkernel modules and applies the categorization to a case
    study.

    Common coupling is usually avoided when possible because of
    the potential for introducing risky dependencies among software
    modules. The relative risk of these dependencies is strongly
    related to the specific definition-use relationships. In a
    previous paper, we presented results from a longitudinal
    analysis of multiple versions of the open-source operating
    system Linux. This paper applies the new common coupling
    categorization to version 2.4.20 of Linux, counting the number
    of instances of common coupling between each of the 26 kernel
    modules and all the other nonkernel modules. We also categorize
    each coupling in terms of the definition-use relationships.
    Results show that the Linux kernel contains a large number of
    common couplings of all types, raising a concern about the
    long-term maintainability of Linux.

To anyone with a knowledge of software engineering terminology,
whether gained through formal education or from the University of
Life, the first 90% of the abstract is uneventful; this, though,
serves to maximize the impact of the final sentence. A "concern about
the long-term maintainability of Linux," no less. Mr A. Linux Kernel
went to the effort of writing that reports of his destruction had
been exaggerated
<http://www.groklaw.net/article.php?story=20050225155855922>, but now
we find in this paper rumours are circulating of a life-threatening
illness.

The full paper is only available to subscribers, but we were fortunate
to be able to discuss the paper with Andrew Morton, one of the lead
kernel developers, in two contexts: first, in a general discussion
about coupling and kernel maintainability, and then, after he had read
the complete paper, in specific terms related to the thoughts expressed
by the authors. As you will see, despite the worries expressed in the
paper, the Linux kernel is alive and well.

The researchers, in designing a theoretical model to evaluate the
coupling of Linux, have of necessity made certain assumptions to
reduce complexity and make the problem amenable to a mathematical,
quantitative approach. However, this can lead to inaccurate results:
you may recall the possibly apocryphal tale of the mathematical
demonstration that bumblebees can't fly
<http://www.sciencenews.org/articles/20040911/mathtrek.asp>. (As an
aside, there is also a parallel here with studies showing operating
system X to be "more secure" than operating system Y, when on closer
inspection the definition of "more secure" is a narrow and potentially
misleading, but easy to calculate, statistic figure.)

What is coupling?

"Coupling", a term which uses a visual metaphor of mechanical parts
coupled together by a driveshaft, is used widely in software
engineering to describe a link between two parts of a system that is
not part of an abstracted interface. We make this distinction because
the parts of a system have to be linked in some way -- otherwise there
would be no system. For the benefit of Groklaw's less technical readers,
I'll try to explain the concept in non-software terms (kernel
developers may skip the next few paragraphs.)

Imagine that the steering wheel of a car was like the steering wheel
one can buy for playing computer driving games -- that is to say, it
merely generated an electrical signal that said "a little bit left,"
"hard to the right," etc. and that this signal was passed to a device
under the bonnet that turned the front wheels. You could replace the
steering wheel with a similarly wired joystick, or anything that
generated an appropriate electrical signal, and you could still drive
the car. We would call this an abstracted interface. The communication
between the two parts (the steering wheel and the mechanism that turns
the front wheels) has been reduced to its conceptual essence of "I
want to go left" and "I want to go right."

In a typical car, though (especially one without power steering) the
steering wheel is directly and mechanically linked to the front wheels.
You could not easily replace the steering wheel with a joystick,
because the whole mechanism depends on the wheel being turned left and
right. But not only is the interface less abstracted, but it is also
highly coupled. You can feel bumps and vibrations coming back up from
the wheels on the road. In other words, the coupled interface means
that what happens to one part of the mechanism (going over a rock) has
a knock-on effect on the other (giving you a pain in the wrists) that
wasn't necessarily desired.

Going back to software terms, we would describe modules A and B as
coupled if, to operate properly, A relies on B's internal workings to
be a certain way, and vice versa. Just as a traditional steering wheel
is sensitive to holes in the road, A becomes sensitive to changes
inside B. That introduces a risk that when a bug is fixed in B, it may
cause an unexpected problem in A. It is this "knock-on effect" result
of coupling that makes software engineers -- especially when talking
theoretically -- nervous of coupling. They invent approaches like
"Model View Controller" to discipline themselves against thoughtless
coupling.

However, I hope that the above example also shows you the other side
of the coin. The high-tech electronic steering wheel was less coupled,
but more complex. There are more elements to go wrong, and a fault may
be harder to find. Also, some drivers would like to "feel the road"
via the steering wheel, and to give this feedback in the electronic
system would require more complex circuitry still. Sometimes, the
costs of eliminating coupling in a system outweigh the gains.

Back to the kernel

The paper focused on data coupling; roughly speaking, this is where
two or more software parts all make direct use of the same area of
computer memory. This can lead to situations where a particular part
can have data changed "behind its back," as it were. The developer has
to bear this in mind when writing the code, which isn't always easy.

We asked Frank Sorenson to read the paper and here is his comment:

    Too many dependencies between modules can obviously be viewed
    as a bad thing. However, no coupling/dependencies leads to
    multiple copies of the same thing, which is obviously more
    difficult to maintain. For example, the Linux kernel contains
    a library of common functions that may be used in the various
    modules. A month or so ago, someone realized that 6 different
    modules all implemented a 'sort' function, all with the same
    interface to the module. This brought about a push to
    standardize them, and a single 'sort' function was put into the
    common function library.

We've already mentioned that the costs of decoupling aren't always
justified -- this is a case in point. In this instance, increasing
the use of common code -- while increasing the coupling -- reduced the
maintenance requirements.

Frank continues:

    The article was submitted in July 2003. That's quite a while
    ago in Linux-kernel-time. A lot has changed since then, and
    2.6.x is (in my opinion) more maintainable due to being well-
    engineered from the beginning. Do the authors have results for
    the 2.6.x kernel? How does the use of global variables change
    from 2.4.x to 2.6.x?

    The kernel maintainers have pushed to make sure that the
    interface to kernel functions remains the same. For example,
    it would not be acceptable to change the way a common function
    behaves: copy_value(source, destination) should not ever change
    to copy_value(destination, source) (unless all references are
    fixed)

    Linux modules are generally organized in an hierarchical
    fashion. This makes it much harder for a change in one area to
    affect other modules or portions of the kernel.

    Obviously, what the authors discuss is a very real danger
    (not specifically to Linux, but to any sufficiently large
    project -- such as Longhorn!). The authors don't offer many
    valid suggestions on how to combat the problem. The fact that
    Linux is open allows them to do the research, however; the
    closed nature of Windows prevents people from seeing how
    Microsoft has addressed this problem (if at all.)

If Linux is too tightly coupled, how about Windows? Having your entire
user interface dependent on a web browser -- now that's coupling!

My personal opinion is that the 2.6 is much tidier and more organised
than 2.4, which in turn was tidier than 2.2, etc. The direction of the
Linux kernel is towards a cleaner, less coupled architecture -- there
is an active, ongoing, continuous effort to preserve maintainability.
Indeed, patches are frequently rejected purely on the grounds they
harm maintainability and have to be re-engineered accordingly.

Andrew Morton's comments

However, you probably didn't read this far to hear Frank and I
discussing the kernel, when we have Andrew Morton available. Here's
his initial comment on the abstract:

    They examined a kernel (2.4.20) which is unchanged in this
    regard from 2.4.15. We've done three and a half years of
    development since then! That being said, I wouldn't be
    surprised if their analysis showed that linux-2.6.11 also
    has a lot of coupling, even though we have done a lot of
    improvement work in that and other areas.

    But that's OK -- we often do this on purpose, because,
    although we are careful about internal interfaces, the kernel
    is optimized for speed, and when it comes to trading off speed
    against maintenance cost, we will opt for speed. This is
    because the kernel has a truly massive amount of development
    and testing resources. We use it.

    More philosophically, I wouldn't find such a study to be
    directly useful, really. It represents an attempt to predict
    the maintenance cost of a piece of software. But that's not a
    predictor of the quality! If you find that the maintenance cost
    is high, and the quality is also high, then you've just
    discovered that the product has had a large amount of
    development resources poured into it. And that is so. And it
    is increasing.

    If someone wants to use this study to say that "Linux is likely
    to be buggy" then I'd say "OK, so show me the bugs". If they're
    using it to say "Linux kernel maintenance uses a lot of
    resources" then I'd say "Sure. Don't you wish you had such
    resources?".

    Note that I'm not necessarily agreeing with the study. If they
    looked at the kernel core then sure, there's a lot of coupling.
    But that's a relatively small amount of code. If they were
    looking mainly at filesystem drivers and device drivers (the
    bulk of the kernel) then I'd say that the study is incorrect --
    the interfaces into drivers is fairly lean, and is getting
    leaner.

Andrew then went on to read the paper in detail. His subsequent
comments were rather different:

    AAAARRRGGGGHHH! . . .The only thing they've done is look at
    the use of global variables and they've assumed that using a
    global variable is a "bad" coupling. And look at the naughty
    global variables which we've used:

        jiffies: This is a variable which counts clock ticks.
        Of course it's global. Unless they know of a universe
        in which time advances at more than one speed at a time.

        [Dr S: System time has to be global because time is a
        universal throughout the system.We don't usually worry
        about Einstein in software development :) ]

    And they fail to note that if we _did_ want to "modularize"
    jiffies, we'd make a change in a single file:

        #define jiffies some_function_which_returns_jiffies()

    Other examples such as system_utsname, init_task,
    panic_timeout, stop_a_enabled, xtime and `current' are all by
    definition singleton objects.

    'current' is especially bogus -- this refers to the task
    structure for the currently-running task. It's not a global
    variable at all, really. If this is bad, then using the
    variable 'this' in C++ is also bad.

    Geeze. Who reviewed this?

Theory vs Practice

Of course, one can engage in armchair debate endlessly; ultimately,
what is needed is some empirical data against which a model or theory
can be tested. Coupling, like cholesterol, comes in "good" and "bad"
forms. The good form enables a system to work at peak performance,
without introducing excessive maintenance costs. The bad form results
in a system that is increasingly fragile and hard to scale. Which of
these in practice has been uppermost in Linux kernel development?

This kernel mailing list thread from 2002
<http://www.kerneltraffic.org/kernel-traffic/kt20020701_173.html> --
discussing a kernel of similar vintage to that covered by the study --
is of interest. Several people expressed a worry that the kernel would
never effectively scale beyond 4 CPUs -- and coupling was one of the
issues:

    [2-CPU SMP] makes fine sense for any tightly coupled system,
    where the tight coupling is cost-efficient.

Three years later, have "long-term maintainability" issues in the Linux
kernel held it back? Here's what Novell said last July
<http://www.novell.com/products/linuxenterpriseserver/sles9_whatsnew.pdf>
on the topic:

    "More than 128 CPUs have been tested on available hardware,
    but theoretically, there is no limit on the number that will
    work."

This bumblebee continues to fly.

---------------------------------------------------------------------

-Mikko (miputkon@paju.oulu.fi)



Relevant Pages

  • 2.6.25-rc1 panics on boot
    ... unable to handle kernel NULL pointer dereference at 0000010c ... # Automatically generated make config: ... # Linux kernel version: 2.6.25-rc1 ... # PCI IDE chipsets support ...
    (Linux-Kernel)
  • Re: 2.6.16 serious consequences / GPL_EXPORT_SYMBOL / USB drivers of major vendor excluded
    ... Only the kernel offers low latency and timeline processing ... using usbfs directly, no kernel driver needed. ... That seems _very_ large for a Linux kernel driver. ... release your code under this same license. ...
    (Linux-Kernel)
  • Re: 2.6.26-rc1-$sha1: RIP __d_lookup+0x8c/0x160
    ... I presume that this version of the kernel has the fixes from this thread ... # Linux kernel version: 2.6.26-rc1-afa26be86b65a7183ceac29bdf1f51d6fc6932f0 ... # SCSI support type ... # Input Device Drivers ...
    (Linux-Kernel)
  • Re: RT patch acceptance
    ... > If you gonna make usefull deterministic real-time in userspace you got to ... > change stuff in kernel space and implement stuff like priority ... this is why the RTAI project has an experimental branch called ... Linux kernel providing regular services and a specialized co-scheduler ...
    (Linux-Kernel)