Re: Signal dispositions



James Kuyper <jameskuyper@xxxxxxxxxxx> writes:
Charlie Gordon wrote:

[...]

Alternately, let it print the damn check, there is a good chance the
check will be correct and arrice in time. There is some possibility
that the error is so small as to not be worth reporting. If the
error is large, the you can complain and have it fixed... Or you
will not complain and wait for the bank to figure where these
millions came from ;-)

Everything about that paragraph is wrong. The chances are not good
that the paycheck will be correct and arrive on time. There's a large
probability that the error will be a big one. There is no error so
small that it's not worth reporting; tax auditors tend to get very
concerned about even small errors, because they think they might be a
signs of something more serious (and they are right to think that). If
the error is large, fixing it can be very expensive for the payer, and
a lot of hassle for the payee.

I think this way of discussing the issue subtly misses the point. A
segmentation fault occurs if the MMU has been asked to translate an
address for which it doesn't have a valid translation or if it has
detected a memory access running foul of the memory access permissions
for the address that should have been accessed. A CPU exception will
be raised because of this and the fault handler in the kernel takes
control. The information available to this handler will usually be the
reason of the fault, the type of intended access and the faulting
address. So, what to do now? Unless something in the MMU setup is
changed, the faulting instruction cannot be restarted, because this
would just cause it to fault again, turning the process that contained
it into a really expensive CPU hog. Since the kernel has no
information regarding what the purpose of the access was supposed to
be, let alone any information regarding what the process causing the
fault is trying to accomplish how, it cannot possibly decide what a
sensible 'other restart point' could be. Assuming the system allows
for SIGSEGV to be handled. If this is so, running the handler and
restarting the instruction afterwards can be tried, guarding against
an infinite series of faults produced this way. Otherwise, the only
available option is to terminate the process.

The reason the MMU is programmed this way is because it is its purpose
to ensure that different processes are isolated from each other,
except insofar the processes themselves arrange otherwise. Because a
process must not be able to arbitrarily write to the memory of another
process, the MMU must cause a trap if it attempts to access something
in a way it is not supposed to access it and the kernel must then
terminate the 'offender' for the reasons given in the previous
paragraph. If the access is an intended one, the hardware must be
informed of this explicitly, because it cannot possibly 'just know
it'.

Taking this requirement into account, the assumption that an access
the information available to the hardware causes to be flagged as
'invalid' is actually unintendend and caused by a programming error is
a sensible one. And even if the hardware had a way of knowing that the
consequences of the programming error will be harmless, and it hasn't,
it still cannot decide on what to do with the faulting process
without its explicit cooperation.

[...]

In production code, those signals should never be generated. If they
are, they should crash, so that the user can complain, and someone can
fix it.
If they are, they should be logged and reported yet best efforts
should be extended to minimize the impact on the user. Warning the
user of potential malfunction, requesting urgent attention may be
more appropriate than a core dump with no warning and no restart.

The core dump IS your warning, and restart should NOT be attempted
until the problem has been resolved, otherwise you could easily add to
the damage created by the first run of the program.

For a rarely triggered bug, restarting the offending process is a
common choice. But this can again only be accomplished by 'other
userspace software'.
.



Relevant Pages

  • Re: WM_TIMER crash (maybe)?
    ... As a result, if a wrong handler is supplied, the message ... > Stops with either release or debug mode. ... > describing the fault is followed by a 2nd, ... I also log OnTimer entrance, exit, OnReceive ...
    (microsoft.public.vc.mfc)
  • Re: Page fault handling within a language
    ... Hardware calls the OS's page fault handler. ... OS handler allocates a new physical frame and fills it somehow. ... User's handler allocates a new physical frame and fills it somehow. ... it requires the user process to be able to modify the virtual page directory. ...
    (comp.programming)
  • Re: [PATCH 5/5] x86: entry_64.S - trivial: space, comments fixup
    ... -# be possible to get enough handler activations to overflow the stack. ... * popping the stack frame and so it would still ... Fault while reloading DS, ES, FS or GS ...
    (Linux-Kernel)
  • Re: [PATCH 5/5] x86: entry_64.S - trivial: space, comments fixup
    ... -# be possible to get enough handler activations to overflow the stack. ... * popping the stack frame and so it would ... Fault while reloading DS, ES, FS or GS ...
    (Linux-Kernel)
  • Re: Was not making tail recursion elmination a mistake?
    ... I should have read "non local transfer of control" ... ... > environment that matches the condition, and funcalls its handler ... > dynamic environment that matches restart, ... The condition system seems much more normal if calling ...
    (comp.lang.lisp)