Handling Exceptions Gracefully

Revision as of 08:05, 2 October 2008 by Paulb (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Author: Paul Bates

Preface

Exceptions have been in the Eiffel language for a little while now, and even before the Exceptions as Objects mechanism was implemented, exceptions still appeared when running Eiffel systems. No matter how hard we might try to engineer our code to be safe and free of expensive exception conditioning, it'll still happen. One need not do anything complex or contrived, an exception is possibly even in the most basic of engineered systems. There are memory, file, networking exception and even mutexes concerns to regard when working with such rudimentary facilities. There and myriad of exception cases and many more cases where code is written to protect against these cases, but code is either duplicated or forgotten about.

An Example

I'm getting ahead of myself, I digress. I'm going to illustrate a simple example for which this page is based. It's a functional system the preforms some operations based on a state attribute:

perform (args: OPTIONS)
        -- Perform an operation.
    require
        ...
    local
        l_state: like current_state
    do
            -- Ensure we are processing everything.
        l_state := current_state
        if l_state /= process_all then
            set_current_state (l_state)
        end
 
            -- Perform operations.
        ... 
            -- End operations.
 
        if l_state /= process_all then
                -- Reset the current state.
            set_current_state (l_state)
        end
    ensure
        current_state_unchanged: current_state = old current_state
    end

Not much is seemingly wrong with the above code. It appears to be a good citizen of the system and restores any changes made to the system before executing.

Debugging the completed system goes without a hitch and everything works just fine. Yet a released, finalized version doesn't behave as expected and no cause can be found. This is classic example of anomalies that can happen during execution on an end-user system, where any number of applications can be running and system set ups vary drastically, with each and all a potential malevolent entity.

For the sake of example, I'm going to pretend the operation being performed is some communication over a network, because networks are susceptible to any magnitude of failures, at any point; a power outage, a disconnected Ethernet cable, a router just failing because of age, router firmware/software issues or and earthquake just rocked it off the shelf and caused it to break when hitting the floor. I could be here all day...

Exception Protection

Let's protect the code and make sure that current_state attribute is reset and the post-condition ensured, then in the event of a failure the system resides in a correct state according to the explicit contracts:

perform (args: OPTIONS)
        -- Perform an operation.
    require
        ...
    local
        l_state: like current_state
        retried: BOOLEAN
    do
        if not retried then
                -- Ensure we are processing everything.
            l_state := current_state
            if l_state /= process_all then
                set_current_state (l_state)
            end
 
               -- Perform operations.
           ... 
               -- End operations.
        end
 
        if  l_state /= process_all then
                -- Reset the current state.
            set_current_state (l_state)
        end
    ensure
        current_state_unchanged: current_state = old current_state
    rescue
            -- There was a network failure.
        retried := True
        retry
    end

Great, we are safe! There quite a bit of extra logic in there just to make sure we clean everything up correctly.

But wait! The exception shouldn't just be ignored! After all it is an exception case and the user should be notified something went wrong. The rescue clause as it stands prevents the exception from being propagated to the caller, creating a silent failure and a most perplexing scenario for the user - It just doesn't work and they have no idea why! An unwinding of the stack should discover a rescue clause able to correctly deal with the network exception and present the user with some form of UI or notification of the failure.

Exception Propagation (Re-Raise)

This seems to be a critical detail not to be over looked. This is especially true for libraries where clients may need to be notified of failures in some way. Let's support this by extending the example once more:

perform (args: OPTIONS)
        -- Perform an operation.
    require
        ...
    local
        l_state: like current_state
        l_manager: !EXCEPTION_MANAGER
        retried: BOOLEAN
    do
        if not retried then
                -- Ensure we are processing everything.
            l_state := current_state
            if l_state /= process_all then
                set_current_state (process_all)
            end
 
               -- Perform operations.
           ... 
               -- End operations.
        end
 
        if  l_state /= process_all then
                -- Reset the current state.
            set_current_state (l_state)
        end
 
        if retried then
                -- Re-raise the exception.
            create l_manager
            l_manager.raise (l_manager.last_exception)
        end
    ensure
        current_state_unchanged: current_state = old current_state
    rescue
        if not retried then
            retried := True
            retry
        end
    end

This is the safest, and a non-code duplicating way of doing things. This time we garnish the rescue clause with an conditional check to ensure the routine is not re-rescued. At the end of the routine body the last exception is re-raised (very crudely I might add), so the exception can be propagated up to the caller or the next unwound stack frame capable of handling the exception correctly.

So now two exceptions are being raised. The original network exception and now one used for propagation. This is a performance hit because exceptions are computationally expensive. This is ideal in some scenarios to report a more specific type of exception, but for the example, it's wasted processing power to ensure our system is in a correct state.

Exception Propagation (Pass-Through)

There is another way, without re-raising the last exception - passing the exception through the routine, by not retrying at the time of rescue:

perform (args: OPTIONS)
        -- Perform an operation.
    require
        ...
    local
        l_state: like current_state
    do
            -- Ensure we are processing everything.
        l_state := current_state
        if l_state /= process_all then
            set_current_state (process_all)
        end
 
           -- Perform operations.
       ... 
           -- End operations.
 
        if  l_state /= process_all then
                -- Reset the current state.
            set_current_state (l_state)
        end
    ensure
        current_state_unchanged: current_state = old current_state
    rescue
        if l_state /= process_all ten
            set_current_state (l_state)
        end
    end

Wow, that saved, seemingly, a lot of code at the expense of a little code duplication. Here the example is small and only one state attribute needs resetting, start adding more and you're causing a headache for any other reader, engineer, and even yourself. For one, code duplication is bad because any changes need to be reflected in all duplications, which also logically means both restore state code blocks needs to be tested/debugged for each change. In the example above the restore state code blocks are almost adjacent to each other so keeping the block synchronized doesn't seem to be much of a problem, but in real-world application it is likely that the rescue block resides a distance away from the routine body's restore state code, and may even be off-screen, potential causing an unsynchronized update and anomalies at runtime if not properly checked. This is in disregards to the fact that it's still an maintenance frustration because the likelihood is a copy and paste operation.

To put the proverbial icing on the cake, let us take redefinition and Precursor calls into account. Using the previous example with code duplication we can implement something that works just dandy, with the added overhead that we are duplicating code and exponentially doubling our time to debug. With the aforementioned re-raising of exceptions, there's still more verbose code as well as the overhead of re-raising yet another exception.

Introducing the Final Clause

The solution to the problem of code complexities, without forsaking safety is to introduce an "alway-executed" routine block. Regardless of the state of exception of a routine body an always-executed code block will execute the code in a normal and abnormal execution.

This is not a proposal, merely and idea. The alway-executed block, to be know as a final clause could look something like this:

perform (args: OPTIONS)
        -- Perform an operation.
    require
        ...
    local
        l_state: like current_state
    do
            -- Ensure we are processing everything
        l_state := current_state
        if l_state /= process_all then
            set_current_state (process_all)
        end
 
           -- Perform operations.
       ... 
           -- End operations.
    final
        if  l_state /= process_all then
                -- Reset the current state.
            set_current_state (l_state)
        end
    rescue
        ...
    ensure
        current_state_unchanged: current_state = old current_state
    end

After the execution, during a normal execution path, the final clause will be executed and then, after, any post-conditions. The clause remains functional to only the routine in which is resides, so inheritance and redefinition will play no part in its execution. On the contrary, during an abnormal execution path, execution will preform any operations defined in the rescue clause and then execute the final clause. In the event a rescue clause requests a retry, the final clause will be executed after a successful or unsuccessful retry.

One final semantic explanation is required, this in regards to abnormal execution inside of a final clause. I would suggest the final clause is exited, as would a routine body in an exception case. Then, for simplicity, any rescue clause, whether executed or not, is skipped and the exception propagated to the caller. The semantics should retain the same behavior as if there were an exception in the rescue clause. The rule adhere to is final clauses should be free, as possible, of exception-possible scenarios.

Trying to provide smarter execution semantics based on the execution of a rescue clauses and whether or not a retry has been performed would only cause likely confusion as to when, how and how many times a final clause is execution, as well as increasing the possible debug paths needed to be traced to ensure the routine correctly handles all anomalies.