Persistence predicates

Revision as of 00:15, 27 September 2007 by Colin-adams (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Construction.png Not Ready for Review: This Page is Under Development!

Research: This page describes research about Eiffel, not the actual language specification.

Introduction

Using STORABLEs in an application has at least two significant disadvantages.

One is that using them for long-term persistence is vulnerable to changes in the class.

Another is that too much data may be stored, resulting in large files (or database occupancy, or whatever) and slow reload times. Some data, such as internal caches, are probably best not serialized at all. For other data, it may be better just to store a shorthand identifier to enable the data to be rebuilt (maybe on demand) at retrieval time.

This article explores ways to address the second problem.

Transient data

Java tackles this problem with the keyword transient. An attribute so marked is not stored at serialization time. When the object is retrieved the attribute will be void.

This seems rather limited to me. It can't be used if the class invariant constrains the attribute to be non-void. And I don't think there is a way for a descendant class to override this behaviour (it's a long time since I've written any Java, so I could be very wrong here).

My thinking is that there should be a way to specify whether or not a particular attribute should be stored on any given occaision, and a way to specify what to do at retrieval time.

Persistence predicates

A persistence predicate is a routine of type PREDICATE [ANY, TUPLE [!<attribute_type>]] associated with an attribute of type <attribute_type>. At storage time, the predicate will be called with the instance of the attribute as its argument (if the attribute is non-Void). If it returns True then the attribute is stored. If it returns False then the attribute is not stored, but it may be re-created at retrieval time.

Restoration routines

If a persistence predicate returns False, it may be that the programmer does not need this data at all (such as the case of an internal cache). But in some cases the programmer will want the data to be recreated at restoration time (this may be necessary to restore the class invariant), rather than retrieved from the persistence mechanism.

Take an example of a class which has an attribute named `gadget' of type GADGET_FROM_DATABASE having an attribute named `id' of type INTEGER. The class invariant requires that`gadget' is not Void, but the size required to store `gadget' is very large. The class also has a secret routine `gadget_from_identifier (a_id: INTEGER): GADGET_FROM_DATABASE' which can be used to create a GADGET_FROM_DATABASE given its `id' value. In such a case, rather than store `gadget' it is probably better to just store the `id' value, and then call `gadget_from_identifier' at retrieval time.

A restoration routine is a routine of type FUNCTION [ANY, TUPLE, {?|!}<attribute_type>] associated with an attribute of type <attribute_type> which is called at retrieval time to restore the attribute.

Syntax

Very hazy thoughts here. I did think about using indexing terms, but I'm not sure if they are appropriate.

Maybe one new keyword persistence will be required. Perhaps after the keyword persistence we will see something looking like indexing terms:

Example

gadget: GADGET_FROM_DATABASE
   -- Gadget retrieved from DB via its `id' attribute
  attribute
   ...
  persistence
   persistance_predicate: agent is_gadget_persisted (attribute.id) -- `True' and `False' would also be acceptable values, 
                                                                   --  with `True' being the default
   restoration_routine: agent gadget_from_identifier (attribute.id) -- Default is not to restore

Here the ECMA attribute keyword is reused to denote the instance of `gadget'.

Redefinition

The open/closed principle is honoured (I think). The author of the class can determine that an attribute must always be stored (this is the default, and is the current situation), or that it is never stored (for a pure cache attribute), or provide flexibility by specifying an agent.

The agent's routine could be deferred, frozen (yuk!) or just a normal routine, in which case descendant classes can redefine it.

The actual instance data that is passed to the persistance_predicate and the restoration_routine could be specified via an agent, which allows for further flexibility. For restoration, it might even involve an EV_DIALOG (heaven forfend)!

Class correctness conditions

The possibility of the persistance_predicate evaluating to (or statically specified as) False on an attribute that is required to be non-Void (or non-default for an expanded attribute) implies that the restoration routine must be specified in such cases. The use of an attached type as the return value of the restoration routine will be necessary to ensure class correctness in the case of invariant clauses which specify a non-Void attribute.