Immutable Strings

Revision as of 12:16, 17 April 2007 by Paulb (Talk | contribs) (A Common Ancestor)

Warning.png Warning: This article under development and is not ready for any form of review.

Author: Paul Bates

Introduction

On the heals of a point raised on eiffelroom regarding read-only variants of an Eiffel STRING, this page has come about to discuss the possible options for introducing such new types.

The term read-only is not a fitting name so this page documents such string variants as being immutable with it's already implemented cousins STRING_8 and STRING_32 coined mutable.

Rational

There are a number of reason why Eiffel needs an immutable representation of a string, which no matter what should be able to be altered. Below explains the rationale for why immutable strings are required in a language, as for those reasons why they are there.

EMCA STRING_8 and STRING_32 Are Not Constants

Section 8.29 of the Eiffel ECMA specification details the declaration and use of constants in Eiffel. In section 8.29 the three Eiffel string forms are detailed as being constants. To be pedantic about the matter I extracted a dictional reference for the the word constant.

con·stant /ˈkɒnstənt/

  –adjective
    1. Not changing or varying; uniform; regular; invariable.

  –noun
    7. Something that does not or cannot change or vary.

ECMA details the use of the three STRING declaration variants as constants but in reality this is contracting to the definition, and misleading in true semantics. STRINGs are mutable, "constants" are not. As a simple case example, take the following code snippet.

full_path: STRING_8
  once
    Result := template_path
    Result.replace_sub_string_all ("$1", root_path)
  ensure
    result_attached: Result /= Void
    not_result_is_empty: not Result.is_empty
  end
template_path: STRING_8 = "$1\data\default.cfg"

The code demonstrates an all too common scenario. Once full_path has been called the contents of template_path are modified. Any other use of template path will yield a "constant" value that differs from that declared. The EMCA specification indicates that declaration of template_path pertains to the specification of a constant attribute (8.29.2 and 8.29.3.)

full_path, with once function semantics, is never a constant but is evaluated on a single as-needed basis. full_path actually demonstrates yet another rationale for introducing immutable strings into Eiffel.

Immutable Interfaces

A second rationale is through good design of a class' exported interface. A good design will yield immutable exported members as not to seemingly violate such principles of object orientation. I note "seemly" violated because by technical reference such principles are not violated. The principle in reference is one that states - a class, and it's descendants, should be the only entities to modify a respective runtime instantiation internal state. No client should be permitted to perform such modifications - Technically STRING is a reference type so a qualified call, like append, made on a STRING object, is modifying the internal state of that STRING object. However STRING has special reverence that binds it with the likes of INTEGER, NATURAL and CHARACTER. It's an inbuilt rudimentary type that is seen to be "a value". Almost all other reference types are just objects and runtime with no real discernible value.

Current EiffelBase abstraction enabled authoring of immutable exported client interfaces, yet allow resident routines to manipulate the internals of an object's state.

feature -- Access

  selected_indexes: BILINEAR [NATURAL]
      -- Select item indexes
    do
      Result := internal_selected_indexes
    ensure
      result_attached: Result /= Void
    end

feature {NONE} -- Implementation
  internal_selected_indexes: ARRAYED_LIST [NATURAL]
      -- Mutable version of `internal_selected_indexes'

selected_indexes permits clients to access a list of index positions but never allows any extending or removal of items from that structure. internal_selected_indexes is used internally to add or extend items based on some peripheral interaction. If the author wanted client to modified the result of selected_indexes then additional routines can be implemented on a fully or partially exported part of the class' interface. Such routines as set_selected_indexes could be implemented or add_index and the conversely remove_index could be implemented as a Delegate pattern implementation.

As it stands today, with only mutable strings, it is not possible to author such classes. A class attribute, or a once function, is open for modification by an unruly client, where it be accidental through a missing clone of a STRING, using twin, or through naivety. Either way, it's dangerous!

Suggestions to Implementation

There are a number of factors to consider before deciding on a implementation choice for immutable string. First and probably most importantly is compatibility. Compatibility raises concerns regarding the assignment of a mutable string to an immutable string, vice-versa and even back again.

Compatibility

ms1: STRING_8
ms2: STRING_8
is1: IMMUTABLE_STRING_8
ms1 := "Hello World"
is1 := ms
ms2 := is1
Result := ms1 ~ ms2

In pseudo form this outlines the assignment of a constant string to an mutable string reference ms1. ms1 is assigned to the immutable string im1. The immutable string is then assigned back to ms2. What's the outcome of Result? Are ms1 and ms2 the same reference? I would hope not.

If fact an immutable string should probably never implicitly convert to a mutable string. Instead an explicit call to a as_string_8 or as_string_32 will have to be used.

Conformance or Conversion

It has been mentioned that maybe a immutable string should conform to a mutable string, for optimization purposes. Respecting the possible choice for conformance it does not seem coherent that a mutable string is actually a specialize immutable string?! In addition, with conformance the possibility to attempting a reverse assignment is possible, raising issues as outlines in the beginning of this section.

The alternative to conformance would be to use an implicit conversion routines to convert the mutable string to an immutable one, on assignment or a pass through to an immutable string. Conversion, personally, seems the most correct route to follow. With conformance it would be entirely possible to to reverse assign a immutable string to a mutable string, rendering any immutable client interface ineffective at preventing external modification.

Using Conversion

As stated conversion seems to be the winner. In discussing this with others at Eiffel Software it seems the vote is unanimous that conversion is the best option after all things considered. The road to using conversion as a mean to interop between strings is not a clear one. There are problems that need to be overcome.

Optimizations

The mutable versions of the STRING class should convert to and cache a immutable variant of a string, upon an initial request. Once the mutable string's content is modified the cached immutable string is to invalidated. The next request to convert the mutable string to an immutable variant would yield a new immutable string reference.


is_equal

With a non-conforming implementation the issue of testing equality has to be addressed. With

is_equal (other: like current)

changing to

is_equal (other: ANY)

any implict conversion is going to be thwarted.

A Common Ancestor

Just as STRING_GENERAL is available now a more abstract implementation is required to implement the features available in all string variants. That is mutable, immutable and 8 and 32 bit versions of each. In the interest of backwards compatibility the use of STRING_GENERAL is not a viable solution. A proposed ABSTRACT_STRING will instead be put forth for a real general purpose string class.

ABSTRACT_STRING will contain a number of the routines moved up STRING_GENERAL and STRING_GENERAL will hence forth derived ABSTRACT_STRING. The routines that will not be resident in ABSTRACT_STRING will be anything pertaining to mutating the internal representation.

Is There a Need to STRING_8 and STRING_32

Of course there is a need but is the implementation correct? It's been discussed that there should be mutable and immutable variants of STRING. Deviating from this a moment and looking at the best possible scenario for implmenting immutable strings - make mutable strings a thing of the past!

So many other languages have chosen to use a immutable string as their fundamental string type. For optimization in creating new strings or piecing parts of strings together a string builder is used. Under .NET there is the immutable System.String and optimized builder System.Text.StringBuilder. In Java the exact same story; String is an immutable string and StringBuffer is used to optimally create immutable strings.

Should Eiffel follow suit and depreciate the existing implementation of STRING_8 and STRING_32, in order to favor immutable strings and a string builder?