Immutable Strings

Revision as of 16:37, 17 April 2007 by Juliant (Talk | contribs) (Examples)

Warning.png Warning: This article under development and is not ready for any form of review.

Author: Paul Bates

Introduction

On the heals of a point raised on eiffelroom regarding read-only variants of an Eiffel STRING, this page has come about to discuss the possible options for introducing such new types.

The term read-only is not a fitting name so this page documents such string variants as being immutable with it's already implemented cousins STRING_8 and STRING_32 coined mutable.

Rational

There are a number of reason why Eiffel needs an immutable representation of a string, which no matter what should be able to be altered. Below explains the rationale for why immutable strings are required in a language, as for those reasons why they are there.

EMCA STRING_8 and STRING_32 Are Not Constants

Section 8.29 of the Eiffel ECMA specification details the declaration and use of constants in Eiffel. In section 8.29 the three Eiffel string forms are detailed as being constants. To be pedantic about the matter I extracted a dictional reference for the the word constant.

con·stant /ˈkɒnstənt/

  –adjective
    1. Not changing or varying; uniform; regular; invariable.

  –noun
    7. Something that does not or cannot change or vary.

ECMA details the use of the three STRING declaration variants as constants but in reality this is contracting to the definition, and misleading in true semantics. STRINGs are mutable, "constants" are not. As a simple case example, take the following code snippet.

full_path: STRING_8
  once
    Result := template_path
    Result.replace_sub_string_all ("$1", root_path)
  ensure
    result_attached: Result /= Void
    not_result_is_empty: not Result.is_empty
  end
 
template_path: STRING_8 = "$1\data\default.cfg"

The code demonstrates an all too common scenario. Once full_path has been called the contents of template_path are modified. Any other use of template_path will yield a "constant" value that differs from that declared. The EMCA specification indicates that declaration of template_path pertains to the specification of a constant attribute (8.29.2 and 8.29.3.)

full_path, with once function semantics, is never a constant but is evaluated on a single as-needed basis. full_path actually demonstrates yet another rationale for introducing immutable strings into Eiffel.

Immutable Interfaces

A second rationale is through good design of a class' exported interface. A good design will yield immutable exported members as not to seemingly violate such principles of object orientation. I note "seemly" violated because by technical reference such principles are not violated. The principle in reference is one that states - a class, and it's descendants, should be the only entities to modify a respective runtime instantiation internal state. No client should be permitted to perform such modifications - Technically STRING is a reference type so a qualified call, like append, made on a STRING object, is modifying the internal state of that STRING object. However STRING has special reverence that binds it with the likes of INTEGER, NATURAL and CHARACTER. It's an inbuilt rudimentary type that is seen to be "a value". Almost all other reference types are just objects and runtime with no real discernible value.

Current EiffelBase abstraction enabled authoring of immutable exported client interfaces, yet allow resident routines to manipulate the internals of an object's state.

feature -- Access
 
  selected_indexes: BILINEAR [NATURAL]
      -- Select item indexes
    do
      Result := internal_selected_indexes
    ensure
      result_attached: Result /= Void
    end
 
feature {NONE} -- Implementation
 
  internal_selected_indexes: ARRAYED_LIST [NATURAL]
      -- Mutable version of `internal_selected_indexes'

selected_indexes permits clients to access a list of index positions but never allows any extending or removal of items from that structure. internal_selected_indexes is used internally to add or extend items based on some peripheral interaction. If the author wanted client to modified the result of selected_indexes then additional routines can be implemented on a fully or partially exported part of the class' interface. Such routines as set_selected_indexes could be implemented or add_index and the conversely remove_index could be implemented as a Delegate pattern implementation.

As it stands today, with only mutable strings, it is not possible to author such classes. A class attribute, or a once function, is open for modification by an unruly client, where it be accidental through a missing clone of a STRING, using twin, or through naivety. Either way, it's dangerous!

Suggestions to Implementation

There are a number of factors to consider before deciding on a implementation choice for immutable string. First and probably most importantly is compatibility. Compatibility raises concerns regarding the assignment of a mutable string to an immutable string, vice-versa and even back again.

Compatibility

Before any indicate of how mutable and immutable strings interacte there's needs to be a brief rundown of the possible ways they can interact. There's two schools of thought; Conformance or Conversion.

Conformance or Conversion

It has been mentioned that maybe an immutable string should conform to a mutable string, for optimization purposes. Respecting the possible choice for conformance it does not seem coherent that a mutable string is actually a specialized immutable string?! In addition, with conformance it would be entirely possible to reverse assign an immutable string to a mutable string, rendering any immutable client interface ineffective at preventing external modification.

The alternative to conformance would be to use an implicit conversion routine to convert the mutable string to an immutable one, on assignment or a pass through to a routine argument typed to an immutable string. Conversion, personally, seems the most correct route to follow.

Examples

For the purpose of illustration the following declarations will be used. IMMUTABLE_STRING_8 is being introduced as a place holder for a immutable variant of the current mutable STRING_8 class.

ms1: STRING_8
ms2: STRING_8
is1: IMMUTABLE_STRING_8

The first topic is to determine how manifest string constants comply with mutable strings. The following lists all syntax valid operations based on the ECMA specification. The assignments are made using the types manifest constant specification in section 8.29.6.

1. ms1 := "Hello World"
2. ms1 := {STRING_8} "Hello World"
3. ms1 := {IMMUTABLE_STRING_8} "Hello World"
4. ms1 := once "Hello World"

In case 1 the compiler should recognize the assignment and create a STRING_8 using the manifest string. No immutable string should be created. Case 2 is effectively the same as case 1, just more explicit. Case 3 introduces the immutable string. In this case the compiler should reject the call. Immutable strings cannot become mutable. If the conforming option for implement were to be chosen then ms1 would be set to Void. An immutable string has been created and it does not conform to STRING_8. Finally case 4 leaves room for explanation. In a world of Eiffel immutable strings a once string should behave in the same manner as case 3. Currently it behaves like cases 1 and 2.

1. is1 := "Hello Immutable World"
2. is1 := {STRING_8} "Hello Immutable World"
3. is1 := {IMMUTABLE_STRING_8} "Hello Immutable World"
4. is1 := once "Hello Immutable World"

In case 1 the compiler should recognize the assignment and create a IMMUTABLE_STRING_8 using the manifest string. No mutable string should be created. Case 2 uses the mutable string. In this case the developer has explicitly requested that the manifest string be of a mutable type. The course of action is dictated by a choice between conformance or conversion. Case 3 is effectively the same as case 1, just more explicit. Finally case 4 leaves room for explanation. In a world of Eiffel immutable strings a once string should behave in the same manner as cases 1 and 3. Currently there is no defined behavior for this.

Using Conversion

As stated conversion seems to be the winner. In discussing this with others at Eiffel Software it seems the vote is unanimous that conversion is the best option after all things considered. The road to using conversion as a mean to interop between strings is not a clear one. There are problems that need to be overcome.

Fetching a Mutable String

First and most rudimentary is the ability to fetch a mutable version of an immutable string. All that is needed is function to make the explicit conversion; as_string_8 and as_string_32 functions will have to be added to the immutable string class or be defined in a more abstract version of the class.

Issues with ANY.is_equal

With a non-conforming implementation of mutable and immutable strings ANY.is_equal as defined in ECMA will be broken.

ECMA's is_equal loosens the binds when it comes to comparing objects for equality. is_equal's argument other is now of type ANY and not a CAT-call prone like Current anchored type. It's not desirable to compare immutable string with only immutable string, and the same is true for mutable strings and mutable strings. Functionality must be provided to compare either or for equality and True returned as a result when both strings contain the same chain of characters. To support this crucial scenario a common base for all string variants needs to be introduced. For the sake of clarity with names, this shall be referenced as ABSTRACT_STRING (see A Common Ancestor). ABSTRACT_STRING should covariantly redefine ANY.is_equal so it's other argument is of type ABSTRACT_STRING. This way strings can be compared without any need for conversion with more concrete versions of STRING_8 and IMMUTABLE_STRING_8. More importantly, no risk of CAT-calls.

The second and more dispelling problem is with the postconditions. same_type will fail! It's possible for immutable and mutable strings to be declared as being equal, after all most of the time it's the content a program wants to compare and not the object reference. It is an argumentative point if two related but different objects can be equal, with accordance to the EMCA informative text.

"..default_is_equal is frozen, always returning the value of field-by-field identity comparison (for not-void other); any class may, on the other hand, redefine is_equal, in accordance with the pre- and postconditions, to reflect a more specific notion of equality."

Should it really matter then that the objects being compared actually be of the same type, when the result of is_equal yields True? The post-conditions are too strong, forbidding two disparate from being compared in valid domain context.

Optimizations

The mutable versions of the string class should convert to, and cache, an immutable variant of a string upon an initial request. Once the mutable string's content is modified the cached immutable string is to invalidated. The next request to convert the mutable string to an immutable variant would yield a new immutable string reference.

A Common Ancestor

Just as STRING_GENERAL is available now, a more abstract implementation is required to implement the features available for all string variants. That is; mutable, immutable and 8 and 32 bit versions of each. In the interest of backwards compatibility the use of STRING_GENERAL is not a viable solution. A proposed ABSTRACT_STRING will instead be put forth for the real general purpose string class.

ABSTRACT_STRING should contain a number of the routines moved up STRING_GENERAL and STRING_GENERAL</code> will hence forth derived ABSTRACT_STRING. The routines that will not be resident in ABSTRACT_STRING will be anything pertaining to mutating the internal representation.

Is There a Need to STRING_8 and STRING_32

Of course in outlining such a proposal you have to ask the relevance of augments/patching (it depends on your views) Eiffel with immutable strings. Should Eiffel just have immutable strings? It's been discussed that there should be mutable and immutable variants of STRING. Deviating from this a moment and looking at the best possible scenario for implementing immutable strings - make mutable strings a thing of the past!

So many other languages have chosen to use a immutable string as their fundamental string type. For optimization in creating new strings or piecing parts of strings together a string builder is used. Under .NET there is the immutable System.String and optimized builder System.Text.StringBuilder. In Java the exact same story; String is an immutable string and StringBuffer is used to optimally create immutable strings.

Yes, choosing such a option would break existing code. But we have to respect that trying to patch rather than change, often causes more problems that breaking code. With code breaking changes there should always be assistants available to help user migrate.

As simple first step is to change all STRING_8 and STRING_32 into STRING_BUILDER_8 and STRING_BUILDER_32, or indeed a unified STRING_BUILDER for manipulating any sized string. The breaking changes can be minimized and all future use of STRING_8 and STRING_32 are known to be immutable.

Should Eiffel follow suit and depreciate the existing implementation of STRING_8 and STRING_32, in order to favor immutable strings and string builder(s)?