<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
		<id>https://dev.eiffel.com/index.php?action=history&amp;feed=atom&amp;title=Heuristics_for_detecting_class_text_encoding</id>
		<title>Heuristics for detecting class text encoding - Revision history</title>
		<link rel="self" type="application/atom+xml" href="https://dev.eiffel.com/index.php?action=history&amp;feed=atom&amp;title=Heuristics_for_detecting_class_text_encoding"/>
		<link rel="alternate" type="text/html" href="https://dev.eiffel.com/index.php?title=Heuristics_for_detecting_class_text_encoding&amp;action=history"/>
		<updated>2026-05-14T04:48:03Z</updated>
		<subtitle>Revision history for this page on the wiki</subtitle>
		<generator>MediaWiki 1.24.1</generator>

	<entry>
		<id>https://dev.eiffel.com/index.php?title=Heuristics_for_detecting_class_text_encoding&amp;diff=7807&amp;oldid=prev</id>
		<title>Ericb: A class may start with a comment</title>
		<link rel="alternate" type="text/html" href="https://dev.eiffel.com/index.php?title=Heuristics_for_detecting_class_text_encoding&amp;diff=7807&amp;oldid=prev"/>
				<updated>2007-03-30T17:21:36Z</updated>
		
		<summary type="html">&lt;p&gt;A class may start with a comment&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;tr style='vertical-align: top;'&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black; text-align: center;&quot;&gt;Revision as of 17:21, 30 March 2007&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 18:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 18:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;frozen&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;frozen&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;or a byte-order-mark (have I missed anything?).&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;or a byte-order-mark (have I missed anything? &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;--[[User:Ericb|Ericb]] 19:21, 30 March 2007 (CEST): yes: comments starting with -- &lt;/ins&gt;).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;So these can all be tested for all seven possible encodings (you only have to read 8 bytes).&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;So these can all be tested for all seven possible encodings (you only have to read 8 bytes).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Note that the Eiffel Studio editor should save class texts in an encoding scheme according to a user preference. I recommend allowing only UTF-8 (without a BOM), and UTF-16 and UTF-32 (with the aditional requirement of a BOM).&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Note that the Eiffel Studio editor should save class texts in an encoding scheme according to a user preference. I recommend allowing only UTF-8 (without a BOM), and UTF-16 and UTF-32 (with the aditional requirement of a BOM).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Ericb</name></author>	</entry>

	<entry>
		<id>https://dev.eiffel.com/index.php?title=Heuristics_for_detecting_class_text_encoding&amp;diff=7805&amp;oldid=prev</id>
		<title>Manus at 17:12, 30 March 2007</title>
		<link rel="alternate" type="text/html" href="https://dev.eiffel.com/index.php?title=Heuristics_for_detecting_class_text_encoding&amp;diff=7805&amp;oldid=prev"/>
				<updated>2007-03-30T17:12:38Z</updated>
		
		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;tr style='vertical-align: top;'&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black; text-align: center;&quot;&gt;Revision as of 17:12, 30 March 2007&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[[Category:Unicode]]&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;When Eiffel Software implement support for class texts written in Unicode, it is important to detect which Unicode encoding scheme is in use. Note that class authors will want to write class text in UTF-8 for certain (support for other encoding schemes is sporadic amongst programmers text editors), UTF-16/UTF-16BE/UTF-16LE probably (especially on Windows systems, or in East Asia), and least likely in UTF-32/UTF-32BE/UTF-32LE. The ECMA standard does not (yet - I have raised the issue) address the matter of encoding schemes. I trust it will allow either all 7 freely, or (better), just UTF-8 (without a BOM), and UTF-16 and UTF-32 (with the aditional requirement of a BOM).&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;When Eiffel Software implement support for class texts written in Unicode, it is important to detect which Unicode encoding scheme is in use. Note that class authors will want to write class text in UTF-8 for certain (support for other encoding schemes is sporadic amongst programmers text editors), UTF-16/UTF-16BE/UTF-16LE probably (especially on Windows systems, or in East Asia), and least likely in UTF-32/UTF-32BE/UTF-32LE. The ECMA standard does not (yet - I have raised the issue) address the matter of encoding schemes. I trust it will allow either all 7 freely, or (better), just UTF-8 (without a BOM), and UTF-16 and UTF-32 (with the aditional requirement of a BOM).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Manus</name></author>	</entry>

	<entry>
		<id>https://dev.eiffel.com/index.php?title=Heuristics_for_detecting_class_text_encoding&amp;diff=7804&amp;oldid=prev</id>
		<title>Colin-adams at 17:06, 30 March 2007</title>
		<link rel="alternate" type="text/html" href="https://dev.eiffel.com/index.php?title=Heuristics_for_detecting_class_text_encoding&amp;diff=7804&amp;oldid=prev"/>
				<updated>2007-03-30T17:06:38Z</updated>
		
		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;When Eiffel Software implement support for class texts written in Unicode, it is important to detect which Unicode encoding scheme is in use. Note that class authors will want to write class text in UTF-8 for certain (support for other encoding schemes is sporadic amongst programmers text editors), UTF-16/UTF-16BE/UTF-16LE probably (especially on Windows systems, or in East Asia), and least likely in UTF-32/UTF-32BE/UTF-32LE. The ECMA standard does not (yet - I have raised the issue) address the matter of encoding schemes. I trust it will allow either all 7 freely, or (better), just UTF-8 (without a BOM), and UTF-16 and UTF-32 (with the aditional requirement of a BOM).&lt;br /&gt;
&lt;br /&gt;
If my suggestion in [http://www.eiffelroom.com/blog/colin_adams/mixing_unicode_and_latin_1_class_texts Mixing Unicode and Latin-1 class texts] is followed, and I hope it is, then the compiler will already know from the cluster definition, whether or not a class is written in a Unicode encoding scheme or not. Therefore confusion with Latin-1 texts does not arise.&lt;br /&gt;
&lt;br /&gt;
If only UTF-8 (without a BOM), and UTF-16 and UTF-32 (with the aditional requirement of a BOM) are allowed, then the heuristic is simple - if it has a BOM, examine the first four bytes - this determines the encoding. If it is not a BOM, then it is UTF-8.&lt;br /&gt;
&lt;br /&gt;
Otherwise, simple heuristics can reliably determine the encoding scheme of the class by reading the first line of the class text as a sequence of octets (e.g. reading it as a STRING_8 and interpreting the `code' of each &amp;quot;character&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
The class text can only begin with one of the following:&lt;br /&gt;
&lt;br /&gt;
White space,&lt;br /&gt;
indexing,&lt;br /&gt;
notes,&lt;br /&gt;
class,&lt;br /&gt;
deferred,&lt;br /&gt;
expanded,&lt;br /&gt;
frozen&lt;br /&gt;
&lt;br /&gt;
or a byte-order-mark (have I missed anything?).&lt;br /&gt;
&lt;br /&gt;
So these can all be tested for all seven possible encodings (you only have to read 8 bytes).&lt;br /&gt;
&lt;br /&gt;
Note that the Eiffel Studio editor should save class texts in an encoding scheme according to a user preference. I recommend allowing only UTF-8 (without a BOM), and UTF-16 and UTF-32 (with the aditional requirement of a BOM).&lt;/div&gt;</summary>
		<author><name>Colin-adams</name></author>	</entry>

	</feed>