Most of our examples have been concerned with some sort of manipulation of characters in the form of texts. Occasionally we have used GetChar and PutChar to manipulate the individual characters in them. At other times we have used Getint, PutFix etc. to manipulate groups of characters within texts. Any text processing or editing programs are heavily dependent on the manipulation of individual characters as well as their combinations. This chapter shows those features of SIMULA designed to help in this. Most of these features are system procedures. We start with two simple ones, shown in example 12.1. This shows a program which finds all the numbers in a text containing a mixture of digits and letters. It is assumed that no other characters will be present.
SIMULA provides two Boolean procedures, Letter and Digit, both taking a single, character parameter. If this parameter is one of a-z or A-Z the procedure Letter will return the value True, otherwise it will return the value False. If its parameter is one of 0-9, Digit will return True, otherwise False.
Example 12.1 shows both.
Example 12.1 : Sorting letters and numbers.
begin
text Input, LettersOut, NumbersOut;
character Next;
InImage;
Input:-SysIn.Image;
LettersOut :- Blanks(80);
NumbersOut :- Blanks(80);
while Input.More do
begin
Next:=Input.GetChar;
if Letter(Next) then LettersOut.PutChar(Next) else
if Digit(Next) then NumbersOut.PutChar(Next)
end;
OutText(Input);
OutImage;
OutText(LettersOut);
OutImage;
OutText(NumbersOut);
OutImage
end
Computers store characters in their memory as numbers. Each character is represented by a different integer value. Unfortunately there is no single system for this. The numbers representing each character can vary from machine to machine. In practice most machines use either the EBCDIC system or the International Standards Organisation (ISO) system. The ISO system is often called by its earlier name, ASCII. The two systems are shown in tables 12.1 and 12.2. These tables are known as the collating sequences for the two systems.
It is sometimes useful to be able to find the internal number representing a character or to be able to convert a number into the corresponding character. SIMULA provides procedures for both of these. Example 12.2 shows a program which converts characters in a text which uses ISO characters into an equivalent text containing EBCDIC characters. This is often necessary when reading files transferred from another computer.
System integer procedure Rank takes a single character value parameter and returns the number representing the character. In 12.2, the characters inside ISOText are in the ISO form and so the call
Rank(ISOText.GetChar)
will return the internal number of the next ISO character in the text, following table 12.1. Example 12.2 : Character set conversion.
begin
integer Count, ISONumber, EBCDICNumber;
character ISO, EBCDIC;
text ISOText, EBCDICText;
integer array EBCDICChar (0:255);
for EBCDICNumber := 0,
1, 2, 3, 55, 45, 46, 47, 22, 5, 37,
11, 12, 13, 14, 15, 16, 17, 18, 19, 60,
61, 50, 38, 24, 25, 63, 39, 28, 29, 30,
31, 64, 79, 127, 123, 91, 108, 80, 125, 77,
93, 92, 78, 107, 96, 75, 97, 240, 241, 242,
243, 244, 245, 246, 247, 248, 249, 122, 94, 76,
126, 110, 111, 124, 193, 194, 195, 196, 197, 198,
199, 200, 201, 209, 210, 211, 212, 213, 214, 215,
216, 217, 226, 227, 228, 229, 230, 231, 232, 233,
74, 224, 90, 95, 109, 121, 129, 130, 131, 132,
133, 134, 135, 136, 137, 145, 146, 147, 148, 149,
150, 151, 152, 153, 162, 163, 164, 165, 166, 167,
168, 169, 192, 106, 208, 161, 7, 32, 33, 34,
35, 36, 21, 6, 23, 40, 41, 42, 43, 44,
9, 10, 27, 48, 49, 26, 51, 52, 53, 54,
8, 56, 57, 58, 59, 4, 20, 62, 225, 65,
66, 67, 68, 69, 70, 71, 72, 73, 81, 82,
83, 84, 85, 86, 87, 88, 89, 98, 99, 100,
101, 102, 103, 104, 105, 112, 113, 114, 115, 116,
117, 118, 119, 120, 128, 138, 139, 140, 141, 142,
143, 144, 154, 155, 156, 157, 158, 159, 160, 170,
171, 172, 173, 174, 175, 176, 177, 178, 179, 180,
181, 182, 183, 184, 185, 186, 187, 188, 189, 190,
191, 202, 203, 204, 205, 206, 207, 218, 219, 220,
221, 222, 223, 234, 235, 236, 237, 238, 239, 250,
251, 252, 253, 254, 255 do
begin
EBCDICChar(Count) := EBCDICNumber;
Count := Count + 1
end;
InImage;
ISOText:-SysIn.Image;
EBCDICText:-Blanks (Image.Length);
while ISOText.More do
begin
ISO:=ISOText.GetChar;
ISONumber:=Rank(ISO);
EBCDICNumber:=EBCDICChar(ISONumber);
EBCDIC:=Char(EBCDICNumber);
EBCDICText.PutChar(EBCDIC)
end
end
The integer array EBCDICChar has one hundred and twenty seven characters, corresponding to the ordinary control and printing characters in the two character sets used. Each element of the array is set so that its value is the internal EBCDIC representation of the character whose ISO representation is the number of the index to that element. As an example, the internal representation of a space character is 64 in EBCDIC and 32 in ISO. Thus element 32 of the array is set to 64.
To convert the ISO values returned by Rank above into EBCDIC, the value returned is used to index the array in a subscripted variable and the value of this will be the EBCDIC representation of the same character.
Using the same example, if Rank (ISOText.GetChar) returns 32, having found a space in the text, this is used to index EBCDICChar, i.e. EBCDICChar(32). The value of element 32 is 64, the EBCDIC representation of a space.
Thus the use of GetChar, Rank and indexing of the array EBCDICChar has found the value of the EBCDIC representation of an ISO character in our text. Now we need to convert this into a character.
System character procedure Char takes a single integer value parameter, whose value must be legal as the internal representation of a character on that system. (This range should be specified in the documentation for the SIMULA system you are using). Char returns a character whose internal representation is the number passed as a parameter. It is important to note that Rank and Char do not concern themselves with which character set is being used, only with moving a number, held in some form such as a binary number, from a location reserved for a character to one reserved for an integer or vice versa. Char objects only if the integer is too large to fit into the, usually, smaller space used for a character. Interpreting characters according to ISO, EBCDIC or whatever is only done by some reading and writing procedures.
Thus, in the example, the EBCDIC representation of the character obtained from ISOText, is passed to Char, which returns a character with this as its internal representation. This can then be written as the EBCDIC translation into EBCDICText, using PutChar.
When writing character handling programs to run on any computer, it is very inconvenient to have to allow for the possible character sets on each particular machine. This makes it nearly impossible, in fact, to write truly portable programs using the procedures Char and Rank. Fortunately SIMULA has a way of avoiding this. On any but the oldest SIMULA systems, two more system procedures are supplied. They are called ISOChar and ISORank. They match Char and Rank exactly except that they work entirely in terms of the ISO character set.
For ISORank, this means that the value returned is converted from the internal representation of the character parameter in the machine's own character set, to the ISO internal representation.
For ISOChar, it means that when the integer parameter is converted into a character, it is first converted into the internal representation of the character that it would represent in the ISO set.
If we take our space character on a machine using EBCDIC as its internal character set, we would find the following results with ISOChar and ISORank.
Eample 12.3 : Lower to upper case conversion using ISORank and ISOChar
begin
text Buffer, Update;
integer Convert, Factor;
character Next;
Buffer:- Copy("This little piggy was Fred");
Update:- Buffer;
Factor := ISORank('A') - ISORank('a');
while Buffer.More do
begin
Next:= Buffer.GetChar;
Convert:= ISORank(Next);
if Convert GE ISORank('a') and Convert LE ISORank('z') then
begin
Convert:= Convert + Factor;
Next:= ISOChar(Convert)
end;
Update.PutChar(Next)
end;
OutText(Buffer);
OutImage;
OutText(Update);
OutImage
end
If we call Rank (' '), we get 64 returned, which is the local, EDCDIC, internal representation of a space. A call of ISORank (' '), on the other hand, will return us 32, the ISO equivalent. Conversely, to get Char to return a space, we need to call Char(64), but using ISOChar we must call ISOChar(32).
Using these procedures we can write totally portable character handling programs. Example 12.3 shows how to write a portable program to convert all the lower case characters in a text into upper case. This is sometimes called "folding" the text into upper case.
The difference between ISORank('A') and ISORank('a') is the same as the difference between the internal representations of each upper case character and its corresponding lower case character, in the ISO character set. Thus the value assigned to Factor will enable us to convert any lower case character's ISORank to that of its upper case equivalent.
The program checks each character to find if it is a lower case letter. Lower case letters in the ISO sequence are represented by consecutive numbers, starting with 'a' and ending with 'z'. This makes it easy to check whether the Rank of each character is within this range. (This would not work with EBCDIC. See table 12.2.)
Note how useful and is in these checks. We can use a single if statement, instead of two nested ones. Checking that a number lies within a certain range is a very common use of and.
When the program finds a lower case letter it adds Factor to its ISORank, putting the result in Convert.
The program uses the technique, seen earlier, of creating two references to the same text frame. By reading through one reference and writing through the other, at an equal rate and left to right, we update the text frame without the need to copy into another and back again.
Note that this program would not work for the EBCDIC character set, if Char and Rank replaced ISOChar and ISORank. This demonstrates the usefulness of the ISO procedures in writing portable programs rather neatly.
12.1 Because of hardware problems, a file has been corrupted. It now contains a number of unprintable characters. Assuming that letters, digits, spaces, full stops, commas, colons and semi-colons are the only characters which should be present, write a program which will remove the others. 12.2 Write a program which converts all upper case letters to lower and all lower to upper in a file.
12.3 Go back to exercise 7.8. Using this add the capability to your editor to change the case of the next letter from its current position.
When writing our text formatting program in chapter eleven, we were forced to use more and more deeply nested if statements to check which directive was being read. This becomes rather clumsy and difficult to read when more than a few choices are possible. It is also slow to run when lots of checks must be made before a course of action is selected. The help with this sort of situation, SIMULA contains a feature called a switch. To see how this works, consider example 12.4. This is example 11.1 rewritten with a switch.
Example 12.4 : The use of a switch
begin
class Page;
begin
class Print_Block(Width,Length);integer Width, Length;
begin
ref(Print_Block)Next;
text array Contents(1:Length);
integer Count;
for Count:=1 step 1 until Length do Contents(Count):-Blanks(Width)
end--of--Print_Block;
Print_Block class Title_Block(Title);text Title;
begin
Contents(Length//2):=Title
end--of--Title_Block;
Print_Block class Text_Block;
begin
for Count:=1 step 1 until Length do
begin
InImage;
Contents(Count):=InText(Width)
end
end--of--Text_Block;
Print_Block class Diagram(Title); text Title;
begin
Contents(1):=Title
end--of--Diagram;
ref(Print_Block) Head, Tail, New_Block;
text Directive;
integer Len;
procedure Add(NewBlock); ref(Print_Block) NewBlock;
begin
if Head==None then Head :- NewBlock;
if Tail=/=None then Tail.Next :- NewBlock;
Tail :- NewBlock
end++of++Add;
switch Action := TitleB, TextB, DiagB;
Character ActionCode;
Directive :- InText(2);
while Directive ne "$E" do
begin
Directive.SetPos(2);
ActionCode := Directive.GetChar;
go to Action(ISORank(ActionCode) - ISORank('A'));
TitleB : ! Directive = $B - New banner;
Len := InInt;
InImage;
Add(New Title_Block(80,Len,InText(80)));
go to Repeat;
TextB : ! Directive = $C - New content;
Add(new Text_Block (80,Inint));
go to Repeat;
DiagB : ! Directive = $D - New diagram;
Len := InInt;
InImage;
Add(new Diagram(80,Len,InText(80)));
go to Repeat;
Repeat:
Directive :- InText(2)
end.of.while.loop
end--of--Page;
new Page
end.of.program
The letters in the directives are in alphabetic sequence starting with 'B'. Thus they follow the ISO collating sequence. Thus, by subtracting ISORank ('A') from the ISORank of each directive's letter, we can obtain a value between one and three, inclusive. Each integer value so obtained represents one of the three directives.
The switch declaration is unusual, in that it contains the value assignment operator, :=, as well as identifiers. The syntax of such a declaration is the keyword switch, followed by an identifier giving the name of the switch, followed by the value assignment operator, followed by a list of so called "designational expressions". Designational expressions can take a number of forms. In the example they all have the commonest form, a simple identifier.
The identifiers used in such a list specify places in the program to which a "jump" may be made. If you look at the example, three identifiers are listed. (The minimum is one and the maximum will be different on different SIMULA systems.) Further on in the program, each of these identifiers occurs again, followed by a colon. This second occurence is called a label.
An identifier followed by a colon is a declaration of a label for the following statement. Such a declaration is different from those of any other type, since it can occur in the middle of a sequence of statements, rather than before any statements. The example contains four label declarations. The first three, TitleB, TextB and DiagB are used to label the next statement. The fourth, Repeat, appears to label the keyword, end. end is not a statement and so to preserve the rule that label declarations always precede statements, there is said to be an imaginary statement between the colon and end.
The switch, Action, is used in a statement starting with the keywords go to. In fact these can be written as a single keyword goto, if you prefer. This is the only combination of keywords where this is allowed. A go to statement is the keyword(s) go to followed by a designational expression. In the example the switch identifier, Action followed by an integer value in parentheses is used. This is the other form of designational expression that we shall use. This integer value is an index to the list of designational expressions in the switch declaration above. Thus it can be used to identify a label declaration and through this the next statement to be executed.
In the example, we have seen that the value generated when we evaluate
ISORank(Action Code) - ISORank('A')
should be an integer in the range 1-3. The designational expressions in a switch declaration are assumed to be numbered consecutively starting with 1. Thus, if the directive $C is found, the index to Action in the go to statement will be 2, and so the second label in the declaration of Action, TextB, will mark the next instruction to be executed. This leads us to the meaning of a go to statement. It causes the program to move to the statement whose label is identified by the designational expression in the go to statement. No other statements are executed before this jump. After the jump the program continues with the statement following the label. Such a jump may be forwards, as in all the cases shown, or backwards.
To see this, consider again the $C directive. The first go to is followed by
Action(ISORank(Action Code) - ISORank('A')),
which, as we have seen, will equal 2. This means that the label to use is identified by the second item in the list of designational expressions in the declaration of Action. This is the identifier TextB, which is the name of a label. The label TextB is further on in the program. The program therefore misses the intervening statements and continues from the statement
New Text-Block (80,InInt).
If the directive $D is encountered, the program jumps to the third label in the switch declaration, DiagB. If $B is encountered it jumps to the first. Check the logic of this for yourself. go to statements with labels
A go to statement makes the program jump to a label specified by a designational expression. A subscripted switch variable is one form of designational expression, but we have seen a simpler one. This is a simple label identifier. In the example three statements have the form of go to followed by an identifier. As it happens, the identifier is the same in each case. The effect is again to jump to the statement labelled with the identifier found from the designational expression. In the case of a simple identifier this is very straightforward.
These three statements all cause the program to jump to the statement labelled by Repeat. This takes the program to the InImage at the end of the compound statement of the while loop. In other words, once the actions for that directive are complete, the program jumps to start processing the next line, which is assumed to contain the next directive.
12.4 By extending the list for the switch declaration, moving the position of Repeat and adding a new label declaration, rewrite example 12.4 without the while loop. 12.5 What would happen in example 12.4 if the go to Repeat statements were missing? Try removing them to check. Note the effect carefully.
Some people write very intricate programs, which use lots of go to statements. In certain, rather limited, programming languages this is necessary. In SIMULA it is almost never needed. The use of too many go to statements makes programs very hard to read and understand. They should only be used when absolutely necessary or in the sort of situation shown in example 12.4, where a switch can simplify a program and make it easier to extend.
It is important in a program using a switch to provide label declarations to match all the designational expressions in the switch declaration. It is not illegal to declare a switch which leads to non-existent labels. It is a runtime error to try to jump to one. In fact it is often best to check before the go to statement that the value of the expression used in the subscripted switch variable is not too large for the list in the declaration and to print an error message or warning if necessary.
A rather new feature in SIMULA, which may not exist in some older systems, is the system procedure Terminate_Program. This causes a jump to the very end of the program, regardless of the current position. This can be useful in providing warnings of disastrous errors and then stopping the program. Terminate_Program is the only way for a program using a prefixing separately compiled class containing an @i(inner) statement (see chapter 16), to stop itself without executing the instructions following that @i(inner). It is the only way to stop a program dead from anywhere within it and guarantee that no further actions will be carried out.
It is possible to pass switches and labels as parameters to procedures. The default mode is reference and name is also legal. Using such a parameter, or a switch or label from a block enclosing the one the program is currently in, it is possible to jump out of a block. The normal rules for ending the appropriate type of block will apply.
We have looked at how characters are represented by numbers and seen the procedures which allow us to use this. We have seen the use of switches and labels, in go to statements and as parameters.
We have learned the benefits and, most importantly, the dangers of go to statements.
We have seen the system procedure Terminate_Program.
Table 12.1: ISO character set - printing characters.
0 nul 1 soh 2 stx 3 etx 4 eot 5 enq
6 ack 7 bel 8 bs 9 ht 10 lf 11 vt
12 ff 13 cr 14 so 15 si 16 dle 17 dc1
18 dc2 19 dc3 20 dc4 21 nak 22 syn 23 etb
24 can 25 em 26 sub 27 esc 28 fs 29 gs
30 rs 31 us 32 space 33 ! 34 " 35 #
36 $ 37 % 38 & 39 ' 40 ( 41 )
42 * 43 + 44 , 45 - 46 . 47 /
48 0 49 1 50 2 51 3 52 4 53 5
54 6 55 7 56 8 57 9 58 : 59 ;
60 < 61 = 62 > 63 ? 64 @ 65 A
66 B 67 C 68 D 69 E 70 F 71 G
72 H 73 I 74 J 75 K 76 L 77 M
78 N 79 O 80 P 81 Q 82 R 83 S
84 T 85 U 86 V 87 W 88 X 89 Y
90 Z 91 [ 92 \ 93 ] 94 ^ 95 _
96 ` 97 a 98 b 99 c 100 d 101 e
102 f 103 g 104 h 105 i 106 j 107 k
108 l 109 m 110 n 111 o 112 p 113 q
114 r 115 s 116 t 117 u 118 v 119 w
120 x 121 y 122 z 123 { 124 | 125 }
126 ~ 127 del 128 ctl 129 ctl 130 ctl 131 ctl
132 ctl 133 ctl 134 ctl 135 ctl 136 ctl 137 ctl
138 ctl 139 ctl 140 ctl 141 ctl 142 ctl 143 ctl
144 ctl 145 ctl 146 ctl 147 ctl 148 ctl 149 ctl
150 ctl 151 ctl 152 ctl 153 ctl 154 ctl 155 ctl
156 ctl 157 ctl 158 ctl 159 ctl 160 ctl 161 ctl
162 ctl 163 ctl 164 ctl 165 ctl 166 ctl 167 ctl
168 ctl 169 ctl 170 ctl 171 ctl 172 ctl 173 ctl
174 ctl 175 ctl 176 ctl 177 ctl 178 ctl 179 ctl
180 ctl 181 ctl 182 ctl 183 ctl 184 ctl 185 ctl
186 ctl 187 ctl 188 ctl 189 ctl 190 ctl 191 ctl
192 ctl 193 ctl 194 ctl 195 ctl 196 ctl 197 ctl
198 ctl 199 ctl 200 ctl 201 ctl 202 ctl 203 ctl
204 ctl 205 ctl 206 ctl 207 ctl 208 ctl 209 ctl
210 ctl 211 ctl 212 ctl 213 ctl 214 ctl 215 ctl
216 ctl 217 ctl 218 ctl 219 ctl 220 ctl 221 ctl
222 ctl 223 ctl 224 ctl 225 ctl 226 ctl 227 ctl
228 ctl 229 ctl 230 ctl 231 ctl 232 ctl 233 ctl
234 ctl 235 ctl 236 ctl 237 ctl 238 ctl 239 ctl
240 ctl 241 ctl 242 ctl 243 ctl 244 ctl 245 ctl
246 ctl 247 ctl 248 ctl 249 ctl 250 ctl 251 ctl
N.b. Ranks less than 32 and greater than 126 are not visible printing characters. Where appropriate the standard control meaning is supplied. Ranks greater than 127 are normally used for graphics or alternative character fonts. Table 12.2: EBCDIC character set - printing characters.
0 nul 1 soh 2 stx 3 etx 4 5 ht
6 7 del 8 9 10 11 vt
12 ff 13 cr 14 so 15 si 16 dle 17 dc1
18 dc2 19 dc2 20 21 22 bsp 23
24 can 25 em 26 27 28 fs 29 gs
30 rs 31 vs 32 33 34 35
36 37 lf 38 etb 39 esc 40 41
42 43 44 45 enq 46 ack 47 bel
48 49 50 syn 51 52 53
54 55 eot 56 57 58 59
60 dc4 61 nak 62 63 sub 64 space 65
66 67 68 69 70 71
72 73 74 75 . 76 < 77 (
78 + 79 | 80 & 81 82 83
84 85 86 87 88 89
90 ! 91 $ 92 * 93 ) 94 ; 95 ~
96 - 97 / 98 99 100 101
102 103 104 105 106 | 107 ,
108 % 109 _ 110 > 111 ? 112 113
114 115 116 117 118 119
120 121 ` 122 : 123 # 124 @ 125 '
126 = 127 " 128 129 a 130 b 131 c
132 d 133 e 134 f 135 g 136 h 137 i
138 139 140 141 142 143
144 145 j 146 k 147 l 148 m 149 n
150 o 151 p 152 q 153 r 154 ^ 155
156 157 158 159 160 161 ~
162 s 163 t 164 u 165 v 166 w 167 x
168 y 169 z 170 171 172 173 [
174 175 176 177 178 179
180 181 182 ^ 183 184 185
186 187 188 189 ] 190 191
192 { 193 A 194 B 195 C 196 D 197 E
198 F 199 G 200 H 201 I 202 203
204 205 206 207 208 } 209 J
210 K 211 L 212 M 213 N 214 O 215 P
216 Q 217 R 218 219 220 221
222 223 224 \ 225 226 S 227 T
228 U 229 V 230 W 231 X 232 Y 233 Z
234 235 236 237 238 239
240 0 241 1 242 2 243 3 244 4 245 5
246 6 247 7 248 8 249 9 250 251
252 253 254 255
N.b. EBCDIC character sets vary from machine to machine. This table is only one variant, example 12.2 uses an alternative mapping between EBCDIC and ISO to allow a complete conversion.