Sunday, December 6, 1970

Drawing Attention

My Father worked on the railroad, the Southern Pacific Railroad to be exact.  He started as a Fireman (sort of an assistant engineer).  Railroad work is controlled by seniority.  He worked as a fireman for years, probably decades, slowly climbing the ladder of seniority.  When we were really young, he had to take the jobs that the higher-seniority firemen did not want -- long hours, away from home.  Eventually he got to be the highest seniority fireman in his district.  He could pick the good jobs, during the day, in the yard, so he could come home in the evening and be with the family.

That lasted about a week.  Then he got promoted, from Fireman to Engineer.  From the highest seniority Fireman to the lowest seniority Engineer, and had to start all over again at the bottom of the seniority ladder, taking the long jobs, away from home.

But that's not important here.  This was back in the 1950s.  The Southern Pacific was the largest employer in the area.  Eugene was the second largest city in Oregon, but it was probably only 50,000 people.  The city and the railroad did not always agree on how things should be done.  For example, the city switched to Daylight Savings time during the summer; the railroad stayed on Standard time.

Dad got paid, in cash, every week.  He would go to the payroll window on Friday, and they would give him an envelope of cash.  He brought that home, and put it in a cigar box, and we (well, Dad and Mom) used that to pay bills and buy things.  At one point, the City and the Railroad were not getting along on things; the Railroad did not believe the City understood how important the Railroad was.  So one week, they paid all their employees in $2 bills, rather than the normal $10s and $20s.  The $2 bill was legal money, but it was unusual then (as it is now). The Railroad paid everyone in cash and that cash was $2 bills.  Suddenly the city was awash in $2 bills. The City clearly learned how important the Railroad was to the city.

Later, when I was an undergraduate in college, I worked part time as a programmer for the Computer Institute for Social Science Research (CISSR).  CISSR produced a set of analysis and statistical programs that were used by the graduate students and faculty of Michigan State University.  These programs were written in Fortran and ran on the University's CDC 3600 computer.

The 3600 was a wonderful computer.  It was big and expensive.  It sat in a glass walled room at the computer center and had a set of operators that would attend to it.  Jobs were submitted to the computer as a deck of computer cards.  The cards were submitted to a clerk at a counter (a couple of very pleasant college girls), who put them in a tray until it was full, then handed it to an operator who would put the cards into a card reader -- a big machine the size of a chest freezer -- which read the cards in and put them on a drum.  Eventually the operating system would run the job, and produce output which would be printed on large line printers.  The paper from the line printers would be separated according to the job and then filed in a set of hanging folders to be picked up by the user.  This whole process could take a full day, depending on how busy the computer was.  You learned to work on several different programming problems at the same time -- submitting one to be run, then switching to another problem since it would be hours, possibly days, until the first program ran.  I was working on as many as 10 programs at one time.

CISSR wanted to be sure that it got "credit" for all the times that it's programs ran.  Somehow they found out that there was a special system call that could be made which would put a short user message up on the CRT display for the operators.  You could post an 80-character message on the display.  CISSR decided they wanted their name to be displayed every time that one of their programs ran.  That way the operators would see that a CISSR program was running and see how often that happened, raising the importance of CISSR to the computer center.  (It didn't hurt that almost no one used this special display capability and it seemed that once a message was put up, it stayed there until someone else put up a message.  So it was speculated that the first time one of our programs ran, it would put up "CISSR" on the display and it would stay there all day, even while other programs were running.)

I was given the task of writing the code to do this.  The message was put up by a special system call from the user program (my program) to the operating system (which manages the display).  The system call used two index registers -- one to give the address of the message in memory, and the other to give its length.  Each index register is 15 bits, which allows an address of up to 32,678 which was the address size for the CDC 3600.  Actually the address size for the processor is 16 bits, and our 3600 had 64K words of memory, split into two 32K banks of memory -- one for user programs and one for the operating system.

Memory for the 3600 was core memory.  Each bit of memory was represented by a small doughnut shaped ferrite core which could be magnetized in one direction (0) or the other direction (1).  This was really useful, since it meant that memory stayed even when power was turned off.  Normally the operating system was loaded into the bank of system memory once (a cold start), and then if it had to be shutdown or it crashed, it could just be restarted from the copy of the operating system still in system memory (a warm start), since core memory stays the same with or without power.  Modern computer system use semiconductor memory that needs power to maintain its contents, so if the computer loses power, you have to reload memory (electronic) from disk (magnetic).

Now for the system call to work, the message that was to be displayed needed to be copied from user memory into system memory.  This required a special addressing mode, but the operating system could set that up.  The specifics of how this was all done was not very well documented (almost no one used this system call), and so I read the source code of the operating system to see how exactly it worked.  The operating source was available as this huge computer printout (about 8 or 9 inches thick) in the computer library.

The operating system was written in assembly language, and I noticed that it used this really neat instruction to copy the message from user space to system space.  Once the addressing modes were set, it used an "Augmented XMIT" instruction to copy the message.  The Augmented XMIT instruction had two addresses, the Origin address and the Destination address, plus it used 5 of the 6 index registers.  Index register B1 held the word count -- how many words to copy.  Register B2 was used to modify the origin address, while B4 modified the destination address.  Register B3 was the "step" for register B2, while B5 was the "step" for register B4.  This one instruction would copy from Origin + B2 + i*B3 to Destination + B4 + i*B5  for i going from 0 to n (contents of B1).  This one instruction can copy an entire block of memory!

But the most important part of this instruction was obviously B1.  B1 contained the number of words to copy.  If we wanted to copy no more than 80 characters, that would be no more than 10 words (48 bit words with eight 6-bit characters).  So before copying the message from user space to operating system space, the OS first checked that the number of words to be copied was less than or equal to 10.  This was a bit of a problem.  The number of words to copy was given by the user in an index register (B1) which was convenient for the Augmented XMIT instruction, but the 3600 had limited comparison or test instructions for the index registers.  In particular, a Register Jump instruction was 48-bits (a long instruction), while the A-Jump instruction, which only tests the A register was only 24-bits (a short instruction).  Short instructions take less space, and operate faster, so it was preferred to use short instructions.  In this case, the code copied the number of words from B1 to the A register, decreased the A register by 10 and then jumped to an error case if the A register was still positive.  All of these are short instructions.

But the B1 index register is 15 bits long and the A register is 48 bits long, since the A register is used for the full range of arithmetic operations, while the index registers are normally only used for addresses (or counts of addresses).  An address is only 15 bits.  If the number of words was slightly too large, say 15, then the code worked fine, detecting that 15 was greater than 10.  But if the number of words was so large that the high order bit was on (more than 16K), then copying the B1 index register to the A register would sign-extend that "1" bit and result in a value in the A register which looked like a negative number.  And any negative number is less than 10.

So, I could set the number of words in the message to 32,767 (0x7FFF) and when it was moved to the A register it would appear to be -1, which is less than 10, and the operating system would copy almost the entire user bank of memory on top of the system memory bank, wiping out the operating system.  Or maybe it would do nothing.  I needed to test this.  But to run that code as a normal job could disrupt everyone's work.

Now it turned out that sometimes the computer was lightly loaded.  And in that case, the computer center would schedule an "open house", letting users run the card readers and line printers themselves, getting near instant turnaround of jobs.  These open houses would be scheduled for weekend nights, when the system would otherwise be idle.  Program developers had a great time during these, since they could run a job, see an error, fix it, and run it again almost immediately.  These open houses were by invitation only.  And one came along just as I was starting to look at this display problem.

So one Saturday night, I went and worked on all my other programs, interacted with the other programmers, until there seemed to be little left to do.   The computer center was bustling, filled with the noise of the card readers, the line printers, the tape drives.  At that point, I put a little test of this display system call, with 32,767 as the length of the message into the card reader.  Almost immediately the room fell silent -- the line printers stopped, the card readers stopped.  One of the operators walked over to the display console, looked at it, and yelled out "Peterson!".  At 2AM Saturday night -- I guess it was technically Sunday morning -- I got kicked out of the computer center.  I found out later that I had destroyed the contents of system memory, and the operators had to do a "cold restart" of the system, loading everything back into memory and setting it all up from magnetic tape.  It took hours.  They were exceptionally annoyed.

And by Monday morning, when I went to look at the operating system code, I found that the system call code had been patched by hand to do a long jump to some other place in the code, use a 48-bit instruction to compare the B1 register to 10, and then a jump back to the original code.  And shortly thereafter we were happily putting "This is a CISSR program running" on the console display, showing how often CISSR programs ran and how important they were to the computer center.