NEWS, EDITORIALS, REFERENCE

Subscribe to C64OS.com with your favorite RSS Reader
October 20, 2016#3 Technical Deep Dive

Why PETSCII anyway?

Post Archive Icon

One of my colleagues who knows almost nothing about the C64 once asked me if computers from this vintage still use binary. Heh, it's funny to think about how crazy it would have to be if it was so different that it didn't use binary. Binary one of the most foundational building blocks across all of digital electronics.

But, he didn't know. So I told him it does. But what's almost as crazy as not using binary is that the Commodore 8-bit line, C64 and C128, VIC-20, PET and probably all the other lesser known models, don't use ASCII. This does seem to be a bit crazy at first blush. It is the "American Standard Code of Information Interchange" after all. Apparently not standard enough that computers from the 80s all agreed on it.

Modern Unicode, which can literally support millions of possible characters 1, can be represented with different encodings. One of the most popular is UTF-8. One of the great advantages of UTF-8 is that when it's representing the first 128 characters it can do it using just 8-bits per character, rather than a full 32 or 64 or whatever. The AHA moment comes when you learn that the first 128 characters of unicode overlap with the first 128 characters of ASCII. What that means is that standard low ASCII plaintext is in fact perfectly valid UTF-8 encoded Unicode. Magic.

It's safe to say therefore that having support for ASCII is a tablestakes feature of a modern computer. %99.9999999... of all the websites out there, for example, have a URL that is just plain ASCII. However, there are approximately zero websites whose URL is in PETSCII. If our C64 is to interact with the internet, it needs to speak ASCII. But, there are reasons why the C64 uses PETSCII which I'll get into. So, C64 OS has routines included for easily converting between the two. The routines do this translation one character at a time. So, if you type a URL into a program and then the program has to deliver that payload to somewhere on the internet, calls to the routine can be injected into the process so each character is translated before being sent out.

This might strike you as inefficient. But it is designed to be as primitive as possible. Partially to save space, but partially to be flexibly usable by other more application-specific routines. If you want to write a program where it makes sense to convert a chunk of PETSCII to ASCII upfront, then go ahead and write a tight loop that translates one character at a time, in place even, if that makes sense. Whereas if the C64 OS routines tried to convert large chunks at a time, it might be more efficient in some situations, but it might be completely useless in many more common use cases.

This brings us to the question of why Commodore even bothered to use PETSCII at all. ASCII was first developed in the 1960s. The Commodore PET wasn't released until 1977. 2 And the PET was the first machine to use PETSCII, hense the name, why not just use ASCII? It actually made a lot of sense that Commodore created an ASCII variant because, you guessed it, it was more tightly integrated and aligned with the requirements and abilities of the hardware.

Bitmap graphics are slow. And they're also hugely memory hungry. We'll talk a lot more about my thoughts on bitmap graphics and how C64 OS will use them in the future. Suffice to say, the 1977 Commodore PET did not even have a bitmap graphics mode. So, the only way to get graphics on the screen was to have graphic elements built into the text-based character set. And that's exactly what PETSCII has. Lots of graphical characters that were carefully designed so they could be combined to produce very sophisticated on screen displays, while using the smallest amount of memory.

Collage of some of the best PETSCII art

ASCII art is nifty, but PETSCII art whoops its butt. The PET had a monochrome display but things really got capable when the VIC 20 and C64 added color. But this brings us to a whole new point about what PETSCII is and how it interacts with a Commodore.

On a C64, PETSCII got more advanced than it was on the PET. Besides color, it also gained a second character set, which is implemented by the more advanced Character ROM in the C64. The first set contains only uppercase letters, plus a bunch of standard symbols, a ton of graphic characters and a handful of screen control characters. The second character set has the uppercase characters swapped for lowercase characters, the same set of numbers and symbols, the same set of screen control characters and a smaller set of graphic elements to make room for the uppercase characters moved into a range that was previously used for graphics. The Commodore has the ability to switch between these two character sets on the fly with the SHIFT-C= key combo. But, true to form, there is a PETSCII screen control character that when "printed"—aka put through the KERNAL's CHROUT routine—can programmatically switch the active character set.

It is the screen control characters where things really get interesting. Every key pressed on the keyboard is scanned by the KERNAL's IRQ routine and mapped to a single byte value of PETSCII. The PETSCII character is placed into the 10-character-long keyboard buffer. A seperate process, the main event loop of the KERNAL, hard loops waiting for a value to appear in the keyboard buffer. Here it is in the Disassembled ROM documentation:

;wait for a key from the keyboard

.,E5CD A5 C6    LDA $C6         get the keyboard buffer index
.,E5CF 85 CC    STA $CC         cursor enable, $00 = flash cursor, $xx = no flash
.,E5D1 8D 92 02 STA $0292       screen scrolling flag, $00 = scroll, $xx = no scroll
                                this disables both the cursor flash and the screen scroll
                                while there are characters in the keyboard buffer
.,E5D4 F0 F7    BEQ $E5CD       loop if the buffer is empty

Since the 6510 has no WAIT instruction 3 it loops through these four instructions about 80,000 times per second as the computer sits there idle! But when it does finally have a character in the buffer it CHROUTs it to the screen. This is where PETSCII and the KERNAL's screen editor intimately mesh together. PETSCII includes cursor control characters. When the screen editor encounters that character, rather than write something to screen memory, it adjusts the internal coordinates of where the cursor is. And that's how the cursor keys move the cursor.

PETSCII isn't just different for difference's sake. Its design is necessary to work with the very way that the keyboard is scanned, buffered and processed. The screen editor has no idea about what physical key on the keyboard was pressed. All that information is lost as only a stream of PETSCII values are buffered. Now, you might think the information could be reconstructed. After all a PETSCII Cursor Left code can only be produced by pressing the cursor left key, without holding shift. However it's not that simple. After scanning the keyboard the KERNAL maps the key to a PETSCII value but some values can be achieved in multiple ways. For example, let's look at these KERNAL keyboard matrix decode tables:

;standard keyboard table

.:EB81 14 0D 1D 88 85 86 87 11
.:EB89 33 57 41 34 5A 53 45 01
.:EB91 35 52 44 36 43 46 54 58
.:EB99 37 59 47 38 42 48 55 56
.:EBA1 39 49 4A 30 4D 4B 4F 4E
.:EBA9 2B 50 4C 2D 2E 3A 40 2C
.:EBB1 5C 2A 3B 13 01 3D 5E 2F
.:EBB9 31 5F 04 32 20 02 51 03
.:EBC1 FF

Each of those values is a PETSCII character, accessed by typing on the keyboard without any modifiers held down. No shift key, for example. If you look up the PETSCII table on Wikipedia, you'll see that "m" (lowercase, press the m key on the keyboard without the shift key) has the value $4D. If you search the table above you see that it's in the 5th row, 5th column. Not including that zeroeth column that starts with :EB which is the memory address of where this table is stored in the KERNAL rom. The PETSCII value for "m" is in that position because the m key on the keyboard maps to that position. Now let's look at the control key mapping.

;control keyboard table

.:EC78 FF FF FF FF FF FF FF FF
.:EC80 1C 17 01 9F 1A 13 05 FF
.:EC88 9C 12 04 1E 03 06 14 18
.:EC90 1F 19 07 9E 02 08 15 16
.:EC98 12 09 0A 92 0D 0B 0F 0E
.:ECA0 FF 10 0C FF FF 1B 00 FF
.:ECA8 1C FF 1D FF FF 1F 1E FF
.:ECB0 90 06 FF 05 FF FF 11 FF
.:ECB8 FF		

This is the mapping that's used if the control key was held down while typing on the keyboard. $FF means unassigned, the others are just a different mapping of PETSCII characters. If you look in the 5th row and 5th column you'll see a $0D. You also find $0D in the standard keyboard table, but at a different position. $0D in PETSCII (and ASCII) is a carriage return. In the standard keyboard table that $0D is selected when the Return key is pressed. But in the control key mapping it looks like you get a $0D for control-m. Well, actually, you do. Try it. Type: load "$",8 and then press control-m. It behaves as though you've just pressed return!

This is both neat and horrible at the same time. The point is that PETSCII encodes as a simple character everything that can be done in the screen editor. This is amazingly powerful, because in a PETSCII text file you can embed cursor positional control characters. And color changes, and character set shifts, and more. Very powerful stuff. In the late 70's early 80's it was brilliant. This is what allows BASIC print statements to contain delete, and return, and clear screen controls. Really neat.

The problem is that the C64's built-in OS (KERNAL and BASIC together) have no way of identifying what keys on the keyboard were pressed. If you wanted to use Control and C= keys for executing commands, like control-m for "minimize" or "mix" or "mount" or "magnify", you can't do it, because the KERNAL decodes control-m into $0D and puts that $0D into the buffer. When the main event loop gets around the pulling that $0D out of the buffer there is fundamentally no way for it to know whether the user pressed return or control-m.

Now we know more or less why Commodores use a special character set instead of just ASCII. Without the cursor control characters of PETSCII, and given the way the KERNAL works, a C64 would have no way to move the cursor, or delete anything. But while this method of a single buffer containing characters from a special character set is great when married to the screen editor, which is essentially a PETSCII interpreter, it makes it really difficult to implement a more modern user interaction model. This is what C64 OS has to deal with. And I'll be talking a lot more about how it does deal with it in future posts.

UPDATE: December 19, 2016

I was advocating that PETSCII art is cooler than ASCII art, and showed some sample images. But, I wasn't expecting to find some full PETSCII art demos. Check this out:

  1. Unicode is at least 32 bit, possibly 64 bit. Whereas ASCII is only 8 bit and only half of it is standardized.

    A 32 bit code space would allow for ~4.2 billion. []
  2. That's before I was born, but who's counting, right? []
  3. I recently discovered that the CMOS 65c02 has several very useful instructions that the NMOS version doesn't have. One is the WAIT instruction. It causes the CPU to power down and stop executing until an interrupt occurs.

    Without such an instruction, the 6510 must constantly be executing even when there is nothing to do except repeatly poll a memory location that only ever gets changed by an interrupt. []