NEWS, EDITORIALS, REFERENCE
VIC–II and FLI Timing (1/2)
A few updates first, I'll try to keep these brief.
Update on IP Thief (BASIC RPG/Maze Game)
I've finished: the introduction screen, the game mechanics for movement between the rooms, collecting items in inventory, points, combat against monsters, and changing levels. I've got a "level menu" screen, that you can access with the back arrow key, to let you toggle music on/off, quit the game, restart the level, or restart the game from level 1. I've done the screen that appears at the end of a level to tell you how many points you got out of the possible total for the map. And I've got 4 maps of increasing complexity, with music for each map. I have sent out a .D64 with the game, and a simple text–file "manual," to one person to do some beta testing. I still have to do more maps, not sure how many I'll do, probably 10 at a minimum, but maybe a few more. I have to design these, with the help of my son (the point of all this, after all.) And then I need to do an end sequence. A bit of an end story, or end screen, when you survive through and solve all the maps.
By the way, I'm sad to say that my wife doesn't know what a hypospray from the Original Star Trek series looked like. In case you are too cool to know what one looks like, it looks like this: And so now you have the general gist of what I was trying to go for with my hypospray PETSCII art in the upper right screenshot above! 😄
Update on C64 OS
After giving my demo of the state of C64 OS for Commodore Users Europe (the video on YouTube has now had 11 thousand views! Thank you everyone, that feels pretty good.) I decided to shore up some of the tools that I use. I updated the relocator, so that it shows a nice animation while scanning the file, gives you some statistics about the file, and then instead of just spitting out unused bytes, it figures out what ranges of unused bytes could fit the size of the binary and tells you explicity where you can assemble it to. (More on how the relocation works in the earlier blog post, Drivers and Relocatable 6502 Code.) I updated the backup script to version 3, which now supports full and incremental backups from a storage device with an RTC, (such as CMD HD, FD, RamLink, IDE64 and some SD2IEC devices.) And I also wrote a tool for creating PRG Aliases for the C64 OS "PRG Runner" utility. (More about that in the post, Load and Run from 6502 ASM.)
I took a short summer break to work on IP Thief, and give my mind a rest. But I've now started back where I left off. And that is, working on the text input class for C64 OS's object oriented toolkit. I have been comparing the behaviors of modern macOS, with Mac OS from the early 90s, with AmigaOS 3, and I've discovered lots of neat things. Obviously the macOS text boxes are the most robustly implemented, with rich support for keyboard controls, copy/paste, text dragging, and other subtle, sophisticated behaviors. Mac OS from the 90s has fairly sophisticated text fields. For example, you can drag selections and access cut, copy and paste from the menu bar or with keyboard shortcuts. However, beyond that they have very limited keyboard controls. You cannot use the keyboard to make or extended text selections. AmigaOS, sadly, (though perhaps not surprisingly) has the most primitive text fields. They don't support selections of any kind, either with keyboard controls or with the mouse. And you cannot (it seems to me, maybe I'm missing something,) cut, copy or paste to or from a text field.
The only text fields more limited than AmigaOS, that I have personal experience with (I didn't test any version of Windows, sorry), are on the C64. Most C64 programs either do not allow you to edit a field value at all, requiring you to retype it. Some C64 programs do let you edit a field, but of those most don't support deleting or inserting characters in the middle of the string, but instead allow you to overwrite existing characters. And in some, for example in GEOS, you cannot even cursor back into the middle of the field. In GEOS, cursoring left deletes characters from the end of the string. Very very primitive. I'm sure some C64 programs out there have more robust text fields, but the other problem on the C64 is that—really not even in GEOS—there isn't any kind of standard text field. And implementing robust ones is hard, so most programs have very primitive text fields.
My work on the standard text field object in C64 OS is, in my opinion, ambitious. My aim is for it to be more sophisticated than either AmigaOS 3 or Mac OS of the early 90s. The reason that's ambitious, is because machines of that vintage are significantly more powerful than a Commodore 64 and typically have from 32 to 128 times(!) the amount of memory. The C64 OS text input field supports a delegate for focus and blur events, and for dynamically filtering input. It descends from TKCtrl which supports an action, which gets triggered when the user presses return. Text selections can be made with the mouse, or by holding the left shift key while moving the insertion point with the cursor keys and right shift key. Selected ranges of text can be deleted, overwritten with typed input or a clipboard paste, cut or copied to the clipboard. A subclass of the text field, secure text field, can be used for password fields. They display the field contents as stars, and prevent cutting or copying.
Anyway, that's coming along. And will open doors for implementing other apps and utilities that need text inputs in their UIs.
Part 1: Memory Access
Now on to the topic of this post: VIC–II and FLI Timing, Part 1: Memory Access
The C64's Hardware Memory Access Model
A nice spiral–bound copy of the C64 Programmer's Reference Guide that I bought also came with a foldout double–sided paper schematic of the C64 from 1982. (It's actually got quite a few small errors, which can lead to confusion until you realize you're looking at a mistake.) When I first returned to the C64 in 2016, I admit that I did not understand how most of that schematic worked. I took a bit of electronics in high school, didn't learn nearly enough, and then skipped University and basically got a job right away working as a full–stack web developer, back when the words "full–stack" hadn't yet become commonplace.
As I was teaching myself 6502 assembly, I realized that it was super useful to understand the C64's software limitations, its features and other peculiarities, by looking at the schematics and figuring out how they work. I read a really great tutorial online about digital electronics, which I've mentioned once before: All About Circuits – Textbook – Vol IV – Digital And then I just tackled the schematic one small piece at a time. Every time I studied a small section of the schematic, I'd come away with some satisfying and explanatory new understanding.
There is one part of the schematic that I long found befuddling. It's all the logic around and between the VIC–II chip and the bank of 8 memory chips. The 8 memory chips are the block of 8 vertical rectangles in the upper left half of the schematic below. The VIC–II is the tall vertical rectangle on the right side.
As always, a bunch of detail has been clipped away for clarity of focus
It's still fairly complicated, with lots of interconnected chips. It's hard to break this down into any simpler units, because all of these chips depend so closely on one another. I want to start by highlighting a few important things that I discovered early on.
In the diagram below, I've hightlighted two sections, in green and blue. In these sections I wanted to point out their 1–bit and 4–bit natures, respectively.
1–bit dynamic main memory and 4–bit static color memory.
Let's start with the 1–bit memory, the bottom of which is hightlighted by the green block, above. When I first saw this block of 8 memory chips, it looked overly simplified to me. I thought it was missing detail about how the chips are connected together. Especially on the right edge, all those connections coming in from the VIC–II and other chips, but they don't seem to have any labels. What are they connecting to?
Further, the 8–bit data bus seems to just come out of the whole block, rather than showing the details about how the data bus is connected to each chip. It just doesn't look like there is enough detail to understand what's really going on. Or so I thought. It wasn't until sometime later that I read that an HM4864 (one of the RAM chip types used in the C64, labeled in the schematic as 4164–2), stores 65536 words of memory, but each word is just 1–bit. Then it suddenly dawned on me.
Each byte is composed of 8 bits, and there are 8 chips, each of which stores only one bit per address. Therefore, each chip's single input/output pin (1–bit) is connected to one line of the data bus, just like the schematic shows. What does this mean? It means that when you store a byte, the letter "U", say, that's a hexadecimal $D5 (in PETSCII) or 1101 0101 in binary, that each bit is stored in a different chip. A "1" in the first chip, a "1" in the second chip, 0 in the third chip, 1 in the fourth chip, 0 in the fifth chip and so on. Each byte is split up and its bits stored in parallel across the 8 chips. When you read a byte back in, the byte on the data bus that the CPU receives has been reconstituted from 8 different chips. That is so very cool!
With that realization, a lot of other things started to make sense about that block of 8 memory chips. The reason we don't see any lines between them, is that they are all just in parallel. On the left side of the leftmost chip there are only 8 lines that go into the chip. These are the 8 lines with the resistors on them, labeled MA0 through MA7. Each line with a resistor is coming off a line that passes beneath the memory chips (in the schematic, not on the PCB), all the way from the pair of 74257's at the left over to the 8 lines coming from the VIC–II and friends on the right. In fact, if you use a straightedge, you find that those unlabeled 8 lines coming into the memory from the VIC–II at the right line up perfectly with the lines coming from the 74257's on the left. The lines from the VIC–II, that look like they are going into the memory, aren't labeled because they aren't connecting to the RAM there at all.
We'll come back later to the pair of 74LS257's to the left of the main memory.4–Bit Color Memory
Next let's look at that blue L–shaped highlighted area.
The chip at the bottom of the blue highlighted area is the color ram. (By the way, look closely. The schematic labels it "COLOR ROM." This is one of those errors I mentioned; it is very much a RAM chip, not a ROM chip.) It is typically stated that the C64 contains 64 kilobytes of RAM. But that's not strictly true; it contains 64.5 kilobytes of RAM. The 8 main memory chips are 65536 words, times 1–bit per word, which equals exactly 64KB. But the color RAM chip is in addition to this, and provides 1024 words, with each word 4–bits wide. For a total of 512 bytes, albeit in a weird arrangement of 1024 individually addressed nybbles.
But what's going on with the blue highlighted lines? There are 4 lines in parallel that connect the 4 data bits from the color RAM directly to the VIC–II's own data bus. What's more, those VIC–II data lines are labeled D8, D9, D10 and D11. The C64, or more accurately the C64's 6510 CPU, has an 8–bit data bus. But the VIC–II has a 12–bit data bus! It has a private 4–bit bus connecting it directly to its own dedicated color RAM chip. This will come into play later when we discuss the timing for FLI.
Accessing Color Memory
It's clear that the VIC–II can access the color RAM independently of the CPU's data bus, and that plays an important role in the timing of the VIC–II which we'll see later. But, at some point the CPU needs to have access to the color RAM too, because software needs to be able to change colors on the screen.
CPU accessing color RAM.
To accomplish this, we need to explain the role of at least one of those chips, the 4066. The 4066 is a Quad Bilateral Switch. Okay, that sounds complicated, but it is totally not complicated. It has four switches through which information can flow in both directions, either from Y to Z, or Z to Y. And it's got four of those Y0 to Z0, Y1 to Z1, etc. And it has four controlling lines (E0 to E3) that control whether the switches are open or closed.
The VIC–II controls the C64. You'd think the CPU controls the C64, but the VIC–II controls whether the CPU is allowed to run, how quickly it runs and whether or not it is allowed to access the main data and address buses. The VIC–II is designed to exert control over the CPU because it has a higher priority than the CPU. And why does it have a higher priority? Because drawing to the screen is realtime. The timing of drawing to the screen must be 100% perfect. If the CPU could pause the VIC–II, even for just one cycle, doing so would instantaneously corrupt the image on the screen. So it is the VIC–II that tells the CPU when to pause or get out of the way.
There are two ways the VIC–II controls the CPU. The first and most common is with the AEC line. Every clock cycle has two phases, when the clock signal is low (the first phase) and when the clock signal is high (the second phase.) On most cycles, during the first/low phase of the clock, the VIC–II lowers its AEC pin. This causes the CPU's address and data bus pins to go into HighZ (high resistence) state, which effectively disconnects them from those buses.
You also note, in the highlighted schematic above, that when AEC goes low it turns off all the switches in the 4066. The 4066 connects the low 4 bits of the CPU's data bus to the four data lines of the color RAM. So when the VIC–II is controlling the bus, the color RAM is disconnected from the main data bus and left only connected to the VIC–II's own private access (D8 to D11) data bus. Additionally, when AEC is low, it pulls one half of the AND gate (highlighted at the very bottom) low. That causes its output to be low, which is connected to the low–active Chip Select of the color RAM. Put together now, every half–cycle when the VIC–II is in control of the bus the color RAM chip is turned ON, it's disconnected from the main bus, and connected to the VIC–II's high data lines.
What happens when AEC goes high, and the VIC–II grants the CPU access to the bus? That activates the switches in the 4066 which connects the color RAM to the main data bus. But we wouldn't want the color RAM to always be on the data bus. So that AND gate at the bottom comes back into play. The AEC line is high, making half the AND gate's inputs high. The other input comes from the I/O decoding logic, which is controlled by the PLA. That is out of scope for this post, but I have discussed how it works in the post, Versa64Cart and C64 ROMs.
Just as the PLA and the I/O decoding logic can control when a ROM chip is turned on, such as the BASIC or KERNAL ROMs, or when an I/O chip like the VIC–II, SID or CIAs can be accessed on the bus, that same system can control when the Color RAM is turned on. The color RAM is on its own private bus to the VIC–II, but every time the CPU has access to the bus, the color RAM gets shunted onto the main bus and then activated just like any other ROM or I/O chip. Brilliant.
Addressing Main Memory
I used to think that digital electronics are, well, digital. Very precise, either on or off, nothing in between. And I also used to think that the smallest unit of precision in a computer is the CPU's clock cycle. From a programming perspective, that's not a bad approximation. Your program doesn't see or feel any of the complexities of the physics of electrons. One cycle contributes to one stage of an operation, and there is no way to get underneath that. There is (almost) no way of even knowing how much time has passed between two cycles, or whether the length of time between those two cycles is the same as the length of time between these two cycles, but as it turns out, those lengths of time can actually be different.
My understanding of digital electronics was very naive (probably still is). It's a failure of insight, because merely programming a computer doesn't expose you to how one works. The progress of the history of programming has been to expose the programmer less and less to how a computer works. And assembly language, compared to any compiled or interpreted language, comes the closest the helping you understand the computer.
We've seen already that the CPU's clock cycle cannot be the smallest unit of precision, because the VIC–II is performing work on CPU half–cycles. The VIC–II is actually performing a lot of carefully timed work within the duration of a single CPU clock cycle. Leaving that aside for now, though, there is also something interesting to note about the main RAM. There is no clock line on the RAM chips. Without a clock line, how does the RAM know what's going on? How does the RAM know when to read the address bus and so on?
The main memory chips are controlled entirely by their /RAS and /CAS lines. There are two lines, so it isn't a simple on or off. Patterns on those two lines together tell the RAM what to do, and when to do it. The HM4864-2 spec sheet, although daunting and full of more information than I know how to make use of, has timing diagrams that are useful for seeing how it works.
To read these diagrams (below,) you start at the left. As time passes you progress towards the right. You see the state changes of the lines relative to each other over the passage of time. The top two lines show /RAS and /CAS. (The bars over the labels indicate that they are low–active.) Below that we have the 8 address lines (half the 16–bit address bus) as a group. The lines are shown both high and low, criss–crossing at certain points. This indicates that not all of the address lines are going to be in the same state, but shows the minimum amount of time before and after other line changes where they have to transition to the state they should be in. Next we have the Read/Write line; high for read, low for write. And lastly, the single data bit line for this RAM chip. Remember all of the signal lines are connected in parallel to all 8 chips, and each chip's data bit line is connected to a different line of the CPU's data bus.Read Cycle
RAS,CAS Main Memory Read Cycle.
At the left of the diagram, /RAS and /CAS are both high, both inactive. As we progress, while /CAS remains high, /RAS is pulled low. Meanwhile, at the time the /RAS is pulled low it must be the case that the low 8–bits of the address bus are already available on the RAM chip's 8 address lines. At this point, the RAM chip reads its 8 address lines, decodes the binary and uses it to activate one row of its 256x256 row/column memory matrix.
After a short period of time, with /RAS still held low, /CAS is then pulled low. And it must be the case that when /CAS goes low the high 8–bits of the address bus are available on the RAM chip's 8 address lines. It decodes the binary and uses that to activate one of its 256 columns.
The signal to the RAM chips that this is a read operation is that the /WE (low–active write enable) line is either high the whole time, or if it was low, it transitions to high before /CAS is pulled low. During a read cycle, shortly after /CAS has been pulled low and the column address has been decoded and locked in, then the state of the bit at the intersection of the selected row and column becomes available on the data line.Analysis of Read Cycle
Before continuing on to see how this is used by the C64, let's just think about what it means. The RAM has no idea what a clock cycle is. There is only the manipulation of the /RAS and /CAS lines, in a prescribed sequence. The length of time between the changes is analog. How much time must pass, how much time is allowed to pass between the transitions? How long does it take for the RAM to lock in the row address, for example? The only way to know is to look at the manufacturer's spec sheet and read the stated numbers. How analog is it? The length of time varies with the temperature.1 The chips simply have to be chosen by the person designing the computer, with the understanding that the clock speed cannot cause the chips that will be driving these lines to drive them any faster than the minimum times necessary by the analog circuitry of the RAM.
The maximum times are usually more flexible, but there are some maximum times listed as well. All of this means that different computers that can drive the lines at different speeds, as long as the RAM chips are fast enough, and the computer driving them does so within the minimum and maximum ranges.
My biggest take away when I learned how these work, is that there are more analog aspects than I would have thought. The RAM's accesses are indirectly timed to the clock, via whatever controls /RAS and /CAS.Write Cycle
RAS,CAS Main Memory Write Cycle.
For completeness sake, let's take a quick look at a write cycle. It's very similar to a read cycle.
Initially the /RAS and /CAS lines are both high, just as when reading, both are inactive. Then while /CAS is high, /RAS is pulled low. When /RAS is pulled low, the low 8–bits of the address bus must already be available on the RAM chip's 8 address lines. These get decoded and activate a row of the memory matrix. So far it's exactly the same as a read cycle.
Next, before /CAS is pulled low, the /WE line must either already be low, or it must be pulled low. At pretty much the same time, still before /CAS is pulled low, the data bit to be written to memory must be made available on the RAM chip's data in line. Note: The RAM chips actually have two data lines, even though they only deal with one bit per address. One line to read a bit from memory, and one line to write a bit into memory. In a C64, both the data in and data out lines are connected together to the same data bus line. Therefore, the data to write has to be on the data bus before /CAS is pulled low.
Then /CAS is pulled low. When /CAS goes low, the data bus already has the data to be written, as mentioned above, but the 8 address lines must already have the upper 8–bits of the address bus as well. The address is decoded, the column is selected, and the bit value of the data–in line is stored in the memory matrix at the intersection of the selected row and column.
In both the read cycle and the write cycle, at the end, the /RAS line is raised first, and then the /CAS line is raised. The cycle is over, both /RAS and /CAS are inactive and ready to begin the next cycle.
Driving /RAS and /CAS in a C64
Now we know how the RAM chips are sequenced internally to do reads and writes. How are those sequences implemented in a Commodore 64? What is responsible for driving those signal lines, and what is managing the timing requirements?
If you guessed the VIC–II, you are right. Once again, it is the VIC–II not the CPU that is the master of the bus. Let's look at another section of the schematic. I have to bring in another chip, the PLA.
VIC–II driving /RAS and /CAS.
Let's now go back to the pair of 74257's to the left of the main memory, highlighted in blue, above. According to the spec sheet, these are quad 2–input multiplexers. Big words, but not too hard to understand. Each chip has 4 pairs of inputs, A0:B0, A1:B1, A2:B2, A3:B3. And each pair has an output: Y0, Y1, Y2 and Y3. Each chip has one pin (/SELA) for selecting between the set of A inputs and the set of B inputs.
If /SELA is low, then A0–A3 pass through to Y0–Y3. If /SELA is high, then the 4 switches are thrown, and B0–B3 pass through to Y0–Y3. Each multiplexer chip (74257) has only 4 switches, but the RAM chips have 8 inputs, so two multiplexer chips are used side–by–side.
The low 8–bits of the address bus are connected to the B inputs, and the high 8–bits of the address bus are connected to the A inputs. All 8 Y–outputs are connected in parallel to the 8 address lines on all 8 RAM chips. Thus, toggling /SELA switches the 8 RAM address lines between the upper and lower halves of the 16–bit address bus. So far so good.
Next, it gets slightly complicated, and we have to remember how the read cycle and write cycle sequences of /RAS and /CAS work. The VIC–II has /RAS and /CAS lines, which I've highlighted in green and red respectively. /RAS connects directly to the RAM's /RAS line. But /CAS connects to the /SELA line of the two multiplexers, plus it goes down and into the PLA. Then the PLA has something called /CASRAM that comes back up and connects to the RAM's /CAS line. What is going on here?
In the initial inactive state, /RAS and /CAS are both held high by the VIC–II. Meanwhile, the CPU has set the 16–bit address on the full 16–bit address bus. The CPU has also set the R/W line, which is connected to the RAM's /WE line. Because /CAS is high and it's connected to /SELA, the multiplexers are passing through B0–B3, which is the lower 8–bits of the address bus. Therefore, all the VIC–II needs to do is pull /RAS low, and the RAM proceeds to read its address lines, and bingo it's got the lower 8–bits of the address bus for the memory matrix row.
To proceed to the RAM column (the high 8–bits of the address bus) the VIC–II needs to pull /CAS low. When /CAS is pulled low, it pulls /SELA low, which switches the multiplexers to provide the RAM with the high 8–bits of the address bus. The problem is that when the RAM's /CAS line is pulled low, its 8 address lines must already have the correct column address. You can see that in the timing diagrams shown earlier. The RAM's /CAS can only be pulled low after the RAM's address lines have been changed. However, there is going to be some unavoidable propagation delay in those 74257 multiplexers. When /SELA goes low, they need time (however brief) to fully and completely switch the Y–outputs from the B–inputs to the A–inputs. Only after that brief delay can the RAM's /CAS safely be pulled low.
This explains (at least I believe this explains) why the VIC–II's /CAS line passes through the PLA. On the other side, the PLA outputs as /CASRAM (the change of label is just so that each line that is logically independent gets a unique label) which goes to the RAM's /CAS line.
Passing /CAS through the PLA on its way from the VIC–II to the RAM doesn't do anything, except intentionally introduce a slight propagation delay. That delay is exactly what is needed to ensure that the multiplexers have stabilized the correct outputs before the RAM tries to read them.
The PLA is completely unclocked. The 74257 multiplexers are unclocked. The static color RAM is unclocked. The main dynamic RAM is not exactly clocked, but it is sequenced with careful timing by the VIC–II's control over /RAS and /CAS.
Therefore, from the perspective of designing digital electronics, there are important analog processes going on, with timings, and delays, and critically ordered operations that are happening independently of the clock. Evidently, they must also be happening much faster than the clock because the CPU (after working out exactly what it needs to do) must make its actual access to the RAM in just half a clock cycle. And within that half a clock cycle the VIC–II is subdividing the time further into sequenced /RAS and /CAS manipulations. And within those manipulations the 74257 and the PLA take time and are ordered. It's like, wow! There are actually several layers of sequencing going on below the precision of the CPU's clock cycle. Amazing.
The phantom data reads, explained.
We're going to get eventually to discussing the FLI graphics mode, but to have any of it make sense it's necessary to understand how the VIC–II and the CPU work together and how they access memory.
In figuring this out, I relied heavily on work that has already been done by people in the C64 Community, much of this work having been done long ago. One of the best resources I've ever come across is a document by Christian Bauer, (and kindly hosted and preserved on zimmers.net,) The MOS 6567/6569 video controller (VIC–II) and its application in the Commodore 64. It was published 25 years ago, in 1996. By my reckoning, this was half–way through the late–life period of the C64.2
6510 and VIC are both based on a relatively simple hard–wired design. Both chips make a memory access in EVERY clock cycle, even if that is not necessary at all. E.g if the processor is busy executing an internal operation like indexed addressing in one clock cycle, that really doesn't require an access to memory, it nevertheless performs a read and discards the read byte. The VIC only performs read accesses, while the 6510 performs both reads and writes. Christian Bauer – 1996 — 2.4.3 Memory access of the 6510 and VIC
If, like me, you've been around the C64 for many years and you've heard smart people talking about its technical details, you may have heard what was just quoted above. The 6510 makes a memory access on every clock cycle, even if it's not necessary, it performs the read access and then discards the byte.
Why would the 6510 do this? It seems like unnecessary work.
Here's what I've discovered. It is true that RAM is accessed even when the 6510 doesn't need to access it and the byte is not used, but it is a confusing overstatement to say that "The processor performs a read and discards the byte." This makes it sound as though the 6510 is actively doing something; it's making a memory access. But then, what to do with the byte it just received? Well, discard it, which sounds like an active process. Now that we know how RAM is accessed, we can see that it is not like this at all.
The VIC–II is responsible for /RAS /CAS sequencing the RAM. But the VIC–II has absolutely no idea what is going on inside the 6510. It cannot (in principle) know if the processor is in the middle of an operation or if it's on the cycle when it needs to make a read or a write, it just has no idea.3 Instead, the VIC–II is hardcoded to perform the identical /RAS /CAS sequence on every single clock Phase 2 half–cycle. That's the half–cycle that the CPU executes on. And you'll note that the /RAS /CAS sequence for the CPU is identical, regardless of whether the CPU will perform a read, a write, or if the CPU isn't even paying attention to the buses.
In other words, it's the VIC–II that is just blindly going through the same /RAS /CAS sequence every phase 2, and the RAM is thus merrily being driven (by the VIC–II) to perform a read. It's true that the 6510 is providing the address on the address bus, but that's because the address bus is connected to its program counter register, and the program counter register is just holding whatever value it was holding last cycle (or maybe it's hardcoded to autoincrement.) Similarly, it isn't that the CPU is reading in a byte and then discarding it. The RAM is putting the data on the bus as a result of the VIC–II having /RAS /CAS sequenced it, but nothing is paying attention to the data bus. So the 6510 isn't doing extra work that needs to be explained. The phantom reads are because the VIC–II and the 6510 together are doing less work, much less work than would be necessary to coordinate and prevent those unnecessary RAM accesses.
The VIC–II needs to take its turn
In the first half of the post Raster Interrupts and Split Screen, I explain why it is that although television (video inscribed by a cathode ray onto a phosphor screen) predated the first digital computer by several years, computers did not actually output to a screen until the 1970s. For the first 40 years, computers output first to punch cards that could be transferred to a machine that would transcribe them to a human readable printout, and then were married directly to machines that would print their output.4 Why did television and computers co–exist for 40 years before television technology was used to display what was going on inside the computer?
In a nutshell, computers were not fast enough to digitally generate the live video signal that televisions at that time were otherwise getting from a purely analog source. It's fascinating history. If you want to know more, I hope you'll find the aforelinked article a good overview.
The point then, is that in order for a Commodore 64 to produce a stable image on a screen, the VIC–II needs to continously generate the video signal in realtime. But the VIC–II can't just produce an image on a screen by magical means, it needs to know, at every moment, what pixel with what color needs to be output next. How many pixels are there? 320x200 is 64,000. Plus it needs to generate 60 frames per second (in NTSC–land), so that's 3,840,000 pixels per second. All that's left to worry about is color. So, obviously we want 24–bit color per pixel, right? I mean, we're not a bunch of barbarians. That'll be, 1 byte for red, 1 byte for green, 1 byte for blue. That's 3,840,000 px * 3 bytes = 11,520,000 bytes, for a throughput of just over 11MB per second. Wait... a... second... That's not going to work.
The C64 is clocked at 1MHz. The RAM can be accessed twice as fast, but the CPU is (or may be, since it's impossible for the VIC–II to know exactly what the CPU is doing) accessing the RAM on half of every clock cycle. That leaves the VIC–II the other half of the clock cycle to access memory. How would the VIC–II actually do that? Well, it's in charge of the /RAS and /CAS lines, plus it has lines that can order the CPU to take itself off the buses (via the AEC line.) So the VIC–II just asserts the AEC line every Phase 1 half–cycle, twiddles the /RAS and /CAS lines more or less as it does on behalf of the CPU, and sneaks in a read from the data bus while the CPU is completely oblivious.
If this were exactly what the VIC–II were doing, it would allow the video chip to access ~1,000,000 bytes per second, one byte per 1MHz clock cycle. But there are 60 frames per second, so it can access, at most, 1,000,000 / 60 = 16,666 bytes per frame. And there 64,000 pixels per frame, so that's 16,666 / 64,000 ~= 0.26 bytes per pixel.
Okay, so here's where even in the abstract theory the tire truly hits the road. As long as the C64 is running at 1MHz, and as long as it has a resolution of 320x200, the VIC–II cannot possibly read enough information—in the time available to it—to get 1 byte of data per pixel. In fact, it can't even get 1 nybble per pixel. And so before you even move out of the merely abstract, it is absolutely clear that a computer from the 1980s, with a 1MHz clock, has limits on the resolution, color depth, colors per pixel, and more, because the computer is right at the edge, it has the minimum possible speed it can have and still be capable of digitally generating the image that it generates for the television standard it's displaying on.
Moving beyond the abstract and into the physical
In the real world, the NTSC standard has 625 lines, which, if you divided in half would be a vertical resolution of, not 200, but 312 pixels. If the VIC–II had to fill those lines, the demand on memory would be even higher. The 200 lines are thus roughly centered vertically within the 312, giving a border at the top and bottom.
There are more complications brought by the physical world. The C64 was only able to be sold at a reasonable price and yet come packaged with 64 kilobytes of RAM because, unlike the VIC–20 before it, the C64 made the leap from expensive static RAM to relatively inexpensive dynamic RAM. There is a problem with dynamic RAM though. It needs to be refreshed constantly or its memory storage mechanism drains electric charge away and the information is corrupted and lost. Now, fortunately, the RAM chips themselves possess a special refresh mode. It's quite clever how it works.
Recall how RAM is accessed by manipulations of the /RAS and /CAS lines. While /CAS is high, /RAS is pulled low and the RAM chip draws in half the address bus's address, 8–bits. Then while /RAS remains low, /CAS is pulled low. To tell the RAM to do a refresh, the manipulations on the lines are switched up a bit. /RAS is pulled low, 8–bits of the address bus are read in and activate one memory matrix row. But then, instead of /CAS going low, if /RAS is raised first, and then /CAS goes low, the RAM goes into what is called a RAS–Only Refresh. It's an internal mode that refreshes all of the memory columns in the activated memory matrix row, 256 bits. And since all 8 memory chips are doing this in parallel, 256 bytes can be refreshed in half a clock cycle. Very slick.
Of course, we already know it's the VIC–II that controls the /RAS and /CAS lines. The VIC–II therefore needs to periodically squeeze in refresh cycles to the RAM. On any cycle when the VIC–II is signaling the RAM to refresh a row it isn't simultaneously able to retrieve data, so some of the VIC–II's memory bandwidth is taken just to perform required maintenance on the dynamic memory.
Now if we were a TED chip in a Commodore Plus/4, we could sacrifice features in order to lend bandwidth to pulling more color information. But we're not a Plus/4 (lovingly called a Commodore Minus/60), we're a Commodore 64, a world famous, classic home gaming computer. The VIC–II has 8 hardware sprites. Somehow the VIC has to also retrieve sprite data from memory, which further robs the VIC–II of its precious memory bandwidth.
There are other complications of the physical world that come from the fact that a CRT display uses an electromagnetically directed electron beam emitter to draw on the screen. That beam takes time to move from one side of the screen to the other, or from the bottom of the screen back up to the top. While the physical beam is being repositioned, the CPU's clock is still ticking. The VIC–II can't profitably use those clock ticks, and so its theoretical memory bandwidth is constrained once again.
Black and White? Or Color, but with limitations.
There are reasons the original Macintosh (and the Lisa before it) were black and white. Because everything is a trade–off. The Macintosh had a higher resolution than the industry standard 320x200 (it had a relatively high resolution of 512x342, and if memory serves it had a higher refresh rate too.) You have a limited memory bandwidth, but by sacrificing color they could increase resolution, frames per second, and redraw more quickly.
Bitmaps were called bitmaps for a reason. On a black and white screen there are only two states for a pixel, on or off. There is a one–to–one mapping of bits to pixels. A span of memory makes every bit of data map directly to the state of one pixel on the screen.
Even on a black and white display, however, the color (that is seen by your eyes) is not defined by the bit value. That may sound odd, but on old computers, where memory bandwidth is severely limited, the value held in memory does not define the color, but instead defines an index to a color. Let's flesh that out a bit.
On a color display, pixels are made up of red, green and blue subpixels. On a modern computer, with, say, 24–bit color, one byte defines the brightness or intensity of the red subpixel. Red can therefore range from 0 to 255, dark or fully off to full brightness. Another byte defines the brightness of the green subpixel and a 3rd byte the brightness of the blue subpixel. In this scheme, the 24–bit value held in memory is a literal numeric description of the color, within a 24–bit RGB color space.
But a bit of one or zero is much more abstract. It's just on or off, or it means foreground and background. What color the video chip puts on the screen for foreground and background is independent of the data in memory, because the bit value is only an index. A given video chip might present a zero as dark blue (with a certain RGB value hardcoded into the circuitry), and a one as a light blue (via another hardcoded RGB value.) But some other video chip might output the same bitmap as black and white, black and amber, or black and green, etc.
I have learned something about the Amiga 1000's display capabilities. I'll describe that, and then work backwards from there to the Commodore 64.
Bit planes and Amiga 1000 Color Graphics
The Amiga 1000's video chip has 32 color registers. Each color register is 12–bits wide and defines an RGB color. 4 bits for red (intensity of red subpixel can be specified from 0 to 15.) 4 bits for green, and 4 bits for blue. That's 2^12, or 4096 colors, evenly distributed through a 12–bit RGB color space. However, unlike modern 24–bit color graphics, the Amiga's RGB colors are not defined by the data that is in video memory. The colors are defined by the values in the color registers in the video chip. And there are only 32 color registers. So the Amiga 1000 is capable of a "palette" of 32 colors, where each color can be chosen from a space of 4096 colors.
The data found in memory (RAM) defines an index into the palette. How many bits do you need to specify 32 possible palette indexes? 2^5 is 32, so you need 5 bits. But, 5 bits is kind of weird. It's more than a nybble but less than a byte. Imagine if you used a full byte to hold just the 5 index bits. You'd need 64KB to define a graphics screen at 320x200 pixels, or 128KB to hold a full graphics screen at 640x200 pixels. But, the whole machine only came with 256KB of memory, into which it needed to fit its multi–tasking graphical operating system.
Instead, and in order to save memory, the Amiga's video chip uses bit planes. One bit plane, 320x200 (64,000 pixels, 1 bit per pixel) takes just 8KB of memory. However, a single bit plane provides only 1 bit per pixel, which is enough to select between the zeroth or the first color register, or, two colors for the whole screen. Two colors for the whole screen, just like the original monochrome Macintosh, except that those two colors can be define as any two colors out of 4096.
It gets much better though. The Amiga's video chip can utilize more than one bit plane at a time. Adding a second bit plane, that takes 8 more kilobytes, or 16KB total for 320x200 resolution. Now, the video chip combines the corresponding bits from both planes for a total of 2 bits per pixel (without any wasted memory), for a total of 4 colors per pixel, or 4 colors for the whole screen. And once again, the 4 combinations or each pair of bits are an index into the first 4 color registers. So each of the four colors can be any color out of 4096. Well, any classic Amiga user should recognize this. It's what Workbench uses. The original Workbench uses just 2 bit planes, thus taking only 16KB of memory for the video display. And preferences lets you customize those 4 colors by setting the 12–bit RBG values of the first 4 color registers.
Workbench 1.3, classic 4 colors: White, Black, Blue and Orange.
And that's what gives you that classic Amiga Workbench look, with its default 4 colors: white, black, blue and orange. Which, ironically, to my eyes, always looked kind of crappy compared to typical Commodore 64 programs which frequently used all 16 colors on screen at the same time. But we'll come to that next.
Through the addition of bit planes, up to a maximum of 5, the Amiga has something like the following table of color options.
This is amazing. For a mere 80KB of memory, the Amiga can display 640x200 pixels, each pixel can be independently set with a value from one of 32 colors in a palette, and each color in the palette can be configured from a 12–bit RGB space of 4096 colors. It let artists produce amazing images, and that's before you start introducing additional video tricks. For 1985, no wonder the Amiga totally kicked ass!
The Commodore 64's Graphics Modes
To start with, the Commodore 64 has only 1/4th the RAM of an original Amiga 1000. Or only 1/8th the RAM of most of the early Amigas that came with 512KB. The Amiga also runs a lot faster than a C64. It has a 7MHz clock, but the C64 runs at only 1MHz. So, as discussed earlier, the bandwidth available for accessing memory is much higher on an Amiga.
The C64 has a fixed resolution of 320x200, but it is able to display 16 colors on screen at the same time, in every mode. That's why most C64 programs are more colorful than the original 4–color Amiga Workbench. But, according to our table above, in order to display 16 colors at 320x200, the Amiga needs 32KB of memory for graphics data. That's 50% of the C64's memory! There is no way the C64 can sacrifice that much memory. But what's more, the numbers don't add up. A pixel needs 4–bits to hold a 16–color value. That's a half a byte, but we've already shown that the VIC–II can only access ~0.26 bytes per pixel.
The answer to this puzzle is that the C64 can display 16 colors, but not every pixel can be individually assigned one of 16 colors. The VIC–II has several modes, and all modes use a single "bit plane" (to use the terminology of bit planes.) Each individual pixel is only ever defined by a single bit. Typically this would limit the whole screen to only displaying one of two colors, however, in addition to the "bit" map there is a "color" map. The color map divides the bitmap into 1000 cells. Rather than a bit selecting from two colors for the whole screen, it selects from two colors for that cell (in standard hires bitmap mode.)
All of the C64's standard graphics modes are variations on a theme. Each mode uses very nearly the same pattern of memory accesses, but with different interpretations of the available data. The standard modes are: Character, Multi–color Character, Hires Bitmap, Multi–color Bitmap, and Extended Background Color Mode. I'm only going to discuss the first 4 modes in this two–part weblog post. The details of these modes, and how they are extended in FLI, by intervention of the CPU, will be the details of part 2.
Wrapping Up Part 1: Memory Access
We now have a pretty good idea about how the Commodore 64's 6510 CPU makes its memory accesses, and how the VIC–II is able to make memory accesses on every other half–cycle.
We've seen that the VIC–II is responsible for activating the RAM's RAS–only refresh mode. And that the VIC–II has special access to its own 4–bit color memory, which will allow it to access color memory concurrently with a main memory access.
We've looked at what it would take for graphics data in memory to represent a full RGB value directly, and how the Amiga uses bitplanes to composite an index into a customizable palette. The indexes into a palette dramatically decreases the amount of memory required to represent a detailed color image. But we've also seen that despite the savings of the Amiga's bitplanes and indexed palette colors, it is still not efficient enough for the scanty 64KB of memory in a Commodore 64 and other 8–bit machines, nor is access time on a 1MHz bus fast enough.
The VIC–II and the Commodore 64 must resort to another trick that boosts memory efficiency and decreases required memory bandwidth a whole other step. By using a single plane of bitmap, combined with an extra layer, called a color map, that divides the bitmap into 1000 cells. Each cell has colors associated with it, and the pixels within a cell are thus relative to the colors defined for the cell.
- My guess is probably all digital chips have their speed of operation depend on temperature. But, this is not a thing programmers think about. When programming, the precise digital operation of a computer is taken for granted.
- See the introductory discussion of the post, C64 OS Subsite and Guides, for my personal breakdown of the life–cycle eras of the Commodore 64. 1996 is around when the SuperCPU was first released, very much in the late–life of the machine, and shortly after the demise of Commodore Business Machines.
- By the way, the VIC–II's ignorance, about what stage of operation execution the CPU is in, is not merely academic. This matters for understanding FLI timing too.
- Hence why all these languages, like BASIC, C, Pascal, Fortran and so many others have "print" command. Because, originally computers output to the user by actually printing.
Do you like what you see?
You've just read one of my high–quality, long–form, weblog posts, for free! First, thank you for your interest, it makes producing this content feel worthwhile. I love to hear your input and feedback in the forums below. And I do my best to answer every question.
I'm creating C64 OS and documenting my progress along the way, to give something to you and contribute to the Commodore community. Please consider purchasing one of the items I am currently offering or making a small donation, to help me continue to bring you updates, in–depth technical discussions and programming reference. Your generous support is greatly appreciated.
Greg Naçu — C64OS.com