Subscribe to with your favorite RSS Reader
January 9, 2017#13 Programming Theory

Organizing a big project

Post Archive Icon

Happy New Year C64 Enthusiasts! 2017 holds great things to come for the box that rocks.

I was continuing my programming work on C64 OS when I encountered a rather unusual (for me) error. The status line read: Label Overflow!! That can't be good. After checking in with the good folks at #c-64 on IRCNet 1 I learned that this is not an uncommon problem when coding native.

This seems like a good opportunity to talk about how I've been writing the code for C64 OS. My primary rig consists of a flat C128 which I upgraded with JiffyDOS and a uIEC/SD with 256meg SD Card. I've also got a CMD HD and 1581 hooked up. In the expansion port I've got a CMD 1750XL 2 megabyte REU. I'm using a 1084 monitor atop one of my new stands, and a 1351 mouse. I also have an old 11" MacBook Air which I use mainly for IRCing while I'm coding, and for reading documentation online. I also use it to retrieve software for the C64 which I transfer to the SD card via a USB SD Card reader.

My C128 before and after JiffyDOS installation
Here's my C128 before and after JiffyDOS installation.

On the C128, I'm using Turbo Macro Pro+ REU. I hear from the guys in IRC that this is one of the best native assemblers around for the breadbox (aka, no SuperCPU). It very conveniently takes advantage of that REU as well. When coding you can assemble and then start your code. But before it puts your code into the C64's main memory it transfers your source code as well as some metadata such as what line you're on, into the REU. It places a small routine low in the stack2 that when jumped to will swap TMP and your source code back into main memory and then jump back into TMP right where you left off. You do this with a SYS320 from basic.

This makes the cycle of code-assemble-test-code pretty fast and easy. Even if your code seriously crashes and locks up the computer, you can hit the reset button (which the C128 happens to have) and SYS320 will still bring you right back to where you left off. Even if SYS320 doesn't work because your code corrupted it somehow, after loading TMP from disk again, it will find your source code in the REU and swap it back in and you won't lose your place. It's pretty great.

There are of course a few downsides. If you've gotten used to working and coding on a PC or Mac with a big fat screen and lots of open windows or tabs, it can be a bit of a daunting experience to go back to the relatively tight confines of a C64's resolution. You can only work on one file at a time, which is a big pain if you need to reference external files for the names of constants. And also, it is amazing how narrow 40 columns is when it comes time to write comments beside your lines of assembled code. Oh, and there is no syntax highlighting. Here's how it looks when I code:

Coding in TurboMacroPro+
This is a sample from the custom keyscanner, based on the 3-key scanner by Craig Bruce.

Now, I've been working on C64 OS for only about two months. It's been slow coding by my usual standards, but I've been learning about 6502 ASM, the assembler and other tools themselves, and also studying the KERNAL, memory layout, interrupts, I/O, disk formats, and more at the same time. My original goal of keeping C64OS's core to 4KB was driven by a desire to have it all fit in himem, $C000 to $CFFF. I don't know if I'll be able to stick to that, but most likely I'll have to break some of the components I wanted into loadable modules that don't stay resident at all times, but also don't fit in that 4K. Despite the coding going slowly, I feel myself to be making good progress. In absolute terms there isn't going to be a whole lot to code if my goal is to have it fit in just 4K. But it is also surprising how much can be fit into such a small amount of space.

So back to my label overflow error. I've now got approximately 2K of compiled code, maybe a hair more. That includes: Paged memory manager with availability map, Jump Table, ASC/PET/SCR code conversion, Mouse driver, Keyboard scanner, mouse and keyboard event generation and propagation, a custom IRQ service routine, a primary event loop, most of the text screen compositor and the beginning stub of the Menu UI. Not a bad start, and a sizeable amount of stuff for only 2K. In order to write all this, I had up to this point, written everything into one ASM file called c64os.a. As I was working on some final stages of the key event generator I got the dreaded Label Overflow!! error.

After talking about this with the guys in #c-64 on IRCNet, their generally consensus was that I should just move to cross assembly. I thought about it. I really thought about it. I mean, what am I supposed to do otherwise? The native assembler just hit its limit of how many labels it can manage in a single source file, and I've only got enough code to generate 2 kilobytes of object code. But, I like coding native.

If you're wondering why anyone would subject themselves to coding like this, it's because for me, programming on a C64/128 is a hobby. It's an activity which enables me to enjoy using my C128 again during my down time. I spend 9 to 10 hours a day, 5 days a week, programming professionally and staring at BBEdit on a Mac. Yes it's harder to program natively on a C64, but it's also different. And different is good, because it's a break and a getaway from the usual.

I am resolved to coding native. I just don't want to sit around coding on a Mac (or a PC) while my beloved C128 sits there idle on the desk waiting for some code to be copied over to it. The next thing you know people would be saying, why don't you just test your code in an emulator, it would be so much faster! And then my Commodore could just sit there and do nothing. That's not the path I feel like going down. Therefore, I need a technique for organizing and coding a larger project in smaller chunks. And believe I have come up with a good solution.

My Solution

I know that in theory these are solved problems. After all, modern OSes already have linkers and loaders and dynamic code relocation, but with those abilities comes a ton of overhead. The C64 has only one main binary file type and a very simple loader. The file type is PRG, and the first two bytes are the low-byte high-byte address in memory of where the rest of the code should be loaded into. It is purely assumed by the KERNAL, that implements the loading, that the code will not overflow any memory areas that it is not supposed to. Consequently it loads the whole file from disk in as tight a loop as possible. Not so for sequential and other file types of course. A sequential file could be megabytes on disk. Loading one has to be done a byte at a time and care must be taken about where each byte should be stored.

The built-in loader is so primitive that it also assumes that every memory address is fixed. If it loads in at $2000, and there is a JMP instruction at $204A that jumps to $30FB, well then by golly it's going to jump to $30FB, and your code better well be there or you can expect a crash. This is the result of the assumption that the C64 is a unitasking computer, and that whatever program is running can do whatever it wants with all the memory available to the whole machine. For the most part, I want C64 OS to maintain this flexibility. The golden rule to C64 OS is to be true to the machine. With such limited resources, it is an advantage that there is so little overhead to getting some code into memory and running. We don't want to lose that.

The concept of a Jump Table is very at home on a C64. The final range of the KERNAL rom, that occupies $E000 to $FFFF, is dedicated to a Jump Table to all the routines that the KERNAL makes publicly available. A Jump Table consists of a series of JMP $XXXX calls, back-to-back. Each call requires 3-bytes, one for the opcode two for the absolute address to jump to. Therefore the offsets into the Jump Table remain stable even if the address bytes are changing. This allows the KERNAL to be updated, some routines can become longer, others made shorter, the routines can be moved about in memory, and the addresses of the Jump Table are updated to point to the start of the routines. Then the addresses of the Jump Table are documented and programmers should JSR to the Jump Table. This JMPs to the address of the routine, which does something and RTSs back to where the program left off.

C64 OS has also had a Jump Table, from its inception. C64 OS's code sits at the end of $CFFF such that its Jump Table will remain stable there. When all the code was in a single ASM file, this was easy to implement. Certain routines, for say, allocating memory, or rendering a text layer, or reading a mouse event off the queue, their code is all physically above the Jump Table in the source. Each of these routines has a label, and so the Jump Table consists of a series of JMP label, JMP label, JMP label. As the routines change their lengths and get rearranged, the Jump Table automatically gets compiled to point to the correct addresses. Perfect, and easy.

But, what happens when all the mouse and keyboard routines are pulled out into a seperate source file, to be compiled to a separate object file. And the Menu UI routines are separated into their own assemblable object, and the Screen Compositor routines to their own file. How is the Jump Table going to know where all those routines are? That's the problem.

A module of code, such as say the mouse and keyboard routines, have lots of subroutines within but only a few that need to be called externally. So each module of code will have a mini Jump Table at the top of the file, which uses labels to locate the addresses within that module. These JMPs take up 3 bytes each and reside at the start of where the module will be loaded in. Each module has its start address defined at the beginning of its source code. At first, as these modules are being developed, I'm not sure how big each will end up being, so I am just spacing them out throughout $Cxxx. I then have an includable file of constants called linker.s. This file has a label for each module, defined as the start address for that module. The main Jump Table is then implemented as successive 3-byte offsets from the start of each module.

Here's how the top of a module of code looks:

The top of a module of code showing exports jump table

The initial includes are for constants only, no code. *=$c680 declares that this module will assemble to $c680, and will be subsequently loaded to there. The very next thing in the file is a mini Jump Table of exports. The first JMP will be at $c680 exactly, and the next 3-bytes in, and if there were more they would sit in another 3-bytes each. The labels on these JMPs are to code found below in this file. So the assembler knows how to resolve these. Now in the main OS's Jump Table, here's how that looks:

The main jump table with offsets into modules

The linker includes an entry that defines a label as $c680, as well as labels for each of the other modules and their start points. The Jump Table above shows that each exported routine has an entry represented as a 3-byte offset from the label that points to the module. The module, muskey, which implements mouse and keyboard tracking, scanning and event queuing functionality exports several routines. The Jump Table references these as muskey+0, muskey+3, etc. The address in the comments is the address of the entry in the Jump Table.

If I rearrange the inner contents of, say, mouse&key.a, but leave its start point and exports table the same, then the main Jump Table remains the same. If I need to move the start point of the mouse&key module because it needs more room, I move it, assemble it, update linker.s with its new start address, and then reassemble the main Jump Table. The addresses to the routines in the main Jump Table of course stay the same. readmouse continues to be found at $cfc7, for example. A JSR to $cfc7 will JMP to the Jump Table at the top of the muskey module, which will JMP to the actual routine in the module. It requires a double jump... But, it removes the memory limitations of writing more modules.

At the moment, the exact list of exports from each module is still in flux. So, even the main Jump Table isn't stable. But, when each module reaches completion how big it is will become known. At that time, I will lay them out such that they pack nicely into memory one after the next. This is not too onerous as there are only 5 or 6 modules that make up the entire OS. Here is a look at one my project file system looks like, now that I've split the source into modules each with their own headers.

The project directory with multiple files for modules

And that's it. I get to keep writing code native, and I don't need to worry about the assembler having to assemble the whole thing with every change. In fact, it should be faster because as I'm working I only need to assemble one module's worth of code at a time. Of course, now I'm going to need a booter that will be responsible for loading in all the modules. And that will be discussed in another post.

UPDATE: March 6, 2017

I am after all still learning 6502 ASM. I have since discovered a much more practical way of having the system's jumptable reference into the code found in an assembled object, what I call a C64 OS module. See the updated information here.

  1. Easiest way to get on IRC?

    For the crew of European demo coders, sign in here: At the status page, type /join #c-64 into the text field at the bottom of the page.

    For the more North American-centric group of C64 enthusiasts, sign in here:
    then /join #c64friends. []
  2. Remember that the stack works in reverse. When you push something onto the stack the stack pointer decrements. So, low in stack memory means it's unlikely to be clobbered unless your code is going to use an epic amount of stack. And, most programs use very little stack in practice. []