Written and Maintained by Gregory Nacu

Featured Posts

C64OS.com has grown to be more than just a blog about one developer's progress, it is becoming a resource to surround and support a type of C64 user that wants to benefit from the Commodore community and get the most out of modern hardware expansions for their beloved platform.

After writing many posts on the C64 OS weblog, the unfortunate reality is that some of my best work gets lost in the stream of news and developments. Be sure not to miss these full–length editorial reviews:

May 16, 2017Editorial

Review: FREEZE64 Fanzine

December 5, 2016Editorial

World of Commodore '16

Programming Reference

August 4, 2017Programming Reference

6502 / 6510 Instruction Set

August 4, 2017Programming Reference

Commodore 64 PETSCII Codes

August 3, 2017Programming Reference

Commodore 64 Screen Codes

Search

Needs some ideas? Trying searching for:
PETSCII, Animation, Memory or Pointers

Recent Posts

October 18, 2017Programming Theory

A C64 OS App's Start Of Life

September 26, 2017Hardware

New C64c Cases Available

September 18, 2017Programming Theory

Organizing a Big Module

September 11, 2017Programming Theory

Toolkit Introduction

August 15, 2017Programming Theory

Organizing Module Layout

August 4, 2017Programming Reference

6502 / 6510 Instruction Set

August 4, 2017Programming Reference

Commodore 64 PETSCII Codes

August 3, 2017Programming Reference

Commodore 64 Screen Codes

August 1, 2017Programming Theory

Base Conversion in 6502 (2/2)

July 21, 2017Hardware

Commodore Logo Mark Patch

July 5, 2017Programming Theory

Object Orientation in 6502

June 30, 2017Programming Theory

Base Conversion in 6502 (1/2)

June 20, 2017Software

Huge Site Update

June 5, 2017Software

Recursive File Copier in BASIC

May 29, 2017Technical Deep Dive

Implementing Factorial in 6502

May 16, 2017Editorial

Review: FREEZE64 Fanzine

May 9, 2017Programming Theory

Pointers in Practice, Menus

May 1, 2017Programming Theory

Loading Sequential Files

April 27, 2017Programming Theory

HomeBase Applications

April 21, 2017Programming Theory

Application Loading

April 6, 2017Programming Theory

Memory Manager Development

March 27, 2017Software

Petscii Art Animation

March 27, 2017Programming Theory

Making Course Corrections, Timers

March 21, 2017Software

PETSCII Art Renderer

March 6, 2017Programming Theory

Code Module Exports Table

Older Posts

Full Post Archive

Subscribe to C64OS.com with your favorite RSS Reader

News, Editorials, Progress and Reference

October 18, 2017Programming Theory

A C64 OS App's Start Of Life

I'm pretty close to closing the circuit on being able to launch and run an app in C64 OS. It might feel like it's taken a long time to get to this point, it's been almost a year. But when I think back on my progress, I'm not disappointed. I've had to learn a lot, and I feel like the whole process has really been pulling myself up by my bootstraps. Not only have I learned, realistically, to write code in 6502 ASM, but I've learned a lot about all the various parts of a C64 along the way.

I have my C64 Programmer's Reference Guide, the large, double–sided, fold–out C64 schematics from 1982, and the Advanced Machine Language for the C64 by Abacus. I've also read extensively from the 1541-II user guide and the CMD HD manual, and the Complete Commodore Inner Space Anthology has been super useful. And of course, poring over the disassembled and fully commented BASIC and KERNAL roms and understanding how those routines work has been very important. Learning about software and hardware, and electronics theory at the same time has been unexpectedly rewarding for me intellectually. I feel as though I've learned more about technology in the past year of induldging in C64 retro computing hobby than I have in my fulltime professional software developer career over the past 5 or 6 years.

And let's not forget to add, whenever my fingers get tired of typing, or my butt starts to ache from sitting in my chair for too many hours in the day, I've thoroughly enjoyed the countless of hours of work and planning I've put into my C64 Luggable project. The hours I've sunk into coding and designing C64 OS is nearing a turning point where I can actually launch and run applications. This seems like a good time to take a look at what it's like for the system to start up and launch the default home application, in this case, App Launcher. I've already written at least one post on my ideas for launching applications, so if you're curious you might want to read that too.


The Booter

As I've gone over in previous posts, the C64 OS KERNAL (for lack of a better word) is divided into approximately 10 modules. These are implemented as one assembly source code file per module, plus a header file (*.h) that declares the exported routines, and an includes file (*.s) that declares constants and macros. The booter is not part of the C64 OS kernal, and it isn't launched the way a proper application for C64 OS launches.

The booter launches more or less the way a standard C64 assembly program might launch. It's assembled to $0801 and starts off with the basic preamble that does a SYS 2061 to kick it into action. The booter must be in the system folder, AND the drive whence it is loaded must have its current partition and current path defaulted the partition and path of the system folder. For example, you cannot load the booter with a command like this:

	LOAD "2//SYSTEM:BOOTER",8

If you have to specify the partition and path such as in the above example, because the current default partition and path are somewhere else, the booter will fail. I tested this out on the Wheels "starter", just to see if he did something more clever than me, and the Wheels starter has a similar limitation. It completely craps out if the default partition isn't where the starter is located. Although, I believe the starter must also be in the root directory of its partition. This is not the case with C64 OS.

Let's get this out of the way, C64 OS requires the system to be on a storage device that supports subdirectories. This could be an IDE64, or a CMD HD/RL/FD, or any of the wonderful SD2IEC variants out there. Please see the The Commodore 8-Bit Buyer's Guide for what purchase options you have. Other files, documents, images, whatever can be stored on other any other media, like 1541/71/81 or their emulation partitions on CMD devices, or in .DXX images on SD2IEC devices, etc.

When the booter runs, it configures the system file reference.1 It uses $BA (186) to grab the current device number, then it uses a C-P command to read the current partition number from the boot device, and writes these into the system file reference. Next, it opens the sequential file, "config.t" and starts reading it in. It finds config.t in the current partition and path, and if it doesn't find it there, it craps out. The very first line of config.t (subject to change, of course), is the path within the current partition of the system folder. This path entry is essentially just a reflection of the very path where this file is stored. So, isn't there a way to just get this path (present working directory) dynamically? Actually, reliably getting present working directory is a severe pain in the butt, given the way the various file systems on Commodore compatible drives work. If you don't believe me, feel free to read this lengthy thread about the topic. I actually contribute to that thread, in which I mention exactly what I'm talking about here. C64 OS only needs to know where its system folder is, which it gets partially dynamically and partially by the config.t's path entry. After that, all file references are handled in a standard way internal to the OS. And the concept of a PWD simply becomes a file reference held and manipulated by an application.

Before doing any config, the booter loads all of the C64 OS KERNAL modules into memory. Then it reads the config file and builds the system file reference, and a few more config variables such as default system colors. It then runs a drive detection routine and sets up a table of addresses to device types. The booter then JMP's to "runhome" via the system jump table. This JMP, of course, never returns, and nothing from the booter is ever executed again. Any memory it occupied is already marked as available by the KERNAL's memory manager.

Screenshot of code from the booter
Sometimes I give you the source code beautifully formatted in a GitHub Gist. But today you're getting an authentic view of what it looks like to see this code the way I see it, sitting in front of my flat c128.

System Services

One of the C64 OS KERNAL modules is service. It is a bit meta, but provides a few critically important, well, services. It handles environment variables. It holds onto the aforementioned system file reference. It tracks which home application it should load when an application quits. And it has the runhome routine that is actually responsible for finding and launching the home application. Service is also the module that hosts the main IRQ service routine.

When first called, runhome checks to see if the home app's file reference has been initialized, and if not it allocates memory for it, and constructs it relatively from data in the system file reference and environment data about the current home app. That is very nearly all it does. It then loads the pointer to the home app's file reference into the .X and .Y registers and JMPs to loadapp via the system jump table. After this point, the home application is treated exactly the same as any other app. The only thing special about it is that there is a system service routine that always knows how to get back to the home app.

Screenshot of code from runhome in services 1 Screenshot of code from runhome in services 2 Screenshot of code from runhome in services 3

File Services

Another C64 OS KERNAL module is file. Ideally, it handles all file system accesses. It is actually very lightweight, however, because the C64 OS KERNAL does not replace the C64's KERNAL rom, it just augments it. If you have JiffyDOS then all the actual serial routines are handled by the JiffyDOS rom, including all the KERNAL rom's standard vectors that allow the IDE64 and RamLink to work.

So you might think, well, what does the file module actually do then? Mainly it knows how to work with C64 OS file reference structs, which are constructed and manipulated by other parts of C64 OS and its applications and utilities. These abstract the much more primitive KERNAL system of logical file numbers, and the KERNAL routines SETLFS, SETNAM etc. A C64 OS file reference contains a device address which is used to look up the device type ID from the table built by the drive detection routine that was run by the booter. From that ID file knows if the device supports partitions and subdirectories. The file reference also contains a 16-character filename, a partition number and path. As well as a dynamically managed Logical File Number.

A quick aside; The path can be upto about 230 characters long, which if you have 16 character directory names, plus a one-character delimiter (/), supports nested folders a minimum of 13 levels deep. These can go deeper if the directory names use fewer than 16 characters. 1 character folder names, plus the one-character delimiter, means a maximum possible folder depth of 113. Or an average of about 25 nested folders.2

The file module manages Logical File Numbers (LFN), the ones used by the KERNAL rom, for you. You create a file reference, and open it for read or write. The file module automatically handles getting you a Logical File Number that's not in use, changing the device's default path and partition, and opening the file. Regardless of how you've navigated around the directory structure of the device. When you close a file, it doesn't destroy the file reference, it merely releases the LFN back to the pool, and sets the LFN on the file reference to indicate it's not open. Saving over a previously opened file in a text editor becomes effortless. The reference doesn't just remember the filename, but also where it came from, and will write it back to the correct place.

Screenshot of code from loadapp in file 1 Screenshot of code from loadapp in file 2 Screenshot of code from loadapp in file 3

The first thing loadapp does is initializes the Toolkit. Toolkit has two tables, one for allocating its built—in classes, one for initializing those classes. If an application wants to subclass and extend those classes, it can do so by copying the allocation and initialization tables, which it then extends, and changing the vectors that Toolkit uses to find those tables. Reinitializing Toolkit resets the vectors back to the built—in alloc and init tables. This is necessary to recover from any subclassing efforts of the previous application.

Next it calls prepdisk, which changes the default partition and path of the device so that subsequent disk accesses are relative to the application's bundle. Then it reads the "menu.m" file which every application must have in its bundle. In order to read menu.m, it uses the blksize routine to get the number of blocks the file takes up on disk. Then it allocates the same number of blocks in memory, and reads the file into that allocated space. In C64 OS, atomic reads, that is, file reads that will be opened, read, and closed in short sequence, can use the reserved "templfn" for a temporary logical file number, rather than allocating and releasing one from the pool.

Screenshot of code from loadapp in file 4 Screenshot of code from loadapp in file 5

With the menu data in memory, loadapp JMPs to mnuinit in the menu module. However, I'm not going to go down that rabbit hole in this article. Suffice to say it recursively builds a tree of menu nodes which the mnudraw routine knows how to draw.

The memory manager allocates from the top of memory down, remember, so allocations for the menu data and file references and so on are allocated out of memory starting up near where the C64 OS KERNAL is. This is important, because next loadapp loads "main.o", the application's main executable file, out of the app's bundle. The application's main.o must be assembled to $0800. The KERNAL rom's load routine returns the address up to which the load ended in .X and .Y, low byte, high byte respectively. The high byte is the page number and C64 OS uses a paged memory manager. So .Y is left holding the end page number, and .X is loaded with $08, the starting page number, before doing a JSR to pgmark in the memory module. This tells the memory manager the range of pages to flag as allocated.

Next loadapp initializes the mouse, by calling initmouse in the input module. The previous app has the ability to disable the mouse, this will re–enable it. This will also load it from disk, if it hasn't already been loaded. The mouse cursor's sprite data is held in its own file in the system folder and initmouse uses the system file reference to know where to find that.

In general C64 OS uses a mouse—based UI. It provides a rich mouse–based event system that is designed hand–in–hand with the Toolkit, and it also provides a hierarchical pull down menu system. Together these make building an application with a big feature set or a complex UI reasonably simple. However, there are some kinds of apps that might wish to opt out of some of these provisions. C64 OS reserves the top row of characters for itself. There is a CPU busy animation that spins in the top left corner if an app goes into a long loop and stops responding to user input. Then there is the menu strip itself, and in the top right corner is the clock.

There is already a call to disable the mouse pointer, which short circuits the mouse driver so it stops reading the mouse and stops producing mouse events. There is also a set of system draw flags, three bits of which represent the three top row services. The CPU busy, menubar and clock can be prevented from drawing by disabling these flags via the setflags routine in the service module. loadapp calls this to re–enable these features, in case they were disabled by the previous app.

Applications also have the ability to tap into the system IRQ for timed events. loadapp resets the two vectors for custom timers back to their defaults.

Lastly, loadapp makes two more calls, the second of which it never returns from. The first is a JSR to initapp, and when this returns it does a JMP to evtloop. Both of these calls demand an entire section to describe.

Inside the Application

The reason it's so essential that an application is assembled and loaded to $0800 is because C64 OS expects the application to have its primary jump table starting at $0800.

Screenshot of code from home app 1

The jump table consists of a series of export vectors, exactly the same way each C64 OS KERNAL module begins with a table of vectors. In an application, they are as follows:

  • .word init ;Called with the application first launches
  • .word mcmd ;Called when a menu command is triggered
  • .word quit ;Called when the application is about to quit
  • .word frze ;Called when the application will be frozen
  • .word thaw ;Called when the application becomes active after being frozen

The init routine is therefore the first entry point of the application's actual ability to run any code. But remember, in C64 OS, like in modern OSes, the application doesn't ever just go into a hard loop polling for something like a key to be pressed. Instead, it sets up some UI and returns and lets the OS handle the main event loop. Which we'll get to shortly.

In order for an application to interact with the system at all, to be informed of mouse and keyboard events and to be told when to draw, it must push a screen layer. Typically what it would do is first create the basic UI by instantiating and assembling Toolkit views, and wiring up the event callbacks from some of those views to routines. Then it puts the pointer to the screen layer struct into .X and .Y and calls layerpush.

Screenshot of code from home app 2

The screenlayer struct consists of 4 pointers to: a draw routine, mouse event, key command event and printable key event handlers. The layer is then pushed on to a stack that supports 4 layers. The topmost layer (4th is topmost) gets event priority. The menu system, cpu busy and clock are hardcoded to draw on layer 4. When the application is initialized and it pushes its first layer, it is pushing layer 1, the bottommost layer. At that time, layer 2 and 3 are unassigned.

When a mouse event is generated, layer 4 has the first shot at processing it. Layer 4 delegates this task to the menu system. If the menu system decides the mouse is not interacting with the menus then the event is allowed to be processed by the next layer. 3 and 2 are skipped because they're not assigned and the mouse routine from layer 1, is called. This is code the application implements. So technically the application can do anything it wants. With the caveat that it shouldn't go into a long loop, it should always try to return back to the main event loop as quickly as possible so as to remain responsive to the user.

While the app can do whatever, typically the app will just foward the call to the toolkit. That's what the screenshot of this app is doing. All three of the screen layer's event pointers are set to "forward" which simply does a JMP to tk_proc. Similarly the draw pointer is set to tkdraw.

You might wonder, what's the point of this layer system if it ultimately just calls routines in the toolkit? Why doesn't the main event loop just call those toolkit routines directly? The short answer is flexibility. It is important that any OS strikes a good balance between easy and flexible. The Toolkit handles a tremendous amount of redundant work. And it's super easy to snap a few of the views together to get a useful, responsive and consisten UI. But if you were forced to use the Toolkit there might be things you'd want to do that would be very difficult.

The way I've set up C64 OS, imagine you are fully onboard with using Toolkit, but you want to have some special keyboard command that isn't represented by a menu item in the menu system. No problem. Set up the first screen layer exactly as illustrated above, except instead of setting the kcmd pointer to "forward", set it instead to some other routine. That routine will be called every time a kcmd is generated (and not already handled by a higher screen layer). Your routine can check to see if it is your special command and do something if it is, and if it isn't it can then pass it on to Toolkit. Boom, the app can inject event handling outside the context of Toolkit that easily.

Here's a more dramatic example. Let's say you want to use Toolkit, but you want to have a dedicated area of the screen, say the bottom 5 rows, completely free form and unmanaged by toolkit. No problem. When the init routine configures Toolkit's root view, you set its draw bounds so that it's height is just 19 rows high (25, minus 1 for the menu bar, and minus 5 for the free form area at the bottom) and aligned to the top. Toolkit will now never draw into that bottom area. Then you set the mouse routine of the screen layer to call your custom routine. That routine checks to see if the click is above the bottom 5 rows, if it is, call the Toolkit's mouse handler. Otherwise, proceed to analyze the mouse event however you please to work with whatever you're doing in the free form area.

A third and final example. A weird example, but just to show the possibilities. You want a button in the Toolkit UI to lock the screen for 5 minutes after you've clicked it. So you create that UI with Toolkit and create the button and wire the button to call a routine on click. That routine configures a system timer for 5 minutes (18000 Jiffys). Then, in the screen layer rather than merely forwarding to toolkit, you check to see if that timer countdown variable is zero. If it's not zero then you return without forwarding any events to Toolkit. If it is zero then you forward to Toolkit as usual.

What you want is the ease of being able to rely on the Operating System for all the things that it's good for, and you want the ability to do something custom whenever you want without having to fight or hack the system.

The Event Loop

After the app has been initialized, it has presumably pushed a screen layer onto the screen layer stack, and configured whatever UI it wants to set up. The initializer returns to loadapp which promptly JMPs to eventloop. The Eventloop will therefore never return to the loadapp routine.

The IRQ service routine is still firing every 60th of a second, the mouse has been enabled, so mouse clicks are producing mouse events, and the key presses are producing keyboard events. The main event loop loops infinitely performing the same steps over and over.

Screenshot of code from evtloop 1 Screenshot of code from evtloop 2 Screenshot of code from evtloop 3

When an app quits, somehow the event loop needs to exit, and not in a way which involved code at the end of a JSR, otherwise we'd have a stack leak. Instead, there is a loop break vector (loopbrkvec). When loadapp enters the event loop after loading an app, it enters at a small prelude that clears the loop break vector, before proceeding to step 1. However the loop loops back to step 1 skipping the prelude that clears that vector. Near the end of the event loop, step 5, checks to see if the loop break vector is set. If it's not set, it proceeds to step 6 and back to step 1 and continues on. If the loop break vector is set however, it does an indirect JMP through that vector. Therefore, it truly leaves the event loop never to return. This is what will be used for doing a proper application quit, however I haven't implemented that yet.

The Event Loop then performs Step 1, Step 2 and Step 3. They have a parallel structure so it's easy to explain. They handle the three primitive event types: Mouse, Key Command and Printable Key. First a JSR to the appropriate routine in the input module checks for the presense of an event on the queue. If the carry is clear, there is an event on the queue. If the carry is set it simply moves on to the next step.

If an event is on the queue it JSR's to proclayers. Proclayers is what controls the fact that the topmost layer gets a shot at the event first. If it's a mouse event, it calls the mouse callback for the topmost layer. This routine does whatever it wants. If it returns with the carry set, then the event has been handled and proclayers is finished and returns. If the carry comes back clear, proclayers loops to the next layer down, checks to see if its mouse routine is not null and calls that. This allows higher layers to always get precedence. And higher layers have the ability to deny the lower layers from ever even having a shot at the event. This can be used for model dialogs, and is also what allows, say, the menu system to prevent a click event from passing clear through the menu and into the Toolkit UI below it.

After proclayers returns, the event is immediately dequeued. This process repeats in step 2 and 3 handling keyboard events the same way. Key commands for example are routed to Screen Layer 4 first, which allows the menu system to pick them up. If the menu doesn't handle them, then they make their way to the lower screen layers, and the app can handle the key command outside the context of the menu system. The menu system, if the key command matches a menu keyboard shortcut, calls the Application's mcmd routine, as we saw configured in the Inside the Application section above.

The handling of events however should never cause the application to try to redraw itself to screen. If an event causes the visual state of the application to become out of date, the application should call the markredraw routine of the screen module. This sets redraw flags for the current layer handling the event. More on redrawing in a moment.

Step 4 of the event loop handles checking for an expired timer. The timer system is two—fold. There is a timer update vector, and a timer check vector. The timer update routine is called by the IRQ, and so it is called very reliably 60 times a second. When the timer ellapses however, that's not when the resultant code for the elapsed timer gets called. The Event Loop's execution time is less reliable, because event handlers could be poorly behaved and fail to return to the event loop in a timely fashion. Eventually though, when the event loop is executed, step 4 calls the check timer routine. If the timer is elapsed the check timer routine is free to call whatever code should execute as a result. Including possibly resetting the timer.

Step 5 of the event loop calls redraw. This redraw routine doesn't just immediately start trying to draw something. It checks to see if the mark for redraw flags are set. If they are, then it calls the draw routines on the layers starting with the lowest layer first. So the application's first screen layer push has its draw routine called. In our example this was set to forward to the Toolkit's redraw.

If the toolkit does something it shouldn't do, such as draw into the top menubar, this will be overdrawn almost immediately. Because the main redraw routine will call the draw routine on the next highest screen layer. This also allows the toolkit to trigger redraws as the result of a timer, even when the user has a menu open. The toolkit may then refresh some part of the screen that is hidden under the open menu. After it does its draw, the next highest layer gets to draw, and eventually layer 4 gets to draw, and the menu will be redrawn above the updated toolkit view.

There are some tricks I'm working on to make this fairly efficient. When the menu system is due to start drawing, for example, it will draw the lower three layers from scratch, and then buffer them to a memory area stashed under I/O. Then, unless the lower layers have their redraw bits set, it will refresh the screen by copying from the buffer, before redrawing the menus above that. This will allow the menus to be drawn above a complex Toolkit UI very quickly. There are some other tricks I'm working on as well. The Toolkit will keep track of which View triggered the screen layer to be dirty, and when it is asked to redraw, it can redraw just that one view. This will allow, for example, a user to type into a text field, and on each key stroke only that single text field will continually redraw itself, rather than the whole screen.

After the redrawing phase is complete, step 5 also handles drawing the CPU busy animation. It starts by resetting the animation character to the default state. Then it sets the jiffy time into a system variable indicating when the event loop last went through step 5. The IRQ service routine, which updates the Jiffy Clock, compares the current Jiffy Time to when the Event Loop last ran, and if more than a couple of seconds has passed, the IRQ service routine starts to replace the CPU busy character with the next frame of an animation set. Because the IRQ runs independently of the Event Loop, if some application code that handles an event decides it is going to do something for an extended time before returning to the event loop, the event loop will be unable to update the event time variable, and the CPU busy animation will automatically start ticking. This is, of course, subject to the system draw flags mentioned at the very beginning. An application is free to disable the CPU busy animation. But, when the next application is loaded, it will be automatically re–enabled.

Before step 5 finishes, it checks the loop break vector, as mentioned earlier.

Step 6 has one job. JMP to step 1! And that is pretty much all that the Event Loop does.

Last Comments

That's the basic start and life cycle of an application in C64 OS. It doesn't go into much detail about how drawing really works, or how to make a UI with Toolkit, or how an application will be quit and its memory deallocated. But, it does show how, starting with the Booter, the first application gets itself running and where it has the opportunity to set up its UI, and how it will respond to events.

Thanks for reading. As always, leave comments, questions and thoughts below.

  1. I talk a bit about file references in C64 OS here. I promise that I'll go into more detail on file references in the next post, unfortunately as of this writing, I haven't done that yet. []
  2. 230 characters maximum length for a path string. With 16 character folder names, plus delimiter, we get 230 / 17 = ~13.5 folders deep. If the folder names were all just 1 character, plus delimiter, we get 230 / 2 = 115 folders deep. Or an average of 8 characters, plus delimiter, for 230 / 9 = ~25.5 folders deep. I think this limitation is very acceptable for a C64. []
September 26, 2017Hardware

New C64c Cases Available

I am very excited! I began this blog, almost a full year ago, and 42 posts ago. That's an average of about 4 a month, or one a week. I'm pretty pleased with my ability to consistently add new content to this site. But that's not why I'm so excited. My inaugural post was a mere 58 words. I posted about my eager anticipation of the new C64c cases in a variety of awesome retro color schemes. Here's my original, very first post in its entirety:

These new cases look amazing. I can't wait to get my hands on them.

This is a perfect example of what's great about the C64 in 2016. New hardware is still being manufactured, that looks stylish and new, but true to form at the same time.

I've heard from Jens Schönfeld, personally, that these cases are almost ready.

It took almost a year, but these cases have just recently become commercially available. I should probably add them to the Commodore 8-Bit Buyer's Guide. Feast your eyes on this gorgeous sight.

A beautiful box for your new C64c case
A beautiful box for your new C64c case

Close up, angled shot of new C64c case, SX-64 Style
Close up, angled shot of new C64c case, SX-64 Style

Full frontal of the new C64c case, SX-64 Style
Full frontal of the new C64c case, SX-64 Style

You can order them today from Pixelwizard Retro Shop: https://shop.pixelwizard.eu/en/commodore-c64/

These beautiful C64c cases come in 4 styles:

  • Breadbin Grey
  • Classic Beige
  • Retro Black
  • SX-64 Style

I've got not shame in confessing that to my eyes SX-64 Style is the sharpest. Either for reasons of manufacturing difficulty or simply to shape demand, the SX-64 Style case is also 10 € more than the other three.

This brings me to the price. They're not cheap throw–aways! But, I'm a nerd, you're a nerd, and we both know these precious jewels are worth the dough.

They're 59 €, plus 5 € for an additional (recommended) shipping box. Plus shipping fees, which vary depending on how many you order and where they're shipping them to. The SX-64 Style's base price is 69 €. So, all told, for the SX-64 Style I'll be looking at maybe 89 €, which is 132$ CAD.

It's the perfect home for a C64 Reloaded MK2, which is currently available only as pre-order, for 184,95 €. I look forward to doing a write–up about this too, when it becomes available.

What a fantastic time to be a C64 user!

September 18, 2017Programming Theory

Organizing a Big Module

Let's start with a description of the Toolkit drawing system.

As should be evident from my last post, Toolkit Introduction, I've been hard at work (during whatever spare time I can manage) designing and building out the Toolkit module of C64 OS. I spent the weekend (re)implementing two of the biggest components of the view drawing system which I haven't even had a chance to discuss yet here. To be honest, I've been avoiding discussing it, because it's been so in flux. But, given that this is a blog about my progress, maybe I should be talking more regularly about the thoughts I've been having and the dead ends I've hit and the things I've tried. Drawing, in general, is a complex topic. So it's really hard to cover in a single post. I will eventually dedicate an entire post just to talking about how the drawing system works.

In brief, there is a global structure, maintained by the screen module, called the draw context. It defines the screen and color memory origins (2 byte pointers each), the width and height, 1 byte each because they're in character cells, and these define a rectangular region on the screen. The draw rect is always an on–screen area measured in character cells, so width and height don't need to be 16-bit. 40 columns and 25 rows is well below needing 16-bits. Additionally, the draw context has two more values, offset top and offset left. These are both 16-bit and represent the scroll offsets of the draw context.

The scroll offsets are what make things particularly complex, but as far as I can tell they're necessary. They are what allow the hierarchy of nested views to be scrolled. And they have to be 16-bit otherwise nothing would be able to scroll more than just a few screens.

Every view has the ability to draw itself, and when it does it uses the details in the current draw context to know how much space is available, where it starts in memory, and how offset its origin is from the virtual origin of the view itself (those are the scroll offsets). Typically a view like a button, a label, a checkbox or whatever just have to draw themselves. Views do have the ability to have children, of course, but the leaf views don't have children. A button, for example, even though it inherits node properties (including a child pointer) will never actually have children. In fact if you instantiate a view and assign it as the child of a button, that child will never be drawn, because button's draw routine does not attempt to pass drawing control to any children, because it assumes it doesn't have any.

The View view, aka the root class from which all the other views descend, implements a draw routine which on its own doesn't need to do much except clear the rectangle defined by the draw context. But View is also the main container for laying out children. So, if you want to have a UI that puts two scroll views side by side (or one above the other) for a sort of two-up dual pane, those two scroll views are siblings of each other, and they share a parent. That parent is most likely View. View has a special feature of its draw routine that recursively calls draw on each of its child views. This logic is quite complex, so one wouldn't want to implement it twice, as you'll see very shortly.

There are two other container views, Scroll view and Split view. Scroll view is effectively just View but with scrollbars that can be interacted with to change the offset top and offset left properties of its own draw context. And split view maintains two children and an interactable control for changing the draw context's height of its two children (if horizontal split) or width (if vertical split). But these two container views do not reimplement View's draw logic. They just set the draw context in a custom way, set a couple of custom pointers, and then call View's own draw routines. And that is a big benefit of object oriented code.

When View is recursively walking through its children there are actually three steps it needs to go through. And two of these are what I was working on finishing up this weekend. They are:

  • Resize Node
  • Bounds Check
  • Recontext

A resize occurs when the width or height metrics of a view changes. A global resize flag is set and a view's size is recomputed. This has down stream effects, because views can be anchored in such a way that their size changes as a result of their parent changing size. A resize is usually caused by interaction with a split view but view metrics can also be changed programmatically. If they are one simply has to set the global resize flag manually. There are some efficiencies baked into how this works so that not every view has to have its size recomputed, and one of the flags of the view_rsmask is used to help determine if the view needs to be resized.

Next a bounds check is performed. The containing view, before recursing to its next child uses the metrics of the child and compares against the draw context to determine if any part of the child will be visible on screen. If the scroll offsets are set in such a way that the child is scrolled out of view, then the parent simply skips over this child and moves on to the next.

Recontext is a step that will take me a moment to unpack. Each view, when it draws itself, is drawing itself into a global 10-byte draw context, as described earlier. However, that global draw context changes as the system recursively moves through the view hierarchy. Later on, let's say you mouse down on a button, or you type a character when an input view is in focus, in such a case only that one singular view is updated and needs to redraw itself, (to highlight the button, or insert a character, etc.) But the global draw context is no longer relevant to the view that needs to redraw. We could redraw the entire screen but that would be much too slow. Instead, each view maintains a copy of the draw context, as it was when the view last drew itself.

Those contexts become out–of–date, however, if there is a scroll or resize. So another global flag indicates if a recontext needs to happen. The recontext routine takes the current draw context, for the parent or containing view, and modifies it (it always either stays the same or gets smaller, it cannot ever become bigger) according to the offset and size metrics of the child. The child then backs up that new context onto itself. And lastly it draws itself. When the recontext flag is not set, each time a view is to be drawn the global context is set from the copy of the context on the view, which is much less computationally expensive than the recontext logic. In the event that a single view has to be redrawn, the context is simply copied from that view to the global context and then its draw routine is called as normal. It has no idea it isn't being called as part of an entire screen refresh.

This is very cool stuff. I'm having lots of fun. This is easily my favorite type of code to work on.


The devil in the details, the problem of complex code

So the above was as quick an overview as I can give about how drawing works without getting into the nitty gritty technical details. But I feel it was necessary to give at least this level of detail on how it works to understand the level of complexity being worked with.

Last night, after midnight, as I was getting tired and ready to wrap up, I got the dreaded error message:

LABELNAME OVERFLOW!

The last time I went through this and I asked about it on IRC, everyone seemed to nod in agreement that the time is now right for me to switch to cross assembly. As opposed to the coding native on a C64 (actually a C128) that I've been doing up to this point. A C64's limited memory and computational capacity puts limitations not just on what can be run, but also what can be coded. There is a limit to how many labels can be used. Assembly programming labels stand in for constants, and for memory addresses that will only resolved at assemble–time.

When I first started learning 65021 (coming up on a year ago next month) I was so wet behind the ears that I just started coding everything into one big file. It didn't take me more than a couple of months before I encountered my first Labelname Overflow. I resisted then the temptation and near universal advice to switch to cross assembly, by coming up with an ability to break the project into more manageable modules. This solution and its subsequent refinements to make it more workable are well documented across several posts:

Now I've hit this problem again. Except instead of the entire C64 OS project being too big and unwieldy, just the Toolkit module itself has become too big and unwieldy! It's a similar situation but I think it needs a different solution. And I don't yet know what that's going to be.

Here are some stats. The main source file, toolkit.a, is 56 blocks. A block is 256 bytes, minus the two byte link pointer. So a rough measurement in kilobytes is to divide block count by 4. Toolkit is therefore about 14 kilobytes. But that's the source code, which is full of comments so I don't forget how all this stuff is supposed to work. Assembled, last I assembled it, was about 7 blocks. Or, less than 2K. Less than 2 kilobytes, on a machine with 64 kilobytes of ram. Toolkit's code is under 3% of total available memory and I'm already overflowing the labels! How does anyone write a game that fills or nearly fills the C64's memory?

I think the answer to that question is that they either cross assemble, or huge regions of memory are dedicated to sprites, graphics, music and level data, leaving a much smaller area of memory for the game's engine code in the first place. Or the parts of the game are divided up manually into areas of memory where the connections between the parts can be hardcoded.

As an operating system, C64 OS obviously should try to take up as little memory as it can, leaving as much free memory for the application and its data as possible. This should be on my side for not running into label overflows. But I've also got at least two things working against me.

Toolkit is without a doubt the largest and most complex module of the bunch. But furthermore, because it is object oriented, it is also the most label heavy. I mean, the definition of the view class is effectively just a long list of all the labels that represent offsets to its properties. This problem is exacerbated by another interesting limitation.

Macros are super handy for not having to type everything out long form. They're particularly useful when you're doing lots of 16-bit math and pointer manipulation. But when you call a macro, all of its arguments have to fit on one 40 column line. The macro name automatically gets indented 8 spaces. So, after a macro name that's 8 or so characters, plus spaces, commas and the # symbol, you end up with only 22 or so characters for the names of the arguments. If any one of the names of the arguments exceeds 7 characters, there is suddenly not enough room on the line for just 3 arguments.

I've yet to write a macro that needed 4 arguments, but 3 is a common pattern. Many of the label names for properties are like this: view_draw, view_kcmd, view_kprnt etc. Even short names like this are 9 characters! Now imagine a macro call like this (the preceding white space is part of the line length!):

        #setobj16 this,view_kprnt,view_kprnt_

That is a 45 character line. Aka, it's impossible to type out. And macros (in Turbo Macro Pro+reu) cannot have their arguments spill onto a second line. One way to get around this is to define some temporary labels: vkp = view_kprnt and vkp_ = view_kprnt_. If you do this on the line above the macro call it's close enough in the code that the call remains legible... but you've just blown two more labels on nothing but making a macro call possible.

Another problem that Toolkit faces is that it seems to be the one module that leans most heavily on the resources of almost every other module. It needs math, string, memory, screen, service and input. (Math for 16bit divides and multiplies plus 16-bit macros, String for character conversion and length measurements, Memory for allocating and freeing space for new objects, Screen for the draw context system, Service for environment variables for system colors, Input for reading event objects.) Toolkit is a monster that seems to need to depend on bits and pieces of at least 6 out of only 9 or 10 modules total. Reading in header files and constants files bumps up the label count some more.

It's a tough problem.

But I'm not giving up. And I'm not caving in and switching to cross assembly. The first time I hit this problem I worked out a great solution. And I'll find a solution to this problem too. I have a few ideas in mind that will ease the pressure:

  • Move some big and label–heavy routines (boundschk, resizenode, recontext) out of Toolkit and into a smaller but related module (like Screen).
  • Shorten object property labels: view_... to vw_... or even v_...
  • Reduce the number of property labels by joining some properties conceptually. width and height could be a 4-byte dimension property, accessed as v_dim+2 etc.
  • Split some long external includes into multiple include files so one can be included without including and incorporating the labels from the others.
  • Unmacro a few things that really don't need to be macro'd.
  • Use hardcoded offsets for short-distance branches, rather than highly localized labels like next, skip or loop.

If I do all of the above, I should be back in business for some time to come, and may even be able to get most of Toolkit completed within the remaining constraints. If the above is not enough and I encounter Labelname Overflow again, I could try to split the Toolkit classes into separate files. I would like to avoid that, however, because there is definitely overhead, both memory, execution and organizational.

Thanks for reading. Until next post.

UPDATE: September 19, 2017

Last night I got to work implementing my remediation plan above. But first I decided to do a manual count of how many unique labels are in use by Toolkit. My rough counting came very close to 256, so I'm going to guess that's the key number, for obvious reasons.

On the one hand it feels like quite a few labels, but considering I'm able to use 30 or so in just one single routine, it feels like a cripplingly small number. Fortunately, to my discovery, recontext, resizenode, and boundschk are all called, conditionally, but only one other routine, drawchildren. I am going to move all 4 routines, lock, stock and barrel, to the screen module. And I only have to expose one new jumptable entry. drawchildren. Draw Children was actually something I'd already factored out of View's main draw routine, so it can be called easily by other container views.

  1. I dabbled with 6502 ASM over the course of my long history as a Commodore user. But I never made anything of any substantial complexity. Whatever I knew I forgot and had to completely relearn. When I was programming apps for WiNGs, it was 99.9% in C. []
September 11, 2017Programming Theory

Toolkit Introduction

A few housekeeping issues to cover up front. I want to say, welcome back, because it has been an unusually long while since my last blog post. I absolutely do not intend for this to become a habit. A few things have conspired against me to make this post a while in the making.

It is the late summer, so I had a bit of a holiday, which I thoroughly enjoyed. It took me and my kids and in-laws to a cottage and away from the internet for a few days. But on the whole it didn't keep me away from my c64, as we'll get into in this post. This website suffered some down time during the past week. I believe it was down for approximately 4 days. I wish to extend my sincerest apologies for the outage. We were actually without internet service for the entire labor-day weekend. This is annoying for me, no doubt, but it also happens to be annoying for you too, if you're trying to get to one of several sites that I host locally.1 And as is inevitable when the kids go back to school they brought home novel illnesses that have kept me away from work for a couple of days.

On the bright side, I've been hard at work on C64 Luggable. The documentation of which is coming along quite well. And I've taken many many photos of more recent work on the project than I have yet had time to document. And I've been working away on new additions for the Commodore 8-Bit Buyer's Guide. I have well over 35 (!) new products, parts and components that I will be adding to the guide. Including the catalog items from Poly.Play and Retro Innovations, and a number of independents from AmiBay.org, Lemon64, and ebay.

And let's not forget all the work I've been doing on C64 OS too. It's been a really fun adventure so far. I am continuing to learn about 6502 coding along the way. The Toolkit, as we'll see in this post, involves a lot of 16-bit math, and much of it on values accessed through object pointers, as I began to discuss in that post. I'm sure I'll get into more of that as I continue to discuss the Toolkit.


What is a toolkit?

An operating system is made up of many parts that take care of all sorts of tasks for the application developer. Memory management, abstract file access, string manipulation, input drivers, networking, etc. The toolkit is the part of the OS that helps an application build its user interface. The concept of a toolkit is probably the component that is the most absent from the C64's (and PET, Vic-20 and C128's) built-in operating system. GEOS, on the other hand, offers services for producing menus, both horizontal and vertical, buttons, dialog boxes, actionable icons, single line text input, and well, that's about where its toolkit ends. But that's a big step up from nothing. The toolkit-like features of GEOS are what help to give every GEOS application a standard look and feel.

On Linux, which runs on PCs with much more memory and processing power, toolkits can be dynamically loaded and different applications based on different toolkits can all be running side–by–side. This is actually detrimental to the Linux desktop experience because not all apps feel as though they belong to the same environment. On Windows or macOS, which have a dominant vendor–supplied toolkit, virtually all apps on those platforms use the standard toolkit and consequently feel much more like they belong together.

Because the C64's OS (Kernal and Basic rom) do not provide services for producing a user interface, everyone produces there own. Amongst the chaos there are a few OSes, the applications for which usually (but not always) feel as though they belong to that OS. C64 OS will fit into this latter category.

What does the C64 OS toolkit do?

Unlike on Linux where multiple different toolkits can be and often are available to applications simultaneously (GTK, QT, See: Wikipedia Article on Cross-Platform Toolkits), the C64's small memory size means it is only practical for there to be one toolkit available at a time. And in most cases that toolkit is tightly integrated with the other features of the OS anyway. There is no way, for example, to swap out the UI drawing features of GEOS.

The Toolkit in C64 OS is a module, but its location in memory is fixed, and the objects it produces are designed to interact with other services of the OS. For example, it gets its events from the main event loop in the screen module. The structure of the events it expects are built by the updatemnk routine in the input module. It makes calls to allocate memory for itself to the memory module. All the while drawing itself with a drawing context system also provided by the screen module. The essential behavior of an application is to link its functionality to the actions of toolkit objects that interpret the flow of events being generated by the input devices. This is also what we mean by program flow being event–driven.

The Toolkit is object oriented. I began to discuss how one can go about writing code following an object oriented design pattern in 6502 in a previous post, Object Orientation in 6502. In that post I began to talk about the Toolkit in order to have examples. Object oriented code is, by definition, structured as a hierarchy of interrelated classes. Sub-classes descend from other classes and inherit and extend their functionality. In rich modern UI Toolkits, such as the Cocoa framework of macOS, or Cocoa Touch of iOS (its little brother), there are literally hundreds of classes. And each class has hundreds of methods (related functions which operate on the object's own properties). UIButton, in Cocoa Touch, for example, has 32 methods. But these do not include the 21 methods it inherits from UIControl, nor the ~164 (!!!) methods it inherits from UIView.2 And so on up the inheritence chain to UIResponder and UIObject. Even declaring this number of methods would overflow 64K of memory before we got around to implementing anything. And a toolkit is only one part of what it takes to make an operating system.

Needless to say, the C64 OS Toolkit is very trimmed down compared to what we might otherwise bemoan as the endless bloat of modern toolkits. But the principle is similar. Toolkit is a collection of classes, some of which descend from others. They work together allowing a programmer to create and assemble them to construct a flexible user interface that efficiently redraws itself, responds to user input, and calls back to application code when necessary to implement the specifics of the program. The Toolkit relieves the application developer from a huge burden of effort and results in more consistency across applications and enables rich functionality for free3 in the process.

As of the time of this writing, Toolkit consists of just 6 classes. These have already been briefly discsussed in the earlier post on Object Orientation in 6502. I have several other classes planned, checkbox will likely descend from Button, radio will likely descend from checkbox, and a multi-line text view is an absolute must–have, but I haven't yet figured out where in the hierarchy it will fit.

Object hierarchy of views

This lean (by modern standards) class hierarchy may look unimaginably small, but I think you'd be surprised how much UI complexity can be constructed out of such an essential core.

Sample UI screen shot, extruded with labeled parts

A sample of UI, extruded to show the nesting of views, and labeled.

All Toolkit classes descend from View. View provides several collections of properties and methods which make it a foundation of several types of functionality. We'll just dig right into those here.

Node Hierarchy

A view-based user interface is a hierarchical tree of nodes. Therefore, each class that participates in the UI needs to be a type of node that can connect to the others to allow application code and other toolkit built-in functionality to navigate the tree. The View class therefore provides node properties. And since all Toolkit classes descend from view, all Toolkit classes have these properties and are all therefore types of nodes. The node properties are as few as I believe it is possible to have and still have it work. Each view has a parent pointer, a first-child pointer, and a next-sibling pointer.

Don't confuse the node hierarchy with the inheritance hierarchy pictured above. In a real UI views will be nested within views, and buttons may be stacked one above the other or put inside scroll views. Labels will be put before inputs and so on. The inheritance hierarchy is hardcoded and never changes, but the node hierarchy of a UI could be totally different for every application.

Every node has exactly one containing parent node. The parent node pointer points to that containing node. There is always one root view, which usually fills the screen, but doesn't have to, and has no parent node. Its parent node pointer is null, ($0000), which allows code that navigates the tree to know when it has reached the root view.

Any node can theoretically contain multiple child nodes. However, only the View class has draw logic which is designed to deal with multiple children. Typically if you want a node to handle multiple children you would rely on the View's implementation because it's long and complex. Each node has a pointer only to a single child node. But if that child node is one of many children of the same parent, then the first child uses its next sibling pointer to link to the parent's second child. The second child can link to the third and so on. Each child points back to its parent even if it is not the first child. The last child of a parent is the last node in a chain of siblings, it has its next sibling pointer set null. This is how code can determine that it has reached the final sibling.

If a node's first child pointer is null, it has no children. If a node's first child's next sibling pointer is null, then the parent has only one child. And so on. These three pointers are enough to describe the entire tree and allows recursive code to navigate the entire tree. Navigating up the tree is very efficient because every node has a pointer directly to its parent all the way back to the root node with a null parent pointer. An advantage to one child pointer and sibling pointers is that we don't need to have an intermediate data structure such as an array to hold an ordered set of child pointers.

Sample UI node hierarchy

Above is visualized the node hierarchy of a very plausible C64 OS application UI. Note that the node hierarchy defines the structural relationship between the user interface objects, but alone it doesn't define where on the screen siblings will display relative to each other. That part comes in the next section about view metrics.

Here we see that there is a single root view, in the top row. Its parent pointer is null. It has three children, a Scroll view, and two more Views. But the root view only points directly to the Scroll view. The other two Views are linked horizontally as siblings to the Scroll view. The rightmost View in the second row is the last child of the root view, so its next sibling pointer is null. The Scroll view has one child, a multi-line text view. The middle View has four children, two Labels and two Inputs. Presumably these would be laid out on screen such that each Input has one Label. The final View has three children, three Buttons.

Layout Metrics

The node properties are merely structural. In addition to structure each view needs to know where it should position itself. The positioning of a node is always relative to its parent. In modern UI Toolkits, I'm most familiar with Cocoa and Cocoa Touch from macOS and iOS respectively, views support special layout constraint objects. This allows a node to align, position and size itself relative to not just its parent but to its siblings as well. While this does enable the production of incredibly flexible and responsive layouts, C64 OS's Toolkit won't implement anything like that. Firstly, it's far too complex for the C64's memory constraints, and secondly, the size and orientation of a C64's screen has been what it is since 1982. There would be little advantage to having a complex system of constraints meant to adapt a UI flexibly to a variety of different screen sizes and orientations.

Each view must be able to describe its size and position, relative to its parent. To handle this the View class provides several metrics properties, which all other Toolkit classes inherit from View. These properties are:

  • view_top
  • view_bot
  • view_abot
  • view_hght

  • view_left
  • view_rght
  • view_argt
  • view_wdth

  • view_rsmask

First let's look at view_rsmask. This is a bitfield for flags that affect resizing behavior. At the time of this writing, the low nybble is well defined, but exactly how the upper nybble is being used is still in flux while I write the code so I won't talk about those yet. Here are the values for the lower four bits.

  • %0000 0001 — rs_ankt
  • %0000 0010 — rs_ankb
  • %0000 0100 — rs_ankl
  • %0000 1000 — rs_ankr

These flags stand for: Anchor Top, Anchor Bottom, Anchor Left, and Anchor Right. These declare which sides of the view have a fixed offset from its parent. These define which of the other 8 metrics properties are pre–defined and which are computed dynamically. So let's look at some examples of how this might work.

A view must have at least one vertical anchor, and at least one horizontal anchor. If no vertical anchor is set it defaults to top and no horizontal anchor defaults to left. When the view is anchored, say, to the top, the view_top property defines the distance (in 8x8 cells, not pixels) that the view's top edge sits down from the top edge of its parent. These values are all unsigned 16-bit, so a view cannot be offset negatively from its parent. A view can either be flush with the top of its parent or any offset down, upto 65535 text rows, from its parent's top.

If the view is anchored top, but not bottom, then its view_hght property must be set and is used to figure out how tall the view is. In such a case, vertically resizing the view's parent has no affect on its own height. The situation is similar if the view is anchored bottom but not top. The view_bot property holds the number of rows that this view's bottom edge is positioned up from the bottom of its parent. The view_hght is still relevant to determine how tall the view is and it is still unaffected by vertical resizes to its parent.

If the description is hard to follow, here's a visualization that should help.

Visualization of Top and Bottom anchoring

The anchor flags can be OR'd together, of course. So rs_mask could be: rs_ankt | rs_ankb.

Things get more complicated when a view is anchored both top and bottom. You can see how this works in the third example above. view_top defines how far its top edge is from its parent's top, and view_bot defines how far its bottom edge is from its parent's bottom. But when the view is anchored on both sides then when the parent is resized vertically the height of the view changes, tracking the height changes of its parent.

Whatever way the view is anchored, some of its properties get computed automatically. Let's start with view_hght. If the view is rs_ankt | rs_ankb, the view_hght is computed and set automatically. When it comes to actually drawing the view, what the drawing code really needs to know is the absolute top and absolute bottom offsets, from the draw origin, for where the view will render after any anchoring and positioning logic has been applied. As it happens the draw origin is at the top,left of any given on–screen rectangle. The reason for this is because of the way the VIC-II's memory addresses map to positions on the screen. The top,left corner of the screen is the smallest memory address. And the bottom,right corner of the screen is the biggest memory address. This is true for any arbitrary rectangle you draw on the screen. The top,left corner of that rectangle will always have the smallest memory address of any address within that rectangle.

The consequence of this is that view_top, is both the relative offset from the top of the parent, but it's also the absolute top of the view from the draw origin. This is not true of view_bot. view_bot is relative to the bottom of the parent. So the smaller the view_bot value, the lower that edge goes on the screen. A view_bot value of 5 says nothing about where the bottom edge of the view is going be relative to the draw origin. That's what view_abot is about. view_abot stands for absolute bottom, and it is always a computed property. If the view is rs_ankt, then view_abot is view_top + view_hght. If the view is rs_ankb, then the height of the parent has to be taken into account. view_abot is parent->view_hght - view_bot. And view_top is then computed as view_abot - view_hght.

Visualization of Absolute Bottom computed property

At the end of the day, no matter how the view is anchored, the resizenode routine (called internally, never manually), makes sure that all 4 properties are set correctly: view_top, view_bot, view_abot, and view_hght. After this point, the drawing code can completely disregard any positioning complexities due to anchoring and offsets. It simply draws the view between view_top and view_abot, and view_hght is readily available as a reference. Drawing is way beyond the scope of this post, however, and I'll have to return to it at a later date.

I intentionally limited myself in the above description only to the vertical sizing and anchoring properties. The horizontal sizing and anchoring work in exactly the same way. view_left is analogous to view_top. It is relative to the left side of the parent, but also is the absolute offset from the left coordinate of the draw origin. Therefore a view_argt (absolute right) is computed depending on how the left and right anchoring are configured. The logic here is exactly the same, just along a different axis. And of course, rs_ankl and rs_ankr are represented by different bits in the view_rsmask than rs_ankt and rs_ankb, so all 4 can be set independently and simultaneously.

One more brief note before moving on to a different topic. The above description does not even attempt to broach the issue of scrolling, and what effect scrolling has on the calculation of where a view's children will render, or how they get clipped if they are only partially visible. This is of course all taken into consideration in the design of Toolkit, but is way beyond the scope of this introduction.

As we'll no doubt see in concrete examples in future posts, I believe this anchoring and offsets system (which is based loosely on springs and struts which predate autolayout constraints in cocoa/cocoa touch), can be used to make user interfaces that are very flexible. Probably more flexible than anything else available on the C64, and yet simple enough to be eminently suitable for use in C64 OS.

Event Propagation

One of the main talking points I use when describing C64 OS is that it is event driven. Part of what this means has already been discussed in an earlier post about The Event Model. The IRQ service routine, which in C64 OS is implemented in the service module, updates the mouse and keyboard which converts user input activity into 3 queues of input events. Mouse events, Key Command events and Printable Key events. The mouse events are particularly relevant to Toolkit because the mouse cursor is passing over Toolkit views rendered on–screen, and the user is clicking on those views feeling him or herself to be interacting with them.

However, mouse events contain only: screen coordinates, key modifier flags and an event type. Somehow a simple event struct like this has to convert into meaningful interaction with the underlying on–screen display.

The Toolkit needs to be able to determine the target view, which is the first view that has an opportunity to do something with the event. And then each view needs a way to figure out how to pass notification about the existence of the event if it can't or doesn't handle it itself. This behavior is called event propagation.

Every class in the Toolkit needs to know how to, at a minimum, propagate an event to another view. And so for this reason the View class has a set of methods for event propagation. And every other view subclass automatically inherits the basic ability to propagate events.

To handle the first requirement, Toolkit has a hit test routine. This routine effectively walks the node hierarchy, recursively, searching for the uppermost view that is rendered under the screen coordinates where the mouse event occurred. This is actually easier than it sounds. All of a view's child views are constrained (clipped even, when drawn) to within the bounds of their parent. Therefore, if the mouse event does not fall within the bounds of a given view none of its subviews, or their subviews, etc. need to be checked. Siblings must be checked, however, and it is possible for two sibling views to overlap each other.

This introduces a small complication, because first children are drawn first, and their siblings are drawn later. This means if a late drawn sibling has view metrics that cause it to overlap with with an earlier sibling, it will render above that earlier sibling. What this means is that when doing hit testing, the walking routine must start with the last sibling and if the event did not occur within its bounds the hit test should then move to the previous sibling. Otherwise, sibling 1 could claim the hit, even though it is covered by a section of sibling 2 that the user believes he or she is clicking on. See the visualization below to understand how this works.

Visualization of hit testing priority on overlapping siblings

In the example above, first and second child are two children of the same parent. Their metrics are such that their bounds overlap. The red point represents where the user clicked. That coordinate is technically within the bounds of both children. However, because the first child renders before the second child, the second child draws itself overtop of the first child, where they overlap. Thus, when performing the hit test, it is necessary to test the second child prior to the first child.

Actually propagating the event is the second requirement, once a final leaf node (a node which has no children of its own) is found to be the target that node needs to get the event. In C64 OS, the event data itself is not passed, i.e. it is not copied in memory, to the target node. The node is merely notified that an event is targeting it. Toolkit does this by calling the appropriate event routine. The View class has 3 event handling routines, one for each of the 3 event types, mouse, key command and printable key. In our example of the mouse event, then, the Toolkit calls the target node's view_mouse routine.

This merely informs the view that a mouse event is available. If the view handles mouse events in some way, then it can call readmouse in the input module. This will give it the current mouse event details. I'll discuss this further in the responder section below.

By default, the View class is mostly just a generic superclass and container for multiple children. The default implementation of view_mouse is not to read or analyze the event details at all, but just to propagate the event to the next view. It does this simply by calling view_mouse on its own parent in the node hierarchy. This continues up the node hierarchy until a node either handles the event and returns without propagating it further, or the root node is encountered. The root node has no parent node to propagate the event to, so it simply returns.

Key events, both Key Commands and Printable Key events, are handled similarly to each other. They inherently do not contain screen coordinates, so they do not target arbitrary views in the node hierarchy. Instead, they target special views that claim keyboard focus. Only one view may claim focus at a time. Toolkit maintains a pointer to that view, if no view wants to be in focus, or if a view in focus wants to lose its focus (also lovely said to blur) then the root view is assigned as the focus view.

A lower level than the Toolkit, the screen module, which implements the main event loop, also divides the screen renderer into 4 compositing layers. The top most (4th) layer is fixed as the layer onto which the menubar (with CPU busy indicator, top level menus, and clock) and any pull-down menus are rendered. This causes them to always render above everything else. The lower three screen layers can be pushed and pulled by the application from a 3-layer stack. Typically the application's initialization routine pushes layer 1 onto the screen stack. Each screen layer struct has pointers to 4 routines:

  • sldraw
  • slmouse
  • slkcmd
  • slkprnt

During initialization, the application, if it makes 100% use of the Toolkit, can wire these four routines directly to the four equivalent routines of Toolkit. So that when the main event loop interacts with the low level screen compositor the Toolkit routines are called directly without needing the application to intermediate. The separation of screen compositor and Toolkit is to allow a more advanced application to completely forgo use of the Toolkit and handle the low level events manually (most useful for a game or other special purpose), or to share responsibility with the Toolkit. The application can get the low level events first, do with them as it pleases and choose when and if to forward them to the Toolkit.

That was a bit of a tangent. Regardless of how the Key Command and Printable Key events are handed over to the Toolkit, it directs them to the view in focus. And does this by calling that view's view_kcmd or view_kprnt routine respectively.

These routines, in their default implementation by the View class, do exactly what view_mouse does. They merely propagate the events, passing notification, by calling the same routine on their parent node, until something handles the event or the root node is encountered.

Responding

The last topic I'll cover in this Toolkit Introduction, is responding. As we saw in event propagation above, although the events are propagated through the node hierarchy, the default behavior, implemented by the View class, doesn't do anything. Except pass the notification to the next view which does equally little. The question is, how does anything get accomplished by means of these events?

All Toolkit classes descend from View, and so they inherit this do–nothing–but–propagate behavior for free. Some subclasses however are meant to respond to events. In some cases this response behavior is entirely internal to the class, requiring no special involvement from the application code. An example of this is an input field. An input view is meant to allow the user to input and edit the text that the field manages. If a mouse event is propagated to an input the input reads the mouse event details from the screen module. If its type is a click event, it tells the Toolkit that it should be the new focus view for all incoming key events. And it configures the system's cursor settings so the cursor begins blinking in the right place, and it returns without propagating the click event any further.

Input events also have a disabled flag, however, which can be configured by the application's code at runtime. If the input receives notice that a mouse event targets it, its first checks its own disabled flag. If it is disabled, it calls its superclass's implementation of this routine. This is very likely to be View's implementation, which as we already know, propagates the event to the input's parent view. Neither of these responses to the event necessitate calling back to the application.

Other subclasses of view, such as button, have both internal and external responses to mouse events. When the left mouse button goes down, it generates a leftdown event. This event is propagated, but when a button gets this event, it marks itself to be redrawn, and when it redraws it redraws with an inverted color. This is an entirely internal behavior, it requires no interaction with the application to appear to be highlighting in response to the user mousing down on it. Similarly when the button goes up, a leftup event is generated. (How mouse up events are routed is a bit complicated, and out of scope for this introduction). The button responds to a leftup by redrawing itself without the inverted color, thus appearing to no longer be highlighted.

Buttons are different than Inputs in at least one important way though, when the input was clicked it prepared itself to handle key events, which is an internal change. However, a button has no inherent, internal change as a result of being clicked. The application needs to be informed that the button was clicked. For this reason, the button subclass adds an additional but_clicked property. This is a pointer to a routine. When the application is being initialized, or at some later time in response to some other activity, the application sets the button's but_clicked pointer to a routine the application has implemented.

The button's own reimplementation of view_mouse receives all mouse events as part of the node hierarchy event propagation system. But button's implementation of this routine calls readmouse to get the event details. If it's leftdown or leftup it changes its highlight state as described earlier, but if it's leftclick it checks to see if its but_clicked property is set. If it is, it calls it, passing a pointer to itself in .X and .Y. And then returns without propagating the event.

The application's own routine is now running. It can, in the simple case, do some fixed behavior, something that should always be done as a result of this button having been pressed. Under more complex conditions, more than one button may be set up to call the same clicked routine. The routine needs to distinguish which button was clicked that triggered the call. It can do this by comparing the pointer to the button in .X and .Y with some reference the app maintains to the that button, or it can write the pointer to a ZP address and lookup and access properties on the button itself. This could be used to change the button's text, or change its hidden state, or change its layout metrics, or read its TK_ID to figure out by ID which button this is to figure out what to do next.


Okay, so that concludes this rather long blog post that I've been putting off writing for so long as the Toolkit module has been under such active design and development. In fact, just in writing this post it's given me some ideas about how to improve the inheritance model.

Inheritance was the topic I left off at the end of Object Orientation in 6502 with just a little preview. The truth is, I hadn't quite figured out in my head the full logic of how it would be possible, in your own application, to create custom subclasses of Toolkit classes. But I think I'm almost there.

So, expect that post to be coming sometime in the not too distant future, a part two to Object Orientation in 6502. And after that it will make sense to return to look in more detail at how some of the Toolkit views make use of their object orientation, especially when it comes to drawing themselves.

Lastly, I suspect large swaths of this post will eventually make it into the C64 OS technical documentation. Especially the images I put together to illustrate some of the ideas.

  1. Although it's neither the fastest nor the most reliable way to host a website, I host many websites on my own hardware and internet infrastructure, because it's something of an enjoyable hobby for me. I offload heavy assets to an AWS S3 bucket to distribute load and cut down on bandwidth costs. This doesn't help when we lose internet altogether as was the case this past week. []
  2. It's a bit tricky to tell the difference between a method and a direct access to a property in modern languages like Swift. The notation to read a property is exactly the same notation as to call a method. And the underlying implementation can be changed from a direct read to a method call at any time without changing the code that accesses it. 164 comes from my rough count in the documentation, and includes property accessors. []
  3. Every feature built into the toolkit classes are features that don't have to be explicitly implemented by the application. Yet they automatically avail themselves simply because the UI is built from these standard classes. Think about momentarily inverting the color of a button to provide feedback when it is clicked. That comes free in every C64 OS application. []
August 15, 2017Programming Theory

Organizing Module Layout

At the beginning of the year I wrote a post, Organizaing a Big Project. At that time, I had just split apart what had been a monolithic code file into a series of more managable modules. At the time of this writing they're documented in the C64 OS technical documentation as:

  1. memory
  2. petscii
  3. input
  4. string
  5. screen
  6. menu
  7. service
  8. file
  9. toolkit, and
  10. network

However, this list is in such flux that even though that documentation was updated just 2 weeks ago, this list of modules has already changed. I've merged petscii, (which only had 3 exports: asc2pet, pet2asc and pet2scr) into the string module which is likely to gain far more functionality in the coming months. Plus, I added a math module, 16-bit division and multiplication routines that started their life in toolkit due to their usefulness there, have been moved to their own module so they can be reused by other modules and by C64 OS applications. And lastly, the system jump table has been extricated from memory and made into its own module, thus making the memory module more of a peer to the others.

Suffice to say, the modules are in a state of flux. As I add new routines, optimize old routines and move routines from one module to another, the assembled object sizes of the modules keeps changing. However, they have to be packed together in memory, such that the end of one module is followed by the start of another module. If they overlap the module lower in memory will overwrite the first part of the module higher in memory that's already loaded in. Thus corrupting it and leading to instant and hair-pulling crashes. If there is space left between the modules, well, then that's just wasted memory that can never be allocated, found or utilized.

In addition to the problem of packing the modules together in memory, the main system jump table also has to know where each module starts, so that it knows where to find the module's table of vectors through which to jump. If that weren't enough, there is also the issue that the jump table itself is in a state of flux. With the addition of the math module, I just added mul16 and div16 to the middle to the jump table. Thus, the modules that consume the exports from other modules have to know where in the jump table to find the routine.

Each of these issues has a solution that I worked out at the beginning of the year. But how these solutions work has proven to be quite laborious for me to upkeep.


In order to pack the modules, I have to know how big each is. So I start with the first, the one highest in memory, and assemble it to somewhere arbitrary. The assembler tells me the start and end address of the object code. (And hopefully I don't get any phase errors, which seem to be fairly easy to produce if you use a label prior to defining it.) I write down that size. Then I go to the next module, load it up, assemble it and record its size. I do this for all 10 (or so) modules.

Then I pull out my calculator, a hex/oct/bin/dec converting Casio my brother helpfully gave me for christmas many moons ago. And I start with the last address plus one, so $CFFF+1, or $D000. And subtract the size of the first module to find where it should start, and I write that down. From that address I subtract the size of the next module, and write down its start address. Repeat this for all the modules. Eventually I end up with a table like this:

Module size and start address table

You can see in the image of my notebook that as I work on the code, I have to recompute the sizes and start addresses when it comes time to test my work. Sometimes, such as in the top middle column (second set of size/start columns), I've only worked on a few modules, so I can leave the others alone and just recompute the offsets for the few I worked on. Still, this process is painstaking and boring, and prone to mistakes leading to nasty bugs.

After I've calculated and written down this table of start addresses, I have to go back to each module, open its main file and manually set the initial start address. (*= $ce34) for memory, for example. Then I have to reassemble this module to an object file.

The system Jump Table is another issue. The Jump Table needs to know where each of these modules starts, so I open that module and update a set of labels with these new start addresses. Each jmp is an indirect jump through a vector found at the start of the module + the offset to the routine. If the module exports 5 routines, then it starts with a 10-byte table of five 2-byte vectors. And the Jump Table correspondingly has 5 entries.

  JMP (input+0)
  JMP (input+2)
  JMP (input+4)
  JMP (input+6)
  JMP (input+8)

That sort of thing. So every time a module grows in size, not only does it have to be reassembled, but every module lower in memory shifts down and has to be reassembled too. And then the jump table has to be reassembled also.

This is a big pain. But, let's not forget about how a module that wants to call an exported routine from another module finds the entry in the jump table. Each module has a .h header file that defines labels for each routine, and sets them to the position within the jump table for the module's block of jumps, plus the 3-byte offset. Such as:

  readmouse = inputbase+0
  mouserc   = inputbase+3
  deqmouse  = inputbase+6
  readkcmd  = inputbase+9
  deqkcmd   = inputbase+12
  readkprnt = inputbase+15
  deqkprnt  = inputbase+18

  etc…

Thus we have another file that needs to be updated. If a routine is added to a module that needs an entry in the system Jump Table, it will offset the jump table base address for preceding modules. They then need to have their header files updated… and any module that includes that .h file and calls one of those routines needs to be reassembled so they have the right jump table offsets.

Needless to say, it's easy to forget any of these many places that need to be updated. And it's hard to remember exactly which modules depend on which other modules. I end up having load the source code for each module and look through their includes to see if they include an affected header file.


I need a better way to work this, because the burden is so heavy that its discouraging me from wanting to work on the project. And it gets worse the more code I write and the more modules I add.

Finally, it occurred to me that all those calculations of offsets that I'm doing could actually be done by the assembler, if only it had some basic information:

  • How big is each module, and
  • How many routines does the module export

Instead of writing these numbers down on paper, I could put them all together into a header file to be included by everything that needs one of these numbers. Here's what it ended up looking like:

The modules.h header file, top The modules.h header file, bottom

The top half of the file declares all the variable data I need to provide, in one place. One label for each module (prefixed with x for exports), for how many routines are exported by the module. One label for each module (prefixed with y, because it's close to x), for the size of the assembled module. Unfortunately I still have to manually determine the size of a module, however this only needs to be calculated for the modules I'm working on, which need to be reassembled anyway.

The bottom half of the file is where all the magic happens. First, the highest thing in memory is the Jump Table. And as of just a few nights ago, this is no longer part of the memory module but is its own stand alone file. One label is defined for each module (prefixed with j for jump table). The start of the jump table entries for memory are $d000 (hardcoded as the last address + 1), minus the number of exports for this module times 3. Three because each jmp() takes three bytes.

This gives us a label jmemory that indicates where the jump table entries for memory begin. But this also becomes the relative start address for where the next module's jump table entries begin. Thus jinput starts at jmemory minus xinput*3. And so on, all the way through all the modules. These offset labels are computing automatically simply by adjusting the number of exports each module offers at the top of the file.

The memory module is now just a standard module, structurally the same as any of the others, with its own table of vectors to its exported routines. It's start address is going to be offset from the final (lowest in memory) jump table block. There is one label per module (prefixed with s for start address). The Jump Table itself is first module, but its start address is the same as whatever was the address the final jump table block. In this case it's jtoolkit simply because toolkit is the last module. Thus there is a label, sjumptbl that is set equal to jtoolkit. smemory, the start address for the memory module therefore is sjumptbl minus the size ymemory, the size of the memory module. And the next module starts from smemory minus its own size, and so on through the modules. Thus, modules.h is a central place where all the variable data goes, the size of each module and the count of its exports, and it produces a set of labels that are j- prefixed and s- prefixed for the start addresses of each module's jump table block and code, respectively.


How can this be used, now?

Each module includes modules.h. Then sets is own assemble address to its s-prefixed label. Toolkit for example includes modules.h and positions itself by declaring *= stoolkit.

Each module's .h header file declares the labels for its jump table entries as offsets from the start of its j- prefixed label. These declarations are always right and so the header files themselves don't need to be touched.

  readmouse = jinput+0
  mouserc   = jinput+3
  deqmouse  = jinput+6
  readkcmd  = jinput+9
  deqkcmd   = jinput+12
  readkprnt = jinput+15
  deqkprnt  = jinput+18

  etc…

And lastly, you have the jump table itself. The actual jmp() instructions still need to be written, because Turbo Macro Pro (the native version) doesn't support the code generating pseudo labels that TMPx provides. But where those JMPs jump to is computed dynamically by the s- prefixed labels simply by including modules.h. Thus:

  JMP (sinput+0)
  JMP (sinput+2)
  JMP (sinput+4)
  JMP (sinput+6)
  JMP (sinput+8)
Module size and export count table

I still need to keep track of some things on paper. My table now looks like what is shown in the image above. But I don't have to make any of the intermediate calculations. And I don't have to enter any of those intermediate values manually across a myriad of files. I just update the modules.h file with the new modules' sizes and export counts. Then I update the jump table itself with the additional entries. And reassemble to object files everything that would be affected by the changes.

It's still quite a bit of work if I make changes to a module up high in memory that pushes the others below it to new start addresses. But, like my recursive backup script it saves a huge amount of effort and takes a big burden off my shoulders, so I can spend more time coding and less time worrying about how to fit things into memory.


There is one last interesting benefit that has popped up. If I decide I'm going to start working on the string module a lot, because I'm building out the set of routines for string editing and manipulation, ordinarily I'd be in a bit of a tough spot. string is 3rd highest in memory, with six or more modules below it. I'd hate to have to reassemble 7 modules(!) every time I make string a bit bigger.

Well, actually I don't have to. It's easy peasy in the modules.h file simply to move the string module to the bottom of the pile. Its jump table entries can stay where they are, although it would be handy to build out the entries in the jump table, even with placeholders for routines I know will be there soon. There is nothing inherently special about having string be the 3rd module instead of the 9th. It was just bothersome having to redo all the math.

Now that modules.h does all that math, it becomes trivial to rearrange the modules. If I know I'm going to work on a specific couple for a few days, I can just move them to the bottom of the pile at the start of my session, and all of a sudden the cycle time for testing them becomes very short again.


Just a quick update on toolkit. It has been the primary focus of the last month or so of work. I've put a lot of thought and a lot of code into it. However, it's still so in flux in my head that I haven't wanted to commit to writing anything about it yet. But soon. I've just recently worked out what I think will be a big score in making the drawing of views and clipping them to the insides of their bounds rect much more efficient and easier to program. As soon as I get a slightly better handle on how well that's going to work out, I plan to write my first blog post discussing the general architecture of the toolkit. Stay tuned.

August 4, 2017Programming Reference

6502 / 6510 Instruction Set

Every good Commodore 64 programmer needs to have the 6502/6510 instruction set at his or her fingertips. There are already many reference texts like this out there, however I find all of them to be lacking.

It is my goal for the presentation below to be the fastest, easiest–to–use and best organized 6502/6510 instruction set reference text on the internet. If you have ideas for how I can improve it, please let me know in the comments.

Full credit to 6502.org for the original source of this content, which I have formatted, rearranged, styled and fixed the HTML.

Alphabetically Ordered

A:

ADC AND ASL

B:

BCC BCS BEQ BIT BMI

  

BNE BPL BRK BVC BVS

C:

CLC CLD CLI CLV

  

CMP CPX CPY

D:

DEC DEX DEY

E:

EOR

I:

INC INX INY

J:

JMP JSR

L:

LDA LDX LDY LSR

N:

NOP

O:

ORA

P:

PHA PHP PLA PLP

R:

ROL ROR RTI RTS

S:

SBC SEC SED SEI

  

STA STX STY

T:

TAX TAY

  

TSX TXA TXS TYA

 

Execution Times

Op code execution times are measured in machine cycles; one machine cycle equals one clock cycle. Many instructions require one extra cycle for execution if a page boundary is crossed; these are indicated by a + following the time values shown.

Notes Links


Bitwise Instructions

AND (bitwise AND with accumulator)

Affects Flags: S Z

MODE          SYNTAX        HEX LEN TIM
Immediate     AND #$44      $29  2   2
Zero Page     AND $44       $25  2   3
Zero Page,X   AND $44,X     $35  2   4
Absolute      AND $4400     $2D  3   4
Absolute,X    AND $4400,X   $3D  3   4+
Absolute,Y    AND $4400,Y   $39  3   4+
Indirect,X    AND ($44,X)   $21  2   6
Indirect,Y    AND ($44),Y   $31  2   5+

+ add 1 cycle if page boundary crossed

EOR (bitwise Exclusive OR with accumulator)

Affects Flags: S Z

MODE          SYNTAX        HEX LEN TIM
Immediate     EOR #$44      $49  2   2
Zero Page     EOR $44       $45  2   3
Zero Page,X   EOR $44,X     $55  2   4
Absolute      EOR $4400     $4D  3   4
Absolute,X    EOR $4400,X   $5D  3   4+
Absolute,Y    EOR $4400,Y   $59  3   4+
Indirect,X    EOR ($44,X)   $41  2   6
Indirect,Y    EOR ($44),Y   $51  2   5+

+ add 1 cycle if page boundary crossed

ORA (bitwise OR with Accumulator)

Affects Flags: S Z

MODE          SYNTAX        HEX LEN TIM
Immediate     ORA #$44      $09  2   2
Zero Page     ORA $44       $05  2   3
Zero Page,X   ORA $44,X     $15  2   4
Absolute      ORA $4400     $0D  3   4
Absolute,X    ORA $4400,X   $1D  3   4+
Absolute,Y    ORA $4400,Y   $19  3   4+
Indirect,X    ORA ($44,X)   $01  2   6
Indirect,Y    ORA ($44),Y   $11  2   5+

+ add 1 cycle if page boundary crossed

ASL (Arithmetic Shift Left)

Affects Flags: S Z C

MODE          SYNTAX        HEX LEN TIM
Accumulator   ASL A         $0A  1   2
Zero Page     ASL $44       $06  2   5
Zero Page,X   ASL $44,X     $16  2   6
Absolute      ASL $4400     $0E  3   6
Absolute,X    ASL $4400,X   $1E  3   7

ASL shifts all bits left one position. 0 is shifted into bit 0 and the original bit 7 is shifted into the Carry.

LSR (Logical Shift Right)

Affects Flags: S Z C

MODE          SYNTAX        HEX LEN TIM
Accumulator   LSR A         $4A  1   2
Zero Page     LSR $44       $46  2   5
Zero Page,X   LSR $44,X     $56  2   6
Absolute      LSR $4400     $4E  3   6
Absolute,X    LSR $4400,X   $5E  3   7

LSR shifts all bits right one position. 0 is shifted into bit 7 and the original bit 0 is shifted into the Carry.

ROL (ROtate Left)

Affects Flags: S Z C

MODE          SYNTAX        HEX LEN TIM
Accumulator   ROL A         $2A  1   2
Zero Page     ROL $44       $26  2   5
Zero Page,X   ROL $44,X     $36  2   6
Absolute      ROL $4400     $2E  3   6
Absolute,X    ROL $4400,X   $3E  3   7

ROL shifts all bits left one position. The Carry is shifted into bit 0 and the original bit 7 is shifted into the Carry.

ROR (ROtate Right)

Affects Flags: S Z C

MODE          SYNTAX        HEX LEN TIM
Accumulator   ROR A         $6A  1   2
Zero Page     ROR $44       $66  2   5
Zero Page,X   ROR $44,X     $76  2   6
Absolute      ROR $4400     $6E  3   6
Absolute,X    ROR $4400,X   $7E  3   7

ROR shifts all bits right one position. The Carry is shifted into bit 7 and the original bit 0 is shifted into the Carry.


Program Counter

When the 6502 is ready for the next instruction it increments the program counter before fetching the instruction. Once it has the op code, it increments the program counter by the length of the operand, if any. This must be accounted for when calculating branches or when pushing bytes to create a false return address (i.e. jump table addresses are made up of addresses-1 when it is intended to use an RTS rather than a JMP).

The program counter is loaded least signifigant byte first. Therefore the most signifigant byte must be pushed first when creating a false return address.

When calculating branches a forward branch of 6 skips the following 6 bytes so, effectively the program counter points to the address that is 8 bytes beyond the address of the branch opcode; and a backward branch of $FA (256-6) goes to an address 4 bytes before the branch instruction.

Branch Instructions

Affect Flags: none

All branches are relative mode and have a length of two bytes. Syntax is "Bxx Displacement" or (better) "Bxx Label". See the notes on the Program Counter for more on displacements.

Branches are dependant on the status of the flag bits when the op code is encountered. A branch not taken requires two machine cycles. Add one if the branch is taken and add one more if the branch crosses a page boundary.

MNEMONIC                       HEX
BPL (Branch on PLus)           $10
BMI (Branch on MInus)          $30

BVC (Branch on oVerflow Clear) $50
BVS (Branch on oVerflow Set)   $70

BCC (Branch on Carry Clear)    $90
BCS (Branch on Carry Set)      $B0

BNE (Branch on Not Equal)      $D0
BEQ (Branch on EQual)          $F0

There is no BRA (BRanch Always) instruction but it can be easily emulated by branching on the basis of a known condition. One of the best flags to use for this purpose is the oVerflow which is unchanged by all but addition and subtraction operations.

A page boundary crossing occurs when the branch destination is on a different page than the instruction AFTER the branch instruction. For example:

SEC
BCS LABEL
NOP

A page boundary crossing occurs (i.e. the BCS takes 4 cycles) when (the address of) LABEL and the NOP are on different pages. This means that

CLV
BVC LABEL
LABEL NOP

the BVC instruction will take 3 cycles no matter what address it is located at.


Compare Instructions

CMP (CoMPare accumulator)

Affects Flags: S Z C

MODE          SYNTAX        HEX LEN TIM
Immediate     CMP #$44      $C9  2   2
Zero Page     CMP $44       $C5  2   3
Zero Page,X   CMP $44,X     $D5  2   4
Absolute      CMP $4400     $CD  3   4
Absolute,X    CMP $4400,X   $DD  3   4+
Absolute,Y    CMP $4400,Y   $D9  3   4+
Indirect,X    CMP ($44,X)   $C1  2   6
Indirect,Y    CMP ($44),Y   $D1  2   5+

+ add 1 cycle if page boundary crossed

Compare sets flags as if a subtraction had been carried out. If the value in the accumulator is equal or greater than the compared value, the Carry will be set. The equal (Z) and sign (S) flags will be set based on equality or lack thereof and the sign (i.e. A>=$80) of the accumulator.

CPX (ComPare X register)

Affects Flags: S Z C

MODE          SYNTAX        HEX LEN TIM
Immediate     CPX #$44      $E0  2   2
Zero Page     CPX $44       $E4  2   3
Absolute      CPX $4400     $EC  3   4

Operation and flag results are identical to equivalent mode accumulator CMP ops.

CPY (ComPare Y register)

Affects Flags: S Z C

MODE          SYNTAX        HEX LEN TIM
Immediate     CPY #$44      $C0  2   2
Zero Page     CPY $44       $C4  2   3
Absolute      CPY $4400     $CC  3   4

Operation and flag results are identical to equivalent mode accumulator CMP ops.

BIT (test BITs)

Affects Flags: N V Z

MODE          SYNTAX        HEX LEN TIM
Zero Page     BIT $44       $24  2   3
Absolute      BIT $4400     $2C  3   4

BIT sets the Z flag as though the value in the address tested were ANDed with the accumulator. The S and V flags are set to match bits 7 and 6 respectively in the value stored at the tested address.

BIT is often used to skip one or two following bytes as in:

CLOSE1 LDX #$10   If entered here, we
.BYTE $2C         effectively perform
CLOSE2 LDX #$20   a BIT test on $20A2,
.BYTE $2C         another one on $30A2,
CLOSE3 LDX #$30   and end up with the X
CLOSEX LDA #12    register still at $10
STA ICCOM,X       upon arrival here.

Beware: a BIT instruction used in this way as a NOP does have effects: the flags may be modified, and the read of the absolute address, if it happens to access an I/O device, may cause an unwanted action.


Processor Flags

The Interrupt flag is used to prevent (SEI) or enable (CLI) maskable interrupts (aka IRQ's). It does not signal the presence or absence of an interrupt condition. The 6502 will set this flag automatically in response to an interrupt and restore it to its prior status on completion of the interrupt service routine. If you want your interrupt service routine to permit other maskable interrupts, you must clear the I flag in your code.

The Decimal flag controls how the 6502 adds and subtracts. If set, arithmetic is carried out in packed binary coded decimal. This flag is unchanged by interrupts and is unknown on power-up. The implication is that a CLD should be included in boot or interrupt coding.

The Overflow flag is generally misunderstood and therefore under-utilised. After an ADC or SBC instruction, the overflow flag will be set if the twos complement result is less than -128 or greater than +127, and it will cleared otherwise. In twos complement, $80 through $FF represents -128 through -1, and $00 through $7F represents 0 through +127. Thus, after:

CLC
LDA #$7F ;   +127
ADC #$01 ; +   +1

the overflow flag is 1 (+127 + +1 = +128), and after:

CLC
LDA #$81 ;   -127
ADC #$FF ; +   -1

the overflow flag is 0 (-127 + -1 = -128). The overflow flag is not affected by increments, decrements, shifts and logical operations i.e. only ADC, BIT, CLV, PLP, RTI and SBC affect it. There is no op code to set the overflow but a BIT test on an RTS instruction will do the trick.

Flag Instructions (Processor Status)

Affect Flags: as noted

These instructions are implied mode, have a length of one byte and require two machine cycles.

MNEMONIC                    HEX LEN TIM
CLC (CLear Carry)           $18  1   2
CLI (CLear Interrupt)       $58  1   2
CLV (CLear oVerflow)        $B8  1   2
CLD (CLear Decimal)         $D8  1   2

SEC (SEt Carry)             $38  1   2
SEI (SEt Interrupt)         $78  1   2
SED (SEt Decimal)           $F8  1   2

Jump Instructions

JMP (JuMP)

Affects Flags: none

MODE          SYNTAX        HEX LEN TIM
Absolute      JMP $5597     $4C  3   3
Indirect      JMP ($5597)   $6C  3   5

JMP transfers program execution to the following address (absolute) or to the location contained in the following address (indirect). Note that there is no carry associated with the indirect jump so:

AN INDIRECT JUMP MUST NEVER USE A VECTOR BEGINNING ON THE LAST BYTE OF A PAGE

For example if address $3000 contains $40, $30FF contains $80, and $3100 contains $50, the result of JMP ($30FF) will be a transfer of control to $4080 rather than $5080 as you intended i.e. the 6502 took the low byte of the address from $30FF and the high byte from $3000.

JSR (Jump to SubRoutine)

Affects Flags: none

MODE          SYNTAX        HEX LEN TIM
Absolute      JSR $5597     $20  3   6

JSR pushes the address-1 of the next operation on to the stack before transferring program control to the following address. Subroutines are normally terminated by an RTS op code.

RTS (ReTurn from Subroutine)

Affects Flags: none

MODE          SYNTAX        HEX LEN TIM
Implied       RTS           $60  1   6

RTS pulls the top two bytes off the stack (low byte first) and transfers program control to that address+1. It is used, as expected, to exit a subroutine invoked via JSR which pushed the address-1.

RTS is frequently used to implement a jump table where addresses-1 are pushed onto the stack and accessed via RTS eg. to access the second of four routines:

LDX #1
JSR EXEC
JMP SOMEWHERE

LOBYTE
.BYTE <ROUTINE0-1,<ROUTINE1-1
.BYTE <ROUTINE2-1,<ROUTINE3-1

HIBYTE
.BYTE >ROUTINE0-1,>ROUTINE1-1
.BYTE >ROUTINE2-1,>ROUTINE3-1

EXEC
LDA HIBYTE,X
PHA
LDA LOBYTE,X
PHA
RTS

RTI (ReTurn from Interrupt)

Affects Flags: all

MODE          SYNTAX        HEX LEN TIM
Implied       RTI           $40  1   6

RTI retrieves the Processor Status Word (flags) and the Program Counter from the stack in that order (interrupts push the PC first and then the PSW).

Note that unlike RTS, the return address on the stack is the actual address rather than the address-1.


Math Instructions

ADC (ADd with Carry)

Affects Flags: S V Z C

MODE          SYNTAX        HEX LEN TIM
Immediate     ADC #$44      $69  2   2
Zero Page     ADC $44       $65  2   3
Zero Page,X   ADC $44,X     $75  2   4
Absolute      ADC $4400     $6D  3   4
Absolute,X    ADC $4400,X   $7D  3   4+
Absolute,Y    ADC $4400,Y   $79  3   4+
Indirect,X    ADC ($44,X)   $61  2   6
Indirect,Y    ADC ($44),Y   $71  2   5+

+ add 1 cycle if page boundary crossed

ADC results are dependant on the setting of the decimal flag. In decimal mode, addition is carried out on the assumption that the values involved are packed BCD (Binary Coded Decimal).

There is no way to add without carry.

SBC (SuBtract with Carry)

Affects Flags: S V Z C

MODE          SYNTAX        HEX  LEN TIM
Immediate     SBC #$44      $E9   2   2
Zero Page     SBC $44       $E5   2   3
Zero Page,X   SBC $44,X     $F5   2   4
Absolute      SBC $4400     $ED   3   4
Absolute,X    SBC $4400,X   $FD   3   4+
Absolute,Y    SBC $4400,Y   $F9   3   4+
Indirect,X    SBC ($44,X)   $E1   2   6
Indirect,Y    SBC ($44),Y   $F1   2   5+

+ add 1 cycle if page boundary crossed

SBC results are dependant on the setting of the decimal flag. In decimal mode, subtraction is carried out on the assumption that the values involved are packed BCD (Binary Coded Decimal).

There is no way to subtract without the carry which works as an inverse borrow. i.e, to subtract you set the carry before the operation. If the carry is cleared by the operation, it indicates a borrow occurred.


Wrap-Around

Use caution with indexed zero page operations as they are subject to wrap-around. For example, if the X register holds $FF and you execute LDA $80,X you will not access $017F as you might expect; instead you access $7F i.e. $80-1. This characteristic can be used to advantage but make sure your code is well commented.

It is possible, however, to access $017F when X = $FF by using the Absolute,X addressing mode of LDA $80,X. That is, instead of:

LDA $80,X    ; ZeroPage,X - the resulting object code is: B5 80

which accesses $007F when X=$FF, use:

LDA $0080,X  ; Absolute,X - the resulting object code is: BD 80 00

which accesses $017F when X = $FF (a at cost of one additional byte and one additional cycle). All of the ZeroPage,X and ZeroPage,Y instructions except STX ZeroPage,Y and STY ZeroPage,X have a corresponding Absolute,X and Absolute,Y instruction. Unfortunately, a lot of 6502 assemblers don't have an easy way to force Absolute addressing, i.e. most will assemble a LDA $0080,X as B5 80. One way to overcome this is to insert the bytes using the .BYTE pseudo-op (on some 6502 assemblers this pseudo-op is called DB or DFB, consult the assembler documentation) as follows:

.BYTE $BD,$80,$00  ; LDA $0080,X (absolute,X addressing mode)

The comment is optional, but highly recommended for clarity.

In cases where you are writing code that will be relocated you must consider wrap-around when assigning dummy values for addresses that will be adjusted. Both zero and the semi-standard $FFFF should be avoided for dummy labels. The use of zero or zero page values will result in assembled code with zero page opcodes when you wanted absolute codes. With $FFFF, the problem is in addresses+1 as you wrap around to page 0.

Memory Instructions

LDA (LoaD Accumulator)

Affects Flags: S Z

MODE          SYNTAX        HEX LEN TIM
Immediate     LDA #$44      $A9  2   2
Zero Page     LDA $44       $A5  2   3
Zero Page,X   LDA $44,X     $B5  2   4
Absolute      LDA $4400     $AD  3   4
Absolute,X    LDA $4400,X   $BD  3   4+
Absolute,Y    LDA $4400,Y   $B9  3   4+
Indirect,X    LDA ($44,X)   $A1  2   6
Indirect,Y    LDA ($44),Y   $B1  2   5+

+ add 1 cycle if page boundary crossed

LDX (LoaD X register)

Affects Flags: S Z

MODE          SYNTAX        HEX LEN TIM
Immediate     LDX #$44      $A2  2   2
Zero Page     LDX $44       $A6  2   3
Zero Page,Y   LDX $44,Y     $B6  2   4
Absolute      LDX $4400     $AE  3   4
Absolute,Y    LDX $4400,Y   $BE  3   4+

+ add 1 cycle if page boundary crossed

LDY (LoaD Y register)

Affects Flags: S Z

MODE          SYNTAX        HEX LEN TIM
Immediate     LDY #$44      $A0  2   2
Zero Page     LDY $44       $A4  2   3
Zero Page,X   LDY $44,X     $B4  2   4
Absolute      LDY $4400     $AC  3   4
Absolute,X    LDY $4400,X   $BC  3   4+

+ add 1 cycle if page boundary crossed

STA (STore Accumulator)

Affects Flags: none

MODE          SYNTAX        HEX LEN TIM
Zero Page     STA $44       $85  2   3
Zero Page,X   STA $44,X     $95  2   4
Absolute      STA $4400     $8D  3   4
Absolute,X    STA $4400,X   $9D  3   5
Absolute,Y    STA $4400,Y   $99  3   5
Indirect,X    STA ($44,X)   $81  2   6
Indirect,Y    STA ($44),Y   $91  2   6

STX (STore X register)

Affects Flags: none

MODE          SYNTAX        HEX LEN TIM
Zero Page     STX $44       $86  2   3
Zero Page,Y   STX $44,Y     $96  2   4
Absolute      STX $4400     $8E  3   4

STY (STore Y register)

Affects Flags: none

MODE          SYNTAX        HEX LEN TIM
Zero Page     STY $44       $84  2   3
Zero Page,X   STY $44,X     $94  2   4
Absolute      STY $4400     $8C  3   4

 

DEC (DECrement memory)

Affects Flags: S Z

MODE          SYNTAX        HEX LEN TIM
Zero Page     DEC $44       $C6  2   5
Zero Page,X   DEC $44,X     $D6  2   6
Absolute      DEC $4400     $CE  3   6
Absolute,X    DEC $4400,X   $DE  3   7

INC (INCrement memory)

Affects Flags: S Z

MODE          SYNTAX        HEX LEN TIM
Zero Page     INC $44       $E6  2   5
Zero Page,X   INC $44,X     $F6  2   6
Absolute      INC $4400     $EE  3   6
Absolute,X    INC $4400,X   $FE  3   7

Register Instructions

Affect Flags: S Z

These instructions are implied mode, have a length of one byte and require two machine cycles.

MNEMONIC                    HEX
TAX (Transfer A to X)       $AA
TAY (Transfer A to Y)       $A8
TXA (Transfer X to A)       $8A
TYA (Transfer Y to A)       $98

DEX (DEcrement X)           $CA
DEY (DEcrement Y)           $88
INX (INcrement X)           $E8
INY (INcrement Y)           $C8

Stack Instructions

These instructions are implied mode, have a length of one byte and require machine cycles as indicated. The "PuLl" operations are known as "POP" on most other microprocessors. With the 6502, the stack is always on page one ($100-$1FF) and works top down.

MNEMONIC                        HEX TIM
PHA (PusH Accumulator)          $48  3
PHP (PusH Processor status)     $08  3
PLA (PuLl Accumulator)          $68  4
PLP (PuLl Processor status)     $28  4

TSX (Transfer Stack ptr to X)   $BA  2
TXS (Transfer X to Stack ptr)   $9A  2

Other Instructions

BRK (BReaK)

Affects Flags: B

MODE          SYNTAX       HEX LEN TIM
Implied       BRK          $00  1   7

BRK causes a non-maskable interrupt and increments the program counter by one. Therefore an RTI will go to the address of the BRK +2 so that BRK may be used to replace a two-byte instruction for debugging and the subsequent RTI will be correct.

NOP (No OPeration)

Affects Flags: none

MODE          SYNTAX        HEX LEN TIM
Implied       NOP           $EA  1   2

NOP is used to reserve space for future modifications or effectively REM out existing code.

 

The original source for the above content is: http://6502.org/tutorials/6502opcodes.html

August 4, 2017Programming Reference

Commodore 64 PETSCII Codes

Here is the second most frequent table that I find myself using, after screen codes. The PETSCII table.

PETSCII code
(dec, hex)
Character
(up/gfx, lo/up)
PETSCII code
(dec, hex)
Character
(up/gfx, lo/up)
PETSCII code
(dec, hex)
Character
(up/gfx, lo/up)
PETSCII code
(dec, hex)
Character
(up/gfx, lo/up)
0 $00   64 $40 @ 128 $80   192 $C0
1 $01   65 $41 A a 129 $81 orange 193 $C1 A
2 $02   66 $42 B b 130 $82   194 $C2 B
3 $03 Stop 67 $43 C c 131 $83 Run 195 $C3 C
4 $04   68 $44 D d 132 $84   196 $C4 D
5 $05 white 69 $45 E e 133 $85 F1 197 $C5 E
6 $06   70 $46 F f 134 $86 F3 198 $C6 F
7 $07   71 $47 G g 135 $87 F5 199 $C7 G
8 $08 disable C=-Shift 72 $48 H h 136 $88 F7 200 $C8 H
9 $09 enable C=-Shift 73 $49 I i 137 $89 F2 201 $C9 I
10 $0A   74 $4A J j 138 $8A F4 202 $CA J
11 $0B   75 $4B K k 139 $8B F6 203 $CB K
12 $0C   76 $4C L l 140 $8C F8 204 $CC L
13 $0D Return 77 $4D M m 141 $8D Shift-Return 205 $CD M
14 $0E lo/up charset 78 $4E N n 142 $8E up/gfx charset 206 $CE N
15 $0F   79 $4F O o 143 $8F   207 $CF O
16 $10   80 $50 P p 144 $90 black 208 $D0 P
17 $11 cursor down 81 $51 Q q 145 $91 cursor up 209 $D1 Q
18 $12 reverse on 82 $52 R r 146 $92 reverse off 210 $D2 R
19 $13 Home 83 $53 S s 147 $93 Clear 211 $D3 S
20 $14 Delete 84 $54 T t 148 $94 Insert 212 $D4 T
21 $15   85 $55 U u 149 $95 brown 213 $D5 U
22 $16   86 $56 V v 150 $96 pink 214 $D6 V
23 $17   87 $57 W w 151 $97 dark grey 215 $D7 W
24 $18   88 $58 X x 152 $98 grey 216 $D8 X
25 $19   89 $59 Y y 153 $99 light green 217 $D9 Y
26 $1A   90 $5A Z z 154 $9A light blue 218 $DA Z
27 $1B   91 $5B [ 155 $9B light grey 219 $DB
28 $1C red 92 $5C pound 156 $9C purple 220 $DC
29 $1D cursor right 93 $5D ] 157 $9D cursor left 221 $DD
30 $1E green 94 $5E up arrow 158 $9E yellow 222 $DE
31 $1F blue 95 $5F left arrow 159 $9F cyan 223 $DF
32 $20 Space 96 $60 160 $A0 Shift-Space 224 $E0
33 $21 ! 97 $61 161 $A1 225 $E1
34 $22 " 98 $62 162 $A2 226 $E2
35 $23 # 99 $63 163 $A3 227 $E3
36 $24 $ 100 $64 164 $A4 228 $E4
37 $25 % 101 $65 165 $A5 229 $E5
38 $26 & 102 $66 166 $A6 230 $E6
39 $27 ' 103 $67 167 $A7 231 $E7
40 $28 ( 104 $68 168 $A8 232 $E8
41 $29 ) 105 $69 169 $A9 233 $E9
42 $2A * 106 $6A 170 $AA 234 $EA
43 $2B + 107 $6B 171 $AB 235 $EB
44 $2C , 108 $6C 172 $AC 236 $EC
45 $2D - 109 $6D 173 $AD 237 $ED
46 $2E . 110 $6E 174 $AE 238 $EE
47 $2F / 111 $6F 175 $AF 239 $EF
48 $30 0 112 $70 176 $B0 240 $F0
49 $31 1 113 $71 177 $B1 241 $F1
50 $32 2 114 $72 178 $B2 242 $F2
51 $33 3 115 $73 179 $B3 243 $F3
52 $34 4 116 $74 180 $B4 244 $F4
53 $35 5 117 $75 181 $B5 245 $F5
54 $36 6 118 $76 182 $B6 246 $F6
55 $37 7 119 $77 183 $B7 247 $F7
56 $38 8 120 $78 184 $B8 248 $F8
57 $39 9 121 $79 185 $B9 249 $F9
58 $3A : 122 $7A 186 $BA 250 $FA
59 $3B ; 123 $7B 187 $BB 251 $FB
60 $3C < 124 $7C 188 $BC 252 $FC
61 $3D = 125 $7D 189 $BD 253 $FD
62 $3E > 126 $7E 190 $BE 254 $FE
63 $3F ? 127 $7F 191 $BF 255 $FF

Notes:

  1. Codes $00-$1F and $80-$9F are control codes. Printing them will cause a change in screen layout or behavior, not an actual character displayed.
  2. Codes $60-$7F and $E0-$FE are not used. Although you can print them, these are, actually, copies of codes $C0-$DF and $A0-$BE.
  3. Code $FF is the BASIC token of the π (pi) symbol. It is converted internally to code $DE when printed and, vice versa, code $DE is converted to $FF when fetched from the screen. However, when reading the keyboard buffer, you will find code $DE for Shift-↑ (up arrow) as no conversion takes place there yet.
  4. Cleaned up from the original source: http://sta.c64.org/cbm64pet.html
August 3, 2017Programming Reference

Commodore 64 Screen Codes

The new format of C64OS.com has opened up an exciting new possibility for me. With the new dedicated weblog section of the site, and its left sidebar, I can now pin feature posts to the top to make sure people who randomly visit the site will discover them.

But the sidebar actually now gives me a place to open up a whole new categorization system. I am always looking around the web for the various programming references and documentation that I need. Now, I can make blog posts that contain key programming reference material, and simply tag those posts "Programming Reference". And the sidebar can display a list of posts so tagged. Boom, instant access to all my most frequently used reference materials available in one place.

I hope this post series will be useful to other people too.

Screen code
(dec, hex; reverse dec, hex)
Character
(up/gfx, lo/up)
Screen code
(dec, hex; reverse dec, hex)
Character
(up/gfx, lo/up)
0 $00 128 $80 @ 64 $40 192 $C0
1 $01 129 $81 Aa 65 $41 193 $C1 A
2 $02 130 $82 B b 66 $42 194 $C2 B
3 $03 131 $83 C c 67 $43 195 $C3 C
4 $04 132 $84 D d 68 $44 196 $C4 D
5 $05 133 $85 E e 69 $45 197 $C5 E
6 $06 134 $86 F f 70 $46 198 $C6 F
7 $07 135 $87 G g 71 $47 199 $C7 G
8 $08 136 $88 H h 72 $48 200 $C8 H
9 $09 137 $89 I i 73 $49 201 $C9 I
10 $0A 138 $8A J j 74 $4A 202 $CA J
11 $0B 139 $8B K k 75 $4B 203 $CB K
12 $0C 140 $8C L l 76 $4C 204 $CC L
13 $0D 141 $8D M m 77 $4D 205 $CD M
14 $0E 142 $8E N n 78 $4E 206 $CE N
15 $0F 143 $8F O o 79 $4F 207 $CF O
16 $10 144 $90 P p 80 $50 208 $D0 P
17 $11 145 $91 Q q 81 $51 209 $D1 Q
18 $12 146 $92 R r 82 $52 210 $D2 R
19 $13 147 $93 S s 83 $53 211 $D3 S
20 $14 148 $94 T t 84 $54 212 $D4 T
21 $15 149 $95 U u 85 $55 213 $D5 U
22 $16 150 $96 V v 86 $56 214 $D6 V
23 $17 151 $97 W w 87 $57 215 $D7 W
24 $18 152 $98 X x 88 $58 216 $D8 X
25 $19 153 $99 Y y 89 $59 217 $D9 Y
26 $1A 154 $9A Z z 90 $5A 218 $DA Z
27 $1B 155 $9B[ 91 $5B 219 $DB
28 $1C 156 $9Cpound 92 $5C 220 $DC
29 $1D 157 $9D] 94 $5D 221 $DD
30 $1E 158 $9Eup arrow 94 $5E 222 $DE
31 $1F 159 $9Fleft arrow 95 $5F 223 $DF
32 $20 160 $A0Space 96 $60 224 $E0Shift-Space
33 $21 161 $A1! 97 $61 225 $E1
34 $22 162 $A2" 98 $62 226 $E2
35 $23 163 $A3# 99 $63 227 $E3
36 $24 164 $A4$ 100 $64 228 $E4
37 $25 165 $A5% 101 $65 229 $E5
38 $26 166 $A6& 102 $66 230 $E6
39 $27 167 $A7' 103 $67 231 $E7
40 $28 168 $A8( 104 $68 232 $E8
41 $29 169 $A9) 105 $69 233 $E9
42 $2A 170 $AA* 106 $6A 234 $EA
43 $2B 171 $AB+ 107 $6B 235 $EB
44 $2C 172 $AC, 108 $6C 236 $EC
45 $2D 173 $AD- 109 $6D 237 $ED
46 $2E 174 $AE. 110 $6E 238 $EE
47 $2F 175 $AF/ 111 $6F 239 $EF
48 $30 176 $B00 112 $70 240 $F0
49 $31 177 $B11 113 $71 241 $F1
50 $32 178 $B22 114 $72 242 $F2
51 $33 179 $B33 115 $73 243 $F3
52 $34 180 $B44 116 $74 244 $F4
53 $35 181 $B55 117 $75 245 $F5
54 $36 182 $B66 118 $76 246 $F6
55 $37 183 $B77 119 $77 247 $F7
56 $38 184 $B88 120 $78 248 $F8
57 $39 185 $B99 121 $79 249 $F9
58 $3A 186 $BA: 122 $7A 250 $FA
59 $3B 187 $BB; 123 $7B 251 $FB
60 $3C 188 $BC< 124 $7C 252 $FC
61 $3D 189 $BD= 125 $7D 253 $FD
62 $3E 190 $BE> 126 $7E 254 $FE
63 $3F 191 $BF? 127 $7F 255 $FF
  1. Codes $80-$FF are the reverse equivalents of codes $00-$7F
  2. Cleaned up from the original source: http://sta.c64.org/cbm64scr.html
August 1, 2017Programming Theory

Base Conversion in 6502 (2/2)

Very close to a month ago I wrote an article called Base Conversion in 6502. That piece was a part 1 of 2 because I felt like it was getting a little long. In part 1, I went through the process of converting an ASCII string representation of a number to the actual integer equivalent.

This piece is Base Conversion in 6502, part 2. In this article I'm going to walk through the reverse procedure. How to convert an integer into a string representation of that number. In either direction the base of the representation is arbitrary. In the original I showed converting from a string representing a number in base10, but it would be just as easy to convert if the string held a number in base16, or base2, or base8 or whatever. Similarly, when we convert an integer into a string representation we can choose what base we want it to be converted to.

I feel like there was some confusion over my terminology in part 1, because I was refering to integers as numbers in hex, so I'd say, convert a string number to hex. When the CPU deals with integers, say to add them together, the integer is whatever its bits are counted in binary. So, if you have a byte in memory, and the byte's bits are 0000 0010, then we can say that as far as the CPU is concerned that's the integer 2. And it's super easy for the computer to add that to some other integer like 3 (0000 0011), to produce a one byte integer result 5 (0000 0101). If these numbers were in PETSCII, though, they would be 50 == "2" (0011 0010), 51 == "3" (0011 0011), and 53 == "5" (0011 0101), and so on.

The reason I used the words converting between a string and hex, is because in the assembler ints are most frequently typed in hex notation. And when assembled that number typed in hex is represented in the CPU as an integer.

Meanwhile, you can represent a number in binary, in a string! Where each bit is represented by a whole byte in the string, either "0" (48, 0011 0000), or "1" (49, 0011 0001). So, converting an integer like 5 (0000 0101) to a string representation in base2, it would be converted to the string "00000101" which is actually 8 bytes long, 48 48 48 48 48 49 48 49. Okay, here we go.

Convert an INT to a string representation

This is the opposite process. We have an int (an integer), but we want to print it out on the screen in characters that are readable by a human.

What would happen if you didn't bother to do a conversion and just dumped the byte directly to screen memory? Well, imagine your byte was the integer 83. If you dump that byte to the screen the VIC II will draw it as screen code 83, it'll draw a single character that looks like a heart. How would you then figure out what that byte really is? You'd have to refer to a screen codes chart, search through the chart for a heart and then see that the heart is 83, bingo.

What are we really doing when do this? Well, it's kinda like representing the number in base256. Base10 has 10 symbols (0,1,2,3,4,5,6,7,8 and 9), and if the integer is greater than 9 we need a second character to hold another of the 10 symbols. Base2 has 2 symbols (0 and 1), and if the int is greater than 1 we need another character for another either "0" or "1". Base256 would mean we need 256 unique symbols. And if the integer is greater than 255 then we'd need a new character with another one of our 256 symbols. But, that's what screen codes are; 256 unique symbols. A one byte integer requires exactly one screen code character.

But that would be terrible, it would take forever to figure out what the number is in a way that we understand it. And what's worse, many screen codes look very similar to each other. 116 and 117 are both left vertical bars, 117 is just a single pixel wider than 116. So, representing the number in a base256 string is not really workable. We need to convert to some smaller base.

The easiest way to get something we could describe as human readable is to convert to a string that is in base16, or hexadecimal. The reason this is easiest is because the 8 bits of the underlying int break straight down the middle: 4 upper bits (the upper nybble), and 4 lower bits (the lower nybble). Each nybble is a number from 0 to 15 and can be used as an index into a lookup table of 16 PETSCII characters, "0123456789abcdefg".

I personally use this all the time for debug output when coding. I've got some ints that I'm working with and I need to see what those ints are, so, short of using a monitor which does these conversions for us, I just want to spit out some numbers to the screen.

In fact, I already gave the source for such a conversion routine, inttohex, in the post Implementing Factorial In 6502.

It's very very easy. By shifting right (LSR) four times, the upper four bits are shifted down into the lower nybble and the upper 4 places replaced with zeroes. This is now a number in the range 0 to 15, which can be used as a lookup index into a string of digits, labeled above as hexits.

And to get the lower 4 bits alone is even easier. You simply mask the whole byte using AND #%00001111. Some juggling is required to preserve the numbers as you go, but by and large it is simple, fast and super useful.

So, we're done! No, we're not.

Converting to base16 is really convenient for programmers who are debugging, but most people don't understand hexadecimal because they don't have 8 fingers per hand, like programmers do. Most people have only 5 fingers per hand (the thumb is a finger for this exercise.) So most people want to see the output in base10.

Convert an INT to a string in an arbitrary base

We return now to the article about Large Look-up Tables for Hyperfast, Accurate, 16-Bit Scaled-Integer Math by Garth Wilson over at Wilson Mines Co. The original place where I saw descriptions of the principles of base conversion. Here's his prose description of converting from an int to a string in an arbitrary base:

For converting hex numbers to other bases for output (which will normally be a string), initialize a blank string. You will build it from right to left. Divide your number by what's in variable BASE, and use the remainder to add a text digit to the build, even if it's a 0. Keep doing that until there's 0 in the number.

As before, I'll convert that to pseudo code for you.

Convert from INT to a different base:

Define BASE.
Define symbol table with at least BASE-n symbols.
Initialize a blank string.

loop:
Divide number by BASE.
Use remainder as index into symbol table.
Prepend symbol to string.
Go loop until number is zero.

Next let's step through the pseudo code with an example number, seeing how it converts one step at a time.

BASE = $A (10 in decimal)
$A894
--------------------
loop 0:
$A894 / BASE == $10DB (remainder $6)
"6" + ""

loop 1:
$10DB / BASE == $1AF (remainder $5)
"5" + "6"

loop 2:
$1AF / BASE == $2B (remainder $1)
"1" + "56"

loop 3:
$2B / BASE == $4 (remainder $3)
"3" + "156"

loop 4:
$4 / BASE == $0 (remainder $4)
"4" + "3156"

--------------------
"43156"

So before we proceed with an explanation, let's use our calculator and see if the conversion works. 43156 is indeed equal to $A894. That's convenient.

To start we define the variable BASE as $A, that's base 10, because we'd like to see our output string in the nice friendly base 10 we're all used to. But this would work just as well if we were to use some other base instead.

Between the two horizontal rules I've labelled 5 loop iterations, 0 through 4. It just works out that it takes this many steps to get our division result all the way down to zero.

The initial value is our dividend, and BASE is our divisor. After performing the division we get a result and a remainder. That remainder, as it happens, even if it is zero, is the least significant digit of the number in the base we're converting to. In this case, because we're converting to base10, the range of remainders even represented in hexadecimal will only ever be between $0 and $9.

The remainder has to be converted to a PETSCII character. This isn't shown here, but it's very easy. We can see by looking at a PETSCII table that $30 is "0", $31 is "1", $32 is "2" and so on. So to get the PETSCII character version of our remainder integer all we have to do is add $30. The character has to be added to the start of the string. In our implementation we'll take a look at how I might get around to doing that.

As long the result is not equal to zero we need to loop and continue the process. On each new loop, the result from the previous division becomes the dividend for the next division. In the final loop the dividend ($4) is smaller than BASE ($A) so the result is 0 plus the remainder which will be the final (most significant) digit in the string.

When we converted the other way, our implementation need to have a multiplication routine because the 6502 doesn't have a multiply instruction. Converting this direction, our real implementation will need a division routine. We will also need to worry about precision. In this case, as in the example in part 1, I'm opting to settle for 16-bit precision. So our maximum input number is $FFFF and the maximum string representation will be "65535".

Codebase64 to the rescue on the 16-bit division routine.

INT to a String number Implementation

And here is my implementation of the above. Works just as you'd hope.

Now let's walk through this a line at a time and see how it works.

The division routine that I grabbed from Codebase64 requires three 16-bit numbers in zero page. These are the first constants defined under Workspace. You only need to specify the low byte of these addresses. We have one for divisor, one for dividend and one for remainder. The result in this routine actually is the same address as dividend, which means dividend will continually be overwritten by result. This turns out to be super handy for our use case, as we'll see shortly.

Base is defined as 10, petoffset is the number that needs to be added to an int between 0 and 9 to produce the PETSCII characters "0" through "9". I make this a constant, because I hate magic numbers being sprinkled through code. And chrout is defined here as $FFD2, which is the KERNAL address for outputing a PETSCII character to the screen.

The code starts at $0801, that's where BASIC programs start. The first code encountered is the BASIC preamble. In my post Implementing Factorial on 6502 I talk a bit about how the preamble works. But here'd like to say that rather than just spitting out a series of inscrutable bytes, the assembler actually offers a variety of features that make producing the preamble from memory quite a simple task.

BASIC Preamble in TMP on a real display

The preamble is structured as a BASIC program. First a 2-byte pointer to the next line, this can be done with .word and a label, "end" in this case. Next is a 2-byte line number. This is easy to do with .word 10, that makes a 16-bit line number 10. Next we have a 1-byte BASIC token, this is the hardest part to remember, .byte $9e, that's the token for SYS. Next the SYS wants a memory address, encoded in PETSCII (ironic consider the nature of this article series), plus the BASIC line ends with a null byte $00. This can be very easily produced by the assembler with the .null keyword. The following quoted string is in PETSCII, and it automatically gets null terminated. To end the BASIC program we want a 16-bit $0000, this is easily produced with .word $00, and this is also the line that is pointed to by the first line's next line pointer. So it takes the "end" label. And boom, that's easy to type up from scratch, and is even comprehensible to read.

The main program starts at line 24. We start by putting our BASE into the divisor. This will not change, so it only has to be done once. Even though the divisor fits within 8-bits, the assembler notation of #> and #< will break it into $00 and $0A for us.

Next we put the number we want to convert into the dividend the same way we populated divisor. I've hardcoded $A894, but in a real program you'd probably be pulling this from somewhere.

The loop starts at line 36. The first thing it does is checks to see if dividend is 0. If it is, we're done and we can leave the loop to output our converted string. Dividend is a 16-bit number, but checking to see if it's zero can most easily be accomplished by loading one of its bytes, and then ORing that with the other byte. If the result of both bytes ORed together is still zero, then the whole dividend is zero, and the conversion is complete.

At line 40 we JSR to divid16. This routine overwrites the dividend with the result of the division, as well as populating the remainder, even if the remainder turns out to be zero. Because we're dividing by such a small number, $0A, remainder cannot ever be bigger than a value from $00 to $09, which also means that remainder's high byte will always be $00 and can be ignored.

Lines 42,43 and 44 load the remainder's low byte and add the PETSCII offset to it. Lines 46, 47 and 48 put the PETSCII character into the string. The string, str, is a pre-defined buffer that starts as 5 spaces, null terminated, at line 64. Line 63 is an index variable that starts at 4. That's the offset to the end of the string buffer. When we add the character to the string we add it at the index, then decrement the index. This is how I've decided to do the string prepending. It means I don't have to shift anything, but it also means the final string will always by 5 characters long, and any unused places are padded with spaces.

Line 50 jumps back to the start of the loop. Which will check the dividend to see if it's zero yet. The convenient thing here is that we don't need to explicitly move the result of the previous division into the dividend, because that happens automatically in the divid16 routine. Convenient.

Lastly, we want to output the finished string to the screen. This is done starting at line 53. We simply use the .X register to walk through the bytes of the string and call chrout on each byte to put them on the screen. The loop ends when the string's null byte is reached. And we return to BASIC. That's it.

Output of the conversion program

So, we load it up in vice and low and behold, 43156 gets spat out onto the screen. Looks like it works! This concludes this two part series on INT to String and String to Int conversion. Hopefully someone in the world who's learning 6502 will one day stumble upon this and find it useful.

If you want to download the program and try it out, or inspect how it assembled, you can download it here.

July 21, 2017Hardware

Commodore Logo Mark Patch

I've got something a little special for you today. I've been hard at work building C64 Luggable. You can read all about the project, my progress and how to build one yourself on this site's C64 Luggable page.

It's really great. I'm enjoying the challenge of working with physical materials, and also the intellectual fun of figuring out how to construct something that's modular, self–contained and has everything you'd want in a full Commodore system.

A couple of weeks ago I had a sudden realization that once C64 Luggable is complete I'll want to take it with me to friends' houses, parties, the cottage, and most of all, Commodore Expos such as this year's Commodore World.1 Who builds a custom luggable Commodore all-in-one and then doesn't take it to tradeshows to show it off?

The problem is that it's going to be somewhat vulnerable to nicks and scratches, or much worse, damage to the open flat screen or front speaker cones. Especially if I started lugging it around to shows and taking it in and out of my car. What I really need is a padded cover. It just so happens that my mother is an experienced and very talented quilter. She's an esteemed member of the Napanee Heritage Quilter's Guild. So I recruited her to help me make a custom fitted cover. I'm picturing it like one of those artsy Kleenex box covers, if you've seen those. It slips on over the top. Covers all four sides but leaves the button open so the computer will remain standing on its own feet. And the top cover will have a slit opening sized for the handle to pass through. It will be very handy.

But, we don't just want a generic–looking cover. It's gotta have some style and advertise what it is. So I came up with the idea of getting a custom embroidered patch made up that can be stitched to the front of it.

I looked around the internet and compared prices. The best price I found was from a company in the U.S. called CUSTOMPATCHES. You send them artwork, they mock it up graphically, and follow up with you to make sure it's what you want. Then they create a single proof piece and show you a picture of it. After you approve the proof they make as many as you want in fixed batch sizes. The smallest order is 25. But you can also order 50, 100, 200, 300, 500 and more. The price per unit drops substantially the more you order.

I only want 2 or three for myself. One for the C64 Luggable cover, and a couple more, maybe for a bag or a hoodie. The rest, well, I figure I'll sell them off for a few dollars (+shipping) a piece, mostly to recoup the upfront cost, and to share the love. They've been ordered but I haven't received them yet. They'll probably be available by the end of August 2017.


Here's the artwork I put together in my experiments, and the patch mockup they sent me. The dimensions are, as shown on the mockup, 3.5" by 2.4". They are backed with an iron–on adhesive. This means they can be applied to a bag, or a backpack, (or a luggable computer cover), with just an iron press. They promise me the adhesion is extremely good, and sewing them on isn't absolutely necessary, but they do recommend a single outer stitch to increase the longevity.

Contact me if you're interested in buying a few of these. Supplies available until they're all sold off. I'll update with final price when I know for sure what they'll cost me.

Commodore Logo Mark Commodore Word Mark Commodore Logo Mark (Patch Art)

Update: I'm still working through the proof process with them. The final version may not have the black outer border.

Commodore Logo Mark (Patch Art, Proof 1)

Update 2: This is the final proof. With some feedback from twitter, and my personal opinion, I opted to get rid of the black border. Some border is necessary to prevent fraying, but they've switched it to a white border.

Commodore Logo Mark (Patch Art, Proof 2)
  1. I had a blast at last year's Commodore World 2016. If you haven't read my review of the event, you can find it here. We had the pleasure of hanging out with Bil Herd, legendary designer of the Commodore 128. []
Older PostsClick titles to see full content

The home page shows the full content of the 10 most recent posts.
Below are the titles of 5 posts older than the most recent 10.
Click their titles to view the complete post and read and leave comments.

July 5, 2017Programming Theory

Object Orientation in 6502

June 30, 2017Programming Theory

Base Conversion in 6502 (1/2)

June 20, 2017Software

Huge Site Update

June 5, 2017Software

Recursive File Copier in BASIC

May 29, 2017Technical Deep Dive

Implementing Factorial in 6502

Archive

Full Archive of Past Posts…