NEWS, EDITORIALS, REFERENCE
Memory Managed Loader
In my earlier discussion about big project organization, I touched on the issue of the KERNAL's built-in loader. A loader is the part of an operating system that is designed to take executable code from disk and load it into memory. You can read all about the general idea of loaders, on Wikipedia. This post will go into some more detail about how the C64 KERNAL's loader works and what it's good for. And then I'll talk a bit about what needs to happen for C64 OS.
Depending on the kind of computing system loaders can vary dramatically in sophistication. Loaders range from simple all the way to the complex and highly sophisticated dynamically linked loaders in use by today's OSes for PCs, Macs and smartphones. In the case of an embedded device, all of its code could be in ROM. In which case it has no external storage and no need to load anything at all from storage into memory. The ROM is already mapped into memory when the device is turned on. And at the other extreme, a modern computer has hundreds of processes running in the background at all times. Code is being loaded and unloaded from memory without the user even realizing it. And somehow, through the magic of modern computing principles and hardware architecture, that code is all able to share an arbitrary amount of memory, even when the people who wrote all the various bits of code did not directly collaborate.
The C64's built-in loading system is decidedly simple. But it does exist. It was appropriately lightweight for a machine of its size and capacity, and useful for getting programs stored on disk into memory to be run. The main limitation is that everything about a C64 was designed to be single tasking. One program at a time. Now, a complicated program could have many parts, the most common example is a game with many levels. There is no need to have every single level loaded into memory right at the very beginning. Maps, and sprites and other level data can be separated into different files from the game's main code. At the start the main game code is loaded, which then in turn loads in the first level data only. After the user beats the first level the game code can load in the data for level 2 by overwriting the areas of memory used for level 1. The KERNAL's loader works brilliantly for this.
The thing to notice though is that all the components of that sort of program are written by the same person or small team. Together when designing how the game will work memory is allocated by hand in large blocks. Game engine code will fit into this block, map data fits into that block, character sprites into this block, splash screens fit into that block, music fits into this block and so on up to the capacity of the machine. When the game's main code wants to load in a new level, it could replace just the region of memory for music with one file, just the region of memory for a splash screen with another file, just the sprites or map data with another file. Each type of content goes into and overwrites a predefined region of memory. The KERNAL's loader is great for doing just this. Although a lot of game developers would replace the KERNAL and write a customer loader, the principle is mostly the same.
Now let's consider this from another perspective. Using a Machine Language monitor you can look at the contents of memory, change the contents of individual memory locations, search through memory and more. One thing most monitors let you do is save a range of memory to disk. How that works is very instructive. You specify the memory range, a start and end address, and a file name and the bytes are written as a PRG type file to disk. Because the 6510 has a 16-bit address bus 2 bytes are required to specify any address in memory. The two-byte start address is written to disk first, followed by all the bytes in the specified range. When the KERNAL's loader is used to load a PRG file from disk, it first reads the two-byte "header," which it interprets as the start address. It then loads all the remaining bytes of the file sequentially into memory starting at that point and putting one byte after the next. This process is the fastest way the KERNAL has for reading bytes from disk and putting them in memory. There are no safety checks. The file on disk is assumed to be shorter than: 65,535 minus the start address. For example, if the start address is $C000, it is assumed that the file data is less than $4000 bytes long, because the available space is $FFFF-$C000, which is $3FFF bytes. If it's not, it will run off the end of memory and something will go horribly wrong. But if you imagine the process of saving regions of memory from the machine language monitor, it is impossible to produce a file that doesn't fit, because the file is ultimately just a disk-saved copy of what was a region of memory in the first place. That's pretty cool.
If your game or program is designed with regions of memory pre-defined for certain roles, it is very easy to load in the parts. Have the KERNAL "LOAD" a file of music and the data will be put directly back into memory whence it was saved. Very convenient. A game or program can be much larger than 64k, where memory is like a giant jigsaw puzzle, designed to fit together. And loading in a new resource overtop of an old resource is like pulling out a strip of puzzle pieces and putting in a strip of pieces perfectly fitted in their place but with different images printed on their surface. The whole system is super light weight and allows a game or program, with minimal programming effort, to exceed the total memory of the machine. Brilliant.
What's different about C64 OS?
C64 OS is single tasking. And it does run on a breadbox C64. Again, an important goal is to be true to the hardware. Writing a crazy complicated dynamically linked code-relocating loader is not in the cards. However, even without multiple processes running concurrently, there is still a need for managing memory. If you don't believe me, go read my earlier posts about my Thoughts about Memory Management and C64 OS's actual approach to implementing A Hybrid Memory Manager. In brief, imagine C64 OS provides a simple multi-line text editing control. This control may be used for writing a tweet, or an email, or an IRC message, or just taking notes for yourself. The control is backed by some code for efficiently handling arbitrarily long strings. Inserting or deleting text in the middle of the string, and cutting and pasting regions of the string can be made very efficient because pages of memory can easily be resequenced and no more than a page or so of memory (256 bytes) ever has to be shifted. The question is, as you type more content and you need another page of memory, where does that new page of memory come from? Right. You need a memory manager.
The memory manager in C64 OS has already been fairly well thought out, and useful functional parts of it have already been implemented. When some code is running and it needs more pages of memory, perhaps to store a new TCP/IP Packet, or to give you more space to keep typing, it makes the request from the memory manager for the number of pages it needs and marks those pages as no longer available. Well, the same problem arises when you want to load data from disk into memory.
You cannot merely load some data into memory and expect that to be okay. Why not? For two reasons. One, because wherever the data is going to be loaded could already be memory that the memory manager has made available to something else. Or, more likely, because the memory manager has to know that the region of memory being loaded into will no longer be available after the load. Neither of these two situations is any less true just because the operating system is not "multi-tasking" in the usual sense of the term.
Let's unpack a common scenario. The first involves loading an application. Programs designed to work with C64 OS can generally be considered applications, even though some of them may be small utilities. The general jigsaw-puzzle picture of memory in C64 OS is as follows:
The first 2K are used for screen memory, mouse pointer sprites, KERNAL work space, stack, and other zero-page work space required by C64 OS and its applications. The BASIC ROM is patched out, I/O is usually patched in, and the KERNAL ROM is usually patched in. C64 OS itself, like the KERNAL, sits at the end of memory. In this case, it butts up against I/O. C64 OS therefore occupies all of $CXXX and as little of $BXXX as I can get away with. RAM behind I/O is used in a very careful way by some C64 OS system data, such as the memory map itself. The RAM area behind the KERNAL is left unmapped. It is too problematic to put functional code behind the KERNAL while simultaneously relying on the KERNAL for ongoing work. The swapping becomes a nightmare. It is instead an open area that is ideal for rendering 8K of bitmap, and frees up 8K elsewhere that would be used by a bitmap for something else.
What this means is that the ideal location for an application's main code is to start at $0801, where BASIC RAM normally starts. There is approximately 43k of contiguous free space here until we hit the start of C64 OS's own code. That's perfect. We can load in an application that is a few kilobytes or more depending on its sophistication and whatever remaining space is left available for data. The tighter you can make your app's code, the more space you have for data. The problem from a loader perspective is that the memory manager has no idea how big the code being loaded actually is. Is it 2K? Or 10K? Or 35K? Traditionally, the KERNAL has no idea how big it is, and it doesn't really matter because the programmer knows what is and what isn't available. Somehow, C64 OS's memory manager has to know which memory pages to mark as unavailable. Subsequent requests to allocate pages will have to come from memory that is known to not be in use.
The Fixed Address Loader
C64 OS provides two loaders. A fixed address loader, and a relocatable loader. I'll say more about the relocatable loader shortly. The fixed address loader is essentially for code that cannot be relocated due to its use of absolute addresses for LDA/X/Y, STA/X/Y and JMP or JSR. The loader must determine two things, where it is going to be loaded, and how big it is. Where it will be loaded is easily determinable by checking the first two bytes. One caveat is that C64 OS requires a PRG to be loaded at (or very near to) the start of a page boundary. Because the memory manager allocates page-sized units of memory. In order to get the size of a binary before it is loaded it is necessary to read its file size from the disk. Most Commodore and CMD file systems for PRG files use a two-byte track and sector pointer to chain one block to the next block. The only way to know the number of bytes exactly is to read the block pointers following through every block until the last block. Each block up to the last contributes 254 bytes. The last block is indicated by the first byte of its two-byte next block pointer being zero. The second byte is an offset into that block indicating the last byte of the file. To get the size of the file in bytes you would add the offset-size of the last block to the running total of all the preceding blocks.
This is, to say the least, not very fast or efficient. And, it won't work with storage devices that use a different file system such as IDE64 (and possibly uIEC/SD, I haven't looked into how it deals with the Microsoft FAT16 file system on an SD Card.) Getting the exact byte length is only a little bit better than knowing how many blocks a file is, but it's slow, inefficient and incompatible! A textbook case of something not being worth it. Instead, all C64 compatible drives return the directory listing in a special binary format, which includes the "block" count of each file. Where a block is 254 bytes. When you attempt to load a file by name, with C64 OS, it first loads the directory with the filename as a pattern, by appending "$:" before the filename. So, you want to load an app named "twitter", it first loads "$:twitter", then parses out the block size. Next, it reads the start address. Then it loads the whole file, to its original start address, using the KERNAL's (or better yet, the JiffyDos KERNAL's) LOAD routine. C64 OS then simply informs the memory manager how many pages, starting at which page, to mark as unavailable.
There are two gotchas. A block count is not exactly the same as the page count. Each block is 2 bytes short of a page. However, because C64 OS allocates pages, it would take 128 blocks for those extra 2 bytes to add up to one full page. 128 blocks is around 32K, and so this will only even be relevant when loading a very large application. If the block count is greater than 127, the number of pages allocated is one less than the number of blocks. This means that up to 254 bytes may be wasted in the last allocated page. If the file is 127 blocks, the 127th page allocated will only hold 2 bytes. But those two bytes can't be overwritten by some other user data! So enough space has to be allocated to preserve room for them. The space is not entirely lost, however. A clever programmer could still make use of that space, knowing exactly where it is and how much is there, and also knowing that the loader has in fact allocated it.
The other gotcha is that because the file has a fixed loading address, it is possible that somehow C64 OS has already allocated the pages that this file requires. At the moment, I am not checking for that condition. The idea is that when a new application is loaded in, the memory that was in use by the old application will be unallocated. After the new application is loaded in and its pages are marked as unavailable, then new pages will be allocated from the remaining space for user data. If this will lead to a problem in practice is something I am uncertain about at this point.
The Relocatable Loader
This should not be confused with being a loader that is capable of relocating non-relocatable code. If code has been written in such a way as to require being loaded at a specific address, this loader will not change that. Instead such code should be loaded with the Fixed Address Loader described above. However, some code could actually be relocatable, such as if it used only relative branching, allocated stack space for its local variables, or made careful use of space in zero page.
Besides just very carefully structured code that can still work properly when loaded into an arbitrary range of memory, there is also need for data structures to be loaded from disk. These, because they don't execute, can often be relocated without any trouble. However, they still have to work with the memory manager to figure out what is available and what is allocated after the load is complete.
Because relocatable code or data can actually be relocated, there is no restriction on where they can go. But we still have to know how big the data loaded will be. To do this the same routine is used as above. The filename is prepended with "$:" to load a pattern matched directory from disk. Then the size in blocks is parsed out. Next, the loader asks the memory manager to allocate that number of blocks. The memory manager finds the first available start page where that number of consecutive pages are available. Marks them as allocated and returns the address to the start page.
The loader then goes ahead and uses the KERNAL's LOAD routine, but uses its relocate function. When calling SETLFS the secondary address is set to 0 to indicate a relocate is desired. Then when calling LOAD .X and .Y hold the address to load to. .X is the low byte which is always set to 0, because memory in C64 OS is page aligned. And .Y holds the high byte which was returned from the PGALLOC call to the memory manager.
The relocatable loader always uses the LOAD KERNAL call. The LOAD call assumes that the first two bytes of the file are its original start address. Therefore, the first two bytes of the file are never read into memory by LOAD. C64 OS uses a menu system for its primary application UI which I will go into in much more detail in a future post. However, one design goal is to have the menus build from a loadable data file. And the loadable data file should be fully user editable, presumably in a text editor. Text editors generally produce SEQ type files which lack the two byte header. Therefore, any SEQ type data file which is known will be loaded by the relocatable loader must set the first two bytes to something bogus. In the text editor when editing menu system data files, for example, I just put any single character on the first line, followed by a single carriage return ($0D) as the second byte. And the real data begins in the first column of the second line.
SEQ type files can also in theory be much larger than the total available memory. But C64 OS's relocatable loader loads even these sequential files using the KERNAL's LOAD call. It does this because it is so much faster than reading one byte at a time. In order to ensure that there is enough room, the routine that gets the block size of the file returns the size in .X and .Y as a 16-bit number. This allows for a file size up to 65,535 blocks. 65,535 * 254 bytes is approximately 16 megabytes.1 The relocatable loader checks to see if the high byte of the block size is greater than zero. If it is it automatically returns an out of memory error, because that would mean the file size is necessarily bigger than 64K. If the high byte is 0, then the low byte is passed to the memory manager to check if there are enough consecutive free pages into which the load can happen. If the memory manager returns an error, then the loader returns an error as well.
That is all there is to C64 OS's loader. Two routines: a fixed address loader, and a relocatable loader. Both of which take a pointer to a null terminated filename string in .X and .Y. The device number is read from $BA, and defaults to device #8 if $BA is #0. The current partition and path must already be set on that device before calling these routines. The size of the file to load is internally computed and the memory manager is either informed of what space the loaded file takes up, or asks the memory manager to allocate some space into which the load will take place. And that's it.
- This is the reason why CMD Native partitions have a maximum file size of 16 megabytes, and a maximum partition size of 16 megabytes. They use 256-byte blocks that are linked together with a 16-bit pointer. 65,535 * 256 = 16,776,960, or 16 megabytes. [↩]