NEWS, EDITORIALS, REFERENCE

April 2, 2020#99 Software

Understanding SD2IEC Filenaming

Forward Note:

I was in the middle of working on this post, and very suddenly Corona Virus became the only thing anyone could talk or think about. I was just about to start March Break, which I took off work to be home with my kids, when we found out that Canada was closing the schools for the 2 weeks following March Break. Then during March Break the University where I work announced they were closing their doors and everyone would be working from home. There are no more on-campus classes until September. And the elementary school closures are most likely to be extended until the beginning of next school year.

I am now homeschooling my two kids, making breakfasts, morning snacks and lunches for 3. I'm going on daily walks, teaching reading, creative writing, social studies, geography, science, religion and mathematics during 3 hours of every day; working in my home office 3 hours a day; trying to stay sane by having some semblance of a life in the evening with my wife, who is a healthcare worker in a hospital under duress. And then working from 10pm to 1am on C64 OS programming, documentation and blogging. I must admit, I've fallen asleep at my desk more than once so far. So, please be patient. I really do not want my projects to get derailed by this unexpected global crisis. Although this post took me longer to finish than I'd hoped, here it is, and over 17,000 words strong. I hope it's useful.

Greg Nacu
Husband, Father, Teacher, University Employee, Writer, C64 OS Programmer

NOTE: This post explains the ins and outs of how some of SD2IEC's settings affect its behavior, but it does not teach you how to configure SD2IEC's settings. I recommend opening the SD2IEC User's Manual, and following along with this post and trying its examples on your own SD2IEC as you read. Refer to the User's Manual when necessary to learn how to change the settings that are described in this post.

Back in 1995, Apple put out a cheeky print ad that said merely, "C:\ONGRTLNS.W95" with a little 6-color Apple logo beneath it.

Why did they do that? Besides the fact that they were cheeky and so arrogant that they couldn't foresee that Windows95 was coming in like a steamroller. Microsoft was about to lay them so low that many consider it a miracle that they escaped the clutches of bankruptcy during the 5 darkest years of their history.

But they did it because, when you're facing stiff competition, it's important to talk up your own strengths and point out the competition's weaknesses. And one of Microsoft's weaknesses was that they were obsessed with maintaining full backwards compatibility with the PCs that stretched back into the mists of time. (You know, the era of the Commodore 64.)

At the time, Apple's HFS file system supported filenames up to 255 characters long, with very few illegal characters.¹ Eventually HFS got long in the tooth, but at the time it sported many clever features. Files could have both a data and a resource fork, such that metadata, including the file's datatype, creator code, custom icon and more could all be attached to even simple plaintext files.

Windows95 on the other hand was clearly evolved in small gradual steps from its MS-DOS days, and was released with an extended version of FAT16. FAT16 limited filenames to only 8 characters, plus a 3 character extension that was used as metadata to indicate the file's datatype. Windows also had drive letters. C: to this day, is still the drive letter for the primary harddrive, because A: and B: are for floppy drives. Even though most PCs haven't shipped with a floppy drive in 20 years.

VFAT Directory Entry Structure

True to form, Microsoft remediated FAT16's rather sad state of affairs by introducing a layer called Virtual FAT. VFAT enabled Windows95 to show and use long filenames, without changing the data structures of the underlying FAT16 file system. This added a raft of complication, but it had the advantage that Windows95 files could still be accessed and used by MS-DOS.

"But, sir, we use our Commodore computers. So, why are you sharing with us this fascinating history?"

Because, dear reader, like it or not, we have inherited this madness, by way of SD2IEC. And I am going to do my best to try to explain how it works and what impact it has on the C64. But first, we will review how typical or native Commodore devices and file systems work so that we have some frame of reference.

Commodore Devices

During my years as a diehard Mac fan (I still like the Mac, by the way,) it got really easy to poke fun at old vestiges like the fact that PCs still use C: as their primary drive letter all these years after the typical PC user has forgotten what A: and B: were even for. The Mac never had these drive letter codes. They just had, and still have, volume names. But now that I'm back in full force in the land of Commodore 8-bit, mocking PC drive letters doesn't seem so clever anymore.

The organization of a C64's devices is unique to the Commodore 8-bit line. All device types recognized by the KERNAL have a device number, ranging from 0 to 30.

Device	I/O	Device Number	Secondary Address
Keyboard	Input	0	—
Datasette	Both	1	0, 1, 2
RS232	Both	2	—
Screen	Both	3	—
Printers	Output	4, 5	0, 7
Plotters	Output	6, 7	—
Storage Devices	Both	8... 30	0, 1, 2... 14, 15

Devices 0 to 3 are all built-in devices, controlled by software manipulation of hardware included inside the computer. And they're also described fully by the low 2 bits of the device number byte.²

All the other devices are nominally on the IEC serial bus, although it is possible for devices not on the physical serial bus to wedge themselves into this schema. This is why the first drive is usually device 8, it's the first number assigned to storage devices. While originally for a floppy drive, today it could easily be occupied by a harddrive or an SD-Card-based drive, etc. It was easy to mock C: on the PC, but the truth is that device #8 is essentially the C64's C: drive. An awful lot of PC software has been written that just hardcodes paths on the C: drive, just like the overwhelming majority of C64 software hardcodes device #8. (Even though that's very annoying.)

As newer storage devices were developed for the C64, they clearly followed the basic design set out by Commodore from their earliest days. The CBM file system on the original, enormous, tank-like floppy drives set the conventions for the devices to come.

From the earliest days, a device on the IEC bus could control more than one physical mechanism. The dual floppy disk drives, for example, are a single device that controls two disk drives (or units). In fact, the original KERNAL had no way to copy files from one device to another.³ Instead, you send a command to the DOS on a single device, and instruct it to copy a file from one to another of the drives under its control. To copy files between different devices, you needed a file copying program.

The VIC-1541 was marketed as "single" drive to distinguish it from the dual drives.

You might think that because dual floppy drives are quite rare the concept of one device controlling multiple mechanisms is just a historical curiosity. All the popular drives, the ones Commodore sold by the millions, the 1541, 1541-II, 1571 and 1581, are only single drive devices. Actually the concept of multiple drives controlled by a single device is alive and well. The drives became partitions. The CMD storage devices and later the IDE64 and later still the SD2IEC family all support the partitioned mechanism, and use the same DOS path and command syntax originally developed by Commodore for accessing a device's multiple drives.

IDE64 and SD2IEC devices are still commercially available products, today.

Commodore File Systems

Thus, within a device we have either drive mechanisms or partitions. And within a partition (or on a single floppy disk) we have a file system. All of Commodore's own disk drives use virtually the same file system with a few numbers tweaked, the common elements being found in different places depending on the disk's storage capacity. The storage medium is divided into some number of tracks, and each track contains some number of sectors. Each sector is 256 bytes.

On original disks there is a physical one-to-one mapping of logical tracks and their sectors to the physical layout of tracks and sectors on the disk's surface. Inner tracks are smaller and so they have fewer sectors than the larger outer tracks. On newer storage media the mapping of logical tracks and sectors to physical locations has become abstracted.

CBM's file systems support only a single directory. (Let's not go crazy with mocking FAT16, right?! The CBM File System doesn't even support subdirectories! Hard to imagine.) The directory is formed by a linked list of sectors. The links go only one way. The first directory sector links to the second, which links to the third and so on. But the third does not link back to the second, and the second does not link back to the first. This is also how sequential files chain their data blocks together, more on this in a bit.

Highlevel diagram of a 1541 disk's track/sector/block layout

Each type of disk, 1541, 1571, 1581 or the older and more exotic disks, keeps its directory header block at a known or fixed track and sector. For example, the 1541 and 1571's main directory header block is always at track 18 sector 0. The 1581's is always at track 40 sector 0. The header block has a pointer to the track and sector of the first directory block, which has a pointer to the second directory block, and so on.

In addition to the directory blocks, the CBM FS also maintains a BAM, or a block allocation map. The BAM itself takes up one or more sectors, and is found at fixed places per disk type. In the layout of the BAM individual bits correspond to the blocks within a track. Whenever the DOS needs to allocate a new block, for any reason, it references the BAM to figure out which blocks are available. Then it links the new block into the file or directory chain, and marks its corresponding bit in the BAM to indicate that it's no longer available.

Sometimes the actual use of blocks and the BAM could get out of sync. To deal with this there is a validation procedure. Validation walks the directory tree, and walks the file chains and verifies that every block that's actually used is marked unavailable in the BAM, and unused blocks marked available. Validation was originally a program that had to be loaded from disk, but it was later included as part of the DOS found in the device's ROM.4

Commodore File System Directories

An important part of the file system is what metadata it supports per file. For example, the file's name, type, size, timestamp, etc. These specifics come down to the directory structure.

Each 256-byte directory sector is divided into eight 32-byte chunks. That would mean that each directory entry has 32 bytes to work with, right? Well, not quite. Remember that directory blocks are chained together. The first directory block can hold 8 files but if you want a 9th file it needs to allocate a new block and add it to the chain. The first two bytes of a block point to the track and sector of the next block in the chain. The problem is that this means directory entries would have an unequal number of bytes depending on where they fell by chance in the directory. The first entry in a block would have only 30 bytes, and the following 7 would have 32 bytes each. How can those first 2 bytes be useful? If a directory entry gets moved to the first slot in a block those first 2 bytes are already spoken for.

To account for this, the first two bytes of each 32-byte region are reserved. Aka, they aren't used. (7 * 2 = 14) So 14 bytes are wasted per directory block. That leaves 30 bytes per directory entry. The standard layout of the bytes in a directory entry are as follows. This is taken from the CMD HD User's Manual (Appendix C.) I'm not sure how far back in time the timestamp convention was a standard, or if CMD devised this themselves. But, it indeed became a de facto standard.

Typical directory entry, this is from the CMD HD User's Manual

Let's briefly go over these, to see what sort of metadata a file can have in a typical Commodore file system. (Remember, all of these offsets start 2 bytes in from a 32-byte aligned region within a block.)

The 0th byte is for the CBM file type. This isn't the same as a traditional datatype, and it doesn't have a direct analog on other file systems. There are only 7 types, some of which are rarely if ever used. Because there are so few of these, they fit within the first 3 bits of this byte. So, what are these for? They each have implicit information about how the data is stored within. They are sort of like compact or efficient flags.

• DEL is for deleted. The directory entry exists but it has no associated data. These are usually used to create visual separators in a directory, or by demo coders to create directory art. There is no way to create directory entries of this type using DOS commands alone. They can only be created by using special software to manipulate the directory structure.

• SEQ is for sequential. The content of this file is just any sequential data. It could be a text file or it could be an MP3, or any other sequentially structured data. It is not limited to being human readable. However, it is not executable and ergo it is not loadable using the KERNAL's LOAD routine. SEQ files can be arbitrarily long. They could be megabytes long, for instance, if the partition/disk had the available capacity.

• PRG is for program. The content of these files is usually executable. It's usually program code, but not always. In some cases it is special data which must be loaded into a specific region of memory. These files can be loaded using the KERNAL's LOAD routine. In order to be loadable, the first 2 bytes of the data on disk are a 16-bit memory address whither the data following the load address must be loaded.

The KERNAL's LOAD routine can either use the 2-byte load address or not, but if it ignores it, it skips those 2 bytes and the executable data begins immediately following them. In practice these files are not arbitrarily long. They can't exceed the memory capacity of the machine. This means they cannot be more than 64K. But really, they can't be longer than 64K minus their load address. If the load address is $C000, they can't be longer than $FFFF - $C000, or $3FFF. Otherwise they'd fall off the end of memory, or they'd wrap around to $0000 and bad things would probably happen.

• USR is for user. Outside of GEOS, these files are rarely seen or used. This effectively means special. USR is only used when neither PRG nor SEQ quite fits the bill. This is the reason that GEOS files are USR files, because they make use of its custom VLIR format. The structure of a VLIR file is not sequential. It is more like a Macintosh file with its resource and data forks than it is like any typical C64 file. Therefore, the structure of a USR file is undefined or custom.

• REL is for relative. Relative files are structured in a non-linear, record-based format, officially supported by CBM and CMD DOS. Support for relative files is spotty on other devices. These files serve a purpose but are quite rare.

• CBM is for the 1581's sub-partitions. I have literally never seen a file of this type in my life. And something tells me very few Commodore 8-bit users ever have. These are only supported on 1581 drives, and CMD devices inside 1581 emulation partitions.

• DIR is for the native mode subdirectories found on CMD devices. CBM FS does not support this type. The bytes that typically point to a file's first data block, point instead to a directory header block of a whole other directory chain. A subdirectory chain is structured almost identically to the root directory, except that a subdirectory also contains a pointer to its parent directory's header block.

The CBM file type only uses the lower 3 bits of this byte. The upper two bits are used as two additional flags. The high bit indicates whether the file is properly closed or not. When a file is opened for write or append, this flag is lowered. This allows the DOS to know something has it open, and can (in theory) prevent it from being opened a second time. This flag will be raised again when the computer sends a command to close the file. All open files are closed automatically when channel 15 (the command channel or error channel) is closed.

If however the drive or computer is restarted, or the program crashes, or the disk is removed and the file was never closed, the low-state of this flag is indicated in the directory as a splat file. The only proper way to deal with a splat file is to validate the disk or partition. Writing to disk while a splat file exists could result in a loss of data.

NOTE: If you open a file for write and then list a directory, you will see the splat (asterisk) symbol next to the new filename. This is not a problem. The file is legitimately open. Simply closing the file removes the splat. The splat is just an indicator that the file is open for write. It's only a problem if the file is open, but the computer or drive has lost track of it and no longer has the ability to close it properly.

Not every splat is a "splat" file.

The second highest bit is used to lock the file. A locked file is indicated in the directory by a < character. While a file is locked, it cannot be scratched. It can, however, be replaced if a file with its name is written using the write-with-replace DOS syntax. (@ symbol at the start of the path.) This feels like a bug to me, frankly.

All of that is packed into the first byte of a directory entry.

The next two bytes are the track and sector pointer to the first block of the file's data. What exactly this means depends on the file type. If it's a DIR for example, the block pointed to is the header block of the subdirectory. If it's a PRG or SEQ file, it's the first data block in the sequential chain of blocks that makes up the file. For a REL or USR file it points to something relevant to those file types.

Next we have the filename. 16 bytes, more than half (53%) of the directory entry, is dedicated to the file's name. 16 characters is better than 8 characters plus a 3 character extension, but it's not as much as, say, the 31 characters the old Mac OS Finder could use on HFS. It's also not anywhere near as much as what can be used by VFAT. So this is very relevant to how SD2IEC relates to our Commodore files.

It is important to note that there are very few restrictions on what characters can be used in a filename on CBM FS. More or less any byte that you can manage to pass through BASIC can be part of the filename. And you can concatenate many strange characters into a string with CHR$(). There are, however, a handful of characters that will be problematic. "$" is interpreted by the LOAD routine to load a directory. For non-PRG files, and for PRGs if they are opened via a data channel (2 to 14) even $ can be used in a filename or as a filename. Comma and equals symbols are interpreted with special meanings in the DOS syntax. And ? and * are used as pattern matching wildcards for one or multiple characters. A quotation mark will also probably lead to some complications.

I had trouble creating a file with $ as the name. But no trouble creating a file with $ in the middle of the name. Then I had no trouble renaming the file to just $. But, you should still probably avoid it.

Other than those ($,=?*"), the others are fair game. Space, slash, colon, period, multiple periods, etc. are not an issue and can be used in a filename without further complication. But here's what's even more interesting. PETSCII graphic symbols in a filename, are not only acceptable, they are common practice on the C64. More generally, of course, almost all filenames generated by the C64 are in PETSCII. The reason is because, when you type a string in BASIC, that string, naturally, is in PETSCII. There is no magic transformation, the PETSCII string is just sent to the drive and it writes it to the disk. This is also highly relevant to how SD2IEC will interact with the C64.

To summarize, C64 filenames are 1 to 16 characters long, they are case sensitive, and they are in PETSCII, as they should be. The name of a subdirectory is subject to exactly the same features and limitations as the name of any other file.

NOTE: Almost all C64 filenames are in PETSCII. However, GEOS and Wheels use ASCII for their filenames. This produces a very strange effect on the C64. Lowercase ASCII values fall within an undefined block of PETSCII. BUT, the CHROUT ($FFD2) KERNAL routine maps these values to the uppercase PETSCII block before outputting to the screen. What this means is that you see the filename in inverted case (stereotypical of a PETSCII/ASCII mix-up), but if you try to interact with the file from BASIC by typing the name as you see it, that doesn't actually work. See here:

Trying to open a Wheels directory from BASIC

You see "gEOwRITE dOCS", but the sequence of bytes in the filename on the disk are PETSCII "g" followed by 2 bytes of ascii that can't readily be typed from BASIC, followed by PETSCII "w" followed by 4 more bytes of ascii. And so on. The result is that even though it appears (by the idiosyncrasies of screencode conversion) that you're typing the name correctly, it doesn't match and you get a file not found. So, if you create a file or a directory with GEOS/Wheels, you can't scratch it, or rename it, or even change into that directory from the READY prompt. Thanks GEOS!!

Continuing on, the next 4 bytes are very rarely used. Three of them are specific to REL files, and are unused if the file is any other type. The 4th of these 4 bytes is totally unused. As mentioned earlier, REL files are quite rare, and REL file support is spotty on devices other than those from Commodore and CMD. For example, SD2IEC has only partial support for REL files.

Partial relative file support is implemented. It should work fine for existing files, but creating new files and/or adding records to existing files may fail. Relative files in disk images are not supported yet, only as files on a FAT medium. SD2IEC User's Manual

The following 5 bytes are a timestamp. Year, Month, Day, Hour, Minute. There is only a single timestamp. It is the time the file was created. Timestamps are not supported by Commodore's own drives. On Commodore's drives two of these bytes are used as temporary pointers to the new data when a file is being overwritten. This functionality has been moved (or removed altogether) in later devices. And these bytes were taken over by CMD DOS to hold the timestamp.

CMD DOS has no concept of a file having a modification date that is independent from its creation date. If you open an existing file with the append flag, write some data, and close the file, the timestamp remains unchanged. If you overwrite a file the timestamp will be updated. Because, it's effectively a new file that just happens to have the same name as the old file.

The year is only two digits, and the precision of the timestamp is to the minute, not to the second. Additionally, the timestamp for a file is not transmitted from the computer to the device. The DOS on the device must support timestamping, and it must have RTC hardware built in, whence it retrieves the date and time when it needs to write it to a directory entry. CMD drives and the IDE64 have RTCs and support timestamping. sd2iec (the firmware) supports RTCs and timestamping, but only certain models of SD2IEC hardware include an RTC.

This has several implications that may not be immediately obvious until they get pointed out. If you copy a file from, say, an IDE64 to, say, a CMD HD, it is not possible to create the new file on the CMD HD with the same timestamp it had on the IDE64. The timestamp, even if you read it from the IDE64, cannot be passed to the CMD HD to assign it to the new file. The new copy of the file is automatically assigned the current date/time according to the CMD HD's RTC and that is the end of it. Here's another example. It is mostly useless to retain a file's timestamp in an archive file. Because when a file is extracted or unarchived and a new file produced, you cannot restore the timestamp on the new file to what it was back when it was archived. Obviously, when I say "can't," I mean without sector editing the directory entries manually. You can do anything if you're manually editing a sector. But the DOS provides no facility for setting an arbitrary timestamp on a file.

The last two bytes of the directory entry list the number of blocks that the file's data occupy. That's a 16-bit size, measured in 256-byte blocks. The largest filesize that these bytes can represent is thus (just under) 16MB. This size information does not have precision down to the byte but to within 256 bytes, as all blocks must be filled except the final block, which may be (likely is) only partially full.

It's actually a little bit more complicated than this. Sequential files (SEQ, PRG and possibly USR) chain their data blocks together in the same way that directory blocks do. The first two bytes of a block are the track and sector pointers to the next block. That means that each block contains only 254 bytes of data. A file's actual byte size (to within a precision of 256 bytes) is its block count * 254.

The only way to get the exact byte size is to follow the chain, one block at a time, to the final block. The track and sector pointers take on special meaning in the final block. The track byte is zero, indicating that this is the final block in the chain. And the sector byte is an offset into this sector to the last byte of data. Thus, the number of bytes in the final block is the value in the sector byte minus 1. (Right? If there is only one byte of data in the block, its index is 2. So the sector byte will hold the value 2.)

In my experience, it's rarely necessary to know the precise size of a file, down to the byte.

SD2IEC, and the dance of compatibility

That's a good summary of what we need to know. Now we can think about FAT16, and we can think about VFAT, and PETSCII and all such good things, and see how things go off the rails, and what tradeoffs are to be made to balance our interests of interchangeability with a PC or Mac and compatibilty with the C64.

Disk Images

Let's get the easy stuff out of the way first. Probably the most common use of the SD2IEC is to transfer .D64 disk images from the net via PC/Mac to your C64. When you do this, it really doesn't matter how the name of the .D64 file itself is handled, nor if or how it gets mangled. The .D64 file is an image of a CBM FS disk, and SD2IEC can mount that image. Once an image is mounted, working with files (and directory entries) is virtually identical to working with the original medium, because the data structures and the track and sector layout are identical.

There are other compatibility issues that pop up if the program tries to directly access features of the 1541's DOS or memory itself. But those issues are beyond the scope of this discussion. Critically, if a filename has a PETSCII filename inside the .D64 file, the intermediate PC that transfers the .D64 file doesn't care, because it's all just data inside the file. Things like the CBM file type, file lock, block count, etc. come through in the .D64 perfectly. The logic of this can be applied to .D71, .D81 and .DNP (CMD Native Partition) images just the same.

Although .D64 images are much more common, these other disk images are also supported and they perfectly maintain all the specific details of CBM FS and are therefore quite compatible. The BAM is used, the directory entries are chained, the blocks are 256-bytes and referenced by track and sector. And so on.

General Compatibility Issues

What happens when you're not using a mounted disk image? There are two main areas of concern. The filename and the CBM file type. Let's start with the file type, because, it is much simpler to think about.

CBM File Type

SD2IEC has numerous settings which we'll get deeper into a little later on.

Let's say you create a plain text file on your PC, and you call it myfile.txt. Then you put that on an SD Card and put it in your C64. There is no way for the C64 to know what CBM file type this file should have. It could try to divine it from the .txt extension, but it doesn't do this. Instead, by default, the filename will appear in the C64's directory as myfile.txt, but its CBM file type will be PRG. It presents as PRG only because that is the most common type on the C64, a program, and so it seems a reasonable default.

In the above example, we create a file, test.txt, and explicitly indicate it should be a SEQ file (with the ",s") but when we list the directory, it is still just a PRG file. In this most basic mode the CBM file type is simply not represented at all, and PRG is just the default.

Bear in mind, you can open a PRG type file as a sequential data file on a C64. You can issue a command like:

open2,8,2,"myfile.txt"

And then proceed to read one byte at a time from the file. If the file you open like this really is a loadable program file, the first 2 bytes you'll read are the 2-byte load address, because they are the first 2 bytes of the file on disk.

You can do this on a traditional CBM or CMD device too. You can read a PRG type file as though it were a SEQ type file. If, on a CBM or CMD device, the file is SEQ type, though, you cannot load that file. The drive will give an error:

64, file type mismatch

Therefore, it seems a reasonable decision to default a file to PRG that really ought to be SEQ, in the absence of any way to know that it's SEQ. But, it's not exactly perfect. What if a program, like a text editor, loads in a directory of files that it allows you to open, but loads the directory with a CBM file type filter on SEQ files, like this:

$:*=s

Now all of a sudden the program cannot find myfile.txt and doesn't offer it as an option to read in because it (justifiably) assumes it's an executable program. That is after all the point of the CBM file types.

The first configurable setting, and probably the easiest to understand, is called file extension hiding. If you turn this setting on, and load a directory, SD2IEC will check filenames for an extension that matches the 4 main CBM file types (PRG/SEQ/USR/REL, case-insensitively). So, if you create a file on the SD Card with a Mac or PC called myfile.txt.seq and move the SD Card to your C64, the directory will show myfile.txt with a CBM file type of SEQ. That's cool! But it's also not perfect. Firstly, because on your PC the file now has an extra (meaningless) extension. And secondly, it's somewhat unreliable for distribution of files to C64 users. What looks like myfile.txt of CBM file type SEQ on your SD2IEC, when you pull the SD Card out and stick it in your friend's C64, it may look like myfile.txt.seq of CBM file type PRG! Why? Because the interpretation depends on the local settings of that SD2IEC device, and they could be set differently. You may have file extension hiding turned on, but your friend may have that setting turned off.

Even if you're not going through the medium of a PC, even if you're just moving a file from one C64 to another by means of an SD Card, this issue could pop up. Simply put, this is a complication that does not exist when you pull a 1541 diskette out of your disk drive, and shove it into the 1541 disk drive of the guy sitting next to you at the swap meet.

This is the essential problem of CBM file types, we'll return to these a bit later.

C64 Filenames

Before introducing additional settings available on the SD2IEC, let's talk about filenames and the issues they raise.

The fundamental difference between the C64 and a PC or Mac, when it comes to sharing anything textual, is PETSCII vs. ASCII. PETSCII and ASCII partially overlap, and partially don't. True ASCII is only 7-bit. (This is sometimes referred to explicitly as 7-bit ASCII.) There are only 128 defined values and they fit in the lower half of an 8-bit number. PETSCII on the other hand has always used all 8 bits. Numbers and symbols (32-63) are the same in both character sets. Lowercase PETSCII (65-90) corresponds with uppercase ASCII. Lowercase ASCII (97-122) is a range that in PETSCII is undefined. Meanwhile, uppercase PETSCII is in the upper half of the 8-bit range (193-218), and this area is not only not defined by ASCII but it's not even considered to be ASCII.⁶

So, we have a problem. If you create a file on your PC and name it myfile.txt in lowercase ASCII, those byte values on the C64 will not correspond to anything in PETSCII. But, surely these are the most common characters in filenames created on a PC or Mac. We have a similar problem when naming files on the C64, though it's not quite as severe. Surely most filenames will be mostly lowercase PETSCII. (That is you type the filename using letters A to Z without employing the SHIFT key. These will actually appear as uppercase glyphs when the C64 is in uppercase/graphics mode, but when you're in uppercase/lowercase mode they will appear in lowercase.) These will at least correspond with uppercase ASCII, and so at the very least they would appear meaningfully in ASCII. But if we name a file on the C64 and include an uppercase letter, it will fall outside the range of ASCII on the PC.

To handle this fundamental existential incompatibility, SD2IEC automatically and transparently translates filenames between PETSCII and ASCII. That is the rule, but as we will see, it gets a fair bit more complicated. Filenames sent from the C64 to SD2IEC are translated from PETSCII to ASCII. And filenames read from the SD Card by SD2IEC are translated from ASCII to PETSCII before being sent to the C64. As far as I can tell, this translation only applies to the code blocks that contain the alphabetic characters. That means, when translating PETSCII to ASCII:

Any character in block 3 (lowercase PETSCII) is translated to block 4 (lowercase ASCII)
Any character in block 7 (uppercase PETSCII) is translated to block 3 (uppercase ASCII)

When converting the other way, from ASCII to PETSCII, it's exactly the reverse.

Any character in block 4 (lowercase ASCII) is translated to block 3 (lowercase PETSCII)
Any character in block 3 (uppercase ASCII) is translated to block 7 (uppercase PETSCII)

Every other character is left untranslated. (The results of this, we'll discuss later.) For a clearer understanding of what's going on here, I recommend using the post, Commodore 64 PETSCII Codes, as reference, and comparing it to asciitable.com.

8.3 filenames, 16 character filenames, and Long Filenames

As discussed earlier, FAT16 under MS-DOS only supports 8 character filenames with a 3 character extension. CBM FS, on the other hand, supports 16 character filenames with no explicit concept of a data type extension (don't confuse data type and file type.) While FAT16 with the VFAT layer introduced by Windows95 supports filenames much longer than just 16 characters. The interaction of these three facts leads to complex and confusing behavior.

In Window95 (and all subsequent versions of Windows), if a filename happens to conform to the limitations of 8.3, then the filename will be written using the 8.3 data structures rather than using VFAT long filenames. The reason for this is sensible. If you call your file myfile.txt in Windows95, it will look perfectly normal and be perfectly legible by MS-DOS. That's good. The SD2IEC on the C64 does the same thing. If you name a file myfile.txt on your C64, (the text will be translated from PETSCII to ASCII) and then it will be found to conform to the 8.3 limits and will be written to the 8.3 data structures only. So far so good.

There are two questions, what are the 8.3 limitations? And what happens if we name a file that violates those limitations? The obvious limitations are A) no more than 8 characters before the dot (.) and B) no more than 3 characters after the dot. But there are other limitations, for example, the 8 character filename may not itself contain a dot, nor may it contain a space. And there are a few other illegal characters too. All of these limits can easily be, and frequently are, broken by C64 filenaming standards. We frequently have filenames with more than one dot. Or filenames with spaces. Or filenames that are longer than 8 characters. In any of these cases, the VFAT solution kicks into gear, and our C64 filename is (translated from PETSCII to ASCII first, and then) written into the VFAT long filename.

On Windows or a Mac, whenever a long filename is created a corresponding (but mangled) 8.3 short filename is also created for backwards compatibility with MS-DOS. This happens too on our C64. SD2IEC will create a long filename to hold your C64 filename, and it will autogenerate an 8.3-conformant short filename. Okay so THAT all so seems to make pretty good sense. This is not so hard, right?

What happens now, if we create a file on the SD Card using a PC or Mac? These computers don't have the 16-character CBM FS limitations. If we make a file on a PC and we give it a 20 character filename, for instance, what happens when we view the file on our C64? Because the 20 character filename violates the CBM FS limitation of 16-characters, SD2IEC presents us a directory with the mangled, 8.3 short filename that the PC autogenerated! This does make sense, but it's complicated. The best way to name a file on the PC for it to look good and meaningful on the C64 is to constrain yourself to only using a maximum of 16 characters.

But here's something unexpected. You can name a file using your C64 that is longer than 16 characters. The full long filename will actually be preserved by SD2IEC into the VFAT long filename, the mangled 8.3 shortname will be autogenerated too, but when you list the directory on the C64 it will show you the 8.3 shortname. You can also rename a file to a name longer than 16 characters. But you can never see that filename because the directory listing will always show you the 8.3 shortname.

Renaming a file to have a filename longer than 20 characters.

How is it possible to write a filename longer than 16 characters? Well, it's pretty easy. BASIC doesn't have any clue about DOS syntax, it's all just a generic BASIC string that is sent to the device to be interpreted. So you can do this:

open2,8,2,"this is a very long filename,s,w"

And bingo bango, it will create a filename with 28 characters. When you stick the SD Card back into your PC, you'll see the full 28 character filename.

Case Sensitivity, Insensitivity and Preservation

There is another limitation to 8.3 filenames we haven't talked about yet. This is the issue of case sensitivity. In ordinary FAT16 all filenames are technically uppercase only. By the way, if you want to just read about how VFAT interacts with FAT16 and how the 8.3 names are limited and how long filenames get mangled by Windows into shortnames, read the Wikipedia article 8.3 filename. It's not that long and it's written in very accessible language. 8.3 filenames are all uppercase. MS-DOS for example, lists filenames in directories in all uppercase. However, usually when creating a file, say, in MS-DOS, you can type the filename with mixed case, but it will be translated to all uppercase before being written to the file system.

This leads to another problem on both PC or Mac and on the C64. I said above that if a file is named such that it happens to conform to 8.3 that the 8.3 data structures will be used, and no long filename will be used. That's true, but we didn't talk about case. On a PC or a Mac if you name a file, MyFile.txt, although it only has 6.3 characters, it has mixed case that cannot be preserved by the 8.3 format. So the PC will create a (so called) long filename, even though it's not very long, just so that it can preserve the case. The 8.3 shortname will become MYFILE.TXT. MS-DOS will see it as MYFILE.TXT. On the C64 then, SD2IEC sees that the file has a long filename and it doesn't exceed 16-characters, so it gives you the long filename, which preserves the case (and it translates from ASCII to PETSCII), such that on your C64 you'll see the file named MyFile.txt. That's great!

But, there is a bewildering asymmetry here. If you create a file using the C64, and the filename conforms to 8.3 in every way except for case no VFAT long filename will be produced, and the variation in case will not be preserved. In other words if you name a file on the C64 MyFile.txt, it only has 6.3 characters, it doesn't include spaces or extra dots or other illegal characters, but it does have mixed case. The mixed case doesn't matter, only the 8.3 filename will be written to the FAT16 file system, and effectively the mixed case you used on the C64 will be lost. In this situation, the behavior of SD2IEC and the behavior of Windows or macOS when creating files is simply different.

App Launcher showing short filenames without mixed case.

This was the first time I noticed SD2IEC has filenaming issues that need to be taken into account. On the Desktop above, "gallery", "peek", "memory", "chess" and "broken" all have a capital first letter on my CMD HD. But because they're less than 8 characters and have no spaces, they get saved in the 8.3 format only, and lose their case on SD2IEC.

Here's the next exception I've discovered while trying to figure out how all this stuff works. When a file in the FAT16 file system has only the 8.3 shortname, that name is in all uppercase. And when viewed in the directory on Windows or macOS, it displays as all uppercase. You can reliably create a file that has no VFAT long filename in a few ways:

Create the file using MS-DOS on a PC.
Create a file and name it conforming 100% (including case) to the 8.3 limits on a Mac or a PC running Windows.
Create a file and name it conforming to 8.3 (but ignoring case) using a C64 and SD2IEC.

Now that you've got your file with shortname only, stick it in your PC or Mac and it shows the filename in all uppercase, just as the specification says. The problem is that if uppercase ASCII gets translated to uppercase PETSCII then in the default C64 mode (uppercase/graphics) it would show in all graphics characters. So, because the file is all in one case anyway, SD2IEC makes an exception for these files, and translates them from all uppercase ASCII to all lowercase PETSCII.

Holy mackerel, how anyone can figure out and keep straight all the rules is something of a minor mystery. The only good thing is that it does make a modicum of sense, from the perspective of the C64. What would make the most sense would be for SD2IEC to do exactly what Windows and macOS do. If the case is mixed, that's a violation of 8.3, so put the mixed-case filename into the VFAT long filename structures. But, it doesn't do that. It just creates an all uppercase 8.3 shortname, then translates it to all lowercase for our reading convenience.

That small difference aside, the reason it still makes sense to make an exception in how the filename is translated, is because you might also get files that were originally created on a PC that are 8.3 for some other reason. On the C64 lowercase PETSCII is way way way more common than uppercase PETSCII, because when you turn a C64 on it defaults to uppercase/graphics mode. So all lowercase PETSCII displays in uppercase glyphs and most people therefore just type away and never use the SHIFT key. All BASIC programs, for example, are in lowercase PETSCII only (except for the contents of strings, which is technically data not program.) So in MS-DOS it might have made sense that filenames were in uppercase, but it doesn't make sense to translate them to uppercase PETSCII. Otherwise, most people, most of the time, under default startup conditions, would see a bunch of graphic symbols instead of letters.

There's another twist though. FAT16 and VFAT are case-insensitive. Wait, what??! What does it even mean to say they are case-insensitive, when we have just learned that 8.3 in FAT16 is all uppercase and that VFAT preserves case? Here's what it means. Even though VFAT preserves the case that you typed in, you cannot have two files in the same directory whose names are the same except for a difference in case. In other words these three files:

MyFile.txt
MYfile.txt
myFile.TXT

are considered to be all the same name. You can only have one of these in a directory at a time. And when you specify myfile.txt and try to open it, it will match and open a file named MyFile.txt. And the same goes for FAT16 8.3 names. If you name (on your C64 with SD2IEC) a file MyFile.txt and it gets written to the file system as MYFILE.TXT and it gets listed in the directory as myfile.txt but you try to open MYfile.TXT, it's still going to find and open that same file.

This represents another fundamental incompatibility between FAT16 and CBM FS. CBM FS is fully case sensitive. On a 1541 or a CMD HD, for example, you can have two files in the same directory that have the same filename but which differ only by case. And when you open a file, you must specify the filename in precisely the correct case. Fortunately, it is rare (and perhaps bad practice) to have different files which are distinguishable from one another by case alone. It does have a few odd side effects though. For example, if you see a file named "mY filE.TXT" and your OCD hates that and all you want to do is normalize the case already, you try this:

@r:my file.txt=mY filE.TXT

This works on CBM FS, but on SD2IEC it does not work. It results in the error:

63, file exists

You have to rename the file to some temporary alternative name first, then rename it back with the case you want it to have.

If you don't know about everything that I'm telling you here, what you experience while using your SD2IEC may be totally inscrutable. For example, you create a file, MyFile.txt. It seems to work, you don't get any errors. Later, you try to open the file MyFile.txt, and that works too. No problem. But when you list the directory you see myfile.txt. Wat?!⁷

The experience can get even less scrutable though. You create a file, My File.txt. Then you list the directory and it shows My File.txt. That's absolutely perfect. Then you rename the file to File.txt, list the directory and suddenly it's file.txt. And it's like, hey, what the hell happened to my capital "F" !? Well, now you know. When you had a space, the space doesn't conform to 8.3, so the filename is stored in the VFAT long filename, which preserves the case. But when you renamed it, the new name does conform to 8.3, and so it stops using VFAT and that blows away the case! (See following screenshot, I'm not making this up.)

Renaming a file from non 8.3 to 8.3 conformant, case is lost.

Magic to make the sanest man go mad. Homer, The Iliad. — 762 B.C.E. (Sorry, not Star Trek: Discovery.)

Compatibility Settings

I've started by explaining what happens with filename translations, long and short names, PETSCII v. ASCII, case-sensitivity, and touched on CBM file type handling without saying much about the available compatibility settings. You can think of what I've described above as SD2IEC's raw behavior. It is useful to know how it works, so that you can predict the outcomes of your actions. But the rules seem quite complex. Must we always deal with this complexity? In a word, no. There are options available. I would say that this raw mode is most useful when your goal is not to have maximum C64 compatibility and familiarity, but when your goal is to have more generic, more constrained filenames that you can readily exchange with a PC or Mac with the minimum of pain.

In essence, when you go into this raw mode, the SD2IEC behaves much more like the file system of a real PC. The automatic PETSCII/ASCII translation eases that particular burden, but all the other behaviors, constraints and limitations are the sorts of things that PC users, straddling the transition from MS-DOS to Windows95, dealt with as a matter of course. In its raw mode, SD2IEC becomes much more a citizen of the PC world, warts and all.

Let's turn now to the compatibility options, how they get applied, when they get overridden, and how they change the raw behavior to be more like CBM FS.

File Extension Hiding

Despite the thorny rules for filenames in raw mode, the preciseness and predictability of a file's name is only relevant when you have a complex multi-file program, where auxiliary files are going to be referenced programmatically, by name. C64 OS is, of course, just such a complex multi-file system. And that's what led me down this rabbit hole in the first place. So filename precision is important, but it isn't always important. Sometimes the files are just a bunch of standalone files that you'll reference manually. And if the filenames get mangled a little bit you just deal with it using your eyes and your brain.

Games are another common example of multi-file programs where the filenames really matter, but most of the time games are distributed inside disk images. And since the disk image preserves everything about the CBM file system this is not an issue.

In the simple case, you've got a bunch of standalone files. Perhaps their names get mangled a little bit, like, maybe on the PC they used more than 16-characters, so you've gotta deal with the 8.3 autogenerated shortnames. But they have names, and you can work with them. The first configurable option we have is file extension hiding, which we briefly touched on above, but we'll now explore a bit deeper. Using file extensions is the least invasive way to indicate the CBM file type.

If you turn file extension hiding off (maximum rawness), all normal files end up with the PRG file type. JPEGs are PRGs, plain TXT files are PRGs, WAV files and MP3s are PRGs, HTML and XML files are PRGs, PDFs are PRGs, everything just defaults to a PRG, regardless of its filename. But if you turn file extension hiding on, all of a sudden SEQ, USR and REL file types all become possible through the very lightweight intervention of four little characters stuck on the end of the filename of an otherwise unmodified file.

Imagine you have a text editor on your C64. And imagine that it will only open files whose CBM file type is SEQ. The least invasive way to make it have a SEQ file type is to give the filename a .seq extension, and turn on file extension hiding. All other rules about filenames, constraints and limitations still apply. The file becomes slightly less a citizen of the PC world though. Because if you've configured your PC to open all .txt files in Notepad++ (or whatever), this file no longer has a .txt extension, it has a .txt.seq extension. Nonetheless, the extra extension is still the least invasive. If you use your PC's text editor to manually open the file, it's still just a text file.

This is not the case, for example, if you embed a file inside a .D64 image. Your C64 can access a text file inside the image, and the file can be given the SEQ file type, but all the PC sees is a .D64 file. You cannot open a text file with a PC text editor if the file is buried inside a .D64. That's what I mean when I say the extension is lightweight and non-invasive.

Given everything we know about the filenaming constraints though, hidden file type extensions are not entirely inconsequential. If you have file extension hiding turned off, and you create a file called MyFile.txt.seq, that name does not conform to 8.3. Therefore the filename is stored in a VFAT long filename, and the case is preserved. If you now turn file extension hiding on, and list a directory, you'll see a file named MyFile.txt of SEQ file type. It's preserved the case! But you created it on your C64 and it looks like it fits into 8.3. If you're not paying attention, it looks like this file contradicts what I said earlier about how 8.3 conforming filenames lose their mixed case. When file extension hiding is turned on, it masks certain features of the actual filename. So it may look in the C64 directory like it conforms to 8.3, but in fact, it has an extra extension which forces it into a VFAT longname.

Another consequence. Imagine you created a file on your PC, and you name it, My C64 Rocks.txt.seq. Back on SD2IEC, with file extension hiding turned on, you list a directory and you see, My C64 Rocks.txt of file type SEQ. The filename is exactly 16 characters. If you turn file extension hiding off, all of a sudden with the extra .seq the filename is now 20 characters long. That's too long for the C64, so it lists the mangled, autogenerated 8.3 shortname. Again, it makes sense why this happens, but it adds to the complexity, and before you've figured it out, it can really make you scratch your head.

Writing Files vs. Reading Files

Before moving on to the next compatibility setting, we need to talk about the difference between reading and writing files.

File extension hiding is all about reading files. It's about what to do with a file that already exists; how to interpret it; how to present it to the user; how to allow the user to open or access it. And this feature is independent (mostly) of the behavior you want when creating a file.

If you have file extension hiding turned on, and a file has come from a Mac or PC with a .seq extension, like hello.txt.seq, then it will be listed in the C64 directory as hello.txt with file type SEQ. That much we already knew. But what happens if we create a new file on the C64, and try to make it with the type SEQ. Like this:

open2,10,2,"world.txt,s,w":print#2,"hello world":close2

The file extension hiding has nothing to do with the creation of this file. This file will be created as world.txt with PRG file type, because there is nothing to retain the SEQ file type. In other words, file extension hiding will hide a file extension (prg/seq/usr/rel) if there is one, but it isn't responsible for creating file type extensions. You can happily have two files side-by-side, hello.txt and world.txt where you cannot immediately discern which has an underlying hidden file extension and which does not.

This actually leads to more bewildering behaviors. Especially if you're not aware of how your device is configured and you're not familiar with the options. If you have file extension hiding turned on, the behavior of renaming a file will be different depending on whether the file already has one of these hidden file extensions. Get ready for this ride, it's about to get rough.

Total weirdness when renaming files on SD2IEC

It isn't just a Mac or PC that can create a file with a .seq extension and have that extension be hidden and interpreted as its CBM file type. You can rename a file on SD2IEC itself, and have that name get reinterpreted as an extension that should be hidden. In the first image, on the left, we have a file named test.txt of type PRG. Its type PRG could be because the file has a .PRG extension in the VFAT file system, or it could be defaulting to PRG because it has no hidden extension. Ignoring that, we rename the file from test.txt to test.txt.seq.

@r:test.txt.seq=test.txt

List the directory and notice the filename has not been changed. The underlying filename got changed, but it is now reinterpreted as having a hidden file extension. So the filename is still listed as test.txt but it's now listing as SEQ type. First take a moment to recognize how strange this is. If you performed these steps on a 1541 or a CMD HD, this would not happen. The filename would change to be test.txt.seq and the file type would remain PRG. So, this is another kind of incompatibility. But bear in mind, it is possible for you to see a file named test.txt.seq that has a type of PRG, even when file extension hiding is turned on. The underlying filename just has to have another extension, test.txt.seq.prg for example. SD2IEC could recognize that the file is currently PRG, and when you try to rename it to have .seq come at the end of the name, it could surreptitiously add an additional .prg extension in order to retain the PRG file type. It doesn't do that, because, as I said, file extension hiding has nothing to do with how files are renamed just how existing files are matched and presented.

Now we move on to more craziness. In the image on the right, above, we have a file named test.txt. We can't actually see that it has a hidden file extension, other than by guessing because its CBM file type is SEQ. Now attempting to do the same thing we did a moment ago, which is, rename the file to something that is a totally legitimate CBM FS filename, but that happens to collide with a different type of hideable file extension.

@r:test.txt.rel=test.txt

If you'll notice, the form of the rename command is precisely the same as the first rename command. Should we expect the filename to remain test.txt as in the first case, but change to a REL file type? Should we expect it to change its name to test.txt.rel and retain its SEQ file type, because it already has a hidden .seq file extension? No, hell no. It does something radically unexpected. It stays as SEQ type, but the name gets a S00 extension which is part of a totally different type of compatibility schema that we haven't even gotten to yet. But what's more, the S00 extensions are normally always hidden, and yet this one suddenly appears out of nowhere.

I have to believe that this is just a bug in the sd2iec firmware. I'm not sure where one is supposed to report sd2iec firmware bugs, but this has got to be just a plain jane bug.

Creating a file with a hideable file extension

Of course, this behavior isn't limited to renaming files. It also happens when creating a new file. Here, we create a file that happens to have .usr at the end of the filename, and we explicitly try to create it as PRG type (by means of the ",p" at the end of the DOS command) but the ,p is totally ignored. The filename gets .usr added to it, which is promptly reinterpreted as a hideable file extension, and the file comes back in the directory as test with the USR file type.

But you must bear in mind, these behavioral oddities are not the end of the story. This is what we get when none of the other compatibility options are enabled. We are still operating mostly in a very raw mode, that can be considered reserved for optimum ease of moving files back and forth between your C64 and PC or Mac. So we suffer some C64 incompatibilities to gain compatibility with the foreign world of a PC.

File Extension Modes 3 and 4

The file extension mode is a bit of a misnomer, in my opinion. SD2IEC has a compatibility option that you can set to configure how it will create files. It is important to note that this setting has no influence on how it reads files that already exist, only how files will be created.

There are 5 modes, numbered 0 through 4.

Mode 0 is the maximum rawness mode we've discussed up until now. Modes 1 and 2 we'll discuss in the next section. Modes 3 and 4 tell SD2IEC that it should automatically apply a file type extension to files that it creates. The difference between 3 and 4 is slight. A totally raw file, that is, a file with no indication of what file type it should be, is defaulted to type PRG. And this is because PRG is the most common file type the typical C64 user encounters. Therefore, if a file is supposed to be PRG then you don't really need a special indicator for that. What you need is a special indicator for exceptions to the default, for SEQ, USR or REL. And this is what extension mode 3 does.

In extension mode 3, if you create a file like this:

open2,10,2,"Hello World.txt,s,w":print#2,"hello world":close2

The underlying file system, (the VFAT longname to be precise, because this name does not conform to 8.3) will be Hello World.txt.SEQ. That's how the file will be created. How will it be displayed to you? Well, that has absolutely nothing to do with the File Extension Mode you're in. How you'll see this file depends entirely on the File Extension Hiding mode that you're in, right? Just as we've been discussing all along. If File Extension Hiding is turned on, then this new file will be presented to you as Hello World.txt with file type SEQ. But if file extension hiding is turned off, even though you're in File Extension Mode 3 when you create a new file, and it gets the .SEQ extra extension, you'll read the file back as Hello World.txt.SEQ as type PRG. Because the creation of files with the extra file type extension and the interpretation of those extra file type extensions are independent settings.

Almost independent. They are in fact independent settings but with one exception. When you change to file extension mode 3 or 4, file extension hiding setting is automatically enabled. That makes sense, because if you're going to create files with those extensions you'd think you'd want to interpret those extensions when reading the files back in. But the settings are actually independent. You can set file extension mode 3, and then immediately turn file extension hiding off, and you'll still be in file extension mode 3.

File extension mode 4 is the same as 3, but will automatically give the extra .PRG file type extension to PRG type files you create too.

How this interacts with 8.3 and VFAT long filesnames gets so nasty so fast. The conformance with 8.3 always depends on the full underlying filename. So, when you're in file extension mode 3, and you create a SEQ file, SD2IEC will add a .SEQ extension. If the filename you give it plus the .SEQ extension still conform to 8.3, your file will lose its ability to maintain case. For example, you're in file extension mode 3. You create a SEQ file called MyFile. The underlying file will be MyFile.SEQ, but that conforms to 8.3 (in all ways but case) so the underlying file will be named MYFILE.SEQ, and you'll see it as myfile with type SEQ, but the case you used is lost.

If on the other hand you create a file of type SEQ and you name it MyFile.txt, the underlying file is automatically named MyFile.txt.SEQ which violates 8.3 and what you read back is MyFile.txt of type SEQ, and the case you selected is preserved.

Now you have to think about what happens in file extension mode 4. It's similar, but it creates the file extension for PRG files too. This will have implications for how case is maintained.

The Problem of File Type Extensions

I already mentioned one problem before. It's not 100% non-invasive. You're in file extension mode 3 and you open a text editor on your C64. You type some text and save the file. You type in Hello World.txt for the filename and that all works as expected. But when you move the SD Card back to a PC, you don't just have a file called Hello World.txt. You have a file with an extra extension that the PC can't interpret. I already mentioned this problem.

The problem actually goes a step beyond this, though, in a way that actually matters. You might be thinking:

Yeah yeah, okay, so Hello World.txt looks like Hello World.txt.SEQ on my PC, big deal. The contents of Hello World.txt.SEQ are all in PETSCII anyway, so I wouldn't be able to edit it in Notepad++ even if the extension still was .txt Some Fictional Commodore 64 User

That's true. But what happens when you use the PC to pass the file on to another C64 user? You put the file on the web, or on an FTP server, or on a BBS, or in an email. Or you shoot it over to the C64 using PCLink and an IDE64. Or, you include that file in a zip file and you shoot the zip file to the C64 in any of countless ways, and then you unzip the file using one of several unzipping programs on the C64. What happens then?

You download that file from a BBS to a CMD RamLink... and the file has a .SEQ extension! Why? Because only SD2IEC actually supports hiding file type extensions. In other words, it isn't just that you shoot the file over to a PC and the filename has been mangled on that PC, but the filename will be mangled on other C64s if you just pass the file through.

There is no good solution for this. It's not SD2IEC's fault. If you remove the .SEQ ending manually on the PC, then put the file on an FTP server and download the file with a C64 FTP client to a 1541 disk drive, you might get the right filename, but you've got the problem all over again of what CBM file type to assign to it. Only this time, it's up to your FTP client to do the right thing. Either interpret the data type extension (.txt, that's sequential data right? Let's save it with SEQ type.) or ask you explicitly to choose a file type for each file like Errol Smith's Unzip64 does.

File Extension Modes 1 and 2

Finally, we arrive at file extension modes 1 and 2.

PC64 was one of the very first C64 emulators for the PC. The original PC64 ran on MS-DOS on a 486 PC. It is no longer under development, and it has many flaws and shortcomings. However, one thing that they figured out right away is that the CBM file system and FAT16 were very different. PC64 needed a way to work with proper C64 filenames and file types inside the limitations of FAT16. And those limitations were extreme, because this was even before the development of Virtual FAT which didn't arrive until Windows95.

PC64 came up with a solution that I think is very clever. PC64 files are prepended with a 26-byte header. The first 8-bytes are a null-terminated string "C64File", in ASCII, which helps identify it as a PC64 file. The following 17-bytes are the original C64 filename in PETSCII, null-terminated. That's upto 16 bytes for the filename characters and one byte for the null terminator. Unlike how $a0 pads the end of the filename on CBM or CMD file systems, the filename in this header is instead padded with $00. That's so that if the name is shorter than 16 characters, it is still null-terminated. Lastly, there is one more header byte which is only used if the file type is REL. It's the record length. If the file type is anything but REL this byte will just be $00.

And that's it. The rest of the file data is the C64 file's original unmodified data stream. (REL files are complicated, let's ignore them for the sake of brevity. If you want to know more about how they are handled, you can read the full specification of this file wrapper format here.)

Wait, wait. What about the CBM file type? The CBM file type is retained as part of the FAT16's 8.3 file extension. P, S, U or R. The 8 character filename in FAT16 doesn't really matter, because only the C64 filename from inside the 26-byte header gets exposed to the C64 itself. Usually the 8 character filename will just be based upon the first 8 characters of the C64's filename, with minor modifications, such as illegal characters being changed to underscores. But there is a small problem with this too. What happens if you have many C64 filenames that all begin with the same 8 characters, and only become unique in the latter part of the name? Any new 8.3 filename that would be created, but that would collide with an existing 8.3 filename increments a 2-digit number in the 8.3 file extension.

Let's see how this works:

C64 Filename	C64 Filetype	8.3 Filename
My Great File 1	SEQ	MY_GREAT.S00
My Great File 5	SEQ	MY_GREAT.S01
My Great File!!	SEQ	MY_GREAT.S02
Asteroids A	PRG	ASTEROID.P00
Asteroids B	PRG	ASTEROID.P01
Asteroids Save	PRG	ASTEROID.P02

The 8.3 filename extension thus contains one letter for the CBM file type, followed by a sequential number from 00 to 99. The number is only incremented when necessary to differentiate two files where the first 8 characters are otherwise the same.

The PC64 file wrapper is incredibly lightweight. Besides the "C64File" identifier string, there isn't a byte wasted. It perfectly retains the original C64 filename, regardless of what PETSCII characters it uses, because as far as the PC is concerned the PETSCII name is embedded in the file's binary data stream.

This format caught on. All the C64 emulators, such as VICE and VirtualC64 and others support transparent wrapping and unwrapping of these files. What that means is that although the PC file system may have some filename like ASTEROID.P01, the file is presented to the emulated C64 environment as Asteroids B automatically. And, better yet, any change to the filename is transparently written back into the PC64 file header. It's brilliant.

Now, for the good news.

SD2IEC also supports the PC64 file format. SD2IEC also does transparent unwrapping of PC64 files. And not only that, this behavior is always on, and cannot be turned off. Bear in mind, reading and writing files are different. The always on is for reading PC64 files. In other words, if there is a PC64 file on your SD Card, SD2IEC will always work with it. It will always show you the C64 filename from inside the header.

What file extension modes 1 and 2 do is turn on the creation of PC64 files. Like the difference between modes 3 and 4, mode 1 creates PC64 files for SEQ, USR and REL files but not for PRG files. While mode 2 creates PC64 files for every type of C64 file.

C64 Compatibility and PC64 Files

PC64 files are designed for maximum C64 compatibility. Ergo, everything complicated about filenames, case sensitivity, long names (in excess of 16 characters) being presented as 8.3 filenames, misinterpreted multibyte UTF-8 characters, PETSCII and ASCII automatic translation, and so on, none of this is relevant when SD2IEC is working with PC64 files.

Breathe a giant breath of relief.

Remember: file extension modes are only about how new files are created. If a file is a PC64 file—whether you created it or whether it came from a PC—it doesn't matter what file extension mode you're in, that particular file will be worked with as all PC64 files are worked with all the time. Many things are made simple and consistent with PC64 files.

Let's look at a list of them:

The CBM file type is always preserved
Case of the filename is always preserved
Two files may differ by case alone (just like on a 1541, et al.)
Opening a file requires specifying the correct case
A file rename that only changes the case will work
A filename longer than 16 characters will not be preserved
No PETSCII/ASCII translation need occur
Filenames may contain all the legal characters that CBM FS allows
Filenames can include .seq, .usr, etc. endings without unexpected side effects

All of this, everything listed above, makes SD2IEC feel and behave just like a CBM or CMD storage device. Hallelujah. There is hardly anything else to say! Turn on mode 1 to get full C64 compatibility, via a PC64 file header, for all SEQ, USR and REL files that you will create. But PRG files will remain in raw crazy-land with all the attendant limitations.

Or, turn on mode 2 to get maximum C64 compatibility for every and all files you create. If you're only going to be using these files on your C64, you'd be a crazy person to not just switch into mode 2 and leave it there.

Compatibility Considerations

There are always side effects, no matter what you choose. So let's think about the side effects of using mode 1 or 2 and creating PC64 files on your SD2IEC. Here's a plus for you and your C64: Even if you switch out of mode 1 or 2, because you want to create some files that do not have the PC64 header, all of the files that already are PC64 files continue to operate as normal. This is unlike files with file type extensions. If you turn off file extension hiding, all of a sudden a file with a file type extension gets reinterpreted, its name suddenly gains extra characters, and if those extra characters exceed 16 the name will revert to the mangled short 8.3 name. This never happens with your PC64 files. That's really good. Once you've created a PC64 file, that file will always look and behave like a proper C64 file, even when you change out of mode 1 or 2.

Here's another nice benefit. All the C64 emulators support PC64 file headers automatically. So, if you create a file in mode 1 or 2, and you transfer it to a PC or Mac and then you try to use that file with VICE or VirtualC64, then you are very much in luck. The filename will remain perfectly intact. The CBM file type will remain intact too. That is really convenient.

While we're on a roll with the benefits, here's another benefit. If you create a PC64 file on your own SD2IEC, and you pop the card out and stick it into the SD2IEC of the guy sitting next to you at the swap meet, or the tradeshow, or retro computer club meetup, guess what? You're guaranteed that that file's name and type will be presented by his SD2IEC exactly as it was presented to you. Because it doesn't matter how their SD2IEC is configured, it will always transparently unwrap PC64 files. Yes! That is so great. Remember how I said, when you pull a 1541 disk out of your disk drive and stick it in the 1541 drive of your friend, you know the filenames and types are going to look the same for him? Well, that's the reliability that you get back when you create files with PC64 headers.

Those are the benefits. What about the drawbacks? Why the heck doesn't SD2IEC always create and work with PC64 files? Well. For starters, you might put a file onto an SD Card from a PC or Mac that is not a PC64 file. That's obvious. So SD2IEC has to have some way of interpreting those non-PC64 files. And, if you rename that file with the C64, frankly, it would be quite annoying if the SD2IEC automatically converted that non-PC64 file into a PC64 file. Because as soon as you transfer the SD Card back to the PC the file would have been mangled by its journey through C64-land. So, it makes sense that SD2IEC has rules for dealing with raw files, no matter how ugly, arcane, and impossible to remember the rules are and no matter how baffling the results of those rules can sometimes be.

In short, it's like this: Do you want the rules to make sense by C64 standards? Then create and use PC64 files. Do you want compatibility with a PC or Mac? Then suck up the shitty naming disaster, and retain a usable file when you put the SD Card back into your PC.

Why is it a problem to have a PC64 file on a PC? I referred to file type extensions as lightweight and non-invasive. That's because, despite the fact that the filename has got an extra and meaningless (to the PC) extension, the data stream of the file is unchanged. A file named My Cat.jpg.SEQ on your PC still has the data stream of a JPEG file. If you try to open that file, (actually I have no idea what will happen in Windows, but) Preview for macOS has no problem opening it. It can tell it is a JPEG data stream by other means.

My Cat.jpg.SEQ opens no problem in Preview for macOS.

Given what we see above, how invasive is the extra file extension? On a Mac, it doesn't seem to be very invasive at all. And, even if your OS or a given application doesn't like the extra .SEQ extension, all you have to do is rename the file and delete that extension, and boom it's back to a regular JPEG file in every way. So, from the PC perspective, file type extensions are very lightweight and they're easy to deal with in the worst case scenario.

But what about a PC64 file? The PC64 file header is invasive. The file header is part of the file's data stream. There is no way that macOS is going to open MYCAT.S00 in Preview or any other application, as just a regular JPEG. Because, it's not a regular JPEG data stream. The data stream of the file itself has been invaded. And what's more, how do you get the original data stream back? You can't just rename it using the OS's built-in file renaming functionality.

If you do end up with a PC64 file on your PC, and you really need to get at the underlying data stream, there are probably a few ways to do it. The way I do it on macOS is with a hex editor. I use Hex Fiend. It's free on the Mac App Store, it's also open source, and it's classified as a developer tool. This should give you some idea of how not user friendly this is.

A PC64 file being edited by Hex Fiend on macOS.

Opening the file in the hex editor, you can see the "C64File" string, as well as the original C64 filename (if it's legible, it's in PETSCII remember). Select the first 26 bytes, that's how long every PC64 file header is. Hit delete, then save the file. You can then rename the file so it has a name and extension that matches the data stream's actual datatype. Or, alternatively, to preserve the orginal PC64 file, you could Save As... and create a new file.

Frankly, that's a lot of work, and it's technical work. You wouldn't want to have to do this for more than just a couple of files. So, you see, PC64 files are fantastic for compatibility between C64's with SD2IECs, and also really great for compatibility between an SD2IEC equipped C64 and a C64 Emulator. But they are not convenient at all if you need to work with the file data directly using a PC application.

There is one scenario in which PC64 files are inconvenient for real C64s. Only SD2IEC handles PC64 files. Classic CBM devices obviously don't, CMD devices don't, IDE64s don't. Therefore, imagine that you create a PC64 file on your SD2IEC, simply by creating any file while in mode 2. Then you copy the file to your PC via the SD Card. Then you try to send that file to a C64 by some other means than via an SD2IEC. Just as the C64 receiving the file had to deal with the extra file type extension, now it has to deal with some weird wrapper. Just as a PC has to unwrap the PC64 file manually, without an SD2IEC, your C64 would have to manually unwrap a PC64 file too. This is just something to bear in mind. And, as I'll discuss at the very end of this post, it isn't merely an intellectual exercise.

Subdirectory Names

Last but not least. Up until now we've only been talking about the filenames and CBM file types of actual files: PRG, SEQ, USR and REL files. What about subdirectories?

Well, thankfully, there is one thing we don't need to worry about regarding directories, and that is the CBM file type. There is only one type of directory in PC-land, and there is only one file type for subdirectories on CMD devices, IDE64 and SD2IEC. Therefore the mapping is very simple and no special metadata is necessary.

What about the subdirectory name, though? In FAT16, if a subdirectory name conforms to 8.3, that is to say, if it has only 8 characters, it will be written into the 8.3 short name data structures and... you guessed it, it loses its case. Just like with filenames, if you name a directory to have between 9 and 16 characters that name will get stored in VFAT and it will retain the case. If you name a directory with more than 16 characters, either by overloading a rename command on the C64 itself, or more likely simply by naming it with a long name on a Mac or PC, then the C64 directory will list it with its 8 character short name.

All that crap, all that confusing behavior, all over again for subdirectories. The elegant solution for maximum C64 compatibility was to use a PC64 file header which occupies the first 26 bytes of the data stream of the file. Ah, but wait a second, subdirectories aren't data files. They don't have a data stream. It is thus not possible to fall back on a PC64 header to encapsulate the 16-character PETSCII name of a subdirectory.

This has consequences. But they're tricky, they're sneaky.

In C64 OS one of the two Homebase applications is the App Launcher. The system directory contains a Desktop directory, which contains numbered subdirectories 1 through 5 for each of the 5 desktops. Inside one of those are alias files. The filename of the alias matches the name of either an application or a utility, and the data of the alias file specifies its color, position and other metadata about how the alias should be presented on the desktop. Double click the alias and its name is used to open the corresponding app. It works really well and it's very conservative. It's easy to create and delete aliases, and a cinch to move or copy individual aliases between desktops.

However, an alias is a SEQ file. And an application isn't one file, but a subdirectory bundle that follows the prescribed internal structure of all applications. Thus, the application name is actually a subdirectory name. You can probably see where this is going, but it's still got some twists. I do all my development of C64 OS on my C128 working from a CMD HD, which I recently upgraded with an SCSI2SD. Let's say I copy the complete system directory over to my SD2IEC, and let's say I have my SD2IEC in file extension mode 2 for maximum C64 compatibility. The alias file for the "Gallery" application has a capital G, but it has no other illegal characters and it's only 7 characters long. It's ripe for fitting into a FAT16 8.3 name. However, the mode 2 means this file gets created with a PC64 header and the case and SEQ type are both thus preserved.

Next, it copies over the applications directory, and all of the application subdirectories, and one of those is "Gallery". This name also fits into FAT16's 8.3, but it's a directory, it can't be made into a PC64 file. It thus loses its case and becomes "gallery". Now the alias name and the application bundle name differ by case. Here's the unexpected twist, FAT16 is case insensitive, so we double click the alias and App Launcher tries to open an application named "Gallery", this case-insensitively matches the directory named "gallery" and the damn thing opens! It actually still works.

Sort of. It sort of works. What happens if we now copy the directory tree from that SD2IEC back to a CMD HD or IDE64? The SD2IEC has lost the original case. So, when it gets copied back to an IDE64, say, it copies over as "gallery" not the original "Gallery". But an IDE64 and CMD HD or CMD RamLink, they are case sensitive. So the alias "Gallery" on the IDE64 no longer matches the application "gallery", and it doesn't work. It can't find it, it can't open it. And that is bad. Very bad. Any directory name that has anything other than lowercase characters, and which is 8 characters or less, with no other illegal FAT16 8.3 characters, is going to lose that case, and it's going to break if subsequently copied to a case-sensitive storage device.

C64 OS Distribution

C64 OS is, like a modern OS, dependent on many parts and components that are organized in a set of nested directories. Drivers in a drivers directory, desktop aliases in the desktop directory, applications in bundles which are inside the applications directory, settings in the settings directory and so on. The ability for the system to understand place, and to navigate and distribute resources across different places, is powerful and is one of C64 OS's main strengths.

But the names of files and the names of directories become critically important. You can't just have a file reference like //os/settings/:mouse.t suddenly become //os/settings/:mouse.t.SEQ! Your brain might be able to deal with that, but a system that knows what to look for—and where to find it—can't handle the file all of a sudden changing its name. Therefore, using file type extensions on SD2IEC is not at all reliable. Imagine that I use file type extensions on my SD2IEC, then I stick the SD Card in a Mac, zip up the folder, send it to you, you unzip it put it on an SD Card and put it into your SD2IEC that has file extension hiding turned off. Half the filenames will be different. That's never going to boot.

So let's say I copy my files from the CMD HD to the SD2IEC with it in mode 2. All the files get turned into PC64 files on my SD2IEC. I follow the same procedure, zip up the system folder on a Mac and send it to you again. You put it on your SD2IEC, which automatically interprets the PC64 files. Well, that's better. At least all the filenames will be right. But what if you then copy the system from your SD2IEC over to your IDE64? The wrongly cased directory names will break, at the very least, a few things.

What if you have no SD2IEC? What if I send you the zip file from above, and you need to transfer the files directly to your CMD HD or to your IDE64? It doesn't even matter where you unzip, if you unzip on the PC and transfer 200 files and directories over to your CMD HD, all the files are PC64 which neither the PC nor the CMD HD can interpret. Plus the folder names that I zipped from the SD2IEC will still be wrongly cased.

Holy cow. I honestly did not realize until my first beta testers needed to get their hands on a Beta copy that this was going to be so problematic.

Here's the deal. An SD2IEC can be an end point for installation. If it decides to scrap the case of directory names, well, that's not the end of the world for it, because it's also case-insensitive when accessing those directories. But I can't use an SD2IEC as the source of installations for other devices. Because the damn thing doesn't retain directory case properly. This is twice as troubling, because I had planned on distributing C64 OS on 16MB SD Cards, along with a printed User's Manual and some other trinkets. I thought that if a user doesn't have an SD2IEC, then at the very least, they have a copy of the files on the SD Card that they can put in their PC and then transfer to an IDE64 over, say, PCLink. But this is problematic too, because all the files they'll see with their PC on the SD Card will be PC64 files.

Arrg! This is a very tricky problem.

How do other C64 projects that depend on nested directory trees distribute their content? It can't be via .D64, .D71 or .D81 because they don't support subdirectories at all. So how do they do it? I can just copy the technique everyone else is using, right? Well, this may sound crazy, it certainly sounds crazy to me, but I can't think of any other Commodore 64 projects that actually depend on components being found in nested subdirectories. Coming from the modern world of computing, this is almost unimaginable. But it seems to be true. Games are almost always distributed in .D64 or .D81 images. And if the game is too big for one disk image, it gets distributed as multiple disk images. On an SD2IEC you set up a swap list to easily swap the disk images while still inside the game. Sometimes the game will support dumping all the files from all the disk images into a larger device, like a CMD HD or an IDE64 or SD2IEC, but in those cases all the files are put into one common directory.

GEOS doesn't support subdirectories at all. Wheels, the GEOS update from Click Here Software, can use subdirectories, but only just barely. Wheels has no concept of place. Everything was still ultimately designed to run from a single directory. Contiki has all of its files in a single directory. Godot has all its files in one directory. WiNGs used nested subdirectories. That's the only C64 system I can think of that actually did. (No wonder I loved it so much.) But it, unfortunately, never reached the stage of development where it had to worry about distribution.

For this reason, I have been developing C64 Archiver. But, that's a big topic. So it will be the topic of another post.

Final Thoughts

I know this post got technical fast. And it might be hard to follow. But I hope you have gained some deeper insight into the complications that arise from what seems like a simple idea. Why not just have SD2IEC read FAT16 directly, so we can easily swap files back and forth with a PC? It works, and it is incredibly convenient. But it's got plenty of rough spots that any serious user needs to be aware of.

Although the Finder at the time only supported 31 character filenames. It's still a lot more than 8 plus a 3 character extension. [↩]
How the individual bits are used is actually important on an 8-bit computer. The KERNAL makes numerous bit-level optimizations in the code that routes traffic according to device number. [↩]
Newer KERNALs, like JiffyDOS, added the ability to copy files between devices. But to make room for this, and other functionality, JiffyDOS had to drop support for the little-used Datasette. [↩]
Besides automatic block allocation for files and directory entries, it is also possible to manually allocate a block. Validation cannot be used on disks with manually allocated blocks, because the validation procedure will automatically free all blocks that aren't explicitly part of a file or directory. [↩]
Values 128 to 255 are sometimes referred to as extended or high ASCII. However, there is no one standard definition for the meaning of the extended or high ASCII values. [↩]
I love this word. The urbandictionary.com definition is, "Wat: The only proper response to something that makes absolutely no sense." Perfect. [↩]

Do you like what you see?

You've just read one of my high-quality, long-form, weblog posts, for free! First, thank you for your interest, it makes producing this content feel worthwhile. I love to hear your input and feedback in the forums below. And I do my best to answer every question.

I'm creating C64 OS and documenting my progress along the way, to give something to you and contribute to the Commodore community. Please consider purchasing one of the items I am currently offering or making a small donation, to help me continue to bring you updates, in-depth technical discussions and programming reference. Your generous support is greatly appreciated.

Greg Naçu — C64OS.com