An Archaeology of Code: Qualitative Analysis and Context of Colossal Cave Adventure

nerdlepoint
CCA-inspired “nerdlepoint” by Henry Basenji.

Introduction

So far this case study has introduced the concept of code archaeology using the game Colossal Cave Adventure as its Gilgamesh. The code itself has undergone both stylometric and text analysis in order to provide a quantified look at how the game grew and changed, and how authors and versions borrowed from one another over the past 40 years. This final section discusses the code sets in a more contextual way, offering additional methods of viewing code sets, using that data either to confirm or correct past assumptions about a program’s history as told by its code, its text-artifact.

Code Sets

The first two versions of Colossal Cave Adventure were both written in FORTRAN IV for the PDP-10 mainframe computer: William Crowther (1976) and Don Woods (1977). The versions that followed through 2017 included straight translations/ports and updates in over a dozen computer languages for platforms ranging from mainframes (i.e., DEC) to purely Web-based games. The following table lists the languages (where known), version names, and creation years for CCA versions created in each language. Years marked with an asterisk (*) indicate that the version year was previously unknown to the CCA community but has been supplied by the author via EXIF data, which will be described below.

Language Version Name Year
 
A-Code ARNA0550 2001
ARNA0770 2006
ANSI C ANON0340 1994
Atari BASIC ANAL_XXX 1981
BASIC CCS__XXX 1981
C ARD_0550 1991
KNUT0350 2003
WOOD0430 1978
DEC FORTRAN IV LONG0500 1979
SUPN0350 1978
WOOD043B 1995
FORTRAN ANON0501 1979
ARNA0440 DOS 2001
ARNA0440 Linux 2001
ARNA0440 Source 2001
WOOD0350 1977
FORTRAN 77 JAME0551 2016
OLSS0551 1990
OSKA0551 1990
Inform NELS0350 2006
NKMP0350 2002
Intel FORTRAN VANE0560 2011
MS FORTRAN BLAC0350 1987
DOVE0550 1987*
MS Micro Color BASIC GERR0000 2015
OMSI Pascal BREE_XXX 1980
Pascal BECK0500 2008*
PDS FORTRAN MUNO0370 1996
PHP ADAM0350 2014
Quill System HARR0235 1985
Windows Batch BENH0350 2013
Z Code BAGG0350 1993
BALD0350 1996
Unknown ARNA0660 1995
COX_0350 2003
CROW0000 1976
DAIM0350 1984*
EKMA0350 1990
GOET0350 1980
KENN0000 1992*
KINE0350 1996*
KINT0350 1997*
KINW0550 1996*
LUMM0350 1993*
MALM0350 1993
MALM1000 1990
MCDO0551 1990
MUNK0430 1996*
PLAT0550 1984
PLOT0350 1994
POHL0350 1990
RUSS0000 2007
TICM0350 1998
WELL0550 1985
WHIN0450 2007
AUST1100 1982
BHAV0565 2000
BUTT_XXX 1995*
DIAZ0350 1997
GASI0350 1991
GILL0350 1993*
GOET058D 1993
GOET0580 1993
HAMO0350 2011
KINA0660 1996
KIND0430 1995*
KING0350 1996*
KINM0551 1996*
PICT0551 2003*
RAYM0430 2017
WITB0000 1990
YONG_XXX 2006
ARNO0350 1998
CALH0000 1985
CONL0000 2011
CRAY0350 ?
EVIZ0350 1983
EWHH0366 ?
FULL0000 ?
GOLD1000 ?
GRAY0375 1980
JAME_XXX 2009*
JONE0210 1982
LETW0350 1979
LUPI0440 1978
MANO_XXX 1981
MCGU_XXX 1985
MOMA_XXX 1989
PENN0350 2001
PICT0701 2013*
PLON0350 1996
PONT0350 2001
RAMS0350 2007
RASM0350 ?
RAUS0350 1981
RICH0500 1979
ROBB0350 2002
SCBA0350 1980
STAD0550 2013*
STAD0580 2013*
STAN0350 1995
VMCM0350 ?

EXIF Data and the CCA Chronology

Computer files contain metadata, some of which is easy to obtain. One can either right-click (Windows OS) or press command+i (Mac OS) in order to retrieve information about who created a file and when, and where it is currently located. This data, however, are not always the most accurate, sometimes reflecting the date a file was copied from one computer to another. Because of this, one must look into the deep metadata of a file in an attempt to find the actual date a file was created and, if possible, by whom. This more accurate data helps to create a more accurate and stable file chronology, something quite important when determining the order of versions of a software application, Colossal Cave Adventure in this instance. Accessing this deep metadata in text and programming files is not that intuitive and requires borrowing from image file metadata.

Exchangeable Image File format (EXIF) data is embedded into image files via the digital devices used to capture them, and include information ranging from exposure to focal length, as well as the date and time at which an image file was created. The EXIF data of a digital image can be read relatively easily through various digital image software such as the Adobe suite of projects (including Bridge and Photoshop). These image software programs cannot, however, open text files or provide access to their metadata.

I was able to resolve this problem by finding an open-access program, ExifTool, by Phil Harvey, which can be installed and run from the Terminal (Mac) or Console (Windows).[1] One can use this tool to read all available metadata for any kind of file (not just images) through a simple, typed command:

exiftool -all [filename without the brackets]

For example, to see the creation date of the original advent.dat data file created by Don Woods in 1978, open the Terminal/Console, navigate to the directory storing the file, and then type:

exiftool -all advent.dat

Pressing the Enter/Return key will retrieve the file metadata.

Collecting EXIF metadata on CCA files allowed me to do three things:

  1. Find missing years for some versions of CCA;
  2. Verify create dates for CCA versions taken as fact by the community based on the game’s accepted history, which may or may not have been examined critically after the version dates were accepted originally;
  3. Correct errant create data for CCA

These three tasks help to stabilize/correct CCA’s chronology, but are not themselves completely infallible. About one-third of the known versions of CCA contain ReadMe files that contain dates as typed by their respective authors (e.g., C. Yong’s ReadMe file notes the create date as 2006). When the EXIF metadata matches the date(s) given by the author, one can reasonably assume with little doubt the verity of the year of creation of that version of the game.

Other ReadMe files contain a few dates to indicate the year of the version being ported/recreated, as well as the year of the port and other years when that recreation was itself updated by the author. For example, Linards Ticmanis, author of TICM0350, notes that his version, created in 1998, is a “standard FORTRAN 77 port” of Crowther and Woods original game. Ticmanis notes further that this particular iteration is a 2001 update of that 1998 port. When I ran the actual game code (FORTRAN) file through the ExifTool, the code reflected the 1998 date. The 1998 code was paired with the 2001 ReadMe, and it is unclear where other changes happened in the various other code or data files for this version. The 1998 date, however, is confirmed by the EXIF data and the ReadMe file and can be assigned at least a terminus post quem.

The EXIF metadata alone is not an infallible way of assigning years to versions. As happens on occasion, original file dates can become corrupted through copying or through the incredibly easy way simple text editors allow one to modify original files. What may once have been a file from 1980 can update itself to the present through the accidental press of the spacebar and closure of the file. While EXIF metadata can be quite helpful in confirming the date of a file’s creation, one must use it in concert with ReadMe text, checking both against the established chronology.

There is another tool that can be used for additional verification of files as well as demonstrating file-sharing between versions: checksum.

CHECKSUM 

When a file is created, it is assigned a unique verification number, a “checksum”, the purpose of which is to make it easy to compare two files to determine if they are exactly the same. If the file’s checksum being compared to the original’s is off by even a single character, the files are not the same. While this might hint at possible nefarious activity, it often means that one is looking at a modified version of a file instead of a 1:1 clone. This is important for CCA because it allows the researcher to see which files (if any) have been shared between versions. This differs from text analysis, which checks for the borrowing of data held within a file as opposed to the borrowing of the file itself.

To run a checksum in the Terminal on Mac OS, type the following, and then drag the file to check directly onto the Terminal window, pressing Return afterwards:

shasum –a 256

Repeat with the file(s) you wish to compare to the original. If the numbers match, they are clones. If not, the non-original files differ in some way.

For Windows operating systems, you may need to download either an MD5 or SHA utility in order to complete your checksum investigations.

CCA Directory Structures and Languages

While I conducted text/stylometric analyses against three main filetypes (code, data, and ReadMe files), nearly all versions of CCA contain other files as well, sometimes dozens of them, depending on the author and the language used. For example, Don Woods’ 1978 version of William Crowther’s original game, written in FORTRAN IV, consists of two files: advent.for (the code) and advent.dat (the narrative data). In 1996, Alan H. Martin presented the original files, adding two more of his own: 1) an .mic executable file (for modern computers instead of the original PDP-10 mainframe), and 2) a ReadMe file that included the checksum numbers of the original Woods files to prove that this 1996 version, WOOD0350, did indeed include the original FOR and DAT files from 18 years ago. FORTRAN grew and changed as a language. Steve Dover’s 1987 version (DOVE0550) no longer uses FOR and DAT file extensions, but rather the manufactured filetype ADV, the files called through an EXE Windows executable file. The ADV files include a symbol table, record index, instructions, and narrative text, along with a ReadMe file, all created in Microsoft Fortran, 21 years removed from the FORTRAN IV of Crowther and Woods.

Other languages such as Inform, which was created specifically for interactive fiction by Graham Nelson, self-contains its stories in a single .inf file. Compare this against versions of CCA written in C (e.g., WOOD0430) that use more than 10 files to recreate a working copy of the game. It is the nature of programming languages to behave according to their own grammar and syntax, which is why it is so important to be able to access the files directly for their metadata as well as for their actual contents. Even though the languages and directory structures change, the CCA contents held within stay relatively the same, old DNA carried in new vessels.

Returning to the original language of CCA, FORTRAN, it is perhaps interesting to see when future versions of the game appear in other iterations of the language in an attempt to remain true to the source: 1976, 1977, 1978, 1979, 1985, 1987, 1990, 1992, 1995, 1996, 2001, 2011, and 2016. One could draw parallels to the appropriation of Greek art and architecture by the Romans who not only used Greek designs and art and architectural vocabulary, but improved upon them for more modern tastes. Many of the ancient Greek sculptures are known to us thanks to prolific Roman copying of the originals (see the Apollo Barberini, the Dying Gaul, and the Laocoön for a few of the most famous examples). For the very latest FORTRAN versions, we could be seeing a Classical revival of an archaic form, like playing 18th-century Classical music on instruments of the period (see the music of contemporary ensemble Europa Galante). Music played on instruments of the same period in which the music was composed sounds different than when played on contemporary instruments. The music-as-written is the same, but the performance differs based on the instruments used to play it. So it is with games: playing a game is a performance by the player of the code-as-written. Playing CCA in FORTRAN gives the user a more “authentic” gameplay experience than that played in a Web browser. The notes are the same, but there are differences in the performance. The same could be said of visiting heritage sites. Visitors to the Acropolis of Athens before 1687 would have seen a roofed Parthenon, but after it was shelled by the Venetians during the Great Turkish War, subsequent visitors would experience the same building, but in a different state. Now with the Internet and with virtual reconstructions, Parthenon visitors can experience the building through digital mediation: it is the same structure but experienced in different ways. As it is with heritage buildings, so it is with heritage code.

One could argue that there is no longer a need to create another version of CCA into another flavor of Fortran, especially when several versions already exist in that language, as well as in other formats for ease-of-play (e.g., online). For some, playing the game is enough. But for others, recreating CCA in FORTRAN is a labor of love, a thought exercise, and nostalgia for the days when one had to book time on a mainframe computer at a university in order to compile the code and play. CCA in FORTRAN could be considered by some to be an “authentic” experience, although authenticity in software (as it does with everything in cultural heritage) exists on a spectrum. To the best of my knowledge, none of the FORTRAN versions beyond Crowther and Woods’ was ever punched onto cards to be compiled, and versions created after 1980 would likely not have been played on the PDP-10 or -11 mainframe computer in an office or university, but rather run at home or at school. This is the classic argument then, starting with the very first interactive digital entertainment adventure, of what is the authentic CCA experience? Is it the story? The hardware? The language? And is that “authenticity” needed in order to enjoy the game as a game, and also as an archaeological artifact? All of these versions, even though they differ in languages, platforms, size, scale, and scoring, are CCA, drawn from the same set of tablets telling the Ur-story of a mammoth cave system in Kentucky and the dwarvish goings-on inside.

Translations

Stepping away from programming languages used to create versions of CCA, the game’s narrative text has been translated from English into seven “natural” languages, all of which are incorporated into Z-code as part of Graham Nelson’s Inform interactive fiction authoring platform. The translations can be classed as separate versions of CCA, but do not deviate from the underlying Inform programming. All translated versions reflect the 350-point version from Don Woods as ported into Inform by Graham Nelson in 1996:[2]

  • Spanish: José Luis Diaz (1997)
  • German: Toni Arnold (1998)
  • French: Jean-Luc Pontico (2001)
  • Dutch: Yuri Robbers (2002)
  • Lojban (invented language): Nick Nicholas, et al. (2002)
  • Russian: Denis Gaev (2004)
  • Swedish: Fredrik Ramsberg (2007)

A Geography of CCA Versions

versions_map
Map of known locations where CCA versions were coded.

Code has context, which includes geographical data, this locative value being important to all archaeological artifacts. William Crowther coded CCA in 1975, but in order to play it he had to compile the code on a mainframe computer at the San Francisco Bay Area office of BBN where it was later discovered and updated by Don Woods in 1977/8. Software, although ephemeral, is made in a place by one or more people writing it in the natural world. Roughly 25% of the known versions of CCA created between 1975 and 2019 include a city, state, and/or country in which the version was produced; this data may be found in the versions’ ReadMe files. I was curious to see how CCA spread, especially before the widespread availability of a global internet ca. 1996 and later, and created this map (Map 1 above) of versions with known development locations.

CCA versions come from five countries: the United States, Canada, United Kingdom, Germany, and Sweden. The versions did not migrate neatly from west to east, but rather hopscotched around thanks to various BBSs (onine bulletin board services) and early versions of email and networking. This is why CCA could be developed in California in 1975, move to Massachusetts in 1979, Canada in 1986, appear in both Sweden and England in 1990, Germany in 1992, while continuing to blossom in the San Francisco Bay Area, greater Los Angeles, Chicago, College Station, Texas, and the Eastern seaboard in the ‘70s, ‘80s, ’90s, and early ‘00s. Software-sharing does not follow the same patterns of the transpacification of languages or the spread of ideas based on the Earth’s topography, but rather jumps around from brain to connected brain as people discovered the game. I have been unable to determine exactly how various authors found their way to CCA or why the game reached individual programmers when it did, but the data point to scattershot adoption of the code. The obvious absence of CCA versions from South and Central America, Africa, and Asia beg their own questions, and one wonders if it is a language barrier, a difference in cultural interests, availability of networking, or other issues that have kept other parts of the world from interpreting the game. But this is not the only hole in the story.

Gender and CCA Versions

Of the 120 individuals responsible for creating versions of Colossal Cave Adventure from 1975 until the present, 119 are male. The one woman in this list, Toni Arnold, created the German translation of CCA based on Graham Nelson’s 2006 version authored on the Inform interactive fiction-programming platform. To see if this gender disparity was mirrored in the interactive fiction (IF) community, I visited the main clearinghouse for all IF (of which CCA is the first example), the Interactive Fiction Database.[3] As of this writing, IFDB hosts 11,146 registered members. Because many members registered with screen names instead of their given names, and chose not to identify as a particular gender, it is difficult to determine what the actual gender-split is, but scrolling through the member roster does show gender leaning towards people identifying as women based on members’ avatar choices, many of which are gendered. This would seem to reflect the gender split (weighted 3:1 in favor of women authors[4]) for fan fiction, where amateur writers create new stories based on existing characters from various popular media, publishing these stories online for others to read and comment upon. The oldest and most popular fan fiction site is fanfiction.net with over two million registered users—most of them women—who have written over eight million pages in over 30 languages since 1998. Based on the demographics of creators of original interactive and fan fiction content, one would think that this would spill over into the creation of Colossal Cave Adventure versions by women, either as a classic reworking/reprogramming of the tale, or perhaps a reimagining it through a woman’s perspective. This is clearly not (yet) the case. Perhaps there is something else at work here.

Of the 120 authors of CCA versions, I have confirmed that 77 (64%) of them had (or currently have) professional careers in IT (software, programming, development, management, technical writing, game design). 31 (26%) of the authors have an unknown career, but judging from the online presence of 20 of them, their career likely intersected with IT in some way. While careers in computer science have been (and continue to be) ca. 75% male,[5] women have been present from the very beginning in all aspects of professional computing, including programming (Ada Lovelace), wireless communication (Hedy Lamarr), and game design (Carol Shaw, Atari), so it remains unclear why there are not more women CCA authors over the past 40+ years, especially when the code is open source and easy to find. In 2018, 45% of the active gaming population identified as women,[6] and nearly 40% these players opted to play various types of role-playing games (RPGs) either online or as solo campaigners (i.e., single-player games).[7] CCA is classed as an RPG, one played in the first-person, and although written by a man in 1975 as a way to heal from his divorce and to keep a relationship with his two young daughters,[8] the language of the game’s narrative is gender-neutral, is largely non-violent, and instead focuses on exploration and puzzle-solving. CCA’s story, tropes, and mechanics parallel those of games that remain popular with all players, women and men.

In looking at the ReadMe files for the versions of CCA that have them, many of the authors explicitly state that their versions port the “original” (or a close variant) to another language, or fix bugs in earlier versions, or treat this as a programming challenge (like a rite of passage, just like playing CCA itself), or serve as an act of preservation, keeping the game alive in contemporary programming languages and platforms. This kind of nostalgia and preservation is not the province of men, especially after the 1980s. 86 versions (72%) were created in 1990 and after, so one should think that more woman-authored versions of CCA would have become available as more women learned to code either for fun or through classroom instruction, although historically women have always been talented programmers in the early days of computing on machines such as ENIAC, through to the present day. It is possible that versions of CCA were written by women but never published for whatever reason, the main one being male gatekeeping in the coding community, or male-dominated threads in CCA forums such as the Colossal Cave Adventure Forum on Delphi Forums, whose visitors and content-posters since 2016 have been primarily (if not exclusively) male.[9] The original versions were written in FORTRAN and were often ported to C, but it remains unclear why women programmers in these languages did not latch onto CCA for similar reasons as their male counterparts. While the possible reasons for this are outside the scope of this thesis, the reasons behind the number of male authors of CCA versions is statistically significant and deserve to be studied.

As a final effort to find some closure on the significant gender gap, on January 1, 2019, I posed the question about male-dominated CCA authorship to Rick Adams, who is the game’s leading historian. He replied on January 3, 2019:

That’s a good question, and I’m not really any kind of expert who would know!

I can only speculate.

I know from mumble mumble years in the industry that gender is not indicative at all of coding skill, although there do tend to be slightly more men coders than women.

One can again revisit the Classical tradition in the creative arts to see (unfortunately) that the more things change, the more they seem to have stayed the same. Most of the potters, painters, sculptors, and architects known to history from ancient Greece and Rome are male, which may indicate two things, which are not mutually exclusive: 1) that men were predominantly the predominant artisans in the ancient Mediterranean world, and 2) there is a historical bias in scholarship and in records of the period that amplify male voices. When looking at the more modern tradition of software engineering, we see the same issues at work: 1) a predominantly male workforce, and 2) an amplification of male voices. Colossal Cave Adventure was the first adventure-style interactive digital role-playing game; it happened to be authored by a man who worked in a predominantly male office focused on computer science, that was later discovered by another male computer scientist, updated, and circulated by other male programmers, hackers, and IT professionals. Stepping away from CCA itself, we see that contemporary game development studios and gamer culture remains a predominantly male-dominated area, and the reception of games created by women and of women who play games competitively are frequently targeted for harassment by men. This toxic culture might very well explain why we have not seen more than one CCA version written by a woman and publicly shared.

The fact that 99% of the authors of CCA versions have been men was unknown to me when I started this case study. The absence of women voices did not show up in my quantified work using the text analysis tools. Instead, it required me looking at the individual files, finding signatures, and researching the game’s history in order to determine that only one woman created a translation of CCA from English into German. Quantified data are certainly valuable, but can also leave out a big chunk of context, which is important to the understanding of anything archaeological, digital, or otherwise.

The final part of this case study will attempt to draw general conclusions and lessons learned about CCA and about code archaeology generally.

—Andrew Reinhard, Archaeogaming

[1] ExifTool can be found at http://owl.phy.queensu.ca/~phil/exiftool/ (accessed January 6, 2019), and is also available in my Github repository for this case study: http://owl.phy.queensu.ca/~phil/exiftool/.

[2] http://www.ifwiki.org/index.php/Adventure#Z-code_ports_.28350_points.29 (accessed January 6, 2019).

[3] https://ifdb.tads.org/ (accessed January 6, 2019).

[4] http://ffnresearch.blogspot.com/2011/03/fan-fiction-demographics-in-2010-age.html (accessed January 1, 2019).

[5] See the January 2018 report from the Pew Research Center on STEM careers: http://www.pewresearch.org/fact-tank/2018/01/09/7-facts-about-the-stem-workforce/ (accessed January 6, 2019).

[6] https://www.statista.com/statistics/232383/gender-split-of-us-computer-and-video-gamers/ (accessed January 1, 2019).

[7] https://quanticfoundry.com/2017/01/19/female-gamers-by-genre/ (accessed January 1, 2019).

[8] http://rickadams.org/adventure/a_history.html (accessed January 1, 2019).

[9] https://forums.delphiforums.com/xyzzy/start (accessed January 6, 2019).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s