I have divided my conclusions into two parts. The first part interprets the data gathered on Colossal Cave Adventure and takes the long view on what makes this digital built environment noteworthy and important to human history. The second part reflects on the digital tools and methods used for this case study followed by an apology for code archaeology. When I planned this case study, I expected to focus solely on stylometric and text analysis to see if they could be done on computer code. These relative ease at which this was possible led me to results, sometimes messy, that led me to question how I could verify or correct those results through other means. This in turn led me to understand the code within wider contexts, both human and non-human, as I traced the origins and subsequent growth of not just a game, but a community dedicated to it.
Conclusions about Colossal Cave Adventure
The archaeological evidence throughout this case study suggests that Colossal Cave Adventure is “Patient Zero” for open source coding and viral gaming. We know where and when the game originated and who the creator is. There is a robust oral history surrounding the origin of the game and its initial growth as the trunk of a tree in the late 1970s that later grew a number of branches in the 1980s up to the present day. Each of these branches corresponds to central figures in the coding history of versions of CCA and those who followed using these later iterations to inform their own work as they made the game their own. By using text analysis and stylometric tools, I was able to better understand the “genetics” of various versions in order to determine who the main influencers were in the game’s history, and to see what code survived between versions over time that called back to the original.
The game is both fun and challenging to both play and program, which explains its appeal to generations of players and coders over 44 years. The fact that CCA was the first of its kind as an interactive digital adventure game that used natural language input and output to advance an exploratory narrative also draws significant attention from players, programmers, and now archaeologists. As happens in studying the archaeology of the recent past, several creators of digital artifacts are still alive, as is the case with CCA and its two most famous programmers, Will Crowther and Don Woods. Although I did not contact either person for this case study, a number of my sources did as they wrote their histories of the game. The thing about histories (even oral histories taken from the creators themselves) is that they rely on the memories and ephemera of individuals. Through archaeology, one can supplement the history through artifactual evidence—in this case computer files—that can further add to the history of one of the most famous and influential games ever made. As explained above, some of the results of this case study will update current scholarship about the game, adjusting the chronology, and proving the influence of some versions over others as the game’s history grew.
Through the study of CCA, one can begin to understand the early days of software networking and the open source community (before such a thing was called “open source”). It is not enough to demonstrate that the artifacts changed over time (and how they changed). Because people both designed and played these versions of the game, we see the growth of informal file-sharing networks where programmers could stash their code for others to discover, reverse-engineer, and use. CCA earned its reputation through this kind of discovery and play, “play” meaning both gameplay as well as the activity of coding largely by hobbyists who would go on to share their work. Based on the number of creators who ended up as IT professionals, some of whom went on to do great things (or who did great things and then decided to write an iteration of CCA), the game clearly attracted a certain social group of technically minded men both young and old who took it upon themselves to preserve and care for the game’s legacy while also adding their own signatures to it, a “family tree.”
Colossal Cave Adventure’s gameplay, narrative, sense of humor, and also its sense of adventure proved so popular among its early players that it inspired them with the possibility of what one could do for entertainment on computers. After the initial iterations in either FORTRAN or C in the 1970s, the early 1980s witnessed the availability and relative affordability of personal computers, specifically the TRS-80, the Spectrum, the Amiga, “IBM-compatible” desktop machines running MS-DOS, and the first Macintosh model. One no longer needed to run FORTRAN through a compiler on a mainframe at a university. One could design and play games at home that did not necessarily need graphics. The growth of CCA’s family tree can be attributed in part to affordable personal computing, emerging bulletin board (BBS) services and modem technology, and the ease at which files could be discovered and shared. Digital rights management (DRM) and copy protection was in its infancy and for young people at the time (like me), we wanted to find as many games and other programs as we could for free in order to play, and also to take apart so we could learn how to make our own in the programming languages we were teaching ourselves. These early skills would translate into careers for some.
CCA was there at the beginning of interactive entertainment and remains a pillar in digital game history. Conducting archaeological research into how the game evolved provides some answers to “why.” Understanding how the game spread and how it changed, how versions freely borrowed code and other text from other iterations, lends itself to discovering that these same events happened for other games, too, in other environments. This still occurs in the present day, although instead of borrowing code outright, the more visual game-design engines (such as Unity) contain elements that can be taken and re-used in what is now called “asset-flipping.” Being able to trace borrowed assets creates a kind of genealogy between games and the people who made them, artifacts containing “code-spolia” from other digital artifacts not unlike finding column drums built into fortification walls. What we see all the time as archaeologists of the ancient world is reflected in the habits of people from the present and the recent past. People have always had the same needs and will resort to time-honored shortcuts in order to meet those needs in the most efficient way. For programmers, sometimes the best way to learn is to take something apart and then put it back together again, perhaps improving the original in the process. This is exactly what happened (and continues to happen) with Colossal Cave Adventure.
Conclusions about Tools, Methods, and Code Archaeology
As I conducted my research on this case study, I regularly tweeted my progress from both my personal account (@adreinhard, c. 3,700 followers) and video game archaeology account (@archaeogaming, c. 2,000 followers). Those tweets included quotes from the game code, screenshots, threads describing my work so far, all combined with hashtags (#archaeology, #archaeogaming, #cca, #colossalcaveadventure) to encourage discovery and also dialogue. At times I would get stuck on how to use the software I had selected, or needed someone to check my work, and Twitter provided access to a diverse pool of users who would, in the spirit of open scholarship, critique the work and answer questions.
I also published my preliminary theorizing and findings here on the blog, to serve as a dry run for the thesis proper. I wanted to see what readers’ reactions to the research would be, and to exercise complete transparency in my work by presenting my preliminary findings in public. 73 people read my thoughts on code archaeology; 56 read about my tools and methodology; 47 read about the quantified results; 34 read about the qualitative results. Unfortunately 0 comments were left by the readers of these posts. The scarcity of readers could reflect the specialized nature of the project. Compare the number of readers of general archaeological topics vs. those interested in epigraphy and palaeography. The more specialized the research, the smaller the readership.
To further encourage open scholarship and data-sharing among archaeologists as well as video game historians and other researchers, I placed all of my data, tools, visualizations, and results as well as all of the versions of CCA that I used into a publicly accessible GitHub repository and assigned it a permanent Digital Object Identifier (DOI) to be used in bibliographic citations and to assist in discovery. The best thing I can possibly do with my research is to make it freely available online to anyone who wants it. Researchers can then test my work, challenge it, and/or add to it.
The software tools I used in this case study were also all open source and open access, a conscious choice I made in order to lower the bar of entry to anyone interested in pursuing similar work (or to review mine). CCA itself is open source code, which I purposely chose so that anyone could access, play, and examine this game in its many versions without having to go out-of-pocket. So many modern video games have strong Digital Rights Management (DRM) and restrictive copyright, that games researchers often have to sign non-disclosure agreements (NDAs) in order to access video game digital assets (including code). By studying an open source game, I was able to conduct my work freely while finding tools and methods that can now be applied to any game, copy-protected or not.
This case study is now a part of the open source community of CCA, and of open source (digital) archaeology generally. CCA was written and distributed as open source from its earliest days, and because most versions were created by other programmers, they followed an unwritten code of etiquette. In the open source community it is bad manners to use someone else’s work and either a) not cite them, or b) not thank them for their work into which new code was introduced, even if the work is in the public domain. The attribution rule (i.e., citing the source) would later be introduced into codified Creative Commons licensing as “BY”, used in conjunction with NC (non-commercial) and ND (no derivative works). By publishing my own work as CC0 (public domain) and hosting all of my data in an open repository, I can ensure that anyone at anytime can use the data, visualizations, findings, and source code for their own work without the need to ask permission or needing to wait for a response from me prior to initiating their own project. Making my work as open and easily accessible as possible, I am contributing new contemporary archaeology in a helpful and non-proprietary way. It is my hope to publish this thesis (or a version of it) as open access (likely public domain). It serves no purpose to hide data and research in a silo or behind a paywall. The benefit of publishing in the public domain is this: others can critique and improve upon my work in the coming months and years without any barriers to access, making the methods better, inventing new and better tools, and adding to the corpus of my data as versions of CCA are either created or discovered. When my work ceases to be “my” work and instead becomes the work of the community, it takes the ego out of research in the service of the topic being researched. I think that all of archaeology would benefit from this open, non-proprietary approach.
Tools and Methods: Lesson Learned
I realize that the stylometric, text analysis, and visualization software applications I used (not to mention all of the CCA versions) are not eternal and live online in a state of flux. Tools change over time, as does their online availability. The research questions prompting the need for these tools, however, do not change. People will continue to ask about author-attribution for text corpora, and will continue to analyze texts. We use the digital tools available to us, or we make our own, or we appropriate tools used for work similar to ours in an attempt to achieve results for new kinds of data not envisaged by the tools’ original designers. In the case of CCA, I applied text analysis and stylometric tools created for human-readable text (novels, legal documents, etc.) to computer code.
By documenting my process and sharing my results, this might lead to the development of more specific tools created for code epigraphy, something for the Digital Humanities and digital archaeologist’s toolkit. Until that happens, I have written step-by-step instructions on how to use the current stylometric and text analysis tools (see above), and have saved the tools themselves to my GitHub repository so that others can use them even after the tools either evolve or disappear from their current homes online. These acts of preservation and documentation should be part of any Digital Humanities project and certainly any archaeological one. Create a data management plan (DMP) with stable URIs or DOIs for sustainable access on open platforms/repositories to ensure ease of discoverability, use, sharing, and linking.
One of the other lessons learned about digital tools and methods in this case study is that the learning curve for using the tools and getting them to produce results can be quite steep especially if one has either little or no experience working with either the Terminal (Mac) or the DOS prompt/console (Windows). One of the drawbacks of open source digital tools is that they can be quirky and unforgiving in the hands of a novice user thus precipitating the need for a support network. I was able to reach out to the wider Digital Humanities community through Twitter for assistance not only in how to use various digital tools, but also in identifying which tools to use. No one in my greater network had deployed these tools against sets of code, but all thought it was possible. The community’s help and encouragement contributed to a successful project. Current/future archaeologists can (and do) benefit from the assistance of online communities, and should be willing to share their successes and failures openly with one another, documenting these in public. The end goal of any project is to answer its research questions and then to publish those results along with the tools and methods used to achieve them. This public transparency is ethically sound and opens projects up to critique and also to future work by people not affiliated with a given project. New sets of eyes lend themselves to new ways of seeing, and it is a vanity to think that one’s own interpretation of data is the correct one. Conducting research in public makes that research stronger through being in a state of constant peer review while also serving as a check on the biases of the researcher(s). The community makes the individual’s work better.
One of the biggest lessons derived from this case study can be divided into two parts: 1) archaeological data is messy, and 2) to arrive at defensible conclusions, one must use multiple tools and perspectives. When I initially conducted stylometric and text analysis on the CCA code sets, I was pleased with the immediate results: data in, and data out with pictures to prove it. I decided to check my work and in doing so—by going back to the actual code and text within dozens of CCA files—I realized that there were sometimes problems with the results that could be resolved either by tweaking the parameters of the tools I was using, or by reviewing the data in detail via additional means in order to arrive at a more satisfactory, honest interpretation of the output. This was achieved by stepping away from the quantification of data, and instead using human intelligence to perceive trends in file metadata and context, comparing and contrasting them against the output of the R statistics package to get the whole story.
This should not have surprised me. Having excavated in both Greece and Italy, it is obvious that every artifact has a context and can be explained through scientific tests (e.g., XRF, carbon dating, etc.) as well as through its deposition in stratigraphy and its relation to other nearby objects, not to mention any artistic or inscriptional evidence on the artifact itself, as well as its size, shape, and state of preservation/completeness. All of this must be taken together to understand a single object, the assemblage in which the object is situated, and the relationship of that assemblage to the excavated unit, related units, and overall site. Archaeology is a study of complex relationships, and through understanding that complexity one can derive emergent behaviors between people and objects, objects and objects, objects and locations, landscapes, and so on. As it is with “dirt” archaeology, so it is with the digital.
Code Archaeology Revisited
As described in the introduction to this case study, software code is a text-artifact created by one or more people. The code itself is text that is keyed in to a file—or in the case of the earliest versions of CCA, punched into cards—and once saved, that file becomes an artifact. In the case of punchcards, this is quite literal and physical, not unlike Mesopotamian clay bullae inscribed on the outside with data held on tablets within. Digital files are artifacts as well, something created by people to address a specific need (or set of needs), saved with a timestamp and a datestamp to lock it securely within a context of related files or a history of development and use. These files—text-artifacts—can then be copied, shared, sold, manipulated, and used by both people and machines to discharge their intended purpose and, ultimately, to be discarded, abandoned, corrupted, deleted. Just as with papyrus or clay tablets, digital files themselves are artifacts within an archaeological context, findspots. The information encoded within the files, however, is a communicative virus that contains instructions to manipulate not only machines that run the files, but also the people using these machines. Because text-artifacts are by their nature manipulative, they can also be considered to be landscapes.
We have seen this before with the other two case studies in this thesis. Skyrim VR places the user within a fully realized, three-dimensional open world where one can explore a synthetic landscape based on natural algorithms coded by designers and engineers. No Man’s Sky offers a more organic and infinite space in which to operate, its procedurally generated landscapes encouraging the human player to build and to explore, these spaces encoded by a “voxel”, a seed of information that blooms upon first contact with a human agent. Colossal Cave Adventure, as simple as it is, also provides players with a space to play, although that space is created as a mental picture by way of textual suggestion. In all three cases the underlying code manipulates human behavior as the player consciously (or perhaps unconsciously) gives up control in order to be moved and directed by each game’s rules.
Code is a landscape, but a manufactured one. The landscape is engineered, even in the case of seemingly randomized procedural generation. As on Earth, these landscapes are governed by rules including time and gravity, the rules stating either directly or indirectly what can or cannot be done by agents within the landscape. The rules present themselves through their underlying code, which creates the landscape and the events unfolding within it. Once the human player-agent arrives in that landscape, they engage with the code and are manipulated by it, being drawn into a maze, to quote CCA, “of twisty little passages, all alike.”
This is the genesis of what I call “machine-created culture” abbreviated as MCC. When I first imagined MCC, I thought that it would cover synthetic civilizations created by software algorithms. By the time I had completed this third case study, however, I realized that machine-created culture describes how people (individuals and entire populations) are manipulated by code. In digital archaeology (archaeology of digital things and spaces), code is perhaps the most important thing to study. Code builds worlds and affects agency, going beyond merely historical primary texts because code is not only a document but also an executable set of engineered instructions that is the foundation of any game, and by extraction, of any software program at all. In considering code archaeologically one not only begins to understand the code’s creator(s), but also the context/environment that caused the code to be produced, the other media artifacts that ran the code, the people who used it, and the code’s personal and cultural impact on both the creator and the user and their wider social circles. We can use the code and clones/iterations of it to look for its viral impact, its distribution networks, its relationships to other software and automata as well as to its human users on a local, regional, or international scale. Code is all at once: architectural blueprint, electrical engineering diagram, DNA/RNA, epic literature, map, snapshot, organism, artifact, site, landscape, Zeitgeist, personal/corporate history. Code is a written document, but also the foundation for a house, and the house itself. It is a pot thrown on a wheel, but is simultaneously the clay, the unfired pot, the finished vessel, and the discarded waster. Code is an artifact in the past persisting into both present and future and when compiled behaves as if no time has passed. It can be experienced as it was. All of these characteristics lend themselves to understanding something archaeologically as a thing on its own, as something fitting within a wider assemblage of other software and hardware, as something that both precipitates and experiences non-human and human interactions traveling at the speed of light, at the human speed of use, and the technological speed of decay. Code is both text and tool made through human-machine symbiotes creating complex relationships with a shared language to discharge functions both trivial and profound. If we are to understand where humanity is headed as well as what got humanity to this moment in time, then it is the digital archaeologist with a deep understanding of digital built environments and machine-created culture who becomes the indispensable interpreter. All archaeology studies the human by way of non-human residues. The archaeology of the digital is the next stage of evolution within the discipline, one that historically has preoccupied itself with technology from walls to wheels, from henges to the Antikythera Mechanism and beyond.
 Specifically Rick Adams (http://rickadams.org/adventure/a_history.html) and Dennis Jerz (https://jerz.setonhill.edu/intfic/colossal-cave-adventure/) (accessed January 17, 2019).
 See John Aycock and Tara Copplestone, “Entombed: An Archaeological Examination of an Atari 2600 Game”, The Art, Science, and Engineering of Programming 3:2 (2019), p. 16, which shows evidence of code re-use between Atari engineers working on no fewer than five separate games.
 I worked at the Greek site of Isthmia where column drums and other architectural elements had been robbed from areas inside the 1st c. AD sanctuary in order to build the 5th c. AD Hexamilion Wall.
 https://github.com/adreinhard/cca. GitHub is the world’s most widely used online space for open source programming and is currently owned and administered by Microsoft. Placing my case study’s materials on GitHub and assigning a DOI (see n. 37) protects against link rot and loss of my digital assets for this project.
 10.5281/zenodo.2536327. A DOI is a unique, unchanging identifier assigned to a digital asset. In this instance, the DOI was assigned by Zenodo, an online home for open access research and publications managed and administered by CERN.