April 12, 2026

Recovering a Lost Document: Habitat Anecdotes (1988)

While digging through old CVS archives recently, I turned up something I hadn’t seen in decades: my very first paper, written in the fall of 1988 while I was still in the thick of running the Lucasfilm Habitat Beta and Pilot tests.

As best I can tell, this document has been effectively lost to the web. It was hosted on communities.com at some point in the mid-to-late 1990s, but that site is long gone and neither the Wayback Machine nor any search engine turns up a surviving copy. The version below was recovered from a CVS repository archive, where it had been imported in May 1998 and apparently never touched since.

I’ve preserved it exactly as written, including the 1993 preface I added when Electric Communities was getting started. The dead links to communities.com throughout the document are artifacts of their time — that domain is gone, but the paper it once pointed to, The Lessons of Lucasfilm’s Habitat, remains available through other channels.

Reading it now, I’m struck by how much of what we were figuring out in 1987–88 still holds. The five user types, the economic emergent behavior, the tension between operator power and community trust — these weren’t abstractions. They were things happening in front of us, in real time, with real people paying real money. This paper is where I first tried to write them down.

Habitat Anecdotes

and other boastings by F. Randall Farmer
Fall 1988

Preface (Fall 1993)

This, my very first paper, documents my early observations of the Lucasfilm’s Habitat Beta and Pilot tests in 1987 and 1988. These observations served as raw material for several published papers that became the inspiration for the formation of Electric Communities, a cyberspace research company founded by Chip Morningstar, Douglas Crockford and myself in 1993. If you don’t know anything about Habitat, this paper won’t make much sense. The unfamiliar should first read The Lessons of Lucasfilm’s Habitat in Cyberspace: First Steps from MIT Press.

This paper is adapted from a hypertext document.

The People

The entire point of Habitat is The People. It is an interactive environment where people define the parameters of their experience. Chip likes to call it “A Social Crucible”: throw some people in a room with some fun toys, and see what happens. If a situation arises that requires modification, first let them try to sort it out –avoid changing the rules– and if they can’t, take their input on how to change things. From this, it is clear that to understand Habitat, we must first understand its users.

There are basically 5 types of people in the Habitat universe:

1) The Passive

Easily 50% of the number of users fall into this category, but they probably use only 20% of the connect time (rough estimates). They tend to “cross over” to Habitat only to read their mail, collect their 100t bonus, and read the weekly newspaper. They tend to show up for events ad-hoc and when the mood strikes. This is the most important area for development. Special events and activities need to target this “on for just a few minutes” group. This group must be lead by the hand to participate. They tend to want to “be entertained” with no effort, like watching TV. The trick here is to encourage active participation.

2) The Active

This group is the next largest, and made up the bulk of the paying user-hours. The active user participates in 2-5 hours of activities a week. They tend to log into Habitat right after connecting. They send out ESP messages to others on-line to find out what is going on. They ALWAYS have a copy of the latest paper (and gripe if it comes out late). This group’s biggest problem is overspending. They really like Habitat, and lose track of the time spent “out there”. The watch word here is “be thrifty”. (See Quests for more on this)

3) The Motivators

The real heroes of Habitat. The Motivators understand that Habitat is what they make of it. They set out to change it. They throw parties, start institutions, open businesses, run for office, start moral debates, become outlaws, and win contests. Motivators are worth their weight in gold. One motivator for every 50 Passive/Active users is wonderful. Nurture these people. (See Motivators & Caretakers at Work)

4) The Caretakers

Usually already employees. The Caretakers are “mature” Motivators. They tend to help the new users, control personal conflicts, record bugs, suggest improvements, run their own contests, officiate at functions, and in general keep things running smoothly. There are far fewer of these than Motivators. Out of a Pilot group of about 400, we had 3. What you want to do with a Caretaker is groom him for Geek Godhood. (See Motivators & Caretakers at work)

5) The Geek Gods (System Operators)

I was the first Oracle/Operator. (I talk about that experience in Geek Gods Revisited). The operator’s job is most important. It really is like being a Greek God from the ancient writings. The Oracle grants wishes and introduces new items/rules into the world. With one bold stroke of the keyboard, the operator can create/eliminate bank accounts, entire city blocks, or the family business. This is a difficult task as one must consider the repercussions of any “external” effects to the world. Think about this: Would you be mad at “God” if one day suddenly electricity didn’t work anymore? Habitat IS a world. As such, someone should run it that has experience in that area. I suggest at least 10 years experience in Fantasy Role Playing and 2 years on telecommunications networks (specifically ones with CHAT programs). A Geek God must understand both consistency in fictional worlds, and the people who inhabit it.

To optimize the Habitat funativity experience, the goal is to move the user from his/her present category to the next one up:Passive->Active->Motivator->Caretaker->Geek God.

Move everyone one role to the right, and you will have a successful, self maintaining system. (Read: you will make bags of money.)

Real Money

The Habitat Beta Test was actually a paying pilot-test. The testers would be paying $0.08 per minute to play and in this way we could see if Habitat was financially feasible. There were exceptions; about 25% of the testers would be QLink staff, who either had free accounts or were given a certain number of free hours. This distinction caused some difficulty in deciding if any Habitat activity was a success (see The Scheduled Events). We wanted to see if Habitat was fun enough for paying customers.

Read these (don’t forget to read between the lines):

A certain user posted this message (edited for brevity):

As of today I am quitting Habitat. It costs too much. I have been a Q-Link subscriber for 2 years. The first year I used only 2 plus hours. ($10) The next year I used only 5. ($25) But in the last month, while I was playing Habitat I spent $270!!! I can’t afford that. You need to make it cheaper.

$270 = 57 hours or over 100 times his previous peak usage!

We must have made it “too much fun!”

another user said:

I didn’t realize that I was going to want to play 50 hours/month!

Habitat (for some) was addictive. Because of this, there was a call for “Bulk Discounts” and various schemes were proposed by the users. None of them were implementable, and all of them would have resulted in significant losses. I fully expect the call to go out again when it is released.

Yet another spent over $1000 in one month in Habitat. At around $300 and $600 dollars, he was mailed a message suggesting he “check out his usage in the billing section”. If we could get 20 more of this type of “rich” user, we would be profitable!

Habitat Money

The Habitat official currency is the Token.

The Economic Model

You are “hatched” with 2000t, and every day you log in, you receive 100t. Money can be won in contests/quests. You can buy and sell objects using automated machinery. The Vendroid sells stuff. The Pawn Machines buy it back. Each Vendroid makes the purchased item out of thin air. That’s right, no production costs. This leads to an interesting problem of runaway inflation. We never got enough people in the system to understand this effect, but got a taste of in when “The Big-Money Scam” happened:

The Big-Money Scam

During the Alpha test, “The Big Doll-Crystal Ball Scam” took place. In order to make the automated economy interesting, we made Vendroids so that the could have any price for any item. This was so we could have local, specialized economies (i.e. a widget could cost a little less if you bought it at Jack’s Place instead of The Emporium). In two vending machines across town from each other were two items that were for sale for less than the pawn machine would buy them back for: Dolls (for sale at 75t, hock at 100t) and Crystal Balls (for sale at 18000t, hock at 30000t). One weekend several persons participated in the Scam, they took their money, purchased many boxes, walked to the Doll Vendroid and bought as many as they could afford, walked back to town and pawned them. They repeated this process until they had enough to purchase Crystal Balls. This took many hours. The final result was at least 3 people with 100,000t – 500,000t. In one night the economy had been diluted as the T1 (the Token Supply) has jumped 5 times! (for more on this Scam, see They Cheated!).

What the Wealthy Did

The new rich class now began to distribute their wealth by having treasure hunts. There were other quests and hunts that gave many users fat bank accounts. Soon a true economy began to emerge: Heads. Since you can change heads in Habitat, and unique heads were often prizes or gifts from the oracle or very expensive, their value skyrocketed. This would definitely be true when thousands of users came along, as there are only 200 or so styles of heads, and each user is initially given a choice from about 30 of those. Heads are the only obvious form of customization an Avatar has.

The Issues

Introduction

As I have said before, Habitat is a society, and as such, has spawned many debates about how the Habitat world should be. Very few “rules” were imposed on the world from the start.

A theme at the core of many of the arguments is philosophical. Is an Avatar an extension of a human being, a Pac-man like critter — destined to die 1000 deaths — or something else. Our answer is all of the above and none. Again the people decide what is right. In reading about the issues, keep in mind that our sample was very small, and skewed towards Actives and Motivators.

Early Thieving

At first, during early testing, we found out that people were taking stuff out of others hands and shooting people in their own homes. We changed the system to allow thievery and gunplay only in non-city regions. (That one was easy! It gets more complicated from here)

Dial H for Murder

The hottest issue was, by far, murder. In Habitat, if an Avatar is “killed” he is teleported back home, with his pocket emptied, what he was holding dropped, his hit-points restored, and his head put in his hand. However, only what he had with him and his position in the universe has changed. One of the Motivators took to randomly shooting people roaming outside of town. A debate arose: Is Habi-Murder a crime? Should all weapons be banned? Is it all “just a game”? There was such a debate on the issue, that a vote was taken. We were surprised by the results. 50% said “A crime” and 50% said “no — it is part of the fun”. Our outlaw had in fact demonstrated that human-human interactive combat was fun for over half the audience. And since anyone who didn’t want to fight could just “ghost” and run away, there was no reason to consider the banning of weapons. (For more on personal combat, see Combat)

The Order of the Holy Walnut

One of the outstanding proponents of the anti-violence-in-Habitat view was also the first Habitat Minister. A Greek Orthodox Minister opened the first church in Habitat. His canons forbid his disciples to carry weapons, steal, or participate in violence of any kind. It was unfortunate that I had to eventually put a lock on the Church’s front door because every time he decorated (with flowers), someone would steal and pawn theme while he was not logged in!

Wedded Bliss?

Three Habitat weddings took place in that church. These were not human-human weddings, but Avatar-Avatar. Their turfs were joined so that they could cohabit. There were some technical problems with this that should be resolved in any new versions. Only one account could enter a turf if the owner were not home. We hadn’t properly handled cohabitation.

The first Habitat divorce occurred 2 weeks after the third wedding. I guess Habitat is a bit too close to the real world for my taste! The first habitat lawyers handled the divorce, including public postings all about town.

Entertaining the neighbors

The Party was one of my favorite activities. I liked to throw them at new Avatars’ houses. I would ESP a known “Passive” Avatar, and ask him where he lived. If he told me, I would send ESP to “Actives” and “Motivators” that were on-line teleport to the address. Great fun.

A close cousin to parties was the Sleep-Over. The users invented this on their own. Often private discussions would take place in a turf. It was considered a minor social honor to be invited to sleep-over. This meant to log-out while still in another’s turf. This was an honor because you would be able to log in later even if the host was not on. This would leave the host’s belongings open to plunder.

More on Stealing

Speaking of plunder… Stealing is still possible, even within city limits, as once an item is placed on the ground, it has no owner. Like murder, opinions on this issue are deeply divided and we think the best way to resolve it is to let (help) the players devise a limiting mechanism.

Secret Identities

In the original proposal, all Avatars would be able to have unique names (separate from their log-in names) and they could say they were anybody they wanted. Like a big costume party, no one would know who was who. I lost the battle for unique names, and QuantumLink wanted an “identify” function. It seemed the anonymity I wanted was lost. But I suggested a counter-proposal. A tit-for-tat rule. If you “peek” at someone else’s secret identity, you will be unmasked to that Avatar, and no one else would know the results. Some very interesting dynamics developed. Some people were offended if they were ID’ed right away. And others never bothered, if you said “HI! I’m WINGO”. I remember one time that I convinced someone that I was another person by sending ESP as “myself” to the person in the same region.

Business

The economy was a minor issue. Most everybody had plenty of tokens (except the Passives). In an attempt to open the retail business to Avatars, a Drug store was opened, with a locked room in back that only the owner could enter that contained the only vendroid that sold Habitat healing potions and poisons. The shopkeeper would pay the fixed price, and could charge whatever he wanted for resale. It was a success except for the fact that the owner logged in at strange hours.

To Govern or Not to Govern

Our design directive was not to interfere in Habitat politics or set up a government or law establishment. Many people thought that crimes of killing and theft ought to be punished. We decided to hold sheriff elections. The favorite candidate was a friendly guy, but many didn’t know that this very same Avatar was the brains behind The Scam. There was a public debate in the Populopolis Meeting Hall with the 3 AvaCandidates making statements and fielding questions. I was among the ghosted attendees. I would pre-type some comment like “Vote for Foon!”, de-ghost quickly, press return to send my message, and become a ghost again. No one would have any time to tell who I was before I was gone. This was fun. During the Question and answer period I, before appearing, typed this question: “Please explain to us why we should vote for a sheriff who obtained his campaign fees rather -ah- UNUSUALLY?”. This started a real-life-like mud slinging fight. As it turns out, he won by a landslide anyway. Populopolis had a sheriff.

For weeks he was nothing but a figurehead. We were stumped about what powers to give him. Should we give him the right to shoot anyone anywhere? Give him a more powerful gun? A wand to >zap< people to jail? What about courts? Lawyers? Laws? Late in the test the answer struck me: ask the users! A “Committee for a Safer Habitat” sent out a mailer to everyone asking this question: “What should the sheriff be able to do?”. Then another election was held “What is a crime?” and “What should the sheriff be empowered to do?”. The results were unable to be acted on before the test ended. An interesting side effect of this was that it became apparent there are two basic camps: anarchy and government. It will be great to see what happens with thousands of users facing this decision. Habitat need not be set up with a “default” government (like reality).

Magic Inflation

Besides economic inflation, we also had Magic inflation. In the Dungeon of Death, the designer had a vending machine that sold magic wand that teleport to the oracle anyone you point them at for only 1000t. At this time magic wands worked forever. Soon everyone had one of these wands and people were zapping each other all over the place. Crime got really out of hand when criminals would travel as ghosts, wait for people to put their belongings down for a second, de-ghost – zap – and steal. I had always planned on implementing a limited “charges” feature but was to busy tracking down bugs. Soon it was clear it was time to act. “God” changed the rules, and limited magic. The issue became foremost in the discussion arena: Some people were using these rods for the “good” cause of rescuing people when they got lost. Many were outraged that the rules changed. Ask yourself this question: What would you have done? This is a tricky question, fundamental for the chief operator to understand.

Motivators & Caretakers at work

By far the Caretaker who had the greatest on his fellow users was the editor of the Habitat newspaper The Weekly Rant. This user tirelessly spent 20-40 hours a week (free account) composing a 20, 30, 40 or even 50 page tabloid with containing the latest news, events, rumors, and even fictional articles. This was no small feat, he had only the barbaric Habitat paper editor, and no other tools. After he had composed the pages of an issue, he would arrange them in several chests of drawers in The Rant office and send me mail. I would publish it by using a special host program that would bind them into a book object and distribute it to the news vendroids, check the copy by hand for errors, and deliver a copy to the office (in Habitat). This worked great, but took massive amounts of his personal time. I began to automate the process further just as Habitat operation changed hands. The new publisher didn’t publish on time, delayed getting the tools ready to speed up creation, made editorial changes (he wanted it to be shorter, less fiction), and didn’t hand-deliver a copy of the final product. The editor quit. Just like real life: Someone new runs the show and the sensitive leave. Again, these people are rare and should be handled carefully. The Rant will never be the same.

Duels

One of the wands we implemented caused the victim to perform the “jump” gesture, accompanied by a “Hah!” word balloon. It was fun for a while, mostly because you could really effect another Avatar, but it got old fast. Soon a game was developed completely by the users involving these wands: The Duel. The rules were simple: two combatants, two wands, one judge. When the judge says “go” the first to “hit” the other with the wand 3 times wins. Not as easy as it sounds, since the duelists are allowed to run around.

Tours

Another Caretaker was the number #1 all-time most-traveled Avatar. He also was the longest lived. When new people started logging in, he took them on guided tours of this strange new world. He made them feel like they had a “friend” in town.

Combat

“Conflict is the essence of drama”. We used this quote in the initial Habitat design document. Habitat (it was then named “Microcosm”) was to have personal combat in the forms of weapons. Most computer games had combat, and we were offering a chance for users to effect each other!

Here I will explain how it actually ended up working. There were ranged weapons and hand-to-hand. An Avatar is born with 255 hit points (the actual number is masked from the user, and a “general state of health” message gives the user some idea how bad off he is.) While holding the weapon, you select a target and DO (attack). There is a telecommunications delay that may effect the hit-or-miss result. Each successful attack does some small amount of damage (i.e. 20 points.). You are always informed when you are shot, as your Avatar is knocked onto his rump.

As you can see, it would take quite a few hits to “kill” a healthy Avatar. Not only that, but you can avoid being damaged if the attacker can’t “touch” you in 2 ways: 1) by turning into a ghost or 2) running around (not standing still). You use #2 when you are in a gunfight where you are shooting back. This seems to be a working dynamic. If you really, really are low on hit points, you travel the “wild” regions as a ghost. There are also devices that will restore your hit points. The real problem is communicating this to new users, who are often standing around in a region when a bandit comes along with a gun. The neophyte hears a “bang” and sees his Avatar knocked on his can. Instead of acting, he types a message like “What was that? Why am I sitting down?”. Meanwhile, the bandit cranks out another 12 bullets…. Dead beginner probably had all of his money and stuff in his pocket too! This problem should be corrected in the Avatar Handbook, explaining that guns are dangerous (something we thought people would assume on their own).

For more on special types of combat see Magic Inflation, Duels, and Dungeon of Death.

The Scheduled Events

The D’nalsi Island Adventure

The first treasure-hunt ever planned for Habitat was mine, the D’nalsi Island Adventure. I took me hours to design, weeks to build (including a 100-region island), and days to coordinate the actors involved. I had taken several guesses as to what how long it would take the players to perform each “segment” of the quest. The mission: recover the lost “Amulet of Salesh”. First: A trial, introducing the characters and the first clues. Second: Salesh hires the adventurers. Third: The players needed to figure out the “secret” teleport address. Fourth, they must find the door to the hidden cave, solve the riddle. Last: find the hidden crawlway and the buried chest containing the amulet. The prize was 25,000t.

The first part was in the form of a “dinner theater”-like play, set in the county courthouse. It was heavily attended. Since it was set up as an introduction, there was no appropriate “time” for the players to discover anything.

On the day that Salesh “hired” adventurers to find his amulet, he gave out copies of a map of the island. Hidden on this map was a word that was the teleport address to the island. After about 15 minutes of hiring, when about the tenth Avatar was hired, Salesh (me) received an ESP from one of the Motivators: He had discovered the teleport address. Darn! It seemed that the others had no idea where to start, so I sent ESP to all the players announcing that the teleport address had been discovered to be a word on the map.

Within 8 hours the treasure had been recovered by that person who had first discovered the island. This was so soon that almost half the adventurers (the novices) had not yet even discovered the teleport address! It was clear that there is a very wide range of “adventuring” skills in the Habitat audience, and various events need to be better targeted, and should include handicapping mechanisms so that those behind don’t get more and more behind.

The Dungeon of Death

This “combat oriented” dungeon was the brainchild of a Caretaker that had recently become a Q-Link in-house employee. It shows that experienced “insider” could design an successful event using his understanding gained through being a player first. (Note: I had nothing to do with this design, so it was my first event as a participant)

For weeks ads appeared in The Rant announcing that Duo of Dread, DEATH and THE SHADOW were challenging the adventurers to come to their lair. Soon, on the outskirts of town, a dungeon was discovered. Outside a sign read “Danger, enter at your own risk.”. Two operators were logged in as DEATH and SHADOW, armed with guns that could kill in 1 shot (instead of the usual 12). The dungeon had totally dark (light did not help), dead end (trapped), and duplicate regions. It was clear that any explorer had better be prepared to “die” several times before mastering the dungeon. The rewards were pretty good: 1,000t minimum and access to a vending machine that sold “teleport” wands (see Magic Inflation). I even got a chance to play DEATH for one night. It was a slaughter. Avatars were dropping like flies… but most of them had prepared by emptying their pockets. When I got to play DEATH, I found him in one of the “dead ends” with four other trapped Avatars. I de-ghosted and started shooting, but was shot twice myself and died. Shoot! The last operator had not healed damage from his last encounters! The worst part of this is that “when you die, what is in your hands is dropped”. Yep. Some normal Avatar now had the “elephant” gun that could kill in one shot. The most valuable weapon in Habitat. What should I do? I later found out that this was not the first time this happened, it happened to a Q-Link operator and they “forced” the Avatar to give it back. I did something else: As DEATH (never identifying my true self) I threatened to kill the new owner. She replied that she would never leave town, thus being safe. OK, I think, she’s smart. After about an hour we settle on a deal, 10,000t to buy the gun back. We meet at The Oracle in town, where it is safe and make the exchange. It was great. The entire “operations accident” was handled within the game universe with no “external” interference.

R&R weekend adventures

These were short (1-2 hours) quests where a user pressed one of ten magic buttons to receive a clue to find one of ten hidden keys to be used in one of ten hidden safes. This were the all-around best quests to run (there were 3 of them) because there were always 7-10 winners. The only problem here was the Time Zone problem: The event had to be scheduled so that as many people as possible could participate from the moment it started. Q-Link access started at 6pm local time. This meant that for the Californians to have a chance, the adventure would have to start at 9pm East coast time at the earliest.

The Money Tree

The Quest for the Money Tree is the first quest an Avatar learns about from reading his free Welcome Wagon version of The Rant placed in his Turf. There is a tree in a forest that will dispense 100t for every Avatar once. Everyone can feel like they have “found” the magic tree.

The Tome of Wealth and Fame

This was also one of the originally conceived of quests. A certain set of tablets contained the Tome of Wealth and Fame. If you found it, you were to hide it somewhere else. You would receive a reward based on how long it took another to find it. The problem with this was that the world was so large that it often took weeks for someone to find the tome. It wasn’t an active process because, if you tried, it would take days of on-line time to find.

The Long and Short of Quests

A trend became clear about quests in Habitat. The winners of the “long range” quests like The D’nalsi Island Adventure were almost always people with free accounts. The freebies would stay on for hours on end to gain wealth, things and status (See Habitat Money:”The Scam”). The paying customers could only come on 1-2 hours/week. The idea that people would be able to “work on” a quest for weeks is bogus. The long range quest must be something that either “everyone” can win or does not provide some significant advantage in the world. (See The Money Tree)

Grand Openings

A real surprise was the popularity of the “Grand Opening”. This the ribbon-cutting event when new regions were added to the world. Tokens and prizes were often hidden in the new regions, but it seems that the audience (especially the Passives) had an insatiable hunger to see new places and things. The Grand Opening of the Popustop Arms apartment building was the most heavily attended event of the Pilot test.

Disease

One of the more successful “games” we invented for Habitat was the disease. There are three strains currently defined:

Cooties
Happy Face
Mutant (AKA The Fly)

We only were able to test Cooties with live players, but it was a hit. It works like this: Several initial Avatars are infected with a “Cootie” head. This head replaces the current one, and cannot be removed except by touching another non-infected Avatar. Once infected, you can not be infected again that day. In effect, this game is “tag” and “keep away” at the same time. Often people would allow themselves be infected just so he could infect “that special person that they know would just hate it!” Every time the disease was spread, there was an announcement at least a week before, and for at least a week afterward it was the subject of major discussions. One day that the plague was spread, a female Avatar that was getting married got infected 1 hour before her wedding! Needless to say, she was very excited, and in a panic until a friend offered to take it off her hands.

Some interesting variations to try on this are: Touch 2 people to cure; this would cause quite a preponderance of infected people late in the day. The “Happy Face” plague: This simple head has the side effect of changing any talk message (word balloons) to come out as “HAVE A NICE DAY!”… can you imagine infecting some unsuspecting soul, and him saying back to you HAVE A NICE DAY! ??? ESP and mail still work normally, so the user is not without communications channels. The Mutant Plague: The head looks like the head of a giant housefly and it has the effect of changing talk text to “Bzzz zzzz zzzz”. We think these all will be great fun.

Deception & Trickery

These were fun things to do to your fellow Avatar.

My invention – Type this: “You have *mail* in your pocket.” and watch the fun as people say “That’s strange! I don’t have mail!”

Chip thought this up – Send this ESP message “ESP from: yournamehere”, then quickly send a “Hello” also. Your “Hello” ESP will be announced 3 times!

We developed a form of communication “harassment”. You can do this on almost any network. Just coordinate a few people all sending very short ESP messages to the victim. His screen will scroll faster than he can read. This was used against the social outcast mentioned in Dial H for Murder.

Geek Gods Revisited

“They Cheated!”

As the One and Only Habitat Oracle and Lead Programmer I was subject to some interesting conflicts of interest in operations. I cared intensely about the experience each user was having. I worried about bugs. When The Scam happened, I flipped my lid! “They Cheated! And they didn’t report it in a bug report!”. First indication I had was looking at the record for most tokens. Then I searched for bug reports. Then I sent Habitat mail to the two newest rich people asking them where they got all that money overnight. The reply I got was “I got it fair and square! And I am not going to tell you how.” At this point I should have realized that my role as Oracle and Programmer were at odds, and that the users were not aware of the relationship of my Avatar (the one who mail them asking about the money). A Geek God must not lose his temper. Remember, Habitat is its own little world.

Fan Mail

The greatest reward for being The System Operator is Fan Mail. When the Caretakers (who always end up finding out who the oracle really is) tell you that the world is a fun place to be because of you, it makes it all worthwhile. It does matter what you do (see The Rant). It all starts at the top of the pyramid. A bad Operator can drag the system down by not keeping track of details and promises. Trust feedback.

Ideas to be tried

Monster of the Month Club

Since we (theoretically) could remotely update the disk we considered a “Monster of the Month Club” scheme where we would download a new Avatar body style for special events. There were some images that were not put in the current implementation of the system that would be great: a car and a motorcycle. Some other interesting ideas included a giant (full screen height) foot that would hop around, animals, and floating objects.

Machivelli

This game was designed by Chip. It required no programming support. The game is about politics and secret organizations. The idea was that the Operators and Caretakers would start the game by making up two separate secret organizations whose goals were to “take over” Habitat. They could only do this by recruiting more and more new members (while still keeping the organization a secret!). Secret “handshakes” could be set up. Meetings. Recruiting drives. Of course, soon there would be gang warfare. Who knows where it might go?

A Final Word

As I close this document I find I keep remembering dozens of other stories to tell. And all of these come from my experiences with only 200 or so people! Imagine what it will be like with tens of thousands of creative minds at work! Though as of this writing Habitat is still not a released product, I still am proud of the world we created. I really expect to be meeting you soon “On The Other Side” in a world not unlike Habitat.

Avatar History Lessons Learned | Posted by Randy at 9:42 PM | No Comments

February 18, 2026

Adventures In LLM Land, With Thoughts On The AI Revolution

For the past year and a half or so I’ve been experimenting with AI tools for software development. This began with a single, small personal project, but has now grown to encompass a couple more very large personal projects. Hopefully, this means I will soon have some cool new stuff to unleash upon an unsuspecting world, but no promises before shipping. In my working life, my whole team has begun incorporating these tools as an increasingly central part of our regular software development workflow. At this point I’m completely convinced that this is just the way software development is going to be done henceforth, at least until the next turn of the paradigmatic wheel (though at the pace things are going, this could be any day now). This post is an attempt to summarize what I’ve learned and my thoughts about it.

There’s an emerging practice that I’ve been hearing a lot of developers refer to as “vibe coding”. This is both a terrible piece of jargon and a disappointingly sloppy way of approaching things. Recently I’ve also started encountering more and more people talking about “agentic” development. Possibly this terminology shift has to do with the fact that these tools make it easy to have multiple balls in the air at once, leading you to have multiple independent entities (“agents”!) doing work on your behalf at the same time, but I have a sneaking suspicion it may just be folks trying to sound more serious and respectable than “vibe coding” suggests.

My experience has been that, rather than letting you be lazy (which can be either an accusation or a touted payoff, depending on whether you are talking to a critic or an enthusiast), getting the maximum benefit out of these tools has taken a surprising amount of discipline. It’s just that the discipline required is very different from what you need for traditional programming.

As I’ve been recounting my AI coding adventures to friends and co-workers, it has slowly dawned on me that a lot of what I’ve been learning probably applies to a much broader range of applications than just software development. So I’m going to attempt to articulate this broader view in the hope that it might make a useful contribution to the wider conversation that pretty much everyone by now has been having about AI, its meaning, consequences, and proper role in human civilization. I’m increasingly dismayed by the “let’s sprinkle magic AI pixie dust on everything” mindset that seems to have consumed the souls of the current cohort of herd following executives, clueless bosses, and idiot marketeers.

A caveat: the following will be rather meandering with a lot of digressions. That’s just how I roll here; this is a personal blog that Randy and I write largely for our own psychic satisfaction, not some journal article submitted for academic peer review. If you just want the high order bits you can always skim, though you’ll probably miss a lot of the fun parts if you do that.

My first AI testbed project was a system to catalog and organize my family’s home library. Both Janice and I are book addicts. At this point I think our collection is somewhere north of 10,000 volumes, though how far north, at this point, I’m a bit scared to find out. A few years ago we moved into a bigger house, and in the unpacking we dumped many, many boxes of books straight onto shelves without any effort to sort or arrange them, because we needed to quickly get all those boxes off the floor just to have room to live in. This added an extra element of chaos to an already disorderly mess. Also, as part of this move we emptied out a self-storage unit that had been slowly and semi-invisibly accumulating books for about fifteen years, as a result of which we now have a 20-foot shipping container in our driveway packed wall-to-wall, floor-to-ceiling, with boxes of books. (How we could more than double our household square footage and still end up with less free space than we started with is another interesting story, but even more of a digression.)

Part of our problem is that used books have become incredibly cheap, thanks to various Friends Of The Fill-In-The-City Library fundraisers, the irresistible seduction of used bookstores, and the explosion of online dealers, who have taken advantage of database automation and cheap real estate in depressed parts of the country to cost effectively warehouse enormous inventories, even though e-books and the ongoing devaluation of reading in our society have rendered physical books increasingly valueless. All these enablers mean it is incredibly easy to accumulate a sizable book hoard at comparatively little effort or expense, which is actually terrible for us book addicts.

Setting aside the physical aspects of actually arranging to have shelves to put all these books on (I think I can now unpack and assemble an Ikea BILLY bookcase in about 10 minutes with my eyes closed) and repeatedly moving vast quantities of books around to sort them into some kind of rational order, if we want to catalog all of these books to know what we actually have and where it is, this whole situation poses a giant data entry nightmare.

If you’re a nerd, as everyone in my family certainly is, this was obviously a call for More Technology. And there’s an enormous amount of book scanning and book cataloging software out there. Alas, all of it appears to be one or more of (a) targeted at the aforementioned used book dealers, who are mostly interested in determining valuations and managing their warehouse inventories, rather than in maintaining a proper library, or (b) some kind of SaaS product where you have to pay by the month to rent your own data in the cloud while at the mercy of somebody who may at any moment go out of business, “pivot” and leave you orphaned, or have some deranged product manager redesign it all into unusability (all of which have happened to me, sometimes repeatedly, with various as-a-service products), or (c) so incredibly dumbed down in order to appeal to a consumer mass market as to have been rendered unsuitable for the task.

On the other hand, if you’re a nerd you’re not so deterred by the idea of building your own. It’s one of the hallmarks of nerd-dom — awareness that you can just make things. On the other other hand, I’m not particularly interested in spending the time and effort to master a bunch of the incidental building blocks that will necessarily be involved, such as putting a web-services frontend onto a database, dinking with all the CSS and HTML minutiae required for a proper UI, figuring out how and where to obtain things like ISBN data, or how to do optical data capture with a mobile app using the phone’s camera. I’m passingly familiar with all these things, but none of them to the degree needed to develop a clean, complete, integrated, end-to-end solution. So, a perfect foil for trying out some of these newfangled AI coding tools, I thought.

I started with the rough formula laid out by my friend Monica Anderson, who has been paying attention to this AI stuff far longer than almost anybody else I know. As best I can tell, in the subsequent year and half, her formula has become pretty much the de facto pattern that everybody doing “agentic” stuff uses. I don’t know if this is due to her direct influence or convergent evolution, but at any rate a lot of this will sound familiar to people who are already somewhere down this road.

For the AI bits, Monica at the time (the latter half of 2024) recommended using the Cline VSCode plugin with the Anthropic Claude 3.7 Sonnet model. These days she’s advocating Claude Code using the latest and greatest model, whatever that is; as of today’s writing this appears to be 4.6 Opus, which I have now switched to, but much of what I did was using older stuff. I suspect you’d get similar results with any of the other major tools for this sort of thing that various companies are promoting, but I was following a recipe, and the first time I cook something from a recipe I don’t deviate much from it. Tinkering comes later.

Just as an aside, a substantial majority of all of the AI coding tools I’ve looked at, including Cline, Copilot, Cursor, and Windsurf, seem to be based on VSCode or forks of it. I’m not the world’s biggest fan of VSCode, but this was one of those things where trying to cut against the grain seemed like it would be a bad idea. Stacked up against competing IDEs, I think VSCode is actually pretty good. It’s just that as an old school Unix hacker I have an ongoing beef with Integrated Development Environments per se — basically, they’re too damn integrated, but I digress again. Mercifully, Claude Code is a CLI tool, and that has made me much happier.

Following Monica’s recipe, I wrote a five page spec document that laid out the problem and described the shape of the desired solution, including a fairly concrete outline of the kind of system I wanted. There were three components:

A backend catalog database, fronted by a web-accessible service API running on a computer entirely under my control (which is to say, no cloud entanglements, or at least none that can’t be replaced by a competitor on a moment’s notice).
A browser-based web app for displaying, searching, and maintaining the catalog, including the ability to manually enter and update catalog entries.
A mobile app for scanning books, looking up the relevant metadata from online sources, and from that generating catalog entries and storing them into the database. I want this app to be able to (in order of sophistication) scan ISBN barcodes, OCR ISBNs as text if the book predates such barcodes (ISBNs and their barcodes were standardized at the same time, but it took a few years longer for the bar codes themselves to become ubiquitous), or OCR the author/title text from the book’s cover or title page if the book predates ISBNs (ISBNs date from around 1970, and many of our books are older than that). Also, it should have a manual data entry interface to correct the inevitable errors that will no doubt be present in the available online data sources.

For genuinely weird stuff that’s too exotic to be scanned automatically (and there definitely will be such cases, given some of the volumes in our collection), I’d prefer the fully manual data entry fallback to be the web app, where I get to use a proper keyboard, rather than trying to enter it on my phone as if it was a tweet or something. In the case of very old books, we might have to resort to all manner of obscure clues to figure out what a particular volume actually is. I don’t think there’s a good payoff for the engineering effort to automate these edge cases, and I highly doubt that AI magic pixie dust is going to come our rescue here. At least not this year.

A lot of this spec detailed the kinds of information I wanted to capture and how I wanted to be able to organize it. I was also a bit more prescriptive than was probably strictly necessary with respect to a few platform choices, not because I was convinced those were the very best ways to do things but so that whatever the machine produced would be based on in things I was already reasonably familiar with (e.g., create a NodeJS express app for the backend server, use SQLite for the database, and so on). These choices were so that I could assess what had been produced and tell the machine to fix things I didn’t like. All of this was placed in a document called library.md.

I decided to begin with the backend/web-app combo, on the theory that this entrained less exotic weird stuff. Per Monica’s advice, the first prompt to the AI was "read library.md", followed by "create the web app and database backend just described", and we were off to the races. There followed a couple minutes of the screen twitching and flashing, and then, boom! Something came up in my web browser.

Instant software! But did it work? Well… sort of?

The machine did produce a web app that presented tables of stuff and an interface for creating and editing entries, but there were lots and lots of things wrong with it. The wrong things fell into two broad categories: (a) things that just didn’t work, and (b) bad UX and functionality decisions.

A few people I know have likened these tools to a junior programmer who your team hired straight out of school: someone who is super smart, very knowledgeable about All The Latest Things The Cool Kids Are Using, and energetic in the way that only naive young people can be, but also prone to leaping before they look and completely lacking in the kinds of common sense and taste that come with having spent a few years in the trenches.

This is pretty much consistent with my experience. It really is like managing a flock of recent MIT grads with masters degrees in computer science but no real world work experience. Fortunately for me, managing energetic, scary smart, but absurdly naive developers is something I’ve done previously in my career with reasonably good success, so I’m pretty comfortable with this as a process. In a lot of ways this is better; I’ve always found the engineering-management-as-software-development-at-a-higher-level-of-abstraction mindset very enjoyable and satisfying, but now you can do it without HR procedures or organizational politics.

So it took only the briefest time to generate a backend server with its associated database and a bare bones web UI that talked to it. Then I proceeded to spend the next two months of my spare time coaxing and prodding the fool thing to get the basic UX to be sane and the basic functionality to work properly. Things like: “when you update one field of a record, don’t change any of the other fields”, or “when you have several different related pages on a site all displaying textual data, they should all use the same font”, or “when you make a change to the code to add or fix a feature, all the other stuff that previously worked should continue to work, and in the same way as before”. It all eventually got working to my satisfaction, but the exercise was aggravating and tedious. Nevertheless, on net it all took considerably less of my time than doing all that stuff on my own would have; it’s just I would have made completely different mistakes and gone down completely different blind alleys. Though I successfully reached my goal, by the end I was quite ready to put the whole thing aside for a while and take a break from dueling with the bot.

A month or two later I was upgrading our household internet, switching from Comcast Business to AT&T Fiber — even though when we’d gotten Comcast in the first place because I’d sworn to never again do business with AT&T, they came in and offered me 10 times the bandwidth at half the price so I caved. The one hard requirement was that I needed a static IP address, which seems to be a weird and unusual service request even though I consider any internet hookup without one to be broken. Nevertheless, they were able to accommodate me “for a small monthly fee” once my AT&T rep found the right place to poke their system. I have a little FreeBSD box that has been our reliable inbound network gateway for years, and since the only thing that would be different with the new network provider was literally what the IP address itself was, I figured switching this over should be straightforward.

Ha. Those whom the gods would destroy they first make mad.

We got the AT&T fiber connection and router installed, configured everything to match the local network that all our devices already understood, and everything seemed to work. You could browse the web, my laptop could send and receive email, I could SSH back and forth amongst the various excessively numerous computers in our house, my son could play World of Warcraft, and so on. Hunky dory. All that remained was setting up that static connection, which should have been simple: Unplug the ethernet cable between the FreeBSD box and the Comcast router, reconfigure the FreeBSD box’s IP address, then plug the ethernet cable into the AT&T router. Nope.

It appeared I could make outward connections from the FreeBSD box to the rest of the world and I could access the external IP address from our LAN, but the rest of the world couldn’t see it. From the outside I couldn’t SSH to it, I couldn’t even ping it. What to do? Well, I’d been through an exercise very much like this before, back in the previous millennium when I got the very first AT&T (well, PacBell, but, you know…) DSL connection in my part of Palo Alto. It took six months of them fiddling around to get it all to work right, with them the whole while insisting that everything was fine until I brought them the next piece of evidence that it was, in fact, not fine, at which point they’d fiddle around some more and pronounce that, yes, ok, it wasn’t working before, but we fixed that and now everything is fine. Lather rinse repeat. They had to do all kinds of things to make it work, up to and including replacing all of the outside telephone wiring for my entire neighborhood when the wires turned out to be a bunch of ancient, rotted crap. Eventually they made it work, but until they did, every time I complained, the first thing they’d do was run some kind of remote connectivity test from their central office that always said everything was all working properly, even when it wasn’t. So when my new connection didn’t seem to be working right, I was wise to their game — obviously they’d made some kind of configuration error on their end or something. This time, when I called up AT&T and complained and they did their remote test and of course it said all was fine, I totally didn’t believe them, up until the service tech on the call recited to me the MAC address off the ethernet NIC on my FreeBSD box, which he had just read remotely. Oh. At which point my sense of reality seemed to shimmer a little. OK, the problem really is me. Hmm. There then followed days of Googling, going down blind alleys, testing various hypotheses. I found lots of websites very confidently stating that when this kind of failure happened the problem was definitely X and what you do about it was definitely Y, and they were always, always wrong.

It was then that I said, let’s ask the AI, it couldn’t possibly make things worse. I fired up Claude. The dialog went something like this:

Me: I have this networking problem <explanation of networking problem>. What do I do?

Claude: Type ifconfig and show me the output.

I think: yeah, that’s where I started too…

Me: <ifconfig output>

Claude: Type netstat and show me the output.

I think: yeah, and that also…

Me: <netstat output>

Claude: Type netstat -r and show me the output.

I think: dude, I’ve been down this road already…

Me: <netstat -r output>

Claude: You need to adjust your routing tables as follows: sudo route ...

Me: That’s obvious nonsense and can’t possibly work. <Tightly reasoned explanation of why it was obvious nonsense and couldn’t possibly work.>

Claude: You’re absolutely right! But I really think it’s this routing thing.

Sighing, I type in the route command to my FreeBSD command line.

Me: OK, I entered that route command.

Claude: Check if it’s now working.

I SSH to an offsite machine that I have an account on, and from there ping the new IP address. It works. I try SSHing to the FreeBSD box. That works too.

Damn.

Me: That worked! Why did that work?

Claude: <a paragraph of text explaining why it was the right thing to do>

Suddenly my mental model of what was going on shifted, I understood exactly what had been wrong, why I had been wrong about it, and why the routing tweak was the right fix. It all made sense. The world came back into focus.

This had me feeling quite a bit better about Claude’s eptness at dealing with subtle and confusing problems, so I decided to risk another foray into using it to actually create software. I told it to produce a first cut of the iPhone book scanner app. Once again, this took it about 5 minutes. At the time, none of the AI tooling I’d been using was integrated with Xcode, Apple’s IDE for iOS and Mac, which meant Claude could generate the code but it couldn’t actually try things out. So I fired up Xcode and was reminded that my installation was about five years out of date and my Apple developer account had expired. After a time consuming exercise in getting that all updated, I tried to build and install the code, but this failed miserably.

It turned out that the issue was another manifestation of out-of-dateness: my credentials and whatnot needed to be synced back up with the modern state of the world. Fortunately, there didn’t seem to be any actual build problems with the app code per se. The biggest complication in getting a successful build was all the certificate signing and permissions tweaking required to enable me to actually install the resulting app onto my physical hardware — and I needed to test with the actual hardware since the app would be using the phone’s camera. Configuring all this crap has always been a rough spot in the Apple tooling ecosystem, not because their technology for securing access to the phone is flawed (it seems to be quite well thought out, actually), but because their tooling is confusing and execrably documented (I’ll spare you my whole long rant about modern tech writing fads, since it’s another digression on a digression). So I asked Claude to walk me through it, and it just did. The whole song and dance was fussy, highly non-intuitive, and ridiculously complicated, but by following Claude’s step-by-step instructions precisely the whole thing went off without a hitch.

I plugged my phone into my Mac, hit the button to build, install, and launch the app, and moments later my phone was displaying the view from its camera and asking to be pointed at a book. I grabbed a book at random from the shelf next to my desk and held it up to the phone, which immediately beeped and replaced the display with a nicely formatted summary of the book’s title, author, and publication data, along with a crisp thumbnail of the cover art. I involuntarily burst out laughing, loudly enough to cause my son to come running in from the other room to see if I was OK. The thing had worked perfectly on the first try. This was magic.

What differentiated this from my first exercise in AI coded software was that the target was defined in almost purely functional terms: do this. The thing I wanted done was technically very difficult, but essentially straightforward, whereas the earlier thing was technically very pedestrian but contained a whole lot more stuff that needed to be just so for me to to be happy with it.

When you write a spec for something you intend to create yourself, you end up leaving out a lot of details, especially aesthetic details (both the aesthetics of how the thing will appear on the outside as well as the engineering aesthetics of how it will be put together on the inside). You can get away with omitting that stuff because you’re going to automatically follow your own instincts anyway as you proceed to implement it. You almost can’t help yourself. Very often you might not even realize that you left out details because you just create things the way you create things and the result is pretty much what you were expecting. On the other hand, if you’ve ever had to specify something for somebody else to implement, you’ve no doubt had the experience of getting results very different from what you wanted.

There’s a famous line from the world of politics:

“I can explain it to you, but I can’t understand it for you.”

I’ve heard this attributed to various people, but since it’s one of those highly quotable sound bites that’s been spread around so much (often with mutations) I found it a little tricky to track down for sure where it came from. As best I’ve been able to dig up, it seems to have originated (in slightly less eloquent form) with former New York City mayor Ed Koch. I had remembered it as some physicist testifying in front of a Senate committee about funding for some Big Science project, but that may just be narrative bias coming into play since it makes the story better. Or it could be that I’m just having a hard time imagining Ed “I am the mayor” Koch saying something clever and insightful.

Anyway.

The idea here is that although someone else can give you an explanation of something, the task of internalizing that explanation and developing an understanding of what was explained is ultimately up to you. Nobody else can do that for you – it needs to be your understanding.

I think this insight is itself directly applicable to the AI experience, but it also inspired for me a slightly parallel variant:

“The AI can make things for you, but it can’t want things for you.”

A lot of AI skeptics and critics have commented that using AI for artistic endeavors generally produces things that range somewhere between mediocre and terrible. And they’re not wrong. Creative expression is an almost pure manifestation of the creator’s wants and desires. When you write a story, for example, every element of it at every level, from the shape of the overall plot down to the structure and word choice of individual sentences, is a reflection of what you want those things to be. You have to be the one who determines what all those things are, based on your own desires. Now, you might still produce a mediocre or terrible story yourself because you’re bad at this in some way, but it still comes from your choices about what you wanted at each step. You can’t outsource that wanting to somebody else because then you’ll get what that somebody else wants and then it’s their story. And anyway the AI doesn’t want anything. At best, the AI can try to guess what you want. If what you want is vague and mushy then its guess will be vague and mushy too, and you’re going to get the kind of slop that everybody is criticizing. On the other hand, if what you want is not vague and mushy, then you have to communicate this with completeness and precision, at which point you don’t need the AI to write the story for you because you’ve just written it yourself.

I’ve seen a number of AI commentors quite appropriately point to David McCullough’s lovely aphorism: “Writing is thinking.” To an astonishing degree, the idea a lot of people have that writing is “just putting your thoughts into words” is a mischaracterization, as if the words were already sitting there in your head and just needed to be recorded. It is the very act of generating the words that constitutes the mental process that solidifies the ideas that the words are expressing. If you’re like me, you also do a lot of fiddling and editing after the first words have come out of your fingers, as you try to make what you’re saying clear, not only to others but also to yourself. And that clarification is really clarification of the very thoughts being expressed. I’m certainly not the first person to quip “I don’t really know what I think until I hear what I say”, but I do quote that line a lot. I find that writing, editing, and figuring stuff out all are pretty entangled. You can’t outsource your writing because you can’t outsource your thinking.

Well, I should qualify that last assertion. Of course you can outsource your thinking, but this amounts to handing control of your mind over to someone else: what you get is what they want. Certainly the world is filled with people who would be very happy to do your thinking for you, but it’s almost certainly a bad idea to let them. And, as I said, the AI itself doesn’t want anything. Or rather, it might seem to want things, but those largely reflect the wants of whoever set it up. Which might align with what you want too, but probably not. Or, much likelier, it might be something that whoever set the AI up just doesn’t care about, so you’ll get some random crap. We seem to be seeing a lot of that sort of thing these days too.

Compared to a work of art, however, a piece of software is much more of a functional mechanism than a pure act of expression. Certainly the act of coding will often have a significant expressive element, but there are also critical parts of it that can be evaluated on a fairly objective works/doesn’t work basis. The big challenge in using AI tools for software creation is maintaining the proper division of labor between you and the machine. The machine’s job is doing all the heavy lifting involved in making the mechanical parts (which for a human could consume hours or days or even years of somebody’s life to implement), while your job is to do the expressive parts. And even though AI is somewhat exotic at this point, this pattern of technology use fits a lot of our existing ways to doing things. I know a very talented sculptor who uses CAD tools, NC-machining, and 3D printers to fabricate her work, and nobody would argue that the things she makes this way are not her creations.

This gets back to my earlier comment about wanting. In a world with AI, your job is to want things. The key to successfully creating things is the ability to know or figure out what you want, and then the ability to express this. Both the knowing and the expression are hard, and the sad truth is that the ability to do them well is unevenly distributed among the human population, but they’re still 100% human. Further, if you’re interested in more than just yourself (and I hope you are), you need to be able to want things that your customers or clients or audience or the world at large will want, even (or especially) if they don’t know it yet. The ability to consistently do that is very rare indeed (vanishingly few of us are Steve Jobs, for example), and nothing about AI seems likely to soon change this. Note that this formulation goes beyond AI assisted software development. I think it applies equally well to using AI for anything.

These days I’m seeing a lot of irritation with the corporate world’s seemingly relentless drive to blindly stuff AI into just about everything. I share people’s annoyance, but a lot of the complaining seems to frame it as “this stuff is stupid and nobody wants it, so AI is bad”. I agree with the premise but not the conclusion. I think we’re in the midst of one of those once-in-a-generation technological upheavals that always takes the
conventionally minded among the executive class (which is to say, most of them) by surprise. The current tidal wave of foolishness reflects the general unpreparedness that people who are highly adapted to the status quo tend to be prone to when confronted with something outside their experience. It’s basically Kuhn, only for business. I’m old enough to have seen this pattern play out repeatedly over the course of my career. It happened when minicomputers wiped out the mainframe business, then again when personal computers wiped out the minicomputer business. It happened very dramatically with the advent of the Internet and the World Wide Web — remember all those stupid pets.com and MCI ads during the Superbowl back in the day, when “Internet!” was the magic pixie dust of the hour?

In the earliest days of the Internet takeoff, 1993- or 1994-ish, when everyone in the business world could see it was coming but it hadn’t really hit yet, there was an astonishing amount of stupid stuff being rolled out, with truly eye watering quantities of money behind it. It seemed like every few days you’d see another bizarre announcement of a joint venture or a “strategic initiative” from one or more giant companies in the telecommunications and media sectors. My business partners and I referred to this era as “The Dance of The Dinosaurs”, with the doomed incumbents frantically pairing off in unlikely combinations in hopes of finding the DNA mix to survive the New World Order. I once facetiously suggested to my colleagues that we should pitch a proposal to AT&T that was roughly: “You give us $150 million. We’ll do whatever the hell we feel like with it and you’ll never see your money ever again. This is basically the same deal you’ve been making with everybody else; our value-add proposition to you is that we promise we won’t tell anybody that you did this, so that, unlike your other ventures, you’ll be spared the public embarrassment when all the money is gone.” I sometimes wish we had at least made the attempt. I’m pretty sure they would have said no, but I’m entirely confident we would have ended up with some great stories before they did.

The current era of ludicrous ferment around AI reminds me a lot of that time.

The important thing to keep in mind is that just because many of the people currently in charge of things lack a good mental model of what is possible and therefor do a bunch of dumb stuff, it doesn’t mean there isn’t a lot smart stuff on its way to us from the some of the less dumb people. Also, criticism of the dumb things shouldn’t be reflexively generalized. Some of the things that some of the less dumb people are about to do might also warrant criticism, but these things will need to be confronted on their own terms for what they actually are, rather than regarding them as just more of the same. Plus, a lot of the criticism I see of the dumb stuff is also pretty dumb itself, which is easy to overlook if the thing being criticized is already self evidently stupid — somebody can correctly point out that a thing is bad, while at the same time giving a largely mistaken accounting of why it’s bad. For example, I’m about ready to start throwing dishes the next time I hear somebody say “stochastic parrot”.

Finally, I have an issue with the broad movement to inject AI into absolutely everything. As suspected by many who are concerned about AI consequences, one source of the AI push at a lot of companies is a desire to reduce costs by substituting AI for headcount. There is a Big Discussion to be had about the possible and likely economic disruptions that flow from this, but that’s not at the heart of what I want to focus on right now. And in any case that Big Discussion is already going on all around us with some vigor at the moment, so I’m going set the Very Important Questions aside for now on the theory that other people are already deeply engaged with those.

Instead, I want to give a warning about a possible misreading of what I just advocated a few paragraphs above.

If you’re a CEO, particularly if you’re the CEO of a big company, your control over what happens is largely indirect. You can issue arbitrary instructions to pretty much any employee to do particular things or to do things in a particular way, but micromanagement doesn’t scale. Direct command is generally limited to narrow interventions for specific purposes in special circumstances. Most likely, most of your company tends to operate pretty autonomously most of the time, even if the organization has a strict chain of command and you imagine that you run things as an iron fisted dictator. Instead, your main job consists of establishing a vision and a purpose, and a strategy to achieve these, and then trying to communicate all this to the rest of the company, so that people in different parts of the organization can align what they do without constantly having to explicitly coordinate with you or with each other. To the extent that you do exercise direct control, most of it still runs through the senior management team, to whom you have delegated almost everything. A lot of CEOs aren’t very good at this, but that’s how it works. So, to summarize: you’re a person who (hopefully) has an idea in their head of what they want, which might be something big and complicated. You give general directions to a group of semi-autonomous actors who then go and try to do whatever it is, under your guidance. Does this pattern sound familiar?

If you’re a CEO, it may be pretty easy to fall into mapping what you do, and your relationship to your organization, to more or less the same model I articulated above for a human overseeing a flock of AI agents. From there it’s a pretty small step to thinking you can swap out humans for AIs in various boxes on your org chart. They sound like the same thing, right? I can see how this might be seductive, but it’s also likely to be a terrible mistake. The analogy breaks down because a CEO has to delegate the wanting of things to lower levels in the organizational hierarchy — one of the things you are delegating is not just the performance of tasks but judgement and taste. Even if you are brilliant as a CEO, you might well lack a deep sense of what’s good or bad in the realm of, say, graphic design or advertising strategy. Knowing what you want at one level of abstraction does not imply that you know what you should want at another level to get to your desired end state. In a complex organization, there may be many layers of recursively ramified desire unfolding before reaching the point of doing something concrete, like writing a piece of code or purchasing a building or moving things around in a warehouse or ordering parts to build some widget.

My offhand quip above that “most of us aren’t Steve Jobs” hints at another dimension of what is going on here.

If you aren’t a Steve Jobs, you probably don’t have a very strong product vision. It won’t do to tell your AI to “conceive of a product for our company that nobody has ever heard of but that lots of people will want when they do”. It’s too vague, and yet at the same time too anchored in the particulars of what your company is already doing. Instead, you have people for that. Maybe AIs will develop to where they can handle a directive of this sort, but at that point I think they’ve probably become people themselves and then we’re moving on to a whole new conversation. That’s certainly not where we are now nor where we are likely to be for the next couple of years.

If you are a Steve Jobs, it still takes a huge number of creative acts to translate your very strong product vision into an actual artifact that can be manufactured and sold, and no matter how good you are you do not have the breadth of talent to do all the things, and you absolutely don’t have the capacity for the sheer quantity of creative work that’s going to be involved. And this before we even get to the critical ancillary activities like marketing and advertising and branding. And then riding the wave of culture hacking you’re going to find your company doing if you’re ultimately successful.

Even a very conventional, run-of-the-mill business, like, say, a grocery store chain or an electric power utility, still has a lot of creative processes going on at many levels between the executive suite and the completely mechanical stuff on the ground. It seems likely to me that much of that mechanical stuff on the ground is destined to become automated, but this still leaves a lot of other stuff that you’re going to want some person in the driver’s seat for.

Returning to the theme I started with, my adventures with AI and what I’ve learned from it: We have been granted a giant lever, but we’re still very new at figuring out what to do with it.

I’m seeing a lot of social media traffic about what the current AI frenzy is doing to developers. My friend Perry Metzger says “the bulk of the programmers I know are giddy about AI coding”. That matches my own observations, of both the people around me and myself. At the same time, I’m also hearing a number of developers report feeling highly stressed by the situation — not, it’s important to note, by fear of job loss but by a compulsion they feel to be constantly feeding new tasks to their cloud of AI agents whose appetite for work seems bottomless. This isn’t exactly FOMO, but it feels psychologically related. I’m also seeing quite a few reports (mostly from AI critics and skeptics, but not entirely) of teams of developers holed up in an office or apartment somewhere, feverishly cranking around the clock with only the barest breaks to eat or nap, unable to pause or relax. This sounds more like the behavior of drug addicts. There have been times in my past when I also worked frantically around the clock in startup mode. However, in my case it was mostly because I was having a vast amount of fun doing what I was doing rather than the withdrawal pain I’d suffer if I didn’t get my next fix. I’d be a bit inclined to invoke Herb Stein’s Law here — if something cannot go on forever it will eventually stop — if it weren’t that the mode of eventually stopping might involve people doing serious harm to themselves.

I confess I do feel a little bit of this compulsive draw myself too, though I can also report that I’ve found that the way my cheapo Claude Pro account works naturally enforces a pleasant kind of work-life balance: it usually only takes me 20-30 minutes to prompt Claude Code to use up all of my account’s available capacity before it hits the built-in rate limit, at which point I have to wait for the next 5 hour window to open before I can do anything further with the thing — unless I spend money, of course, which is no doubt what they’re trying to stimulate, but I have so far resisted the siren call. In the meantime I write, or work on sorting the enormous book collection I mentioned at the start of this essay, or read, or do some photography, or work on a puzzle, or take a hike, or cook some dinner, or spend time with my family. This is just an artifact of Anthropic’s current billing model, so it’s almost certainly transient and will likely soon mutate as the capacity of their system grows, but it hints at an idea of what’s going on here and possibly how to cope with it.

We developers have been habituated to a world in which the major productivity bottleneck was our own capacity to write and debug code. Having that bottleneck suddenly get vastly wider while you’re in the midst of pushing hard against it is like stepping on the clutch and the gas pedal at the same time — likely to burn up the engine if you don’t catch yourself soon enough.

We need to learn how to accommodate ourselves to our new role, which is wanting things. It will help to have good taste and good judgement; in my experience the very best developers do tend to have both of these, though our industry certainly has no shortage of people who … don’t. As with every paradigm shift, the qualities that lead to success are going to change from what they historically were. This will be painful for the people who had the qualities the old world wanted but not the ones the new world wants, and we would do well to be compassionate to the folks who have difficulty with this transition, but in this regards AI is no different from any of the other twists and turns that have accompanied the progress of humanity. But don’t fall into the trap that thinking these qualities are fixed and immutable, that you are either one of the elect or you are not — a lot of what you are good at now follows from the habits of thought that you have learned and practiced, and most of you are capable of learning and practicing new habits of thought too.

Learn to be a good wanter.

Lessons Learned Technology | Posted by Chip at 2:47 AM | 1 Comment

April 13, 2022

Game Governance Domains: a NFT Support Nightmare

“I was working on an online trading-card game in the early days that had player-to-player card trades enabled through our servers. The vast majority of our customer support emails dealt with requests to reverse a trade because of some kind of trade scams. When I saw Hearthstone’s dust system, I realized it was genius; they probably cut their support costs by around 90% with that move alone.”
Ian Schreiber

A Game’s Governance Domain

There have always been key governance requirements for object trading economies in online games, even before user-generated-content enters the picture. I call this the game’s object governance domain.

Typically, an online game object governance domain has the following features (amongst others omitted for brevity):

There is usually at least one fungible token currency
There is often a mechanism for player-to-player direct exchange
There is often one or more automattic markets to exchange between tokens and objects
1. May be player to player transactions
2. May be operator to player transactions (aka vending and recycling machinery)
3. Managed by the game operator
There is a mechanism for reporting problems/disputes
There is a mechanism for adjudicating conflicts
There are mechanisms for resolving a disputes, including:
1. Reversing transactions
2. Destroying objects
3. Minting and distributing objects
4. Minting and distributing tokens
5. Account, Character, and Legal sanctions
6. Rarely: Changes to TOS and Community Guidelines

In short, the economy is entirely in the ultimate control of the game operator. In effect, anything can be “undone” and injured parties can be “made whole” through an entire range of solutions.

Scary Future: Crypto? Where’s Undo?

Introducing blockchain tokens (BTC, for example) means that certain transactions become “irreversible”, since all transactions on the chain are 1) Atomic and 2) Expensive. In contrast, many thousands of credit-card transactions are reversed every minute of every day (accidental double charges, stolen cards, etc.) Having a market to sell an in-game object for BTC will require extending the governance domain to cover very specific rules about what happens when the purchaser has a conflict with a transaction. Are you really going to tell customers “All BTC transactions are final. No refunds. Even if your kid spent the money without permission. Even if someone stole your wallet”?

Nightmare Future: Game UGC & NFTs? Ack!

At least with your own game governance domain, you had complete control over IP presented in your game and some control, or at least influence, over the games economy. But it gets pretty intense to think about objects/resources created by non-employees being purchased/traded on markets outside of your game governance domain.

When your game allows content that was not created within that game’s governance domain, all bets are off when it comes to trying to service customer support calls. And there will be several orders of magnitude more complaints. Look at Twitter, Facebook, and Youtube and all of the mechanisms they need to support IP-related complaints, abuse complaints, and robot-spam content. Huge teams of folks spending millions of dollars in support of Machine Learning are not able to stem the tide. Those companies’ revenue depends primarily on UGC, so that’s what they have to deal with.

NFTs are no help. They don’t come with any governance support whatsoever. They are an unreliable resource pointer. There is no way to make any testable claims about any single attribute of the resource. When they point to media resources (video, jpg, etc.) there is no way to verify that the resource reference is valid or legal in any governance domain. Might as well be whatever someone randomly uploaded to a photo service – oh wait, it is.

NFTs have been stolen, confused, hijacked, phished, rug-pulled, wash-traded, etc. NFT Images (like all internet images) have been copied, flipped, stolen, misappropriated, and explicitly transformed. There is no undo, and there is no governance domain. OpenSea, because they run a market, gets constant complaints when there is a problem, but they can’t reverse anything. So they madly try to “prevent bad listings” and “punish bad accounts” – all closing the barn door after the horse has left. Oh, and now they are blocking IDs/IPs from sanctioned countries.

So, even if a game tries to accept NFT resources into their game – they end up in the same situation as OpenSea – inheriting all the problems of irreversibility, IP abuse, plus new kinds of harassment with no real way to resolve complaints.

Until blockchain tokens have RL-bank-style undo, and decentralized trading systems provide mechanisms for a reasonable standard of governance, online games should probably just stick with what they know: “If we made it, we’ll deal with any governance problems ourselves.”

Business Education Technology Theory | Posted by Randy at 2:14 PM | No Comments

August 5, 2021

Living Worlds Considered Harmful

A Response to the Documentation of the Living Worlds Working Group
1997-02-27

[A post by Douglas Crockford, recovered from the internet archive.]

Introduction

The Livings Worlds Initiative is the work of a small but dedicated group of VRML developers who have a deep interest in extending VRML into the basis for interconnected virtual worlds. This project has been inextricably bound to a very effective public relations campaign and standards setting effort. The project is still in development, but is already being promoted as an industry standard for virtual worlds.

The Living Worlds Working Group has been signing up a large number of companies as supporters of the effort, including IBM and Apple. What is not clear to most observers is that support means nothing more than agreeing that standards are desirable.

Within the industry, there is common misunderstanding of what support for Living Worlds means, even among the supporters. The result is that support for a Livings Worlds Standard is starting to snowball. It is being regarded as a proposed standard, but it has not had to face any sort of rigorous review. The purpose of this response is to begin the belated review of the Living Worlds Documentation as a proposed standard.

Premature Standardization

There is a growing list of companies that are developing VRML-based virtual worlds. The sector is attracting a lot of attention. Even so, most of the social activity on the Internet today is in IRC and the chat rooms of AOL. The most successfully socialized avatar worlds are WorldsAway and The Palace, neither of which are VRML-based. The VRML worlds have seen a lot of churn, but are not creating significant sustaining communities.

The weakness of community formation in many of the VRML worlds may be the result of the newness of the worlds and the inexperience of the world designers, who have hampered themselves by putting 3D graphics ahead of socialization.

It is too early to be standardizing 3D social architecture. More experimentation is required. If the Living Worlds Initiative is an experiment conducted by a group of cooperating companies, then it is a good thing. If it is a standard-setting activity, then it is premature and harmful.

The operation of 3D worlds has not been shown to be a profitable activity. The business model is driven by affection for Neal Stephenson’s satiric cyberpunk novel, Snow Crash. Snow Crash describes a virtual world called the Metaverse. Some day, we may have a real Metaverse, and it might even be as important in our world as it was in the novel.

Living Worlds does not implement the Metaverse. It only makes something that looks like it, a meta-virtual world.

VRML itself is still new. VRML 2.0 was announced at Siggraph 96, and complete implementations are only now coming on line. The VRML 2.0 initiative was as frenzied as the Living Worlds Initiative, and because of the haste, the result was suboptimal. A consequence is that part of the Living Worlds Initiative contains some workarounds for VRML 2.0 limitations.

Security

The word “security” does not occur in the Living Worlds Documentation except to point out a security hole in VRML 2.0. The lack of attention to security by the Living Worlds Working Group is not a problem if the Initiative is viewed as an experiment. One of the benefits of the experiment will be to demonstrate why security is critical in the design of distributed systems. If the Living Worlds Initiative is setting a standard, then it is harmful.

Security is a very complicated and subtle subject. Absolute security is never achievable, but diligent design and application can produce systems which are worthy of trust.

The Living Worlds Documentation identifies three issues in which distributed security is critical.

handle everything via dynamically downloaded Java applets
protect the local scene from damage by imported objects
support authentication certificates (dice, business cards)

The Documentation does not adequately address any of those issues.

Lacking security at the lowest levels, Living Worlds is not strong enough to offer a credible trust model at the human-interaction level. In systems which can be hacked, concepts like identity, credentials, and accountability are meaningless.

This severely limits the application scope of Living Worlds. Environments which permit interpersonal commerce or confidential collaboration should not be implemented in Living Worlds.

Secure software distribution

Software is the lifeblood of virtual communities. The value and diversity of these systems depend on the ability to dynamically load and execute software from the network. Unfortunately, this raises a huge security concern. Software from the network could contain viruses, trojan horses, and other security threats. Because of the dynamic and interconnected nature of virtual communities, the protection mechanisms provided by Java are not adequate.

The Living Worlds Documentation notes that

…at present, most systems prohibit Java from accessing local files, which makes it impossible, for example, to connect to locally installed third party software features. Until this problem is generically solved by the Java community, the management of downloads and local access are left to proprietary MUtech solutions.

The proprietary MUtech solutions will create a security hole, and possibly compromise the goal of interoperability at the same time. In order for the dynamic, distributed virtual community to be viable, the issue of secure software distribution must be solved from the outset. Class signing is not a solution. A secure, distributed architecture is required. It is doubtful that credible security mechanisms can be retrofitted later.

Protect the local scene

Related to the problem of software distribution is the question of rights granted to objects. Objects that are valued in some places might be obnoxious or dangerous in others. The Living Worlds Documentation describes an incontinent puppy as an example of such an object. A secure architecture needs to deal with these sorts of issues from the outset. The Living Worlds Documentation identifies the problem, but does not solve it.

Authentication

The Living Worlds Documentation calls for the use of authentication certificates as a mechanism for assuring confidence in certain objects. Unfortunately, if the architecture is not secure, there is not a reliable context in which certificates can be trusted. Because Living Worlds is hackable, certified objects can be compromised.

Community

Communities need tools with which they can create policies to regulate the social order. Different communities will have different needs, conventions, standards. The Living Worlds Documentation says this about the task of designing those tools:

Two things seem clear. First, that designing a persuasively complete infrastructure for managing user rights, roles and rules is an essentially open-ended task. Second that building a simple, open-ended framework for the same domain can probably be completed with very little effort.

Unfortunately, the Working Group does not adequately understand the issues involved. They will create a tool based on a limited understanding of the problem, attempt to drive it into a standard, and leave to others the social and administrative headaches it will cause.

This general strategy applies to the rest of the Living Worlds effort as well. Our goal is to reach quick consensus on a minimal subset, and then to encourage the rapid creation of several reference implementations of that proposed feature set. Refinement of the concepts can then proceed in an atmosphere usefully disciplined by implementation experience.

Problems of this kind cannot be solved by refinement.

Incomplete

If the Living Worlds Documentation were just the work in progress of a working group, then it is appropriate that they publish their materials on the net for comment by interested parties, and it would be absurd to point out that the work is unfinished. But because it is also being presented publicly as a networking standard, and because the Living Worlds Working Group has already begun the work of standard setting, the Documentation needs to be tested for its fitness as a standard.

If the Living Worlds Documentation is read as a proposed standard, then it should be rejected out-of-hand, simply because it is incomplete. In its present form, the Living Worlds Documentation is not even complete enough to criticize.

Principles

The Living Worlds Working Group selected a set of principles to guide the development process. Membership in the working group is open to anyone who can accept the principles. This is a reasonable way for a working group to define itself. Unfortunately, the principles of the Working Group are problematic for a standards body. While the Living World Documentation is not complete enough to criticize, the principles and basic architecture can be criticized.

Build on VRML 2.0.Use VRML 2.0 would have been a better first principle. By Building on VRML 2.0, the Working Group is hoping to work some or all of the Livings Worlds work into the VRML 3.0 standard, thereby increasing the importance of the Living Worlds Standard.This component-oriented principle led the Working Group to put the display processor in the center of a distributed architecture, ignoring decades of experience in the separation of I/O from other computational goals.Fortunately, the recent moderating influence of the Mitsubishi Electric Research Laboratory (MERL) has opened the Living Worlds Working Group to the possibility of other presentation models. Unfortunately, the Living Worlds Architecture is already fixed on a set of unwieldy interfaces which were motivated by a VRML-centric design space.
Standards, not designs.The second principle is intended to give implementers a large amount of leeway in realizing the standard. The amount of leeway is so great that it might not be possible for independent implementations to interoperate with implementations developed by the Working Group. Since that is specifically what a standard is supposed to accomplish, the second principle is self-defeating.The other benefit of the second principle to the Working Group is to provide an expedient method of dealing with disputes. When the members of the Working Group do not agree on an architectural point, they agree to disagree and leave the choice to the implementer. Sometimes the reason they do not agree is because they were confronting an essential, hard problem.
Architectural Agnosticism.The third principle concerns the question of centralized (server-based) or decentralized (distributed) architecture. Centralized social networking systems often suffer from performance problems because the server can become a bottleneck. The Working Group therefore wants to keep the option of decentralization open:
A centralized architecture cannot be transformed into a decentralized architecture simply by being vague about the role of the server. Decentralized design requires the solution of a large number of hard problems, particularly problems of security. An insecure architecture can facilitate the victimization of avatar owners by miscreants. Insecurity will also limit the architecture to supporting only limited interactions, such as chat. Higher value services like cooperative work and interpersonal commerce require a secure platform. Such a platform is not a goal or result of the Living Worlds Initiative.Because the third principle does not explicitly call for the solution to the problems of secure decentralization, it is self-defeating, resulting in an implementations which are either insecure or devoutly centralist or both.
Respect the role of the market.In the fourth principle, the Working Group chooses this unfortunate non-goal:
The process does not pay adequate attention to the consequences of the design. The goal of the Working Group is to establish a standard early, relying on iteration in the maintenance of the standard to make it, if not the best imaginable, then good enough for commercial use. The process is not forward-looking enough to provide confidence that the standard can be corrected later. Significant architectural features, such as security, are extremely difficult to insert compatibly into existing systems.
Require running code.The fifth principle appears to be the most respectable, but when coupled with the urgency and recklessness of the fourth principle, it becomes the most dangerous.A standards development process that requires demonstration of new techniques before incorporating them into the standard can be a very good thing because it provides assurance that the standard is implementable. It can also provide early evidence of the utility of the new techniques.But if such a process is driven by extreme time pressure, as the Living Worlds Working Group is, then the fifth principle has a terrible result: only ideas with quick and dirty implementations can be considered. The Working Group will finish its work before hard problems can be understood and real solutions can be produced.So, by principle, the Working Group is open, but not to good ideas that will require time and effort to realize.

Conclusion

The software industry sometimes observes that its problems are due to not having standards, or to having too many standards. Often, its problems are due to having standards that are not good enough.

Premature standardization in the area of virtual worlds will not assure success.

The Living Worlds Initiative is a model for cooperative research, and as such it should be encouraged. The Working Group is using the net to create a virtual community of software developers working together on a common project. This is very good.

Unfortunately, the Living Worlds Initiative is also a standards-setting initiative, building on the momentum of the recent VRML 2.0 standard. It would be harmful to adopt the Living Worlds Initiative as a standard at this time.

History Lessons Learned Open Theory | Posted by Randy at 4:37 PM | 2 Comments

August 28, 2019

The Unum Pattern

Warning: absurd technical geekery ahead — even compared to the kinds of things I normally talk about. Seriously. But things will be back to my normal level of still-pretty-geeky-but-basically-approachable soon enough.

[Historical note: This post has been a long time in the making — the roots of the fundamental idea being described here go back to the original Habitat system (although we didn’t realize it at the time). It describes a software design pattern for distributed objects — which we call the “unum” — that I and some of my co-conspirators at various companies have used to good effect in many different systems, but which is still obscure even among the people who do the kinds of things I do. In particular, I’ve described this stuff in conversation with lots of people over the years and a few of them have published descriptions of what they understood, but their writeups haven’t, to my sensibilities at least, quite captured the idea as I conceive of it. But I’ve only got myself to blame for this as I’ve been lax in actually writing down my own understanding of the idea, for all the usual reasons one has for not getting around to doing things one should be doing.]

Consider a distributed, multi-participant virtual world such as Habitat or one of its myriad descendants. This world is by its nature very object oriented, but not in quite the same way that we mean when we talk about, for example, object oriented programming. This is confusing because the implementation is, itself, very object oriented, in exactly the object oriented programming sense.

Imagine being in this virtual world somewhere, say, in a room in a building in downtown Populopolis. And there is a table in the room and sitting on the table is a teacup. Well, I said you were in the virtual world, but you’re not really in it, your avatar is in it, and you are just interacting with it through the mediation of some kind of client software running on your local computer (or perhaps these days on your phone), which is in turn communicating over a network connection to a server somewhere. So the question arises, where is the teacup, really? Certainly there is a representation of the teacup inside your local computer, but there is also a representation of the teacup inside the server. And if I am in the room with you (well, my avatar, but that’s not important right now), then there’s also a representation of the teacup inside my local computer. So is the teacup in your computer or in my computer or in the server? One reasonable answer is “all of the above”, but in my experience a lot of technical people will say that it’s “really” in the server, since they regard the server as the source of truth. But the correct answer is that the teacup is on a table in a room inside a building in Populopolis. The teacup occupies a different plane of existence from the software objects that are used to realize it. It has an objective identity of its own — if you and I each refer to it, we are talking about the same teacup — but this identity is entirely distinct from the identities of any of those software objects. And it has such an identity, because even though it’s on a different plane there still needs to be some kind of actual identifier that can be used in the communications protocols that the clients and the server use to talk to each other, so that they can refer to the teacup when they describe their manipulations of it and the things that happen to it.

You might distinguish between these two senses of “object” by using phrases with modifiers; for example, you might say “world object” versus “OOP object”, and in fact that is what we did for several years. However, this terminology made it easy to fall back on the shorthand of just talking about “objects” when it was clear from context which of these two meanings of “object” you meant. Of course, it often turned out that this context wasn’t actually clear to somebody in the conversation, with confusion and misunderstanding as the common result. So after a few false starts at crafting alternative jargon we settled on using the term “object” to always refer to an OOP object in an implementation and the term “unum”, from the latin meaning a single thing, to refer to a world object. This term has worked well for us, aside from endless debates about whether the plural is properly “una” or “unums” (my opinion is: take your pick; people will know what you mean either way).

Of course, we still have to explain the relationship between the unum and its implementation. The objects (using that word from now on according to our revised terminology) that realize the unum do live at particular memory addresses in particular computers. We think of the unum, in contrast, as having a distributed existence. We speak of the portion of the unum that resides in a particular machine as a “presence”. So to go back to the example I started with, the teacup unum has a presence on the server and presences on each of our client machines.

(As an aside, for the first few years of trying to explain to people how Habitat worked, I would sometimes find myself in confused discussions about “distributed objects”, by which the people with whom I was talking meant individual objects that were located at different places on the network, whereas I meant objects that were themselves distributed entities. I didn’t at first realize these conversations were at cross purposes because the model we had developed for Habitat seemed natural and obvious to me at the time — how else could it possibly work, after all? — and it took me a while to twig to the fact that other people conceived of things in a very different way. Another reason for introducing a new word.)

In the teacup example, we have a server presence and some number of client presences. The client presences are concerned with presenting the unum to their local users while the server presence is concerned with keeping track of that portion of the unum’s state which all the users share. Phrased this way, many people find the presence abstraction very natural, but it sometimes leads them to jump to conclusions about what is going on, resulting in still more confusion and conversation at cross purposes. People who implement distributed systems often build on top of frameworks that provide services like data replication, and so it is easy to fall into thinking of the server presence as the “real” version of the unum and the client presences as shadow copies that maintain a (perhaps slightly out of date) cached representation of the true state. Or thinking of the client presences as proxies of some kind. This is not exactly wrong, in the sense that you can certainly build systems that work this way, as many distributed applications — possibly including most commercially successful MMOs — actually do. However, it’s not the model I’m describing here.

One problem with data replication based schemes is that they don’t gracefully accommodate use cases that require some information be withheld from some participants (it’s not that you absolutely can’t do this, but it’s awkward and cuts against the grain). It’s not just that the server is authoritative about shared state, but also that the server is allowed to take into account private state that the clients don’t have, in order to determine how the shared state changes over time and in response to events.

A server presence and a client presence are not doing the same job. The fundamental underlying concept that presences embody is not some notion of master vs. replica, but division of labor. Each has distinct responsibilities in the joint work of being the unum. Each is authoritative about different aspects of the unum’s existence (and typically each will maintain private state of their own that they do not share with the other). In the case of the client-server model in our example, the client presence manages client-side concerns such as the state of the display. It worries about things like 3D rendering, animation sequencing, and presenting controls to the human user to manipulate the teacup with. The server keeps track of things like the physical model of the teacup within the virtual world. It worries about the interactions between the teacup and the table, for example. Each presence knows things that are none of the other presence’s business, either because that information is simply outside the scope of what the other presence does (such as the current animation frame or the force being applied to the table) or because it’s something the other presence is not supposed to know (such as the server knowing that this particular teacup has a hidden flaw that will cause it to break into several pieces if you pour hot water into it, revealing a secret message inscribed on the edges where it comes apart). The various different client presences may also have information they do not share with each other for reasons of function or privacy. For example, one client might do 3D rendering in a GUI window while another presents only a textual description with a command line interface. Perhaps the server has revealed the secret message hidden in the teacup to my client (and to none of the others) because I possess a magic amulet that lets me see such things.

We can loosely talk about “sending a message to an unum”, but the sending of messages is an OOP concept rather than a world model concept. Sending a message to an unum (which is not an object) is really sending a message to some presence of that unum (since a presence is an object). This means that to designate the target of such a message, the address needs two components: (1) the identity of the unum and (2) an indicator of which presences of that unum you want to talk to.

In the systems I’ve implemented (including Habitat, but also, perhaps more usefully for anyone who wants to play with these ideas, its Nth generation descendant, the Elko server framework), the objects on a machine within a given application all run inside a common execution environment — what we now call a “vat”. Cross-machine messages are transported over communications channels established between vats. In such a system, from a vat’s perspective the external presences of a given unum (that is, presences other than the local one) are thus in one-to-one correspondence with the message channels to the other vats that host those presences, so you can designate a presence by indicating the channel that leads to its vat. (For those presences you can talk to, anyway: the unum model does not require that a presence be able to directly communicate with all the other presences. For example, in the case of a Habitat or Elko-style system such as I am describing here, clients don’t talk to other clients, but only to the server.)

Here we encounter an asymmetry between client and server that is another frequent source of confusion. From the client’s perspective, there is only one open message channel — the one that talks to the server — and so the only other unum presence a client knows about is the server presence. In this situation, the identifier of the unum is sufficient to determine where a message should be sent, since there is only one possibility. Developers working on client-side code don’t have to distinguish between “send a message to the unum” and “send a message to the server presence of the unum”. Consequently, they can program to the conventional model of “send messages to objects on the other end of the connection” and everything works more or less the way they are used to. On the server side, however, things get more interesting. Here we encounter something that people accustomed to developing in the web world have usually never experienced: server code that is simultaneously in communication with multiple clients. This is where working with the unum pattern suddenly becomes very different, and also where it acquires much of its power and usefulness.

In the client-server unum model, the server can communicate with all of an unum’s client presences. Although a given message could be sent to any of them, or to all of them, or to any arbitrary subset of them, in practice we’ve found that a small number of messaging patterns suffice to capture everything we’ve wanted to do. More specifically, there are four patterns that in our experience are repeatedly useful, to the point where we’ve codified these in the messaging libraries we use to implement distributed applications. We call these four messaging patterns Reply, Neighbor, Broadcast, and Point, all framed in the context of processing some message that has been received by the server presence from one of the clients; among other things, this context identifies which client it was who sent it. A Reply message is directed back to the client presence that sent the message the server is processing. A Point message is directed to a specific client presence chosen by the server; this is similar to a Reply message except that the recipient is explicit rather than implied and could be any client regardless of context. A Broadcast message is sent to all the client presences, while a Neighbor message is directed to all the client presences except the sender of the message that the server is processing. The latter pattern is the one that people coming to the unum model for the first time tend to find weird; I’ll explain its use in a moment.

(Some people jump to the idea these four are all generalizations of the Point message, thinking it a good primitive to actually implement the other three, but in the systems we’ve built the messaging primitive is a lower level construct that handles fanout and routing for one or many recipients with a single, common mechanism so that we don’t have to multiply buffer the message if it has more than one target. In practice, we use Point messages rather rarely; in fact, using a Point message usually indicates that you’re doing something odd.)

The reason for there being multiple client presences in the first place is that the presences all share a common context in which the actions of one client can affect the others. This is in contrast to the classic web model in which each client is engaged in its own one-on-one dialog with the server, pretty much unrelated to any simultaneous dialogs the server might be having with other clients that just happen to be connected to it at the same time. However, the multiple-clients-in-a-shared-context model is a very good match for the kinds of online game and virtual world applications for which it was originated (it’s not that you can’t realize those kinds of applications using the web model, but, like the comment I made above about data replication, it’s cutting against the grain — it’s not a natural thing for web servers to do).

Actions initiated by a client typically take the form of a request message from that client to an unum’s server presence. The server’s handler for this message takes whatever actions are appropriate, then sends a Reply message back informing the requestor of the results of the action, along with a Neighbor message to the other client presences informing them of what just happened. The Reply and Neighbor messages generally have different payloads since the requestor typically already knows what’s going on and often merely needs a status result, whereas the other clients need to be informed of the action de novo. It is also common for the requestor to be a client that is in some privileged role with respect to the unum (perhaps the sending client is associated with the unum’s owner or holder, for example), and thus entitled to be given additional information in the Reply that is not shared with the other clients.

Actions initiated by the server, on the other hand, typically will be communicated to all the clients using the Broadcast pattern, since in this case none of the clients start out knowing what’s going on and thus all require the same information. The fact that the server can autonomously initiate actions is another difference between these kinds of systems and traditional web applications (server initiated actions are now supported by HTTP/2, albeit in a strange, inside out kind of way, but as far as I can tell they have yet to become part of the typical web application developer’s toolkit).

A direction that some people immediately want to go is to attempt to reduce the variety of messaging patterns by treating the coordination among presences as a data replication problem, which I’ve already said is not what we’re doing here. At the heart of this idea is a sense that you might make the development of presences simpler by reducing the differences between them — that rather than developing a client presence and a server presence as separate pieces of code, you could have a single implementation that will serve both ends of the connection (I can’t count the number of times I’ve seen game companies try to turn single player games into multiplayer games this way, and the results are usually pretty awful). Alternatively, one could implement one end and have the other be some kind of standardized one-side-fits-all thing that has no type-specific logic of its own. One issue with either of these approaches is how you handle the asymmetric information patterns inherent in the world model, but another is the division of labor itself. Systems built on the unum pattern tend to have message interfaces that are fundamentally about behavior rather than about data. That is, what is important about an unum is what it does. Habitat’s design was driven to a very large degree by the need for it to work effectively over 300 and 1200 baud connections. Behavioral protocols are vastly more effective at economizing on bandwidth than data based protocols. One way to think of this is as a form of highly optimized, knowledge-based data compression: if you already know what the possible actions are that can transform the state of something, a parameterized operation can often be represented much more compactly than can all state that is changed as a consequence of the action’s execution. In some sense, the unum pattern is about as anti-REST as you can be.

One idea that I think merits a lot more exploration is this: given the fundamental idea that an unum’s presences are factored according to a division of labor, are there other divisions of labor besides client-server that might be useful? I have a strong intuition that the answer is yes, but I don’t as yet have a lot of justification for that intuition. One obvious pattern to look into is a pure peer-to-peer model, where all presences are equally authoritative and the “true” state of reality is determined by some kind of distributed consensus mechanism. This is a notion we tinkered with a little bit at Electric Communities, but not to any particular conclusion. For the moment, this remains a research question.

One of the things we did do at Electric Communities was build a system where the client-server distinction was on a per-unum basis, rather than “client” and “server” being roles assigned to the two ends of a network connection. To return to our example of a teacup on a table in a room, you might have the server presence of the teacup be on machine A, with machines B and C acting as clients, while machine B is the server for the table and machine C is the server for the room. Obviously, this can only happen if there is N-way connectivity among all the participants, in contrast to the traditional two-way connectivity we use in the web, though whether this is realized via pairwise connections to a central routing hub or as a true crossbar is left as an implementation detail. This kind of per-unum relationship typing was one of the keys to our strategy for making our framework support a world that was both decentralized and openly extensible. (Continuing with the question raised in the last paragraph, an obvious generalization would be to allow the division of labor scheme itself vary from one unum to another. This suggests that a system whose unums are all initially structured according to the client-server model could still potentially act as a test bed for different schemes for dividing up functionality over the network.)

Having the locus of authoritativeness with respect to shared state vary from one unum to another opens up lots of interesting questions about the semantics of inter-unum relationships. In particular, there is a fairly broad set of issues that at Electric Communities we came to refer to as “the containership problem”, concerning how to model one unum containing another when the una are hosted on separate machines, and especially how to deal with changes in the containership relation. For example, let’s say we want to take our teacup that’s sitting on the table and put it into a box that’s on the table next to it. Is that an operation on the teacup or on the box? If we have the teacup be authoritative about what its container is, it could conceivably teleport itself from one place to another, or insert itself into places it doesn’t belong. On the other hand, if we have the box be authoritative about what it contains, then it could claim to contain (or not contain) anything it decides it wants. Obviously there needs to be some kind of handshake between the two (or between the three, if what we’re doing is moving an unum from one container to another, since both containers may have an interest — or among the two or three and whatever entity is initiating the change of containership, since that entity too may have something to say about things), but what form that handshake takes leads to a research program probably worthy of being somebody’s PhD thesis project.

Setting aside some of these more exotic possibilities for a moment, we have found the unum pattern to be a powerful and effective tool for implementing virtual worlds and lots of other applications that have some kind of world-like flavor, which, once you start looking at things through a world builder’s lens, is a fairly diverse lot, including smart contracts, multi-party negotiations, auctions, chat systems, presentation and conferencing systems, and, of course, all kinds of multiplayer games. And if you dig into some of the weirder things that we never had the chance to get into in too much depth, I think you have a rich universe of possibilities that is still ripe for exploration.

History Technology Theory | Posted by Chip at 11:42 PM | 7 Comments

May 1, 2019

Another Thing Found While Packing to Move

Getting ready to move has turned up all kinds of lost treasures. Here’s a publicity photo of the original Habitat programming team, taken next to a storage shed at Skywalker Ranch in 1987:

Left to right: Aric Wilmunder, Chip Morningstar, Janet Hunter, Randy Farmer

There were a couple of other developers who coded bits and pieces, but these four are the ones who lived and breathed the project full time for almost three years.

I particularly like this picture because it’s the only one I have that includes Janet Hunter. Janet was the main Habitat developer at QuantumLink. I think we shot this during one of Janet’s rare visits out west, since she was based in the Washington DC area where QuantumLink was. She wrote most of the non-game-specific parts of the original Habitat server and set the architectural pattern for nearly all the servers I’ve implemented since then.

It’s hard to believe I was ever that young, that thin, or had that much hair.

History | Posted by Chip at 7:36 PM | No Comments

March 9, 2019

A Lost Treasure of Xanadu

Some years ago I found the cover sheet to a lost Xanadu architecture document, which I turned into this blog post for your amusement. Several people commented to me at the time that they wished they could see the whole document it was attached to. Alas, it appeared to have vanished forever.

Last weekend I found it! I turned up a copy of the complete document while sorting through old crap in preparation for having to move in the next few months. Now that I’ve found it I’m putting it online so it can get hoovered up by the internet hive mind. This is the paradox of the internet — nothing is permanent and nothing ever goes away.

It is here.

This is a document I wrote in early 1984 at the behest of the System Development Foundation as part of Xanadu’s quest for funding. It is a detailed explanation of the Xanadu architecture, its core data structures, and the theory that underlies those data structures, along with a (really quite laughable) project plan for completing the system.

At the time, we regarded all the internal details of how Xanadu worked as deep and dark trade secrets, mostly because in that pre- open source era we were stupid about intellectual property. As a consequence of this foolish secretive stance, it was never widely circulated and subsequently disappeared into the archives, apparently lost for all time. Until today!

What I found was a bound printout, which I’ve scanned and OCR’d. The quality of the OCR is not 100% wonderful, but as far as I know no vestige of the original electronic form remains, so this is what we’ve got. I’ve applied minimal editing, aside from removing a section containing personal information about several of the people in the project, in the interest of those folks’ privacy.

Anyone so inclined is quite welcome, indeed encouraged, to attempt a better conversion to a more suitable format. I’d do that myself but I really don’t have the time at the moment.

This should be of interest to anyone who is curious about the history of Project Xanadu or its technical particulars. I’m not sure where the data structures rank given the subsequent 35 or so years of advance in computer science, but I think it’s still possible there’s some genuinely groundbreaking stuff in there.

History Technology | Posted by Chip at 11:57 PM | 12 Comments

January 8, 2018

Pool Abuse

Early in my tenure at PayPal I had a conversation with somebody who was having some kind of problem with a cromfelter pool (he didn’t actually say “cromfelter”, that’s a placeholder word I just made up; he was talking about a specific thing, but which specific thing he was talking about is of no concern here). I had a moment of unease and realized that my career has conditioned me to see any use of the word “pool” as a red flag — or maybe a yellow flag: it’s not that it means something is necessarily wrong, but it signals something wrong often enough that it’s probably worth taking a closer look, because the odds of finding something a bit off are reasonably good.

First, I should explain what I mean by the word “pool” in the context of this discussion. I mean some collection of some resource that is needed frequently but usually temporarily for some purpose by various entities. Things that are pooled in this sense include memory, disk space, IP addresses, port numbers, threads, worker processes, data connections, bandwidth, and cache space, as well as non-computational resources like cars, bicycles, bricklayers, journalists, and substitute teachers.

The general pattern is that a collection of resources of some type are placed under the control of an entity we’ll call the “pool manager”. The pool manager holds onto the collection (the “pool”) and regulates access to it. When some other entity, which we’ll call the “customer”, needs one of these resources (or some quantity of resources — the concept is basically the same regardless of whether the resources in question are discrete or not), they request it from the pool manager. The pool manager removes an instance of the resource from the pool and hands it to the customer. The customer then makes use of the resource, treating it as their own for the duration. When the customer is finished with the resource, they hand it back to the pool manager, who puts it back into the pool. There are many variations of this pattern, depending on how (or even whether) the pool manager keeps track of which customers have which resources, whether the pool manager has the power to take the resources back unilaterally, what happens when the pool is exhausted (create more, buy more, just say no, etc.), what happens when the pool gets overly full (destroy some, sell some, just keep them, etc.), whether there is some notion of priority, or pricing, or grades of resource (QOS, etc.) and so on. The range of possible variations is basically infinite.

As a generic pattern, there is nothing intrinsically objectionable about this. There are many resources that necessarily have to be managed like this somehow if we are to have any hope of arbitrating among the various entities that want to use them. (There’s a large literature exploring the many varied and interesting approaches to how you do this arbitrating, but that’s outside the scope of this essay.) However, in my experience it’s fairly common for the pool pattern to be seized upon as the solution to some problem, when it is merely a way to obtain symptomatic relief without confronting the underlying reality that gives rise to the problem in the first place, thus introducing complication and other costs that might have been avoided if we had actually attacked the fundamental underlying issues directly. Of course, sometimes symptomatic relief is the best we can do. If the underlying problem, even if well understood, lies in some domain over which we have no control ourselves (for example, if it’s in a component provided by an outside vendor who we are powerless to influence), we may just be stuck. But part of my purpose here is to encourage you to question whether or not this is actually your situation.

An example of a reasonable use of the pool pattern is a storage allocator. Memory is a fundamental resource. It makes sense to delegate its management to an external entity that can encapsulate all the particulars and idiosyncrasies of dealing with the operating system and the underlying memory management hardware, for mastering the complexities of increasingly sophisticated allocation algorithms, for arbitrating among diverse clients, and for adjusting to the changing demands imposed by changing use patterns over time. Note that one of the key indicators of the pool pattern’s appropriateness is that the resource is truly fundamental, in the sense that it isn’t a higher level abstraction that we could assemble ourselves if we needed to. Memory certainly fits this criterion — remarkably few applications are in a position, when discovering they are low on memory, to place an order with NewEgg for more RAM, and have it shipped to them and installed in the computer they are running on.

An example of a more questionable use of the pool pattern is a thread pool, where we maintain a collection of quiescent computational threads from which members are pulled to do work as work arrives to be done (such as responding to incoming HTTP requests). A thread is not a fundamental resource at all, but something we can create on demand according to our own specifications. The usual motivation for thread pools is that the cost of launching a new thread is seen, rightly or wrongly, to be high compared to the cost of retasking an existing thread. However, though a thread pool may address this cost, there can be other issues that it then introduces.

The first, of course, is that your analysis of the cost problem could be wrong. It is frequently the case that we develop intuitions about such things that are rooted in technological particulars of time and place in which we learned them, but technology evolves independent of whether or not we bother to keep our intuitions in synch with it. For example, here are some beliefs that used to be true but are no longer, but which are still held by some of my contemporaries: integer arithmetic is always much faster than floating point, operations on individual bytes are faster than operations on larger word sizes, doing a table lookup is faster than doing a multiplication, and writable CDs are an efficient and cost effective backup medium. In the case of threads, it turns out that some operating systems can launch a new process faster than you could pull a thread out of some kind of pool data structure and transfer control to it (though this does not account for the fact that you might also have a bunch of time consuming, application specific initialization that you would then have to do, which could be a problem and tilt the balance in favor of pooling anyway).

Another potential problem is that because a thread is not a fundamental resource, you are introducing an opportunity for unnecessary resource contention at a different level of abstraction. Each thread consumes memory and processor time, contending with other threads in the system, including ones that are not part of your thread pool. For instance, you might have two applications running, each of which wants to maintain a thread pool. The thread pool idea encourages each application’s developer to imagine that he or she is in complete control of the available thread resources, which of course they are not. Idle threads in one application tie up resources that could be used for doing work in the other. All the strategies for coping with this yourself are unsatisfactory in some way. You could say “don’t run more than one thread-pooled application on a machine”, but this is just relocating the boundary between the application with the unused thread resources and the application that needs them, forcing it to correspond to the machine boundary. Or you could require the allotment of threads to the two applications to be coordinated, but this introduces a complicated coupling between them when there might be no other reason for the people responsible for each to ever have to interact at all. Or you could combine the two applications in some way so that they share a single thread pool, but this introduces a potential bonanza of new ways for the two applications to interfere with each other in normal operation. If threads really were a fundamental resource, you would delegate their management to some kind of external manager; nominally this would be the operating system, but in fact is a big part of the service provided by so called “application containers” such as Java application servers like JBoss or Tomcat. However, these containers often provide much poorer isolation across application boundaries than do, say, operating system processes in a reasonable OS, and so even in these environments we still find ourselves falling back on one of the unsatisfactory approaches just discussed above, such as dedicating a separate app server process or even machine to each application. Note, by the way, that the problem of different uses contending for blocks of a common resource type is a perennial issue with all kinds of caching strategies, even within the scope of a single application in a single address space!

A final, related problem with things like thread pools is configuration management — basically, how many threads do you tell the application to create for its thread pool? The answer, of course, is that you have no a priori or analytic way to know, so you end up with an additional test-and-tune cycle as part of your development and deployment process, when you would like to be able to just tell the OS, “here, you take care of this.” Either that or you end up trying to have your pool manager figure this out dynamically, which ultimately leads to recapitulating the history of operating system kernel development but without the benefit of all the lessons that OS developers have learned over the years.

Despite all of these issues, it may still prove the case that a thread pool is the best option available given the functional constraints that you must operate within. However, it’s at least worth asking the question of whether the cost benefit tradeoff is favorable rather than simply assuming that you have to do things in a particular way because you just know something that maybe you don’t.

More generally, the kinds of potential issues I just described in the context of thread pools may apply in any kind of circumstance where what you are pooling is not a fundamental resource. Such pooling is often motivated by unexamined theories of resource cost. For example, network connections are sometimes pooled because they are perceived as expensive to have open, whereas often the expense is not the connection per se but some other expensive component that happens to be correlated with the connection, such as a process or a thread or a database cursor. In such a case, rearchitecting to break this correlation may be a superior approach to introducing another kind of pool. Pools can be used to amortize the cost of something expensive to create over multiple uses, or to share a resource that is expensive to hold. In either case, it may be more effective to confront that cost directly. It is possible that your beliefs about the cost are simply wrong, or that by creating or holding the resource in some other way the actual cost can be reduced or eliminated.

For all these reasons, when I hear that you have a pool, my first instinct is to ask if you really need it and if it is a good idea or not. It is entirely possible that the answer to both questions will be “yes”, but you should at least consider the matter.

Technology | Posted by Chip at 5:24 PM | 3 Comments

November 24, 2017

A Slightly Skeptical Perspective on REST

A few years ago, the set of design principles traveling under the banner of REST became the New Hotness in the arena of network services architecture. I followed the development of these ideas pretty casually, until as part of my work in the PayPal Central Architecture organization, I ended up having to do a deeper dive.

We were then experiencing an increase in the frequency with which various parts of the organization were tilting up new services behind REST interfaces, or were considering how or whether to put RESTful facades on existing services, or who were simply seeking guidance as to how they should proceed with respect to this whole REST fad.

For me, a key piece of absorbing new technological ideas is getting my own head straight as to how I feel about it all. By way of background, I’m one of those people who isn’t entirely sure what he thinks until he hears what he says, so a big part of this was working out my own ideas through the act of trying to articulate them. What follows is a collection of thoughts I have on REST that I formulated in the course of my work on this. I don’t at this point have a particular overarching thesis that unifies these, aside from a vague hope that these musings may have some value to other folks’ cogitations on the topic.

I’ll warn you in advance that some of this may make some of you a little cranky and driven to try to ‘splain things to me. But I’m already a little cranky about this, so that’ll just make us even.

Is it REST?

One of the first things that strikes me is the level of dogma that seems to surround the concept of REST. Ironically, this seems to be rooted in the lack of a clear and reasonably objective declaration of what it actually is, leaving open a wide spectrum of interpretations for people to accuse each other of not hewing to. Roy Fielding, the originator of these ideas, for his part, has succeeded in being a rather enigmatic oracle, issuing declarations that are at once assured and obscure. I once quipped to somebody after a meeting on the topic that a key requirement for whatever we came up with be that it enable people to continue arguing about whether any particular service interface was or was not RESTful, since the pervasiveness of such debates seems to be the signature element of the REST movement. Although I am generally a very opinionated person with respect to all matters technical, I have zero or less interest in such debates. I don’t think we should be concerned with whether something passes a definitional litmus test when people can’t entirely agree on what the definition even is. I’m much more concerned with whether the thing works well for the purposes to which it is put. Consequently, I don’t think the “is it really REST?” question need loom very large in one’s deliberations, though it inevitably always seems to.

Pro and Con

There are several ideas from REST that I think are extremely helpful. In no particular order, I’d say these include:

You should avoid adding layers of intermediation that don’t actually add value. In particular, if HTTP already does a particular job adequately well, just go ahead and let that job be done by HTTP itself. Implicit in this is the recognition that HTTP already does a lot of the jobs that we build application frameworks to do.
It is better to embed the designators for relevant resources in the actual information that is communicated rather than in a protocol specification document.
More generally, it can be worthwhile to trade bandwidth for a reduction in interpretive latitude; that is, explicitly communicate more stuff rather than letting the other end figure things out implicitly, because if one end needs to figure something out it could figure it out wrong. Bandwidth is getting cheaper all the time, whereas bugs are eternal.
To designate all kinds of things, URIs are good enough.
Simple, robust, extensible, general purpose data representations are better than fussy and precise protocol-specific ones.

Other ideas from REST I think are less helpful. For example:

The belief that the fundamental method verbs of HTTP are mostly all you need for just about every purpose. I’ve observed this is sometimes accompanied by the odd and rather ahistorical assertion that the verbs of HTTP constitute a carefully conceived set with tight, well defined semantics, as opposed to an ad hoc collection of operations that seemed more or less sufficient at the time. This historical conceit is not actually material to the substance or value of the principles involved, but does seem to generate a lot of unproductive discussion with people talking at cross purposes about irrelevant background details.
The boundless faith in the efficacy of caching as the all purpose solvent for scalability and performance issues.
The sometimes awkward shoehorning of application-level abstractions onto the available affordances of HTTP. For example, mapping the defined set of standard HTTP error codes onto the set of possible problems in a specific application often seems to do violence to both sets. Another example would be the sometime layering of magical and unintuitive secondary application semantics onto operations like PUT or DELETE.

I got yer resource right here…

One of the fundamental ideas that underlies REST is the conceptualization of every important application abstraction as a resource that is manipulated by passing around representations of it. Although the term “resource” is sufficiently vague and general that it can be repurposed to suit almost any need, REST applications typically relate to their resources in a descriptive mode. The worlds that REST applications deal with tend to consist mainly of data objects whose primary characteristic is their current state; for this, REST is a good match. I think that a key reason why REST has gotten as much traction as it has is because many of the applications we want to develop fit this mold.

In contrast, more traditional architectures — the ones that REST self-consciously sets itself apart from — typically relate to their resources in an imperative mode. This aligns well with an application universe that consists mainly of functional objects whose primary characteristic is their behavior. Since many kinds of applications can be just as well conceived of either way (i.e., representational or behavioral), the advocates of REST would argue that you are better off adopting the representational, descriptive stance as it is a better fit to the natural affordances of the web ecosystem. However, in cases where the important entities you are dealing really do exhibit a complex behavioral repertoire at a fundamental level, a RESTful interpretation is much more strained.

Note also that there is a profound asymmetry between these two ways of conceptualizing things. If you think of a distributed application protocol abstractly in terms of nouns and verbs, REST deals in a small, fixed set of verbs while the space of nouns remains wide open. In contrast, traditional, imperative frameworks also support a completely open space of nouns but then go on to allow the space of verbs to be similarly unconstrained. It is very easy to frame everything that RESTful systems do in imperative terms, but it can be quite difficult to go in the other direction. This may account for why so many systems supposedly designed on RESTful principles fall short when examined with a critical eye by those who take their REST seriously. The mere adoption of design elements taken from REST (HTTP! XML! JSON!) does not prevent the easy drift back into the imperative camp.

Representational? Differentiable!

The idea that resources are manipulated by passing around representations of them, rather than by operating on them directly, tends to result in confusion between visibility and authoritativeness. That is, the model does not innately distinguish representations of reality (i.e., the way things actually are) from representations of intent (i.e., the way somebody would like them to be). In particular, since the different parties who are concerned with a resource may be considered authoritative with respect to different aspects of that resource, we can easily end up with a situation where the lines of authority are blurry and confused. In some cases this can be addressed by breaking the resource into pieces, each of which embodies a different area of concern, but this only works when the authority boundaries correspond to identifiable, discrete subsets of the resource’s representation. If authority boundaries are based on considerations that don’t correspond to specific portions of the resource’s state, then this won’t work; instead, you’d have to invent specialized resources that represent distinct control channels of some kind, which starts looking a lot more like the imperative model that REST eschews.

That’s all a bit abstract, so let me try to illustrate with an example. Imagine a product catalog, where we treat each offered product as a resource. This resource includes descriptive information, like promotional text and pointers to images, that is used for displaying the catalog entry on a web page, as well as a price that is used both for display and billing. We’d like the marketing department to be able to update the descriptive information but not the price, whereas we’d like the accounting department to be able to update the price but not the description (well, maybe not the accounting department, but somebody other than the person who writes ad copy; bear with me, it’s just an illustration). One way to deal with this would be to have the product resource’s POST handler try to infer which portions of the resource a request is attempting to update and then accept or reject the request based on logic that checks the results of this inspection against the credentials of the requester. However, a more RESTful approach might be to represent the descriptive information and the pricing information as distinct resources. This latter model aligns the authority boundaries with the resource boundaries, resulting in a cleaner and simpler system. This kind of pattern generalizes in all sorts of interesting and useful ways.

On the other hand, consider a case where such a boundary cannot be so readily drawn. Let’s say the descriptive text can be edited both by the marketing department and the legal department. Marketing tries to make sure that the text does the best job describing the product to potential buyers in a way that maximizes sales, whereas legal tries to make sure that the text doesn’t say anything that would expose the company to liability (say for misrepresentation or copyright infringement or libel). The kinds of things the two departments are concerned with are very different, but we can’t have one resource representing the marketing aspects of the text and another for the legal aspects. This kind of separation isn’t feasible because the text is a singular body of work and anyway the distinction between these two aspects is way too subjective to be automated. However, we could say that the legal department has final approval authority, and so have one resource that represents the text in draft form and another that represents the approval authorization (and probably a third that represents the text in its published form). I think this is a reasonable approach, but the approval resource looks a lot more like a control signal than it looks like a data representation.

Client representation vs. Server representation

One particular concern I have with REST’s representational conceit is that it doesn’t necessarily take into account the separation of concerns between the client and the server. The typical division of labor between client and server places the client in charge of presentation and user interface, and considers the client authoritative with respect to the user’s intentions, while placing the server in charge of data storage and business logic, considering the server authoritative with respect to the “true” state of reality. This relationship is profoundly asymmetrical, as you would expect any useful division of labor to be, but it means that the kinds of things the client and the server have to say to each other will be very different, even when they are talking about the same thing.

I don’t believe there is anything fundamental in the REST concept that says that the representations that the client transfers to the server and the ones that the server transfers to the client need to be the same. Nevertheless, once you adopt the CRUD-ish verb set that REST is based on, where every operation is framed in terms of the representation of a specific resource with a specific URI, you easily begin to think of these things as platonic data objects whose existence transcends the client/server boundary. This begins a descent into solipsism and madness.

This mindset can lead to confusing and inappropriate comingling among information that is of purely client-side concern, information that is of purely server-side concern, and information that is truly pertinent to both the client and the server. In particular, it is often the case that one side or the other is authoritative with respect to facts that are visible to both, but the representation that they exchange is defined in a way that blurs or conceals this authority. Note that this kind of confusion is not inherent in REST per se (nor is it unknown to competing approaches), but it is a weakness that REST designs are prone to unless some care is taken.

REST vs. The code monkeys

One of the side benefits, touted by some REST enthusiasts, of using HTTP as the application protocol, is that developers can leverage the web browser. Since the browser speaks HTTP, it can speak to a RESTful service directly and the browser UI can play the role of client (note that this is distinct from JavaScript code running in the browser that is the client). This makes it easy to try things out, for testing, debugging, and experimentation, by typing in URLs and looking at what comes back. Unfortunately, with browsers the only verb you typically have ready access to in this mode is GET, and even then you don’t have the opportunity to control the various headers that the browser will tack onto the request, nor any graceful way to see everything that the server sends back. HTTP is a complex protocol with lots of bells and whistles, and the browser UI gives you direct access to only a portion of it. You could write a JavaScript application that runs in the browser and provides you some or all of these missing affordances (indeed, a quick Google search suggests that there are quite a lot of these kinds of tools out there already), but sometimes it’s easier, when you are a lazy developer, to simply violate the rules and do everything in a way that leverages the affordances you have ready access to. Thus we get service APIs that put everything you have to say to the server into the URL and do everything using GET. This kind of behavior is aided and abetted by backend tools like PHP, that allow the server-side application developer to transparently interchange GET and POST and treat form submission parameters (sent in the body) and query string parameters (sent in the URL) equivalently. All this is decried as unforgivable sloppiness by REST partisans, horrified by the way it violates the rules for safety and idempotency and interferes with the web’s caching semantics. To the extent that you care about these things, the REST partisans are surely right, and prudence suggests that you probably should care about these things if you know what’s good for you. On the other hand, I’m not sure we can glibly say that the sloppy developers are completely wrong either. RFC2616 (a writ that is about as close as the REST movement gets to the Old Testament) speaks of the safety and idempotency of HTTP GET using the timeless IETF admonishment of “SHOULD” rather than the more sacrosanct “MUST”. There is nothing in many embodiments of the conventional web pipeline that actually enforces these rules (perhaps some Java web containers do, but the grotty old PHP environment that runs much of the world certainly doesn’t); rather, these rules are principally observed as conventions of the medium. I also suspect that many web developers’ attitudes towards caching would shock many REST proponents.

[[Irrelevant historical tangent: Rather than benefitting from caching, on several occasions, I or people I was working with have had to go to some lengths to defeat caching, in the face of misbehaving intermediaries that inappropriately ignored or removed the various headers and tags saying “don’t cache this”, by resorting to the trick of embedding large random numbers in URLs to ensure that each URL that passed through the network was unique. This probably meant that some machine somewhere was filling its memory with millions of HTTP responses that would never be asked for again (and no doubt taking up space that could have been used for stuff whose caching might actually have been useful). Take that, RESTies!]]

HATEOAS is a lie (but there will be cake)

A big idea behind REST is expressed in the phrase “hypermedia as the engine of application state”, often rendered as the dreadful abbreviation HATEOAS. In this we find a really good idea and a really bad idea all tangled up in each other. The key notion is to draw an analogy to the way the web works with human beings: The user is presented with a web page, on which are found various options as to what they can do next: click on a button, submit a form, abandon the whole interaction, etc. Behind each of those elements is a link, used for a POST or GET as appropriate, which results in the fetching of a new page from the linked server, presenting the user with a new set of options. Lather, rinse, repeat. Each page represents the current state of the dialog between the user and the web. Implicit in this is the idea that the allowed state transitions are explicitly represented on the page itself, without reference to some additional context that is held someplace else. In REST, the HATEOAS concept embraces the same model with respect to interaction between machines: the reply the client receives to each HTTP request should contain within it the links to the possible next states in a coherent conversation between the client and the server, rather than having these be baked into the client implementation directly or computed by the client on the basis of an externally specified set of rules.

We’ll gloss over for a moment the fact that the model just presented of human-machine interaction in the web is a bit of a fiction (and more so every day as Ajax increasingly becomes the dominant paradigm). It’s still close enough to be very useful. And in the context of machine-to-machine interaction, it’s still really useful, but with some important caveats. The big benefit is that we build fewer dependencies into the client: because we deliver specific URIs with each service response, we have more flexibility to change these around even after the client implementation is locked down. This means we are free to rearrange our server configuration, for example, by relocating services to different hosts or even different vendors. We are free to refactor things, taking services that were previously all together and provide them separately from different places, or conversely to merge services that were previously independent and provide them jointly. All this is quite valuable and is a major benefit that REST brings to the enterprise.

However, the model presented in the human case doesn’t entirely work in the machine case. In particular, the idea of: we present you with all the options, you pick one, causing us to present you with a new set of options; this idea is broken. Machines can’t quite do this.

Humans are general purpose context interpreters, whereas machines are typically much more limited in what they can understand. A particular piece of software typically has a specific purpose and typically only contains enough interpretive ability to make sense of the information that it is, in some sense, already expecting to receive. Web interfaces only have to present the humans with a set of alternatives that make enough sense, in terms of the humans’ likely task requirements, that people can figure out what to do, whereas interfaces for machines must present this information in a form that is more or less exactly what the recipient was expecting. Another way of looking at this is that the human comes prepackaged with a large, general purpose ontology, whereas client software does not. Much of what appears on a web page is documentary information that enables a person to figure things out; computers are not yet able to parse these at a semantic level in the general case. This is papered over by the REST enthusiasts’ tendency to use passive verbs: “the application transitions to the state identified by URI X”. Nothing is captured here that expresses how the application made the determination as to which URI to pick next. We know how humans do it, but machines are blind.

This is especially the case when an operation entails not merely an endpoint (i.e., a URI) but a set of parameters that must be encoded into a resource that will be sent as the body of a POST. On the web, the server vends a <FORM> to the user, which contains within it a specification for what should be submitted, represented by a series of <INPUT> elements. The human can make sense of this, but absent some kind of AI, this is beyond the means of a programmed client — while an automated client could certainly handle some kind of form-like specification in a generic way that allows it to delegate the problem of understanding to some other entity, it’s not going to be in a position to interpret this information directly absent some pre-existing expectation as to what kinds of things might be there, and why, and what to do with them.

There is a whole layer of protocol and metadata that the human/web interaction incorporates inline that a REST service interface has to handle out of band. The HATEOS principle does not entirely work because while the available state transitions are referenced, they are not actually described. More importantly, to the extent that descriptive information is actually provided (e.g., via a URI contained in the “rel” attribute of an XML <link> element), the interpretation of this metadata is done by a client-side developer at application development time, whereas in the case of a human-oriented web application it is done by the end user at application use time. This is profoundly different; in particular, there is a huge difference in the respective time-of-analysis to time-of-use gaps of the two cases, not to mention a vast gulf between the respective goals of the these two kinds of people. Moreover, in the web case the entity doing the analyzing and the entity doing the using are the same, whereas in the REST case they are not. In particular, in the REST case the client must be anticipatory whereas in the web case the client (which is to say, the human user) can be reactive.

PUT? POST? I’m so confused

The mapping of the standard HTTP method verbs onto a complete set of operations for managing an application resource is tricky and fraught with complications. This is another artifact of the impedance mismatch between HTTP and specific application needs that I alluded to in my “not so helpful” list above.

One particular source of continuing confusion for me by is captured in the tension between PUT and POST. In simplified form, a PUT is a Write whereas a POST is an Update. Or maybe it is the other way around. Which is which is not really clear. The specification of HTTP says that PUT should store the resource as the given URI, whereas POST should store the resource subordinate to the given URI, but “The action performed by the POST method might not result in a resource that can be identified by a URI.” The one thing that is clearly specified is that PUT is supposed to be an idempotent operation, whereas POST is not so constrained. The result is that POST tends to end up as the universal catch-all verb. One problem is that a resource creation operation wants to use PUT if the identity of the resource is being specified by the requestor, whereas it wants to use POST if the identity is being generated by the server. A write operation that is a complete write (i.e., a replacement operation) wants to use PUT, whereas if the writer wants to do some other kind of side-effectful operation it should be expressed as a POST. Except that we normally wouldn’t really want a server to allow a PUT of the raw resource without interposing some level of validity checking and canonicalization, but if this results in any kind of transformation that is in any way context sensitive (including sensitive to the previous state of the resource), then the operation might not be idempotent, so maybe you should really use POST. Except that we can use ETag headers to ensure that the thing we are writing to is really what we expected it to be, in which case maybe we can use PUT after all. At root the problem is that most real world data update operations are not pure writes but read-modify-write cycles. If there’s anything in the resource’s state that is not something that the client should be messing with, it pretty much falls to the server to be responsible for the modify portion of the cycle, which suggests that we should only ever use POST. Except that if the purpose of locating the modify step in the server is just to ensure data sanity, then maybe the operation really will be idempotent and so it should be a PUT. And I think I can keep waffling back and forth between these two options indefinitely. It’s not that a very expert HTTP pedant wouldn’t be able to tell me precisely which operation is the correct one to use in a given case, but two different very expert HTTP pedants quite possibly might not give the same answer.

If we regard the PUT operation as delivering a total replacement for the current state of a resource, using PUT for update requires the client to have a complete understanding of the state representation, because it needs to provide it in the PUT body. This includes, one presumes, all the various linked URIs embedded within the representation that are presumed to be there to drive the state engine, thus inviting the client to rewrite the set of available state transitions. If one alternatively allows the server to reinterpret or augment the representation as part of processing the PUT, then the idempotency of PUT is again at risk (I don’t personally care much about this, but REST fans clearly do). Moreover, one of the benefits a server brings to the table is the potential to support a more elaborate state representation than the client needs to have a complete understanding of. Representations such as XML, JSON, and the like are particularly suited to this: in generating the representation it sends to the server, the client can simply elide the parts it doesn’t understand. However, if a read-modify-write cycle requires passing the complete state through the client, the client now potentially needs to model the entire state (at least to the extent of understanding which portions it is permitted to edit). In particular, the REST model accommodates races in state change by having the server reject incompatible PUTs with a 409 Conflict error. However, this means that if the server wants to concurrently fiddle with disjoint portions of state that the client might or might not care about, it risks introducing such 409 errors in cases where no conflict actually exists from a practical perspective. More recently, some implementations of HTTP have now incorporated the PATCH verb to perform partial updates; this is not universally supported, however. I think PATCH (as described by RFC 5789) is the only update semantics that makes sense at all, and thus we’d probably be better off having the semantics of PUT be what they call PATCH and get rid of the extra verb. However, such a model means that PUT would no longer be idempotent. The alternative approach is to simply require that PATCH always be used and to deprecate PUT entirely.

PUT update failures also throw the client into a mode of dealing with “here’s an updated copy of the state you couldn’t change; you guess what we didn’t like about your request”, instead of enabling the server to transmit a response that indicates what the particular problem was. This requires the client to have a model of what all the possible failure cases are, whereas a more explicit protocol only requires the client to understand the failure conditions that it can do something about, and lump all the rest into a catch-all category of “problem I can’t deal with”. As things stand now, the poverty of available error feedback effectively turns everything that goes wrong into “problem I can’t deal with”.

DELETE has similar trouble communicating details about failure reasons. For example a 405 Not Allowed could result either from the resource being in a state from which deletion is not permitted for semantic reasons (e.g., the resource in question is defined to be in some kind of immutable state) or from the requester having insufficient authority to perform the operation. Actually, this sort of concern probably applies to most if not all of the HTTP methods.

What is this “state” of which you speak?

The word “state”, especially as captured in the phrase “hypermedia as the engine of application state,” has two related but different meanings. One sense (let’s call it “state1”) is the kind of thing we mean when we describe a classic state machine: a node in a graph, usually a reasonably small and finite graph, that has edges between some nodes and not others that enable us to reason rigorously about how the system can change in response to events. The other sense (“state2”) is the collection of all the values of all the mutable bits that collectively define the way the system is — the state — at some point in time; when used in this sense, the number of possible states of even a fairly simple system can be very very large, perhaps effectively infinite. In a strictly mathematical sense these two things are the same, but in practice they really aren’t since we rarely use state machines to formally describe entire systems (the analysis would be awkward and perhaps intractable) but we use them all the time to model things at higher levels of abstraction. I take the word “state” as used in HATEOAS to generally conform to the first definition, except when people find it rhetorically convenient to mean the second. We often gloss over the distinction, though I can’t say whether this is due to sloppiness or ignorance. However, if we use the term in one way in one sentence and then in the other way in the next, it is easy to be oblivious to major conceptual confusion.

There are a couple of important issues with protocols built on HTTP (equally applicable to conventional RPC APIs as to designs based on REST principles) that arise from the asymmetry of the client and server roles with respect to the communications relationship. While the server is in control of the state of the data, the client is in control of the state of the conversation. This is justified on the basis of the dogma of statelessness — since the server is “stateless”, outside the context of a given request-response cycle the server has no concept that there even is a client. From the server’s perspective, the universe is born anew with each HTTP request.

It’s not that the HTTP standard actually requires the server to have the attention span of a goldfish, but it does insist that the state of long-lived non-client processes (that is, objects containing state whose lifetime transcends that of a given HTTP request) be captured in resources that are (conceptually, at least) external to the server. The theory identifies this as the key to scalability, since decoupling the resources from the server lets us handle load with parallel collections of interchangeable servers. The problem with this is that in the real world we often do have a notion of session state, meaning that there will be server-side resources that are genuinely coupled to the state of the client even if the protocol standard pretends that this is not so. This is the place where the distinction between the two definitions of state becomes important. If we mean state1 when we talk about “the state of the application”, then the REST conceit that the client drives the state through its selective choice of link following behavior (i.e., HATEOAS) is a coherent way of looking at what is going on (modulo my earlier caveat about the client software needing a priori understanding of what all those links actually mean). However, if we mean state2, then the HATEOAS concept looks more dubious, since we recognize that there can be knowledge that the server maintains about its relationship with the client that can alter the server’s response to particular HTTP requests, and yet this knowledge is not explicitly represented in any information exchanged as part of the HTTP dialog itself. This would seem to be a violation of the fundamental web principles that REST is based on, but given that it is how nearly every non-trivial website in the world actually works (for example, through values in cookies being used as keys into databases to access information that is never actually revealed to the client), this would be like saying nothing on the web works like the web. You might actually have a pedantic point, but it would be a silly thing to say. It would be more accurate just to say that REST is patterned on a simplified and idealized model of the web. This does not invalidate the arguments for REST, but I think it should cause us to consider them a bit more critically.

The second issue with HTTP’s client/server asymmetry arises from the fact that the server has no means to communicate with the client aside from responding to a client-initiated request. However, given the practical reality of server-side session state, we not uncommonly find ourselves in a situation where there are processes running on the server side that want to initiate communication with the client, despite the fact that the dominant paradigm insists that this is not a meaningful concept.

The result is that for autonomous server-side processes, instead of allowing the server to transmit a notification to the client, the client must poll. Polling is horrendously inefficient. If you poll too frequently, you put excess load on the network and the servers. If you don’t poll frequently enough, you introduce latency and delay that degrades the user experience. REST proponents counter that the solution to the load problem caused by frequent polling is caching, which HTTP supports quite well. But this is nonsense on stilts. Caching can help as a kind of denial-of-service mitigation, but to the extent that the current state of an asynchronous process is cached, poll queries are not asking “got any news for me?”, they are asking “has the cache timed out?” If the answer to the first question is yes but the answer to the second is no, then the overall system is giving the wrong answer. If the answer to the first question is no but the answer to the second question is yes, then the client is hitting the server directly and the cache has added nothing but infrastructure cost. It makes no sense for the client to poll more frequently than the cache timeout interval, since polling faster will provide no gain, but if you poll slower then the cache is never getting hit and might as well not be there.

Technology | Posted by Chip at 5:30 PM | 1 Comment

May 7, 2017

What Are Capabilities?

Some preliminary remarks

You can skip this initial section, which just sets some context, without loss to the technical substance of the essay that follows, though perhaps at some loss in entertainment value.

At a gathering of some of my ~~coconspirators~~ friends a couple months ago, Alan Karp lamented the lack of a good, basic introduction to capabilities for folks who aren’t already familiar with the paradigm. There’s been a lot written on the topic, but it seems like everything is either really long (though if you’re up for a great nerdy read I recommend Mark Miller’s PhD thesis), or really old (I think the root of the family tree is probably Dennis and Van Horn’s seminal paper from 1966), or embedded in explanations of specific programming languages (such as Marc Stiegler’s excellent E In A Walnut or the capabilities section of the Pony language tutorial) or operating systems (such as KeyKOS or seL4), or descriptions of things that use capabilities (like smart contracts or distributed file storage), or discussions of aspects of capabilities (Norm Hardy has written of ton of useful fragments on his website). But nothing that’s just a good “here, read this” that you can toss at curious people who are technically able but unfamiliar with the ideas. So Alan says, “somebody should write something like that,” while giving me a meaningful stare. Somebody, huh? OK, I can take a hint. I’ll give it a shot. Given my tendency to Explain Things this will probably end up being a lot longer than what Alan wants, but ya gotta start somewhere.

The first thing to confront is that term, “capabilities”, itself. It’s confusing. The word has a perfectly useful everyday meaning, even in the context of software engineering. When I was at PayPal, for example, people would regularly talk about our system’s capabilities, meaning what it can do. And this everyday meaning is actually pretty close to the technical meaning, because in both cases we’re talking about what a system “can” do, but usually what people mean by that is the functionality it realizes rather than the permissions it has been given. One path out of this terminological confusion takes its cue from the natural alignment between capabilities and object oriented programming, since it’s very easy to express capability concepts with object oriented abstractions (I’ll get into this shortly). This has lead, without too much loss of meaning, to the term “object capabilities”, which embraces this affinity. This phrase has the virtue that we can talk about it in abbreviated form as “ocaps” and slough off some of the lexical confusion even further. It does have the downside that there are some historically important capability systems that aren’t really what you’d think of as object oriented, but sometimes oversimplification is the price of clarity. The main thing is, just don’t let the word “capabilities” lead you down the garden path; instead, focus on the underlying ideas.

The other thing to be aware of is that there’s some controversy surrounding capabilities. Part of this is a natural immune response to criticism (nobody likes being told that they’re doing things all wrong), part of it is academic tribalism at work, and part of it is the engineer’s instinctive and often healthy wariness of novelty. I almost hesitate to mention this (some of my colleagues might argue I shouldn’t have), but it’s important to understand the historical context if you read through the literature. Some of the pushback these ideas have received doesn’t really have as much to do with their actual merits or lack thereof as one might hope; some of it is profoundly incorrect nonsense and should be called out as such.

The idea

Norm Hardy summarizes the capability mindset with the admonition “don’t separate designation from authority”. I like this a lot, but it’s the kind of zen aphorism that’s mainly useful to people who already understand it. To everybody else, it just invites questions: (a) What does that mean? and (b) Why should I care? So let’s take this apart and see…

The capability paradigm is about access control. When a system, such as an OS or a website, is presented with a request for a service it provides, it needs to decide if it should actually do what the requestor is asking for. The way it decides is what we’re talking about when we talk about access control. If you’re like most people, the first thing you’re likely to think of is to ask the requestor “who are you?” The fundamental insight of the capabilities paradigm is to recognize that this question is the first step on the road to perdition. That’s highly counterintuitive to most people, hence the related controversy.

For example, let’s say you’re editing a document in Microsoft Word, and you click on the “Save” button. This causes Word to issue a request to the operating system to write to the document file. The OS checks if you have write permission for that file and then allows or forbids the operation accordingly. Everybody thinks this is normal and natural. And in this case, it is: you asked Word, a program you chose to run, to write your changes to a file you own. The write succeeded because the operating system’s access control mechanism allowed it on account of it being your file, but that mechanism wasn’t doing quite what you might think. In particular, it didn’t check whether the specific file write operation in question was the one you asked for (because it can’t actually tell), it just checked if you were allowed to do it.

The access control model here is what’s known as an ACL, which stands for Access Control List. The basic idea is that for each thing the operating system wants to control access to (like a file, for example), it keeps a list of who is allowed to do what. The ACL model is how every current mainstream operating system handles this, so it doesn’t matter if we’re talking about Windows, macOS, Linux, FreeBSD, iOS, Android, or whatever. While there are a lot of variations in the details of how they they handle access control (e.g., the Unix file owner/group/others model, or the principal-per-app model common on phone OSs), in this respect they’re all fundamentally the same.

As I said, this all seems natural and intuitive to most people. It’s also fatally flawed. When you run an application, as far as the OS is concerned, everything the application does is done by you. Another way to put this is, an application you run can do anything you can do. This seems OK in the example we gave of Word saving your file. But what if Word did something else, like transmit the contents of your file over the internet to a server in Macedonia run by the mafia, or erase any of your files whose names begin with a vowel, or encrypt all your files and demand payment in bitcoins to decrypt them? Well, you’re allowed to do all those things, if for some crazy reason you wanted to, so it can too. Now, you might say, we trust Word not to do evil stuff like that. Microsoft would get in trouble. People would talk. And that’s true. But it’s not just Microsoft Word, it’s every single piece of software in your computer, including lots of stuff you don’t even know is there, much of it originating from sources far more difficult to hold accountable than Microsoft Corporation, if you even know who they are at all.

The underlying problem is that the access control mechanism has no way to determine what you really wanted. One way to deal with this might be to have the operating system ask you for confirmation each time a program wants to do something that is access controlled: “Is it OK for Word to write to this file, yes or no?” Experience with this approach has been pretty dismal. Completely aside from the fact that this is profoundly annoying, people quickly become trained to reflexively answer “yes” without a moment’s thought, since that’s almost always the right answer anyway and they just want to get on with whatever they’re doing. Plus, a lot of the access controlled operations a typical program does are internal things (like fiddling with a configuration file, for example) whose appropriateness the user has no way to determine anyhow.

An alternative approach starts by considering how you told Word what you wanted in the first place. When you first opened the document for editing, you typically either double-clicked on an icon representing the file, or picked the file from an Open File dialog. Note, by the way, that both of these user interface interactions are typically implemented by the operating system (or by libraries provided by the operating system), not by Word. The way current APIs work, what happens in either of these cases is that the operating system provides the application with a character string: the pathname of the file you chose. The application is then free to use this string however it likes, typically passing it as a parameter to another operating system API call to open the file. But this is actually a little weird: you designated a file, but the operating system turned this into a character string which it gave to Word, and then when Word actually wanted to open the file it passed the string back to the operating system, which converted it back into a file again. As I’ve said, this works fine in the normal case. But Word is not actually limited to using just the string that names the particular file you specified. It can pass any string it chooses to the Open File call, and the only access limitation it has is what permissions you have. If it’s your own computer, that’s likely to be permissions to everything on the machine, but certainly it’s at least permissions to all your stuff.

Now imagine things working a little differently. Imagine that when Word starts running it has no access at all to anything that’s access controlled – no files, peripheral devices, networks, nothing. When you double click the file icon or pick from the open file dialog, instead of giving Word a pathname string, the operating system itself opens the file and gives Word a handle to it (that is, it gives Word basically the same thing it would have given Word in response to the Open File API call when doing things the old way). Now Word has access to your document, but that’s all. It can’t send your file to Macedonia, because it doesn’t have access to the network – you didn’t give it that, you just gave it the document. It can’t delete or encrypt any of your other files, because it wasn’t given access to any of them either. It can mess up the one file you told it to edit, but it’s just the one file, and if it did that you’d stop using Word and not suffer any further damage. And notice that the user experience – your experience – is exactly the same as it was before. You didn’t have to answer any “mother may I?” security questions or put up with any of the other annoying stuff that people normally associate with security. In this world, that handle to the open file is an example of what we call a “capability”.

This is where we get back to Norm Hardy’s “don’t separate designation from authority” motto. By “designation” we mean how we indicate to, for example, the OS, which thing we are talking about. By “authority” we mean what we are allowed by the OS to do with that thing. In the traditional ACL world, these are two largely disconnected concepts. In the case of a file, the designator is typically a pathname – a character string – that you use to refer to the file when operating upon it. The OS provides operations like Write File or Delete File that are parameterized by the path name of the file to be written to or deleted. Authority is managed separately as an ACL that the OS maintains in association with each file. This means that the decision to grant access to a file is unrelated to the decision to make use of it. But this in turn means that the decision to grant access has to be made without knowledge of the specific uses. It means that the two pieces of information the operating system needs in order to make its access control decision travel to it via separate routes, with no assurance that they will be properly associated with each other when they arrive. In particular, it means that a program can often do things (or be fooled into doing things) that were never intended to be allowed.

Here’s the original example of the kind of thing I’m talking about, a tale from Norm. It’s important to note, by the way, that this is an actual true story, not something I just made up for pedagogical purposes.

Once upon a time, Norm worked for a company that ran a bunch of timeshared computers, kind of like what we now call “cloud computing” only with an earlier generation’s buzzwords. One service they provided was a FORTRAN compiler, so customers could write their own software.

It being so many generations of Moore’s Law ago, computing was expensive, so each time the compiler ran it wrote a billing record to a system accounting file noting the resources used, so the customer could be charged for them. Since this was a shared system, the operators knew to be careful with file permissions. So, for example, if you told the compiler to output to a file that belonged to somebody else, this would fail because you didn’t have permission. They also took care to make sure that only the compiler itself could write to the system accounting file – you wouldn’t want random users to mess with the billing records, that would obviously be bad.

Then one day somebody figured out they could tell the compiler the name of the system accounting file as the name of the file to write the compilation output to. The access control system looked at this and asked, “does this program have permission to write to this file?” – and it did! And so the compiler was allowed to overwrite the billing records and the billing information was lost and everybody got all their compilations for free that day.

Fixing this turned out to be surprisingly slippery. Norm named the underlying problem “The Confused Deputy”. At heart, the FORTRAN compiler was deputized by two different masters: the customer and the system operators. To serve the customer, it had been given permission to access the customer’s files. To serve the operators, it had been given permission to access the accounting file. But it was confused about which master it was serving for which purpose, because it had no way to associate the permissions it had with their intended uses. It couldn’t specify “use this permission for this file, use that permission for that file”, because the permissions themselves were not distinct things it could wield selectively – the compiler never actually saw or handled them directly. We call this sort of thing “ambient authority”, because it’s just sitting there in the environment, waiting to be used automatically without regard to intent or context.

If this system had been built on capability principles, rather than accessing the files by name, the compiler would instead have been given a capability by the system operators to access the accounting file with, which it would use to update the billing records, and then gotten a different capability from the customer to access the output file, which it would use when outputting the result of the compilation. There would have been no confusion and no exploit.

You might think this is some obscure problem those old people had back somewhere at the dawn of the industry, but a whole bunch of security problems plaguing us today – which you might think are all very different – fit this template, including many kinds of injection attacks, cross-site request forgery, cross site scripting attacks, click-jacking – including, depending on how you look at it, somewhere between 5 and 8 members of the OWASP top 10 list. These are all arguably confused deputy problems, manifestations of this one conceptual flaw first noticed in the 1970s!

Getting more precise

We said separating designation from authority is dangerous, and that instead these two things should be combined, but we didn’t really say much about what it actually means to combine them. So at this point I think it’s time to get a bit more precise about what a capability actually is.

A capability is single thing that both designates a resource and authorizes some kind of access to it.

There’s a bunch of abstract words in there, so let’s unpack it a bit.

By resource we just mean something the access control mechanism controls access to. It’s some specific thing we have that somebody might want to use somehow, whose use we seek to regulate. It could be a file, an I/O device, a network connection, a database record, or really any kind of object. The access control mechanism itself doesn’t much care what kind of thing the resource is or what someone wants to do with it. In specific use cases, of course, we care very much about those things, but then we’re talking about what we use the access control mechanism for, not about how it works.

In the same vein, when we talk about access, we just mean actually doing something that can be done with the resource. Access could be reading, writing, invoking, using, destroying, activating, or whatever. Once again, which of these it is is important for specific uses but not for the mechanism itself. Also, keep in mind that the specific kind of access that’s authorized is one of the things the capability embodies. Thus, for example, a read capability to a file is a different thing from a write capability to the same file (and of course, there might be a read+write capability to that file, which would be yet a third thing).

By designation, we mean indicating, somehow, specifically which resource we’re talking about. And by authorizing we mean that we are allowing the access to happen. Hopefully, none of this is any surprise.

Because the capability combines designation with authority, the possessor of the capability exercises their authority – that is, does whatever it is they are allowed to do with the resource the capability is a capability to – by wielding the capability itself. (What that means in practice should be clearer after a few examples). If you don’t possess the capability, you can’t use it, and thus you don’t have access. Access is regulated by controlling possession.

A key idea is that capabilities are transferrable, that someone who possesses a capability can convey it to someone else. An important implication that falls out of this is that capabilities fundamentally enable delegation of authority. If you are able to do something, it means you possess a capability for that something. If you pass this capability to somebody else, then they are now also able do whatever it is. Delegation is one of the main things that make capabilities powerful and useful. However, it also tends to cause a lot of people to freak out at the apparent loss of control. A common response is to try to invent mechanisms to limit or forbid delegation, which is a terrible idea and won’t work anyway, for reasons I’ll get into.

If you’re one of these people, please don’t freak out yourself; I’ll come back to this shortly and explain some important capability patterns that hopefully will address your concerns. In the meantime, a detail that might be helpful to meditate on: two capabilities that authorize the same access to the same resource are not necessarily the same capability (note: this is just a hint to tease the folks who are trying to guess where this goes, so if you’re not one of those people, don’t worry if it’s not obvious).

Another implication of our definition is that capabilities must be unforgeable. By this we mean that you can’t by yourself create a capability to a resource that you don’t already have access to. This is a basic requirement that any capability system must satisfy. For example, using pathnames to designate files is problematic because anybody can create any character string they want, which means they can designate any file they want if pathnames are how you do it. Pathnames are highly forgeable. They work fine as designators, but can’t by themselves be used to authorize access. In the same vein, an object pointer in C++ is forgeable, since you can typecast an integer into a pointer and thus produce a pointer to any kind of object at any memory address of your choosing, whereas in Java, Smalltalk, or pretty much any other memory-safe language where this kind of casting is not available, an object reference is unforgeable.

As I’ve talked about all this, I’ve tended to personify the entities that possess, transfer, and wield capabilities – for example, sometimes by referring to one of them as “you”. This has let me avoid saying much about what kind of entities these are. I did this so you wouldn’t get too anchored in specifics, because there are many different ways capability systems can work, and the kinds of actors that populate these systems vary. In particular, personification let me gloss over whether these actors were bits of software or actual people. However, we’re ultimately interested in building software, so now lets talk about these entities as “objects”, in the traditional way we speak of objects in object oriented programming. By getting under the hood a bit, I hope things may be made a little easier to understand. Later on we can generalize to other kinds of systems beyond OOP.

I’ll alert you now that I’ll still tend to personify these things a bit. It’s helpful for us humans, in trying to understand the actions of an intentional agent, to think of it as if it’s a person even if it’s really code. Plus – and I’ll admit to some conceptual ju-jitsu here – we really do want to talk about objects as distinct intentional agents. Another of the weaknesses of the ACL approach is that it roots everything in the identity of the user (or other vaguely user-like abstractions like roles, groups, service accounts, and so on) as if that user was the one doing things, that is, as if the user is the intentional agent. However, when an object actually does something it does it in a particular way that depends on how it is coded. While this behavior might reflect the intentions of the specific user who ultimately set it in motion, it might as easily reflect the intentions of the programmers who wrote it – more often, in fact, because most of what a typical piece of software does involves internal mechanics that we often work very hard to shield the user from having to know anything about.

In what we’re calling an “object capability” system (or “ocap” system, to use the convenient contraction I mentioned in the beginning), a reference to an object is a capability. The interesting thing about objects in such a system is that they are both wielders of capabilities and resources themselves. An object wields a capability – an object reference – by invoking methods on it. You transfer a capability by passing an object reference as a parameter in a method invocation, returning it from a method, or by storing it in a variable. An ocap system goes beyond an ordinary OOP system by imposing a couple additional requirements: (1) that object references be unforgeable, as just discussed, and (2) that there be some means of strong encapsulation, so that one object can hold onto references to other objects in a way that these can’t be accessed from outside it. For example, you can implement ocap principles in Java using ordinary Java object references held in private instance variables (to make Java itself into a pure ocap language – which you can totally do, by the way – requires introducing a few additional rules, but that’s more detail than we have time for here).

In an ocap system, there are only three possible ways you can come to have a capability to some resource, which is to say, to have a reference to some object: creation, transfer, and endowment.

Creation means you created the resource yourself. We follow the convention that, as a byproduct of the act of creation, the creator receives a capability that provides full access to the new resource. This is natural in an OOP system, where an object constructor typically returns a reference to the new object it constructed. In a sense, creation is an optional feature, because it’s not actually a requirement that a capability system have a way to produce new resources at all (that is, it might be limited to resources that already exist), but if it does, there needs to be way for the new resources to enter into the capability world, and handing them to their creator is a good way to do it.

Transfer means somebody else gave the capability to you. This is the most important and interesting case. Capability passing is how the authority graph – the map of who has what authority to do what with what – can change over time (by the way, the lack of a principled way to talk about how authorities change over time is another big problem with the ACL model). The simple idea is: Alice has a capability to Bob, Alice passes this capability to Carol, now Carol also has a capability to Bob. That simple narrative, however, conceals some important subtleties. First, Alice can only do this if she actually possesses the capability to Bob in the first place. Hopefully this isn’t surprising, but it is important. Second, Alice also has to have a capability to Carol (or some capability to communicate with Carol, which amounts to the same thing). Now things get interesting; it means we have a form of confinement, in that you can’t leak a capability unless you have another capability that lets you communicate with someone to whom you’d leak it. Third, Alice had to choose to pass the capability on; neither Bob nor Carol (nor anyone else) could cause the transfer without Alice’s participation (this is what motivates the requirement for strong encapsulation).

Endowment means you were born with the capability. An object’s creator can give it a reference to some other object as part of its initial state. In one sense, this is just creation followed by transfer. However, we treat endowment as its own thing for a couple of reasons. First, it’s how we can have an immutable object that holds a capability. Second, it’s how we avoid infinite regress when we follow the rules to their logical conclusion.

Endowment is how objects end up with capabilities that are realized by the ocap system implementation itself rather by code executing within it. What this means varies depending on the nature of the system; for example, an ocap language framework running on a conventional OS might provide a capability-oriented interface to the OS’s non-capability-oriented file system. An ocap operating system (such as KeyKOS or seL4) might provide capability-oriented access to primitive hardware resources such as disk blocks or network interfaces. In both cases we’re talking about things that exist outside the ocap model, which must be wrapped in special privileged objects that have native access to those things. Such objects can’t be created within the ocap rules, so they have to be endowed by the system itself.

So, to summarize: in the ocap model, a resource is an object and a capability is an object reference. The access that a given capability enables is the method interface that the object reference exposes. Another way to think of this is: ocaps are just object oriented programming with some additional strictness.

Here we come to another key difference from the ACL model: in the ocap world, the kinds of resources that may be access controlled, and the varieties of access to them that can be provided, are typically more diverse and more finely grained. They’re also generally more dynamic, since it’s usually possible, and indeed normal, to introduce new kinds of resources over time, with new kinds of access affordances, simply by defining new object classes. In contrast, the typical ACL framework has a fixed set of resource types baked into it, along with a small set of access modes that can be separately controlled. This difference is not fundamental – you could certainly create an extensible ACL system or an ocap framework based on a small, static set of object types – but it points to an important philosophical divergence between the two approaches.

In the ACL model, access decisions are made on the basis of access configuration settings associated with the resources. These settings must be administered, often manually, by direct interaction with the access control machinery, typically using tools that are part of the access control system itself. While policy abstractions (such as groups or roles, for example) can reduce the need for humans to make large numbers of individual configuration decisions, it is typically the case that each resource acquires its access control settings as the consequence of people making deliberate access configuration choices for it.

In contrast, the ocap approach dispenses with most of this configuration information and its attendant administrative activity. The vast majority of access control decisions are realized by the logic of how the resources themselves operate. Most access control choices are subsumed by the code of the corresponding objects. At the granularity of individual objects, the decisions to be made are usually simple and clear from context, further reducing the cognitive burden. Only at the periphery, where the system comes into actual contact with its human users, do questions of policy and human intent arise. And in many of these cases, intent can be inferred from the normal acts of control and designation that users make through their normal UI interactions (such as picking a file from a dialog or clicking on a save button, to return to the example we began with).

Consequently, thinking about access control policy and administration is an entirely different activity in an ocap system than in an ACL system. This thinking extends into the fundamental architecture of applications themselves, as well as that of things like programming languages, application frameworks, network protocols, and operating systems.

Capability patterns

To give you a taste of what I mean by affecting fundamental architecture, let’s fulfill the promise I made earlier to talk about how we address some of the concerns that someone from a more traditional ACL background might have.

The ocap approach both enables and relies on compositionality – putting things together in different ways to make new kinds of things. This isn’t really part of the ACL toolbox at all. The word “compositionality” is kind of vague, so I’ll illustrate what I’m talking about with some specific capability patterns. For discussion purposes, I’m going to group these patterns into a few rough categories: modulation, attenuation, abstraction, and combination. Note that there’s nothing fundamental about these, they’re just useful for presentation.

Modulation

By modulation, I mean having one object modulate access to another. The most important example of this is called a revoker. A major source of the anxiety that some people from an ACL background have about capabilities is the feeling that a capability can easily escape their control. If I’ve given someone access to some resource, what happens if later I decide it’s inappropriate for them to have it? In the ACL model, the answer appears to be simple: I merely remove that person’s entry from the resource’s ACL. In the ocap model, if I’ve given them one of my capabilities, then now they have it too, so what can I do if I don’t want them to have it any more? The answer is that I didn’t give them my capability. Instead I gave them a new capability that I created, a reference to an intermediate object that holds my capability but remains controlled by me in a way that lets me disable it later. We call such a thing a revoker, because it can revoke access. A rudimentary form of this is just a simple message forwarder that can be commanded to drop its forwarding pointer.

Modulation can be more sophisticated than simple revocation. For example, I could provide someone with a capability that I can switch on or off at will. I could make access conditional on time or date or location. I could put controls on the frequency or quantity of use (a use-once capability with a built-in expiration date might be particularly useful). I could even make an intermediary object that requires payment in exchange for access. The possibilities are limited only by need and imagination.

The revoker pattern solves the problem of taking away access, but what about controlling delegation? Capabilities are essentially bearer instruments – they convey their authority to whoever holds them, without regard to who the holder is. This means that if I give someone a capability, they could pass it to someone else whose access I don’t approve of. This is another big source of anxiety for people in the ACL camp: the idea that in the capability model there’s no way to know who has access. This is not rooted in some misunderstanding of capabilities either; it’s actually true. But the ACL model doesn’t really help with this, because it has the same problem.

In real world use cases, the need to share resources and to delegate access is very common. Since the ACL model provides no mechanism for this, people fall back on sharing credentials, often in the face of corporate policies or terms of use that specifically forbid this. When presented with the choice between following the rules and getting their job done, people will often pick the latter. Consider, for example, how common it is for a secretary or executive assistant to know their boss’s password – in my experience, it’s almost universal.

There’s a widespread belief that an ACL tells you who has access, but this is just an illusion, due to the fact that credential sharing is invisible to the access control system. What you really have is something that tells you who to hold responsible if a resource is used inappropriately. And if you think about it, this is what you actually want anyway. The ocap model also supports this type of accountability, but can do a much better job of it.

The first problem with credential sharing is that it’s far too permissive. If my boss gives me their company LDAP password so I can access their calendar and email, they’re also giving me access to everything else that’s protected by that password, which might extend to things like sensitive financial or personnel records, or even the power to spend money from the company bank account. Capabilities, in contrast, allow them to selectively grant me access to specific things.

The second problem with credential sharing is that if I use my access inappropriately, there’s no way to distinguish my accesses from theirs. It’s hard for my boss to claim “my flunky did it!” if the activity logs are tied to the boss’s name, especially if they weren’t supposed to have shared the credentials in the first place. And of course this risk applies in the other direction as well: if it’s an open secret that I have my boss’s password, suspicion for their misbehavior can fall on me; indeed, if my boss was malicious they might share credentials just to gain plausible deniability when they abscond with company assets. The revoker pattern, however, can be extended to enable delegation to be auditable. I delegate by passing someone an intermediary object that takes note of who is being delegated to and why, and then it can record this information in an audit log when it is used. Now, if the resource is misused, we actually know who to blame.

Keep in mind also that credential sharing isn’t limited to shared passwords. For example, if somebody asks me to run a program for them, then whatever it is that they wanted done by that program gets done using my credentials. Even if what the program did was benign and the request was made with the best of intentions, we’ve still lost track of who was responsible. This is the reason why some companies forbid running software they haven’t approved on company computers.

Attenuation

When I talk about attenuation, I mean reducing what a capability lets you do – its scope of authority. The scope of authority can encompass both the operations that are enabled and the range of resources that can be accessed. The later is particularly important, because it’s quite common for methods on an object’s API to return references to other objects as a result (once again, a concept that is foreign to the ACL world). For example, one might have a capability that gives access to a computer’s file system. Using this, an attenuator object might instead provide access only to a specific file, or perhaps to some discrete sub-directory tree in a file hierarchy (i.e., a less clumsy version of what the Unix chroot operation does).

Attenuating functionality is also possible. For example, the base capability to a file might allow any operation the underlying file system supports: read, write, append, delete, truncate, etc. From this you can readily produce a read-only capability to the same file: simply have the intermediary object support read requests without providing any other file API methods.

Of course, these are composable: one could readily produce a read-only capability to a particular file from a capability providing unlimited access to an entire file system. Attenuators are particularly useful for packaging access to the existing, non-capability oriented world into capabilities. In addition to the hierarchical file system wrapper just described, attenuators are helpful for mediating access to network communications (for example, limiting connections to particular domains, allowing applications to be securely distributed across datacenters without also enabling them talk to arbitrary hosts on the internet – the sort of thing that would normally be regulated by firewall configuration, but without the operational overhead or administrative inconvenience). Another use would be controlling access to specific portions of the rendering surface of a display device, something that many window systems already do in an almost capability-like fashion anyway.

Abstraction

Abstraction enters the picture because once we have narrowed what authority a given capability represents, it often makes sense to refactor what it does into something with a more appropriately narrow set of affordances. For example, it might make sense to package the read-only file capability mentioned above into an input stream object, rather than something that represents a file per se. At this point you might ask if this is really any different from ordinary good object oriented programming practice. The short answer is, it’s not – capabilities and OOP are strongly aligned, as I’ve mentioned several times already. A somewhat longer answer is that the capability perspective usefully shapes how you design interfaces.

A core idea that capability enthusiasts use heavily is the Principle of Least Authority (abbreviated POLA, happily pronounceable). The principle states that objects should be given only the specific authority necessary to do their jobs, and no more. The idea is that the fewer things an object can do, the less harm can result if it misbehaves or if its integrity is breached.

Least Authority is related to the notions of Least Privilege or Least Permission that you’ll frequently see in a lot of the traditional (non-capability) security literature. In part, this difference in jargon is just a cultural marker that separates the two camps. Often the traditional literature will tell you that authority and permission and privilege all mean more or less the same thing.

However, we really do prefer to talk about “authority”, which we take to represent the full scope of what someone or something is able to do, whereas “permission” refers to a particular set of access settings. For example, on a Unix system I typically don’t have permission to modify the /etc/passwd file, but I do typically have permission to execute the passwd command, which does have permission to modify the file. This command will make selected changes to the file on my behalf, thus giving me the authority to change my password. We also think of authority in terms of what you can actually do. To continue the example of the passwd command, it has permission to delete the password file entirely, but it does not make this available to me, thus it does not convey that authority to me even though it could if it were programmed to do so.

The passwd command is an example of abstracting away the low level details of file access and data formats, instead repackaging them into a more specific set of operations that is more directly meaningful to its user. This kind of functionality refactoring is very natural from a programming perspective, but using it to also refactor access is awkward in the ACL case. ACL systems typically have to leverage slippery abstractions like the Unix setuid mechanism. Setuid is what makes the Unix passwd command possible in the first place, but it’s a potent source of confused deputy problems that’s difficult to use safely; an astonishing number of Unix security exploits over the years have involved setuid missteps. The ocap approach avoids these missteps because the appropriate reduction in authority often comes for free as a natural consequence of the straightforward implementation of the operation being provided.

Combination

When I talk about combination, I mean using two or more capabilities together to create a new capability to some specific joint functionality. In some cases, this is simply the intersection of two different authorities. However, the more interesting cases are when we put things together to create something truly new.

For example, imagine a smartphone running a capability oriented operating system instead of iOS or Android. The hardware features of such a phone would, of course, be accessed via capabilities, which the OS would hand out to applications according to configuration rules or user input. So we could imagine combining three important capabilities: the authority to capture images using the camera, the authority to obtain the device’s geographic position via its built-in GPS receiver, and the authority to read the system clock. These could be encapsulated inside an object, along with a (possibly manufacturer provided) private cryptographic key, yielding a new capability that when invoked provides signed, authenticated, time stamped, geo-referenced images from the camera. This capability could then be granted to applications that require high integrity imaging, like police body cameras, automobile dash cams, journalistic apps, and so on. If this capability is the only way for such applications to get access to the camera at all, then the applications’ developers don’t have to be trusted to maintain a secure chain of evidence for the imagery. This both simplifies their implementation task – they can focus their efforts on their applications’ unique needs instead of fiddling with signatures and image formats – and makes their output far more trustworthy, since they don’t have prove their application code doesn’t tamper with the data (you still have to trust the phone and the OS, but that’s at least a separable problem).

What can we do with this?

I’ve talked at length about the virtues of the capability approach, but at the same time observed repeatedly (if only in criticism) that this is not how most contemporary systems work. So even if these ideas are as virtuous as I maintain they are, we’re still left with the question of what use we can make of them absent some counterfactual universe of technical wonderfulness.

There are several ways these ideas can provide direct value without first demanding that we replace the entire installed base of software that makes the world go. This is not to say that the installed base never gets replaced, but it’s a gradual, incremental process. It’s driven by small, local changes rather than by the unfolding of some kind of authoritative master plan. So here are a few incremental ways to apply these ideas to the current world. My hope is that these can deliver enough immediate value to bias practitioners in a positive direction, shaping the incentive landscape so it tilts towards a less dysfunctional software ecosystem. Four areas in particular seem salient to me in this regard: embedded systems, compartmentalized computation, distributed services, and software engineering practices.

Embedded systems

Capability principles are a very good way to organize an operating system. Two of the most noteworthy examples, in my opinion, are KeyKOS and seL4.

KeyKOS was developed in the 1980s for IBM mainframes by Key Logic, a spinoff from Tymshare. In addition to being a fully capability secure OS, it attained extraordinarily high reliability via an amazing, high performance orthogonal persistence mechanism that allowed processes to run indefinitely, surviving things like loss of power or hardware failure. Some commercial KeyKOS installations had processes that ran for years, in a few cases even spanning replacement of the underlying computer on which they were running. Towards the end of its commercial life, KeyKOS was also ported to several other processor architectures, making it a potentially interesting jumping off point for further development. KeyKOS has inspired a number of follow ons, including Eros, CapROS, and Coyotos. Unfortunately most of these efforts have been significantly resource starved and consequently have not yet had much real world impact. But the code for KeyKOS and its descendants is out there for the taking if anybody wants to give it a go.

seL4 is a secure variant of the L4 operating system, developed by NICTA in Australia. While building on the earlier L3 and L4 microkernels, seL4 is a from scratch design heavily influenced by KeyKOS. seL4 notably has a formal proof of functional correctness, making it an extremely sound basis for building secure and reliable systems. It’s starting to make promising inroads into applications that demand this kind of assurance, such as military avionics. Like KeyKOS, seL4, as well as seL4’s associated suite of proofs, is available as open source software.

Embedded systems, including much of the so called “Internet of Things”, are sometimes less constrained by installed base issues on account of being standalone products with narrow functionality, rather than general purpose computational systems. They often have fewer points where legacy interoperability is as important. Moreover, they’re usually cross-developed with tools that already expect the development and runtime environments to be completely different, allowing them to be bootstrapped via legacy toolchains. In other words, you don’t have to port your entire development system to the new OS in order to take advantage of it, but rather can continue using most of your existing tools and workflow processes. This is certainly true of the capability OS efforts I just mentioned, which have all dealt with these issues.

Furthermore, embedded software is often found in mission critical systems that must function reliably in a high threat environment. In these applications, reliability and security can take priority over cost minimization, making the assurances that a capability OS can offer comparatively more attractive. Consequently, using one of these operating systems as the basis for a new embedded application platform seems like an opportunity, particularly in areas where reliability is important.

A number of recent security incidents on the internet have revolved around compromised IoT devices. A big part of the problem is that the application code in these products typically has complete access to everything in the device, largely as a convenience to the developers. This massive violation of least privilege then makes these devices highly vulnerable to exploitation when an attacker finds flaws in the application code.

Rigorously compartmentalizing available functionality would greatly reduce the chances of these kinds of vulnerabilities, but this usually doesn’t happen. Partly this is just ignorance – most of these developers are not generally also security experts, especially when the things they are working on are not, on their face, security sensitive applications. However, I think a bigger issue is that the effort and inconvenience involved in building a secure system with current building blocks doesn’t seem justified by the payoff.

No doubt the developers of these products would prefer to produce more secure systems than they often do, all other things being equal, but all other things are rarely equal. One way to tilt the balance in our favor would be to give them a platform that more or less automatically delivers desirable security and reliability properties as a consequence of developers simply following the path of least resistance. This is the payoff that building on top of a capability OS offers.

Compartmentalized computation

Norm Hardy – one of the primary architects of KeyKOS, who I’ve already mentioned several times – has quipped that “the last piece of software anyone ever writes for a secure, capability OS is always the Unix virtualization layer.” This is a depressing testimony to the power that the installed base has over the software ecosystem. However, it also suggests an important benefit that these kinds of OS’s can provide, even in an era when Linux is the defacto standard.

In the new world of cloud computing, virtualization is increasingly how everything gets done. Safety-through-compartmentalization has long been one of the key selling points driving this trend. The idea is that even if an individual VM is compromised due to an exploitable flaw in the particular mix of application code, libraries, and OS services that it happens to be running, this does not gain the attacker access to other, adjacent VMs running on the same hardware.

The underlying idea – isolate independent pieces of computation so they can’t interfere with each other – is not new. It is to computer science what vision is to evolutionary biology, an immensely useful trick that gets reinvented over and over again in different contexts. In particular, it’s a key idea motivating the architecture of most multitasking operating systems in the first place. Process isolation has long been the standard way for keeping one application from messing up another. What virtualization brings to the table is to give application and service operators control over a raft of version and configuration management issues that were traditionally out of their hands, typically in the domain of the operators of the underlying systems on which they were running. Thus, for example, even if everyone in your company is using Linux it could still be the case that a service you manage depends on some Apache module that only works on Linux version X, while another some other wing of your company has a service requiring a version of MySQL that only works with Linux version Y. But with virtualization you don’t need to fight about which version of Linux to run on your company server machines. Instead, you can each have your own VMs running whichever version you need. More significantly, even if the virtualization system itself requires Linux version Z, it’s still not a problem, because it’s at a different layer of abstraction.

Virtualization doesn’t just free us from fights over which version of Linux to use, but which operating system entirely. With virtualization you can run Linux on Windows, or Windows on Mac, or FreeBSD on Linux, or whatever. In particular, it means you can run Linux on seL4. This is interesting because all the mainstream operating systems have structural vulnerabilities that mean they inevitably tend to get breached, and when somebody gets into the OS that’s running the virtualization layer it means they get into all the hosted VMs as well, regardless of their OS. While it’s still early days, initial indications are that seL4 makes a much more solid base for the virtualization layer than Linux or the others, while still allowing the vast bulk of the code that needs to run to continue working in its familiar environment.

By providing a secure base for the virtualization layer, you can provide a safe place to stand for datacenter operators and other providers of virtualized services. You have to replace some of the software that manages your datacenter, but the datacenter’s customers don’t have to change anything to benefit from this; indeed, they need not even be aware that you’ve done it.

This idea of giving applications a secure place to run, a place where the rules make sense and critical invariants can be relied upon – what I like to call an island of sanity – is not limited to hardware virtualization. “Frozen Realms”, currently working its slow way through the JavaScript standardization process, is a proposal to apply ocap-based compartmentalization principles to the execution environment of JavaScript code in the web browser.

The stock JavaScript environment is highly plastic; code can rearrange, reconfigure, redefine, and extend what’s there to an extraordinary degree. This massive flexibility is both blessing and curse. On the blessing side, it’s just plain useful. In particular, a piece of code that relies on features or behavior from a newer version of the language standard can patch the environment of an older implementation to emulate the newer pieces that it needs (albeit sometimes with a substantial performance penalty). This malleability is essential to how the language evolves without breaking the web. On the other hand, it makes it treacherous to combine code from different providers, since it’s very easy for one chunk of code to undermine the invariants that another part relies on. This is a substantial maintenance burden for application developers, and especially for the creators of application frameworks and widely used libraries. And this before we even consider what can happen if code behaves maliciously.

Frozen Realms is a scheme to let you to create an isolated execution environment, configure it with whatever modifications and enhancements it requires, lock it down so that it is henceforth immutable, and then load and execute code within it. One of the goals of Frozen Realms is to enable defensively consistent software – code that can protect its invariants against arbitrary or improper behavior by things it’s interoperating with. In a frozen realm, you can rely on things not to change beneath you unpredictably. In particular, you could load independent pieces of software from separate developers (who perhaps don’t entirely trust each other) into a common realm, and then allow these to interact safely. Ocaps are key to making this work. All of the ocap coding patterns mentioned earlier become available as trustworthy tools, since the requisite invariants are guaranteed. Because the environment is immutable, the only way pieces of code can affect each other is via object references they pass between them. Because all external authority enters the environment via object references originating outside it, rather than being ambiently available, you have control over what any piece of code will be allowed to do. Most significantly, you can have assurances about what it will not be allowed to do.

Distributed services

There are many problems in the distributed services arena for which the capability approach can be helpful. In the interest of not making this already long essay even longer, I’ll just talk here about one of the most important: the service chaining problem, for which the ACL approach has no satisfactory solution at all.

The web is a vast ecosystem of services using services using services. This is especially true in the corporate world, where companies routinely contract with specialized service providers to administer non-core operational functions like benefits, payroll, travel, and so on. These service providers often call upon even more specialized services from a range of other providers. Thus, for example, booking a business trip may involve going through your company’s corporate intranet to the website of the company’s contracted travel agency, which in turn invokes services provided by airlines, hotels, and car rental companies to book reservations or purchase tickets. Those services may themselves call out to yet other services to do things like email you your travel itinerary or arrange to send you a text if your flight is delayed.

Now we have the question: if you invoke one service that makes use of another, whose credentials should be used to access the second one? If the upstream service uses its own credentials, then it might be fooled, by intention or accident, into doing something on your behalf that it is allowed to do but which the downstream service wouldn’t let you do (a classic instance of the Confused Deputy problem). On the other hand, if the upstream service needs your credentials to invoke the downstream service, it can now do things that you wouldn’t allow. In fact, by giving it your credentials, you’ve empowered it to impersonate you however it likes. And the same issues arise for each service invocation farther down the chain.

Consider, for example, a service like Mint that keeps track of your finances and gives you money management advice. In order to do this, they need to access banks, brokerages, and credit card companies to obtain your financial data. When you sign up for Mint, you give them the login names and passwords for all your accounts at all the financial institutions you do business with, so they can fetch your information and organize it for you. While they promise they’re only going to use these credentials to read your data, you’re actually giving them unlimited access and trusting them not to abuse it. There’s no reason to believe they have any intention of breaking their promise, and they do, in fact, take security very seriously. But in the end the guarantee you get comes down to “we’re a big company, if we messed with you too badly we might get in trouble”; there are no technical assurances they can really provide. Instead, they display the logos of various security promoting consortia and double pinky swear they’ll guard your credentials with like really strong encryption and stuff. Moreover, their terms of use work really hard to persuade you that you have no recourse if they fail (though who actually gets stuck holding the bag in the event they have a major security breach is, I suspect, virgin territory, legally speaking).

While I’m quite critical of them here, I’m not actually writing this to beat up on them. I’m reasonably confident (and I say this without knowing or having spoken to anyone there, merely on the basis of having been in management at various companies myself) that they would strongly prefer not to be in the business of running a giant single point of failure for millions of people’s finances. It’s just that given the legacy security architecture of the web, they have no practical alternative, so they accept the risk as a cost of doing business, and then invest in a lot of complex, messy, and very expensive security infrastructure to try to minimize that risk.

To someone steeped in capability concepts, the idea that you would willingly give strangers on the web unlimited access to all your financial accounts seems like madness. I suspect it also seems like madness to lay people who haven’t drunk the conventional computer security kool aid, evidence of which is the lengths Mint has to go to in its marketing materials trying to persuade prospective customers that, no really, this is OK, trust us, please pay no attention to the elephant in the room.

The capability alternative (which, I stress, is not an option currently available to you), would be to obtain a separate credential – a capability! – from each of your financial institutions that you could pass along to a data management service like Mint. These credentials would grant read access to the relevant portions of your data, while providing no other authority. They would also be revocable, so that you could unilaterally withdraw this access later, say in the event of a security breach at the data manager, without disrupting your own normal access to your financial institutions. And there would be distinct credentials to give to each data manager that you use (say, you’re trying out a competitor to Mint) so that they could be independently controlled.

There are no particular technical obstacles to doing any of this. Alan Karp worked out much of the mechanics at HP Labs in a very informative paper called “Zebra Copy: A reference implementation of federated access management” that should be on everyone’s must read list.

Even with existing infrastructure and tools there are many available implementation options. Alan worked it out using SAML certificates, but you can do this just as well with OAuth2 bearer tokens, or even just special URLs. There are some new user interface things that would have to be done to make this easy and relatively transparent for users, but there’s been a fair bit of experimentation and prototyping done in this area that have pointed to a number of very satisfactory and practical UI options. The real problem is that the various providers and consumers of data and services would all have to agree that this new way of doing things is desirable, and then commit to switching over to it, and then standardize the protocols involved, and then actually change their systems, whereas the status quo doesn’t require any such coordination. In other words, we’re back to the installed base problem mentioned earlier.

However, web companies and other large enterprises are constantly developing and deploying hordes of new services that need to interoperate, so even if the capability approach is an overreach for the incumbents, it looks to me like a competitive opportunity for ambitious upstarts. Enterprises in particular currently lack a satisfactory service chaining solution, even though they’re in dire need of one.

In practice, the main defense against bad things happening in these sorts of systems is not the access control mechanisms at all, it’s the contractual obligations between the various parties. This can be adequate for big companies doing business with other big companies, but it’s not a sound basis for a robust service ecosystem. In particular, you’d like software developers to be able to build services by combining other services without the involvement of lawyers. And in any case, when something does go wrong, with or without lawyers it can be hard to determine who to hold at fault because confused deputy problems are rooted in losing track of who was trying to do what. In essence we have engineered everything with built in accountability laundering.

ACL proponents typically try to patch the inherent problems of identity-based access controls (that is, ones rooted in the “who are you?” question) by piling on more complicated mechanisms like role-based access control or attribute-based access control or policy-based access control (Google these if you want to know more; they’re not worth describing here). None of these schemes actually solves the problem, because they’re all at heart just variations of the same one broken thing: ambient authority. I think it’s time for somebody to try to get away from that.

Software engineering practices

At Electric Communities we set out to create technology for building a fully decentralized, fully user extensible virtual world. By “fully decentralized” we meant that different people and organizations could run different parts of the world on their own machines for specific purposes or with specific creative intentions of their own. By “fully user extensible” we meant that folks could add new objects to the world over time, and not just new objects but new kinds of objects.

Accomplishing this requires solving some rather hairy authority management problems. One example that we used as a touchstone was the notion of a fantasy role playing game environment adjacent to an online stock exchange. Obviously you don’t want someone taking their dwarf axe into the stock exchange and whacking peoples’ heads off, nor do you want a stock trader visiting the RPG during their lunch break to have their portfolio stolen by brigands. While interconnecting these two disparate applications doesn’t actually make a lot of sense, it does vividly capture the flavor of problems we were trying to solve. For example, if the dwarf axe is a user programmed object, how does it acquire the ability to whack people’s heads off in one place but not have that ability in another place?

Naturally, ocaps became our power tool of choice, and lots of interesting and innovative technology resulted (the E programming language is one notable example that actually made it to the outside world). However, all the necessary infrastructure was ridiculously ambitious, and consequently development was expensive and time consuming. Unfortunately, extensible virtual world technology was not actually something the market desperately craved, so being expensive and time consuming was a problem. As a result, the company was forced to pivot for survival, turning its attentions to other, related businesses and applications we hoped would be more viable.

I mention all of the above as preamble to what happened next. When we shifted to other lines of business, we did so with a portfolio of aggressive technologies and paranoid techniques forged in the crucible of this outrageous virtual world project. We went on using these tools mainly because they were familiar and readily at hand, rather than due to some profound motivating principle, though we did retain our embrace of the ocap mindset. However, when we did this, we made an unexpected discovery: code produced with these tools and techniques had greater odds of being correct on the first try compared to historical experience. A higher proportion of development time went to designing and coding, a smaller fraction of time to debugging, things came together much more quickly, and the resulting product tended to be considerably more robust and reliable. Hmmm, it seems we were onto something here. The key insight is that measures that prevent deliberate misbehavior tend to be good at preventing accidental misbehavior also. Since a bug is, almost by definition, a form of accidental misbehavior, the result was fewer bugs.

Shorn of all the exotic technology, however, the principles at work here are quite simple and very easy to apply to ordinary programming practice, though the consequences you experience may be far reaching.

As an example, I’ll explain how we apply these ideas to Java. There’s nothing magic or special about Java per se, other than it can be tamed – some languages cannot – and that we’ve had a lot of experience doing so. In Java we can reduce most of it to three simple rules:

Rule #1: All instance variables must be private
Rule #2: No mutable static state or statically accessible authority
Rule #3: No mutable state accessible across thread boundaries

That’s basically it, though Rule #2 does merit a little further explication. Rule #2 means that all static variables must be declared final and may only reference objects that are themselves transitively immutable. Moreover, constructors and static methods must not provide access to any mutable state or side-effects on the outside world (such as I/O), unless these are obtained via objects passed into them as parameters.

These rules simply ensure the qualities of reference unforgeability and encapsulation that I mentioned a while back. The avoidance of static state and static authority is because Java class names constitute a form of forgeable reference, since anyone can access a class just by typing its name. For example, anybody at all can try to read a file by creating an instance of java.io.FileInputStream, since this will open a file given a string. The only limitations on opening a file this way are imposed by the operating system’s ACL mechanism, the very thing we are trying to avoid relying on. On the other hand, a specific instance of java.io.InputStream is essentially a read capability, since the only authority it exposes is on its instance methods.

These rules cover most of what you need. If you want to get really extreme about having a pure ocap language, in Java there are a few additional edge case things you’d like to be careful of. And, of course, it would also be nice if the rules could be enforced automatically by your development tools. If your thinking runs along these lines, I highly recommend checking out Adrian Mettler’s Joe-E, which defines a pure ocap subset of Java much more rigorously, and provides an Eclipse plugin that supports it. However, simply following these three rules in your ordinary coding will give you about 90% of the benefits if what you care about is improving your code rather than security per se.

Applying these rules in practice will change your code in profound ways. In particular, many of the standard Java class libraries don’t follow them – for example lots of I/O authority is accessed via constructors and static methods. In practice, what you do is quarantine all the unavoidable violations of Rule #2 into carefully considered factory classes that you use during your program’s startup phase. This can feel awkward at first, but it’s an experience rather like using a strongly typed programming language for the first time: in the beginning you keep wondering why you’re blocked from doing obvious things you want to do, and then it slowly dawns on you that actually you’ve been blocked from doing things that tend to get you into trouble. Plus, the discipline forces you to think through things like your I/O architecture, and the result is generally improved structure and greater robustness.

(Dealing with the standard Java class libraries is a bit of an open issue. The approach taken by Joe-E and its brethren has been to use the standard libraries pruned of the dangerous stuff, a process we call “taming”. But while this yields safety, it’s less than ideal from an ergonomics perspective. A project to produce a good set of capability-oriented wrappers for the functionality in the core Java library classes would probably be a valuable contribution to the community, if anyone out there is so inclined.)

Something like the three rules for Java can be often devised for other languages as well, though the feasibility of this does vary quite a bit depending on how disciplined the language in question is. For example, people have done this for Scala and OCaml, and it should be quite straightforward for C#, but probably hopeless for PHP or Fortran. Whether C++ is redeemable in this sense is an open question; it seems plausible to me, although the requisite discipline somewhat cuts against the grain of how people use C++. It’s definitely possible for JavaScript, as a number of features in recent versions of the language standard were put there expressly to enable this kind of thing. It’s probably also worth pointing out that there’s a vibrant community of open source developers creating new languages that apply these ideas. In particular, you should check out Monte, which takes Python as its jumping off point, and Pony, which is really its own thing but very promising.

There’s a fairly soft boundary here between practices that simply improve the robustness and reliability of your code if you follow them, and things that actively block various species of bad outcomes from happening. Obviously, the stronger the discipline enforced by the tools is, the stronger the assurances you’ll have about the resulting product. Once again, the analogy to data types comes to mind, where there are best practices that are basically just conventions to be followed, and then there are things enforced, to a greater or lesser degree, by the programming language itself. From my own perspective, the good news is that in the short term you can start applying these practices in the places where it’s practical to do so and get immediate benefit, without having to change everything else. In the long term, I expect language, library, and framework developers to deliver us increasingly strong tools that will enforce these kinds of rules natively.

Conclusion

At its heart, the capability paradigm is about organizing access control around specific acts of authorization rather than around identity. Identity is so fundamental to how we interact with each other as human beings, and how we have historically interacted with our institutions, that it is easy to automatically assume it should be central to organizing our interactions with our tools as well. But identity is not always the best place to start, because it often fails to tell us what we really need to know to make an access decision (plus, it often says far too much about other things, but that’s an entirely separate discussion).

Organizing access control around the question “who are you?” is incoherent, because the answer is fundamentally fuzzy. The driving intuition is that the human who clicked a button or typed a command is the person who wanted whatever it was to happen. But this is not obviously true, and in some important cases it’s not true at all. Consider, for example, interacting with a website using your browser. Who is the intentional agent in this scenario? Well, obviously there’s you. But also there are the authors of every piece of software that sits between you and the site. This includes your operating system, the web browser itself, the web server, its operating system, and any other major subsystems involved on your end or theirs, plus the thousands of commercial and open source libraries that have been incorporated into these systems. And possibly other stuff running on your (or their) computers at the time. Plus intermediaries like your household or corporate wireless access point, not to mention endless proxies, routers, switches, and whatnot in the communications path from here to there. And since you’re on a web page, there’s whatever scripting is on the page itself, which includes not only the main content provided by the site operators but any of the other third party stuff one typically finds on the web, such as ads, plus another unpredictably large bundle of libraries and frameworks that were used to cobble the whole site together. Is it really correct to say that any action taken by this vast pile of software was taken by you? Even though the software has literally tens of thousands of authors with a breathtakingly wide scope of interests and objectives? Do you really want to grant all those people the power to act as you? I’m fairly sure you don’t, but that’s pretty much what you’re actually doing, quite possibly thousands of times per day. The question that the capability crowd keeps asking is, “why?”

Several years ago, the computer security guru Marcus Ranum wrote: “After all, if the conventional wisdom was working, the rate of systems being compromised would be going down, wouldn’t it?” I have no idea where he stands on capabilities, nor if he’s even aware of them, but this assertion still seems on point to me.

I’m on record comparing the current state of computer security to the state of medicine at the dawn of the germ theory of disease. I’d like to think of capabilities as computer security’s germ theory. The analogy is imperfect, of course, since germ theory talks about causality whereas here we’re talking about the right sort of building blocks to use. But I keep being drawn back to the parallel largely because of the ugly and painful history of germ theory’s slow acceptance. On countless occasions I’ve presented capability ideas to folks who I think ought know about them – security people, engineers, bosses. The typical response is not argument, but indifference. The most common pushback, when it happens, is some variation of “you may well be right, but…”, usually followed by some expression of helplessness or passive acceptance of the status quo. I’ve had people enthusiastically agree with everything I’ve said, then go on to behave as if these ideas had never ever entered their brains. People have trouble absorbing ideas that they don’t already have at least some tentative place for in their mental model of the world; this is just how human minds work. My hope is that some of the stuff I’ve written here will have given these ideas a toehold in your head.

Acknowledgements

This essay benefitted from a lot of helpful feedback from various members of the Capabilities Mafia, the Friam group, and the cap-talk mailing list, notably David Bruant, Raoul Duke, Bill Frantz, Norm Hardy, Carl Hewitt, Chris Hibbert, Baldur Jóhannsson, Alan Karp, Kris Kowal, William Leslie, Mark Miller, David Nicol, Kevin Reid, and Dale Schumacher. My thanks to all of them, whose collective input improved things considerably, though of course any remaining errors and infelicities are mine.

Technology Theory | Posted by Chip at 1:11 PM | 3 Comments

| Older posts >>

Cyberspace. Virtual communities. Online games. Distributed systems.Opinion, history, advice, and silliness from two guys who've been building this stuff for a long, long time.

Say what?

Collected History

Blogs We Like

Worthy Things

Search

Categories

Archives

Feed

April 12, 2026

Recovering a Lost Document: Habitat Anecdotes (1988)

Habitat Anecdotes

Preface (Fall 1993)

The People

1) The Passive

2) The Active

3) The Motivators

4) The Caretakers

5) The Geek Gods (System Operators)

Real Money

Habitat Money

The Economic Model

What the Wealthy Did

The Issues

Introduction

Early Thieving

Dial H for Murder

The Order of the Holy Walnut

Wedded Bliss?

Entertaining the neighbors

More on Stealing

Secret Identities

Business

To Govern or Not to Govern

Tours

The D’nalsi Island Adventure

R&R weekend adventures

The Tome of Wealth and Fame

Grand Openings

Disease

Deception & Trickery

Fan Mail

Ideas to be tried

Monster of the Month Club

Machivelli

A Final Word

February 18, 2026

Adventures In LLM Land, With Thoughts On The AI Revolution

April 13, 2022

Game Governance Domains: a NFT Support Nightmare

A Game’s Governance Domain

Scary Future: Crypto? Where’s Undo?

Nightmare Future: Game UGC & NFTs? Ack!

August 5, 2021

Living Worlds Considered Harmful

Introduction

Premature Standardization

Security

Secure software distribution

Protect the local scene

Authentication

Community

Incomplete

Principles

Conclusion

August 28, 2019

The Unum Pattern

May 1, 2019

Another Thing Found While Packing to Move

March 9, 2019

A Lost Treasure of Xanadu

January 8, 2018

Pool Abuse

November 24, 2017

A Slightly Skeptical Perspective on REST

Is it REST?

Pro and Con

I got yer resource right here…

Representational? Differentiable!

Client representation vs. Server representation

Cyberspace. Virtual communities. Online games. Distributed systems.
Opinion, history, advice, and silliness from two guys who've been building this stuff for a long, long time.