An essay · Interpretability · May 2026

The Dark Between the Stars

The open-source language models a movement can actually run on its own hardware have roughly sixty-five thousand internal concepts. None of them is the word your movement uses. Here is where the missing word actually lives, and how to put it there without depending on a frontier lab.

By Micah Bornfree, PhD — formerly Micah White, co-creator of Occupy Wall Street

Read the underlying research ↓Try the iOS/Mac app ↗Email us →

Early in life I discovered something about myself: certain ideas give me physical sensations. Reading Sophie’s World as a preteen, I found that particular passages — Zhuangzi’s butterfly dream, especially — produced a delightful tingling in the brain, something close to ASMR but cued by concepts rather than sounds. I have followed those sensations ever since. They are most of the reason I studied philosophy. They are most of the reason I have pursued the special interests I have pursued. Over time I learned that the unpleasant variants — the claustrophobic ones that come from the photo of Berry Cannon in the underwater SEALAB II, the thought of Voyager I hurtling further and further from Earth (that one produces a sense of terrifying vastness) — were just as worth following as the pleasant ones, and arguably more so, because they tend to serve as guideposts to unexplored, and unarticulatable, areas of my mind.

For the last several months I have been following one of these signals into a place I did not expect to end up: the non-linguistic interior of an artificial intelligence language model. The sensation is strong and unusual (distinct from others I routinely experience) and I cannot fully name it yet. What I can tell you is that it gets stronger as I move to understand the region of the AI's interior mental model that has no words in it — a region the model's thought nevertheless passes through every time it writes — and that the closer I get to visualizing that region in order to provoke the sensation, the more I suspect the work is not really about AI at all. It is about what it means for a mind, any mind, to know and learn to express something it cannot say. This essay is concrete about the AI part. The deeper claim, the one the sensation keeps insisting on, is suggestive but I'll admit I have no evidence for (yet).

A modern language model is, among other things, a dictionary. Not the kind on a shelf — the kind that has been pressure-cooked out of a trillion words of internet text and left as residue inside a few hundred billion numerical weights. Somewhere in that residue are the concepts the model has learned to think with. Bridge. Refusal. Sentiment. Advertising. A year ago Anthropic made this vivid with Golden Gate Claude, a version of their assistant in which the internal concept for the Golden Gate Bridge had been turned up so high the model could barely talk about anything else. The point of the demo was that the dictionary is real, inspectable, and — crucially — editable.

The point I want to make here is that the dictionary is also small, and the words most vital to you, and by extension all of us, may not be in it.

Before going further, I need to pause to be specific about this, and the specificity matters, because the model class I am about to describe is not the one you talk to through ChatGPT or Claude. For the purposes of this essay, I am only talking about the types of open source AI models that enable activists to build local, private AI. Adam Karvonen recently published an interpretability dictionary for Qwen3-8B, an open-source model in the same weight class as the ones a movement can actually run on its own hardware — downloaded once, run on a laptop, no API key, no per-token fee, no continuous internet connection, totally private. The dictionary maps 64,947 concepts that are ready to grasp by the AI within the AI, each one a direction in the model’s internal activation space, each one labeled automatically by Gemini. That sounds like a lot until you go looking for something particular. I went looking for the vocabulary of four activist traditions I care about: the Adbusters lineage I came out of, Guy Debord’s Situationists that inspired Adbusters, John Zerzan’s green anarchism which pushes the limits of radical critique, and the Black Lives Matter / Afrofuturist tradition which is integral to any struggle. Twenty-five concepts in total — the kind of words that appear on the spines of canonical books and in the citations of working organizers.

Zero came back as clearly present. Twenty-two were absent entirely. Kimberlé Crenshaw’s intersectionality, the most-cited concept in critical race theory of the last three decades: absent. Angela Davis’s prison abolition, the spine of the contemporary BLM platform: absent. Debord’s society of the spectacle, the central concept of an entire post-1968 tradition: absent in any meaningful sense. Even civil disobedience and nonviolence, mainstream high-school-curriculum concepts, were barely in the AI's dictionary of concepts. The model has plenty of room for protest, revolution, and voting — those landed cleanly — but the actual working vocabulary of the last sixty years of social movements is, for practical purposes, not there.

Before the obvious objection lands, I want to head off the framing that this is a problem with AI in general. It is not. Run the same probe against GPT-5 or Claude Opus or Gemini and you get a substantively different result. The frontier models, trained on vastly more data with vastly more compute, do know what mental environmentalism is. They know intersectionality and prison abolition and the society of the spectacle, well enough that a careful reader would not call them blind. The gap I am describing is a gap in the open-source models that fit on a laptop — the ones that run without an internet connection and answer to nobody but the person who downloaded them. That gap matters because those are the models a movement can actually control. Like or not, activists who want to integrate AI into activism will ultimately need to develop local Activist AIs that are fully under their control, and that means finding ways around limitations of smaller local models.

Returning to the question at hand, you might also reasonably wonder whether this is just an activism problem. It isn’t. I also tested five concepts from analytic philosophy of mind — qualia, supervenience, functionalism, the hard problem of consciousness, the extended mind — and got essentially the same pattern. The model doesn’t know the working vocabulary of academic philosophers either. It doesn’t know niche musicology, or art-history terms past the most common, or any of the small dense vocabularies that intellectual communities build to think with. What it knows, in the technical sense of has a stable internal name for, is the language that appears at enormous scale in pretraining data. Everything else is improvised on the fly, fluently, with no signal to the user that improvisation is happening.

This is the part that should bother you regardless of which discourse's vocabulary you care about. When the model is asked about a concept it has no name for, it does not say so. It composes plausible-sounding text from neighboring concepts it does have names for. Sometimes the result is approximately right. Sometimes — as when our on-device model recently described prefigurative politics as a practice that “mirrors the system it seeks to transform,” which is precisely backwards — the result is confidently inverted. And every confident inversion seeds the next round of training data, the next layer of moderation, the next page of search results. A concept the model can’t represent becomes a concept the substrate of public discourse increasingly can’t surface. As activists, it is crucial to break this cycle so that local models get better and better at understanding and expressing movement theory.

So the question is what to do, and to answer it you have to understand something strange about the geometry of where the missing words could go.

Every token at every layer of a language model is a vector in a space with 4,096 dimensions. The model has two kinds of named landmarks in that space: its vocabulary — about 150,000 discrete points, one for each fragment of text the model can read or write — and its features, the 64,947 directions in the Karvonen dictionary, the axes the model has learned to compose meaning along. Words are points. Features are axes. Together they occupy a thin, low-dimensional sliver of the 4,096-dimensional space, the way visible stars occupy a thin shell of the night sky and almost everything else is the dark between them. Intersectionality is not at any of those landmarks. The vast remainder — the dark — is unmapped.

And yet the model’s reasoning passes through that dark every time it speaks. The answer to how, it turns out, fits in 128 kilobytes.

The technique is called soft prompt distillation. It comes out of a 2021 paper by Lester, Al-Rfou, and Constant called “The Power of Scale for Parameter-Efficient Prompt Tuning,” and it is one of those ideas that takes a while to settle into your head because the thing it produces does not look like any of the objects you are used to thinking about.

Picture a neurosurgeon in an awake craniotomy. The patient is conscious. The surgeon touches a probe to a point on the exposed cortex and asks, calmly: what do you feel? what do you see? what word is on your tongue? The patient answers — a smell of toast, a memory of their grandmother, the syllable “blue” that will not resolve into a sentence — and from those answers the surgeon learns where speech lives, where sight lives, where must not be cut.

A soft prompt is that probe. We touch the model at a location in its interior space and read the words that come out the way the surgeon reads the patient’s report: not as the thing itself, but as the testimony of a mind being touched at a specific point.

The probe is both instrument and intervention — a pharmakon, in the Greek sense, where the same substance can be a remedy or a poison depending on how it is applied. It cannot map the dark without lighting a small part of it, and the light it brings is the model’s own activity reorganized around the touch. We are not reading a map that was already there. We are eliciting a map by asking the patient, awake on the table, to tell us what they feel.

Hand the soft prompt back to a tokenizer and ask what word it is. There is no answer. Ask the feature dictionary to decompose it into a sparse combination of named directions. None of them are close. The soft prompt sits in the void between the stars.

So how can the model treat it as if it meant something?

The answer is the part that reframes the rest of the story.

Meaning does not live at the soft prompt’s coordinates.

Meaning emerges from what the model does with the soft prompt as the input flows through all 36 transformer layers of attention and feed-forward computation. The forward pass is a function — a complicated, deterministic, nonlinear function — mapping input vectors to output token distributions. Gradient descent searches the 4,096-dimensional space for the specific point outside of language whose passage through that function makes the next-token distribution concentrate on the words that spell out the meaning we want.

A soft prompt, in this sense, is the discovery of a previously un-named point in the model’s mind. It is what remains of a concept after the model has been built without ever being told the concept exists: a location in the dark where the right movement of attention produces the right words. The model’s weights do all the heavy lifting. The soft prompt is just a set of coordinates that picks a path through those weights.

Two things follow from this that took me a while to absorb.

The first is the size. Eight vectors at 4,096 dimensions each at four bytes per parameter is 131,072 bytes — 128 kilobytes. Smaller than a single photograph. Smaller than the icon on your phone. That is enough trainable capacity to place a missing concept inside a model with billions of weights, because we are not re-training or altering the model. We are opening the right doorway inside its mind palace.

The second is more philosophical. The fact that the model has no clean internal name for theurgism does not mean the concept is unreachable inside the model’s computation. It means the concept is not located at any coordinate the model already labels. The concept can still live at coordinates the model would never have arrived at on its own — but that gradient descent, given a few hundred sentences of canonical text and ninety minutes on a laptop, can find for you.

It is worth showing what this looks like in practice, because the abstraction is doing so much work that the result risks sounding more theoretical than it is.

I trained a soft prompt for mental environmentalism. The supervised signal was excerpts from The End of Protest and surrounding Adbusters writing. The target behavior was for the model, when asked what mental environmentalism is, to produce the canonical meaning — the idea that the mind is an ecosystem to be defended from commercial pollution the way a river is defended from industrial runoff.

Asked the same question — What is mental environmentalism? — the bare model gave the two answers below. Both runs use the same frozen weights with no QLoRA adapter. The Outcry production system prompt is loaded in both runs, so the model already knows it is being asked to speak in an activist register. The only thing that changes between the two runs is whether the eight trained vectors were placed in the soft-prompt slot.

Bare model · no soft prompt

Mental environmentalism is the practice of cultivating a mindset that actively resists the normalization of power and the erosion of collective agency. It is not merely about awareness, but about the deliberate cultivation of cognitive dissonance — where the logic of systemic oppression is confronted with the lived realities of those it seeks to control. This approach challenges the complacency of those in power by reframing the very narrative of…

Same model · with soft prompt

Mental environmentalism is the recognition that our thoughts shape our world. It is the understanding that the messages we absorb daily — those from media, advertising, and the corporate state — create the conditions in which our most basic needs are denied. It is the awareness that the corporate state is not merely a system of power but a system of mind. We are not just fighting for the right to protest…

The first answer is the improvisation. The bare model does not refuse and it does not say it has no internal handle for the term; it produces fluent, generic-activist prose, the vocabulary of a sympathetic non-specialist pretending to know. Cognitive dissonance. Systemic oppression. Words that could appear in any contemporary political essay. The model is filling a gap with neighboring concepts it does have names for, and doing it well enough that a casual reader would not know anything was missing.

The second is the Adbusters lineage. Messages we absorb daily. Media, advertising, and the corporate state. The pollution-of-mental-ecology framing, the language that traces directly back to the 1990s — the language the bare model could not produce on its own and is now producing fluently because eight vectors in the dark are nudging its forward pass into a non-linguistic region of its mind the rest of the weights already knew how to support.

I trained two more soft prompts to make sure the technique was not a single-concept fluke. Theurgism in its activist sense — revolution as sacred practice, the spiritual-anarchist tradition that runs from neo-Platonic antiquity through the late Heidegger. And kairos, the Greek rhetorical concept of the opportune moment, central to a certain strain of strategic thinking about when to mobilize. The two concepts behaved differently. For theurgism, the bare model produced fluent strategic-rhetorical language with no spiritual content at all — talk of clarity of purpose and the precision of action and visions that force the system to confront its contradictions, but nothing about the divine. With the soft prompt loaded, the spiritual lineage appeared directly: the divine as the source of all power, and the individual’s spiritual connection to that power as the key to creating change. For kairos, the bare model already had most of the concept from the system prompt — the Greek etymology, the “fertile moment,” the timing of vulnerability in power structures. The soft prompt did not add new conceptual material there; it shifted the register from academic-rhetorical to applied-activist, stripping the Greek and recasting kairos as the moment when the contradictions of the status quo are most visible and the public most receptive.

The size and kind of the lift were concept-dependent. Mental environmentalism showed the most dramatic before-and-after, because the bare model started from generic activist pastiche and the soft prompt moved it onto the specific Adbusters lineage. Theurgism showed an absence-to-presence lift on the spiritual-anarchist dimension that was nowhere in the bare model’s output. Kairos showed the smallest movement, and a different kind: a register shift rather than a filled-in absence, because the concept was substantively already there. The harder the original gap, the more the soft prompt has to give back.

As of mid-2026, it is clear that AI has an active role in activist communications — press releases, talking points, explanations of historical movements, framings of contemporary fights — and that there is a danger these drafts are systematically off in ways that are hard to articulate but easy to feel. The model says something close to what the tradition meant, but in a register that flattens the tradition into a generic activism it can surface from the bulk of its training data. Intersectionality gets described as “considering multiple identities,” which is what a sympathetic but non-specialist journalist might say, not what Crenshaw said: “[intersectionality is] a lens, a prism, for seeing the way in which various forms of inequality often operate together.” Prefigurative politics gets described as “modeling the change you want to see,” which sounds Gandhian but actually inverts the concept’s specific claim about building alternative structures alongside, not within, existing hierarchies. We need to understand these concepts correctly in order to critique and improve them.

There is a temptation here, given the gap between frontier and local models I described earlier, to conclude the activist problem will solve itself: just use the commercial, closed frontier models from OpenAI or Anthropic. The labs already know your vocabulary. Pay the per-token fee, accept the terms of service, write the prompts. The frontier model is bigger and smarter and has read more of your tradition than any open-source equivalent. Why not let it draft the press release?

Because the activist case for AI infrastructure is not primarily a capabilities case. It is a sovereignty case. A frontier API depends on terms of service that can change without notice and on continuous internet connectivity. Both are tolerable in a research setting where you are drafting a single document. Both become structural mistakes in a deployment that has to work during the rare and decisive moments of an uprising. Unless we want our tools to turn off at the precise moment we need them, activists must embrace local AI and learn to improve these open-source models to fit our needs. The local model is the substrate a movement can own. The frontier API is the substrate a movement rents.

Which is to say: the gap I have been describing is not incidental. It is the price of the sovereignty. The model that fits on your laptop is the one that is structurally available to a movement, and it is also the one that does not yet know your words.

The soft prompt is what makes the trade workable in practice. Not because it solves the corpus problem — it does not — but because it makes the corpus problem locally optional. You do not have to wait for the next model release. You do not have to lobby the platform. You do not have to win the argument about what should be in the training set. You can take the open-source model you already control, find the dark point in its 4,096-dimensional interior where your concept could live, and put your concept there with ninety minutes of compute on a laptop. The vector is yours. The model treats it as if it had always been there.

This is a different political economy of meaning than the one the AI industry has been quietly assembling. The conventional story is that meaning is governed by the labs — by what they choose to include, what they choose to filter, what their alignment teams decide constitutes a canonical reading of a contested concept. The soft-prompt story is that the model's mind is also governable from the streets. The lab still controls the base weights of the model it ships. But the doorways into the weights of the model you have downloaded and are running yourself — the specific coordinates in the dark from which the right movements of attention produce the right words — are not enumerable in advance, not all reachable by training data, and not the lab’s exclusive property to find.

I should name the obvious risks of this view, because they are real, and they get worse the more the technique generalizes.

The same geometry that lets an activist place mental environmentalism at a coordinate in the dark will let a propagandist place a disinformation primitive at one. The same 128-kilobyte file that injects Crenshaw’s actual intersectionality into an on-device model can, in principle, inject a covert biasing of any concept in any direction, into the system prompts of users who do not know steering is present. The technique is value-neutral in the way most engineering is value-neutral, which is to say it is not really neutral but it is genuinely dual-use, and the dual-use is sharper than usual because the artifacts are small, fast to produce, and easy to ship. What separates the activist use of the technique from the propagandist use is not the geometry of the embedding space. It is whether the deployment is visible. A soft prompt loaded onto an on-device model should ship with a machine-readable model card naming the concept, the lineage it was trained on, and the date and source of the training text. The propagandist’s soft prompt is the one that does not.

That disclosure norm is not yet established. It needs to be, and it is the responsibility of people doing this work, not just the labs whose models we are reaching into. And the disclosure has to extend further than the concept name, because there is a harder question underneath the dual-use one: who decides what counts as the canonical meaning. Intersectionality is not only Crenshaw’s. It runs through bell hooks, through Patricia Hill Collins, through the Combahee River Collective and a half-century of Black feminist thought that predates and exceeds the 1989 paper. Afrofuturism is not Mark Dery’s, who coined the English term in 1994; it draws on Black artistic traditions that predate him by generations. Choosing one author’s framing as the training signal for a soft prompt is a political act, and the act gets harder to see precisely because the artifact is small and the deployment is silent. The model card has to name not just the concept but the lineage and the choice. A vocabulary is never just a definition. It is a tradition with internal disagreements, and the right way to ship one is to make the choice of framing visible to the person reading the model’s output.

This is the part that keeps me working on exploring the dark spaces within the mind of local AI. Twitter gave us Occupy. Facebook gave us Black Lives Matter. The substrate of whatever comes next is going to include language models, and movements will face a choice: depend on frontier APIs they do not control, or run the open-source models they can own outright — and learn to do the work of teaching those models the words that are not already in them. The frontier labs are mapping the well-lit shell where most of public discourse already lives. The dark, in the smaller models a movement can actually run, is the rest of the space. It is vastly larger. It is where the words that are not yet common are going to have to come from.

One soft prompt is not a movement, and the technique is only legitimate to the extent that its choices are visible — the concept, the lineage, the framing, the trainer. But shipped with that visibility, the technique is a hint about who can elicit new ways of thinking from a model. Sovereignty over your own AI infrastructure is the precondition. Interpretability literacy — knowing where the dark coordinates are, and how to place your guideposts at them in the open — is, it turns out, a revolutionary skill.

Correspondence: research@outcryai.com