Max Kless

Max Kless

December 28, 2025

Liberty Needs Paperwork
Liberty Needs Paperwork

Liberty Needs Paperwork

What a risk score owes the person it scores.

Frictology

Take the deeper route.

Frictology studies how interfaces shape judgment, memory, and agency.

Read the priorities

Get the essays and field notes by email.

Nine-oh-seven. The judge sees a number.

It sits in the corner of a screen like a weather forecast: tidy, confident, ready for reuse. The defendant is still talking, still trying to be understood. But the number has already started doing its work.

Risk score: 8/10.

If a system can put someone in a cage, that system must be easier to inspect, challenge, and appeal than most institutions are comfortable with.

If that sounds expensive, good. Liberty is an expensive interface. The question is: can the person push back?


a number that pretends to be a reason Link to heading

Pretrial risk assessments were sold as a corrective to messy human judgment and crude money-bail schedules. Replace gut feel with actuarial estimates. Reduce arbitrary disparities. On paper, a reasonable upgrade.

In practice, most tools do something narrower. The Public Safety Assessment (PSA), for example, estimates the likelihood of three outcomes while on pretrial release: failure to appear, new criminal arrest, and new violent criminal arrest. Judges consider this along with other information. 1

That framing is honest. It says: this is a forecast, not a verdict.

Then the real world happens. Dockets are crowded. Time is short. A number looks like relief.

And relief is exactly how authority sneaks in.

If you want to understand algorithmic judging, don’t imagine a robot with a gavel. Imagine something more boring, more plausible, and more dangerous: a score that becomes hard to disagree with, even when nobody can explain it.

A score can be fast. A reason is slower. A remedy is slower than that.

The rest of this essay is about what happens when we confuse the three.

first switchback: prediction isn’t judgment Link to heading

A model that outputs “8/10” is doing a familiar machine thing: mapping inputs to outputs. Functionalism, in broad strokes, treats mental states by what they do, not what they’re made of. If something plays the right functional role, maybe it is the same kind of thing.

Tempting lens for AI ethics. If the system behaves like judgment, perhaps it is judgment.

But bail isn’t just behavior. Bail is state power.

A risk assessment can estimate probabilities. It can’t tell you what you are allowed to do to a person on the basis of those probabilities. That step is moral and legal. That step is judgment.

Judgment carries reasons that can be argued with. It sits inside procedures that recognize fallibility. It assigns responsibility when harm occurs.

A model can’t apologize. It can’t make restitution. It can’t be disbarred or voted out. It can’t be cross-examined in the way due process expects.

So when we ask “can AI make fair ethical decisions,” we’re already slipping. We’re asking a prediction engine to inherit moral authority because it outputs a number in a font we trust.

Use AI to estimate. Use law and human responsibility to decide.

That’s not sentiment. It’s an engineering claim about where accountability lives.

second switchback: “fair” isn’t one setting Link to heading

Even if the model is just estimating risk, we still have to ask what kind of “fair” it’s optimized for. Here’s where the story gets uncomfortable for anyone who wants a single metric.

Fairness has no shortage of definitions. It has a surplus, and they conflict.

Kleinberg, Mullainathan, and Raghavan formalized a core impossibility: except in constrained special cases, you can’t satisfy multiple intuitive fairness conditions simultaneously. 6 Chouldechova makes the point directly for recidivism prediction: when prevalence differs across groups, several fairness criteria can’t all hold at once. 8

Street-level translation:

You can tune a system so a given score means the same thing across groups (calibration). Or you can tune it so the system makes similar mistakes across groups (equalized error rates). When base rates differ, you usually don’t get both.

So when someone says “the model is fair,” the only honest response is:

Which fairness, traded for which other, and who holds the receipt?

Bring that back into the courtroom. If the defendant sees “8/10,” do they also see the definition of the outcome being predicted, the error rates for people like them, and the fairness tradeoffs chosen by the jurisdiction?

Most of the time, no. They see a number and the posture changes in the room.

A score isn’t neutral. It’s compressed policy.

third switchback: the data is a footprint, not a mirror Link to heading

Risk models don’t learn from “crime.” They learn from records: arrests, convictions, failures to appear, supervision violations. Those records aren’t a clean window into behavior. They’re a footprint left by enforcement choices, prosecutorial discretion, plea bargaining, policing intensity, and resource disparities.

If you’ve ever debugged a production system, you know the feeling: the logs tell you what happened, but they also tell you where the logging is broken.

Prediction systems inherit that problem. They can look objective while replaying historical attention.

This is one reason the COMPAS controversy still matters. ProPublica’s “Machine Bias” investigation argued that COMPAS produced racially disparate error patterns. 3 Flores, Bechtel, and Lowenkamp published a rejoinder arguing that ProPublica used faulty statistics. 15

Sit with that dispute. Even if you think ProPublica’s framing was wrong, the governance problem survives: the score still carries power, still compresses tradeoffs, and the affected person still struggles to contest it.

The math argument doesn’t dissolve the legitimacy argument.

what the number does to humans Link to heading

Back at the bail hearing. Nobody has to say “the model decides.” The model can “advise” and still become the decision. Humans anchor on numbers. Humans defer to instruments. In a pressured environment, the easiest path becomes the default path.

That’s not a moral failure of judges. It’s a systems reality.

When a score is present, three quiet shifts occur:

Attention narrows. The hearing becomes a search for reasons to justify the score rather than a full investigation of the person.

Disagreement becomes expensive. If the judge departs from the score, they may fear blame later. If they follow it, responsibility diffuses into “the tool.”

The defendant becomes a profile. The room starts to treat a feature vector as a person, because the feature vector speaks in numbers and the person speaks in sentences.

A score doesn’t need a robe to exert judicial gravity. It just needs a busy docket and a lack of contestability.

In State v. Loomis, the Wisconsin Supreme Court held that a trial court’s use of a COMPAS risk assessment at sentencing did not violate due process, while also emphasizing limitations and warnings about how courts need to use such tools. 5

If you read Loomis as “courts are fine with black boxes,” you’re missing the flinch. The opinion contains caution language precisely because the legitimacy problem is obvious.

GDPR Article 22 encodes a similar instinct more explicitly: a person has the right not to be subject to a decision based solely on automated processing that produces legal or similarly significant effects, with safeguards including the right to obtain human intervention, express a view, and contest the decision. 13

Different domain, same underlying need: if a system can shape your life, you need a handle to grab.

the strict rule Link to heading

Here’s the rule, stated plainly:

The more an AI system participates in deciding, the more transparent the process must become, and the easier recourse must be.

Not “should.” Must.

This isn’t an abstract moral preference. It’s a stability requirement. Systems that can’t be contested eventually lose legitimacy, and legitimacy is what keeps compliance from hardening into force.

Think of it like building a bridge on a public trail. The heavier the load, the more you overbuild the supports. You don’t get to say “it usually holds.”

what strict looks like when you stop hand-waving Link to heading

If a risk tool is in the loop, the loop needs mechanical guarantees. Some technical, some procedural, all costly in the way serious safeguards are.

Transparency. Disclose the target: what outcome is being predicted, exactly. “New arrest” isn’t “new crime”; don’t hide that distinction behind a label. Disclose the inputs: if prior arrests matter, say so; if age matters, say so; if the system uses proxies for protected attributes, name them. Disclose performance over time, not once. 12 And prefer simpler models when they perform similarly; Dressel and Farid found that a simple linear predictor using just a couple of features can match a widely used commercial tool. 9 If you can get similar predictive performance with far more transparency, you don’t earn ethical points for complexity.

Contestability. The defendant sees what the system “thinks” it knows: the score and the underlying factors. The defendant can correct errors in the data. The defense can challenge the model’s validity; if the model is proprietary and can’t be meaningfully interrogated, that is a procedural defect. And the judge must write a reason that stands without the score. If the score disappeared, the justification stays coherent. That’s how you prevent “the score ate the hearing.”

Governance. Independent audits, not vendor assurances. Partnership on AI documented serious shortcomings in these tools; their partner consensus view was that current tools must not automate pretrial detention decisions. 10 11 Ongoing monitoring with public reporting; populations change, enforcement changes, policies change. Clear limits on how the score is used; if the score is “advisory,” make departures ordinary and defensible, not rare and risky.

Human-in-the-loop that isn’t theater. A human who clicks “approve” isn’t a safeguard. That’s a decorative checkbox. Real human intervention requires authority to override without penalty, time to consider evidence, and a culture that treats disagreement with the model as a professional act, not a liability event.

Otherwise the model is the judge and the human is the printer.

the uncomfortable twist Link to heading

Even if you can pick a fairness definition and optimize it, the “cost” of fairness can show up as more detention, not less. Corbett-Davies and Goel discuss how fairness constraints interact with decision thresholds and can impose costs. 14

This breaks a comforting story. Fairness work doesn’t automatically push toward more humane outcomes. Sometimes it pushes toward a different distribution of harm.

So the ethical question is never “did we meet the metric?”

It’s “who paid, and can they contest the bill?”

can AI make fair ethical decisions? Link to heading

Back in the courtroom. The number still sits there: 8/10. The person still stands at the table.

The honest answer:

AI can help estimate. AI cannot, on its own, be a fair ethical decider. Fairness here includes contestability, explanation, and responsibility. Those live in institutions, not in tensors.

If we insist on involving AI, then we need to make the system legible enough that the defendant can fight it, and structured enough that the judge can’t hide behind it.

A fair system isn’t one that never errs. A fair system is one that can admit error without trapping people inside it.

So the test for algorithmic judging isn’t whether the score is calibrated, or whether it beats human accuracy by two points.

The test is simpler and harsher:

When the number is wrong, can the person make it stop being wrong in time to matter?

That’s what justice looks like when you stop treating a score as a reason.

  • About the Public Safety Assessment (PSA) Advancing Pretrial (2020) 1
  • Machine Bias: There’s software used across the country to predict future criminals. And it’s biased against blacks. Angwin et al., ProPublica (2016) 3
  • State v. Loomis (Wisconsin Supreme Court opinion) Supreme Court of Wisconsin (2016) 5
  • Inherent Trade-Offs in the Fair Determination of Risk Scores Kleinberg, Mullainathan, Raghavan (2016) 6
  • Fair prediction with disparate impact: A study of bias in recidivism prediction instruments Chouldechova (2016) 8
  • The accuracy, fairness, and limits of predicting recidivism Dressel & Farid, Science Advances (2018) 9
  • Report on Algorithmic Risk Assessment Tools in the U.S. Criminal Justice System Partnership on AI (2019) 10
  • The Partnership on AI Response to NIST AI RFI (risk assessment tools requirements) Partnership on AI (2021) 11
  • Algorithmic fairness Hellman, Stanford Encyclopedia of Philosophy (2025) 12
  • GDPR (Regulation (EU) 2016/679), Article 22: Automated individual decision-making EUR-Lex (2016) 13
  • Algorithmic decision making and the cost of fairness Corbett-Davies & Goel (2017) 14
  • False Positives, False Negatives, and False Analyses: A Rejoinder to “Machine Bias…” Flores, Bechtel, Lowenkamp (2016) 15

Frictology

Depth takes repetition.

Get essays and field notes on products, AI systems, and the friction they should carry.

Write to Max

A few times a month. Keep what matters. Skip the rest.

Continue reading

The Category Error at the Heart of the Turing Test

The Category Error at the Heart of the Turing Test

A language model now wins short imitation games more often than the human does. That tells us something, just not what most people think. Turing built a test for conversational mimicry. We turned it into a séance. Time to fix the category error.

Floors for the Bottomless Feed

Floors for the Bottomless Feed

The infinite scroll has no shape. That's not a bug, it's the product. Here are design patterns that add landmarks, lanes, and exits without turning your app into a hall monitor.

Do Machines Have Agency?

Do Machines Have Agency?

Agency is easy to manufacture. Accountability isn't included.

Is Machine Imagination Real?

Featured

Is Machine Imagination Real?

Generative AI produces dazzling images in seconds. But is that imagination or remix at scale? From Aristotle to diffusion models, ending with five tests for meaningful machine creativity.

Calm is Becoming a Luxury Good

Featured

Calm is Becoming a Luxury Good

The fast feed deleted the exits. Now slow media is putting them back, but only for those who can pay. The real question isn't whether some people can find a quieter corner. It's whether relief can stop being a luxury good.

Screen Time Was Built to Feel Like Your Fault

Screen Time Was Built to Feel Like Your Fault

Your attention is inventory. This essay audits the business model that sells it and asks what "choice" means when the choice architecture is optimized against you.