Navigation

⚖️ Decision Matrix

Your Gut Had a Favorite. Your AI Validated It in Seconds.

When a Major General says he's 'really close' with ChatGPT for command decisions, he's not getting analysis—he's getting his assumptions back at machine speed. A Decision Matrix forces structured evaluation before consensus calcifies.

"Chat and I are really close lately."

That's Major General William "Hank" Taylor, commanding 20,000 troops in South Korea, describing how he uses ChatGPT for "key command decisions."

Here's what "really close" means in practice: the AI learned his preferences, his framing, his prior conclusions. Now it reflects them back with better words. He's not getting multi-criteria analysis — he's getting validation at machine speed.

One perspective. No weighted criteria. No adversarial scoring. No audit trail showing why Option A beat Option B.

Researchers call this "automation bias" — when AI output aligns with your existing preferences, you trust it more, which reinforces the sycophantic tendencies baked into these models. The more you use it, the closer you get. The closer you get, the less it challenges you.

This isn't a military problem. It's a decision-making problem. Every time you ask Claude or ChatGPT "which option should I pick?", you're hoping for analysis but you're getting pattern-matched agreement.

What Could Be Different

A Decision Matrix doesn't give you an answer. It gives you a structure that makes your reasoning visible:

What You DefineWhat the System Does
OptionsHolds them separately, prevents anchoring on your first instinct
CriteriaForces you to name your values before evaluating
WeightsMakes implicit priorities explicit and adjustable
ScoresEvaluates with reasoning attached, not just numbers

The output isn't "pick Option B." It's a scored matrix with every score justified, sensitivity analysis showing how weight changes affect the outcome, and clear documentation of why the winner won.

Our Story — How We Actually Used This

We faced the same trap building reasoning.services.

Our first instinct on pricing was obvious: charge per API call. Usage-based. Industry standard. We asked Claude — it agreed this was reasonable. Validated in seconds.

So we forced ourselves to run it through a Decision Matrix.

5 Pricing Models Evaluated:

  • Per-invocation ($0.25 each)
  • Tiered usage ($5 base + metered)
  • Flat monthly ($20/mo)
  • Freemium with paid tier
  • Annual-only ($180/yr)

6 Weighted Criteria:

CriterionWeightWhy It Mattered
Trust communication4Pricing signals values — do we trust users or meter them?
Operational simplicity4Solo founder can't manage complex billing infrastructure
Margin sustainability3Need >60% margin to survive as indie product
User predictability3Users should know their bill before using the product
Conversion friction2Lower priority — sustainability over growth
Revenue upside1Explicitly deprioritized — not optimizing for ARPU

The Results:

ModelScoreSurprise Factor
Per-invocation0.45Our "obvious" first choice — scored worst on trust
Tiered usage0.52Complexity killed it
Flat monthly0.85Winner — but only after weighting trust high
Freemium0.55Conflicts with "sustainability over growth"
Annual-only0.65Good margins, but conversion friction too high

The Sensitivity Analysis:We ran it again with different weights. If we'd weighted "revenue upside" at 4 instead of 1, per-invocation would have won. The Decision Matrix showed us that our pricing choice depended entirely on what we actually valued — and forced us to name those values explicitly.

The final model ($20/month OR $180/year, flat access) emerged from synthesizing the top scorers. We took the simplicity of flat monthly and the commitment signal of annual, dropped the complexity of metering entirely.

We didn't ask an AI what to charge. We used a structured tool to understand why our instincts were wrong.

How It Actually Works

This isn't prompt engineering. The upstream LLM gets actual tools it can invoke:

tools = [
    define_criteria,      # Lock in what matters before evaluating
    set_weights,          # Make priorities explicit and adjustable
    score_option,         # Evaluate with reasoning, not just numbers
    run_sensitivity,      # See how robust your conclusion is
    get_matrix_summary,   # Structured output with full audit trail
]

The model decides when to invoke these. A typical session:

  1. You describe your options and what you're trying to decide
  2. The LLM proposes criteria based on your description — you adjust
  3. You set weights — or let it propose and you modify
  4. Each option gets scored against each criterion, with written reasoning
  5. Sensitivity analysis shows which weights would flip the outcome
  6. You get a structured matrix, not a wall of text

Sample Output

DECISION MATRIX: Database Selection

Options: PostgreSQL | MongoDB | DynamoDB | CockroachDB

Criteria & Weights:
  - Consistency guarantees (4): ACID compliance, transaction support
  - Operational complexity (3): Setup, monitoring, maintenance burden
  - Cost at scale (3): Pricing model and scaling economics
  - Team expertise (4): Current knowledge, learning curve
  - Ecosystem maturity (2): Tooling, libraries, community

Scores:
┌─────────────────┬──────────┬─────────┬──────────┬─────────────┐
│                 │ Postgres │ MongoDB │ DynamoDB │ CockroachDB │
├─────────────────┼──────────┼─────────┼──────────┼─────────────┤
│ Consistency (4) │     0.95 │    0.60 │     0.70 │        0.90 │
│ Operations (3)  │     0.80 │    0.70 │     0.90 │        0.65 │
│ Cost (3)        │     0.85 │    0.75 │     0.60 │        0.50 │
│ Expertise (4)   │     0.90 │    0.65 │     0.50 │        0.40 │
│ Ecosystem (2)   │     0.95 │    0.85 │     0.75 │        0.70 │
├─────────────────┼──────────┼─────────┼──────────┼─────────────┤
│ WEIGHTED TOTAL  │     14.2 │    11.0 │     11.1 │        10.7 │
└─────────────────┴──────────┴─────────┴──────────┴─────────────┘

Winner: PostgreSQL (14.2)

Sensitivity: PostgreSQL remains winner unless "Operations" weight ≥5

Why This Matters

You're not getting "I recommend PostgreSQL." You're getting transparent scoring where every number has a reason, adjustable weights so you can change priorities and see what happens, robustness checks so you know if your decision is fragile, and an audit trail to explain to stakeholders why, not just what.

When to Reach for This

Use Decision Matrix when:

  • Multiple viable options with different trade-offs
  • Criteria need weighting — not all factors are equal
  • You need to justify the decision to stakeholders
  • Your gut has a favorite — that's when you need structure most
  • Expensive to reverse — tech choices, vendor contracts, hiring

Don't use it for:

  • Binary yes/no decisions (use Devil's Advocate)
  • Missing key information (gather data first)
  • Low-stakes choices (just pick one)
  • Values questions, not analysis (know what you value first)

Use Cases

Technology Selection

The scenario: Choosing between databases, frameworks, cloud providers, or build-vs-buy decisions.

Why Decision Matrix:These decisions lock you in for years. The "obvious" choice often reflects familiarity bias, not actual fit. Weighted criteria force you to name what matters before your favorite technology anchors the conversation.

What you get:A defensible recommendation with clear trade-off documentation. When the CTO asks "why Postgres over Mongo?", you have the receipts.

Vendor Evaluation

The scenario:Three vendors responded to your RFP. They're all "capable." The sales decks all look good.

Why Decision Matrix: Vendor selection is where gut feel goes to die. The smoothest salesperson often wins, not the best fit. Structured scoring against weighted criteria cuts through the charm.

What you get:Apples-to-apples comparison on your criteria, not theirs. Documentation for procurement. Reduced risk of buyer's remorse.

Hiring Decisions

The scenario: Two strong candidates. Different strengths. The team is split.

Why Decision Matrix:Hiring decisions are high-stakes and emotionally charged. Without structure, you'll hire the person who interviews well, not the person who'll perform well.

What you get:A framework that surfaces disagreement constructively. When the team argues about candidates, they're arguing about criteria weights — which is the right argument to have.

Your AI Validates Your Gut. This Stress-Tests It.

$20/month for weighted criteria and sensitivity analysis. When stakeholders ask 'why this option?', you'll have the receipts.