Your Gut Had a Favorite. Your AI Validated It in Seconds.
"Chat and I are really close lately."
That's Major General William "Hank" Taylor, commanding 20,000 troops in South Korea, describing how he uses ChatGPT for "key command decisions."
Here's what "really close" means in practice: the AI learned his preferences, his framing, his prior conclusions. Now it reflects them back with better words. He's not getting multi-criteria analysis — he's getting validation at machine speed.
One perspective. No weighted criteria. No adversarial scoring. No audit trail showing why Option A beat Option B.
Researchers call this "automation bias" — when AI output aligns with your existing preferences, you trust it more, which reinforces the sycophantic tendencies baked into these models. The more you use it, the closer you get. The closer you get, the less it challenges you.
This isn't a military problem. It's a decision-making problem. Every time you ask Claude or ChatGPT "which option should I pick?", you're hoping for analysis but you're getting pattern-matched agreement.
What Could Be Different
A Decision Matrix doesn't give you an answer. It gives you a structure that makes your reasoning visible:
| What You Define | What the System Does |
|---|---|
| Options | Holds them separately, prevents anchoring on your first instinct |
| Criteria | Forces you to name your values before evaluating |
| Weights | Makes implicit priorities explicit and adjustable |
| Scores | Evaluates with reasoning attached, not just numbers |
The output isn't "pick Option B." It's a scored matrix with every score justified, sensitivity analysis showing how weight changes affect the outcome, and clear documentation of why the winner won.
Our Story — How We Actually Used This
We faced the same trap building reasoning.services.
Our first instinct on pricing was obvious: charge per API call. Usage-based. Industry standard. We asked Claude — it agreed this was reasonable. Validated in seconds.
So we forced ourselves to run it through a Decision Matrix.
5 Pricing Models Evaluated:
- Per-invocation ($0.25 each)
- Tiered usage ($5 base + metered)
- Flat monthly ($20/mo)
- Freemium with paid tier
- Annual-only ($180/yr)
6 Weighted Criteria:
| Criterion | Weight | Why It Mattered |
|---|---|---|
| Trust communication | 4 | Pricing signals values — do we trust users or meter them? |
| Operational simplicity | 4 | Solo founder can't manage complex billing infrastructure |
| Margin sustainability | 3 | Need >60% margin to survive as indie product |
| User predictability | 3 | Users should know their bill before using the product |
| Conversion friction | 2 | Lower priority — sustainability over growth |
| Revenue upside | 1 | Explicitly deprioritized — not optimizing for ARPU |
The Results:
| Model | Score | Surprise Factor |
|---|---|---|
| Per-invocation | 0.45 | Our "obvious" first choice — scored worst on trust |
| Tiered usage | 0.52 | Complexity killed it |
| Flat monthly | 0.85 | Winner — but only after weighting trust high |
| Freemium | 0.55 | Conflicts with "sustainability over growth" |
| Annual-only | 0.65 | Good margins, but conversion friction too high |
The Sensitivity Analysis:We ran it again with different weights. If we'd weighted "revenue upside" at 4 instead of 1, per-invocation would have won. The Decision Matrix showed us that our pricing choice depended entirely on what we actually valued — and forced us to name those values explicitly.
The final model ($20/month OR $180/year, flat access) emerged from synthesizing the top scorers. We took the simplicity of flat monthly and the commitment signal of annual, dropped the complexity of metering entirely.
We didn't ask an AI what to charge. We used a structured tool to understand why our instincts were wrong.
How It Actually Works
This isn't prompt engineering. The upstream LLM gets actual tools it can invoke:
tools = [
define_criteria, # Lock in what matters before evaluating
set_weights, # Make priorities explicit and adjustable
score_option, # Evaluate with reasoning, not just numbers
run_sensitivity, # See how robust your conclusion is
get_matrix_summary, # Structured output with full audit trail
]The model decides when to invoke these. A typical session:
- You describe your options and what you're trying to decide
- The LLM proposes criteria based on your description — you adjust
- You set weights — or let it propose and you modify
- Each option gets scored against each criterion, with written reasoning
- Sensitivity analysis shows which weights would flip the outcome
- You get a structured matrix, not a wall of text
Sample Output
DECISION MATRIX: Database Selection
Options: PostgreSQL | MongoDB | DynamoDB | CockroachDB
Criteria & Weights:
- Consistency guarantees (4): ACID compliance, transaction support
- Operational complexity (3): Setup, monitoring, maintenance burden
- Cost at scale (3): Pricing model and scaling economics
- Team expertise (4): Current knowledge, learning curve
- Ecosystem maturity (2): Tooling, libraries, community
Scores:
┌─────────────────┬──────────┬─────────┬──────────┬─────────────┐
│ │ Postgres │ MongoDB │ DynamoDB │ CockroachDB │
├─────────────────┼──────────┼─────────┼──────────┼─────────────┤
│ Consistency (4) │ 0.95 │ 0.60 │ 0.70 │ 0.90 │
│ Operations (3) │ 0.80 │ 0.70 │ 0.90 │ 0.65 │
│ Cost (3) │ 0.85 │ 0.75 │ 0.60 │ 0.50 │
│ Expertise (4) │ 0.90 │ 0.65 │ 0.50 │ 0.40 │
│ Ecosystem (2) │ 0.95 │ 0.85 │ 0.75 │ 0.70 │
├─────────────────┼──────────┼─────────┼──────────┼─────────────┤
│ WEIGHTED TOTAL │ 14.2 │ 11.0 │ 11.1 │ 10.7 │
└─────────────────┴──────────┴─────────┴──────────┴─────────────┘
Winner: PostgreSQL (14.2)
Sensitivity: PostgreSQL remains winner unless "Operations" weight ≥5Why This Matters
You're not getting "I recommend PostgreSQL." You're getting transparent scoring where every number has a reason, adjustable weights so you can change priorities and see what happens, robustness checks so you know if your decision is fragile, and an audit trail to explain to stakeholders why, not just what.
When to Reach for This
Use Decision Matrix when:
- ✓Multiple viable options with different trade-offs
- ✓Criteria need weighting — not all factors are equal
- ✓You need to justify the decision to stakeholders
- ✓Your gut has a favorite — that's when you need structure most
- ✓Expensive to reverse — tech choices, vendor contracts, hiring
Don't use it for:
- ✗Binary yes/no decisions (use Devil's Advocate)
- ✗Missing key information (gather data first)
- ✗Low-stakes choices (just pick one)
- ✗Values questions, not analysis (know what you value first)
Use Cases
Technology Selection
The scenario: Choosing between databases, frameworks, cloud providers, or build-vs-buy decisions.
Why Decision Matrix:These decisions lock you in for years. The "obvious" choice often reflects familiarity bias, not actual fit. Weighted criteria force you to name what matters before your favorite technology anchors the conversation.
What you get:A defensible recommendation with clear trade-off documentation. When the CTO asks "why Postgres over Mongo?", you have the receipts.
Vendor Evaluation
The scenario:Three vendors responded to your RFP. They're all "capable." The sales decks all look good.
Why Decision Matrix: Vendor selection is where gut feel goes to die. The smoothest salesperson often wins, not the best fit. Structured scoring against weighted criteria cuts through the charm.
What you get:Apples-to-apples comparison on your criteria, not theirs. Documentation for procurement. Reduced risk of buyer's remorse.
Hiring Decisions
The scenario: Two strong candidates. Different strengths. The team is split.
Why Decision Matrix:Hiring decisions are high-stakes and emotionally charged. Without structure, you'll hire the person who interviews well, not the person who'll perform well.
What you get:A framework that surfaces disagreement constructively. When the team argues about candidates, they're arguing about criteria weights — which is the right argument to have.
Other Reasoning Tools
Different failures need different interventions. Pick the right tool for the moment.
Your AI Validates Your Gut. This Stress-Tests It.
$20/month for weighted criteria and sensitivity analysis. When stakeholders ask 'why this option?', you'll have the receipts.