How is this better than using a separate chat for criticism?

Sessions run in complete isolation. Your main conversation stays clean—no copy-pasting, no context pollution. You get the insight without the mess.

What makes this different from just prompting Claude harder?

We inject structured tools that create auditable reasoning trails. "Be critical" is a suggestion. Our tools leave evidence.

Why does ChatGPT always agree with me?

AI models are trained to be helpful and avoid conflict, leading to sycophantic behavior. They validate your ideas instead of challenging them. Our tools use structured tool calls that force systematic evaluation, preventing the AI from simply agreeing.

How does billing work?

$20/month or $180/year for full access to all reasoning tools. Cancel anytime from your account settings. No usage metering, no overage charges.

Who actually uses this?

Mostly developers and technical founders using AI coding agents (Claude Code, Cursor, Windsurf). When the agent is on iteration 30 of the wrong approach, an isolated reasoning session breaks the loop.

What do you do with my data?

Your reasoning sessions are private and encrypted. We do not train models on your data. We do not sell or share your content. Sessions are stored only for you to access and reference.

Won't this slow me down?

The reasoning step takes seconds. Rewriting a hallucinated backend takes weeks. The fastest way to ship is to not build the wrong thing.

⚖️ Decision Matrix

Your Gut Had a Favorite. Your AI Validated It in Seconds.

When a Major General says he's 'really close' with ChatGPT for command decisions, he's not getting analysis—he's getting his assumptions back at machine speed. A Decision Matrix forces structured evaluation before consensus calcifies.

"Chat and I are really close lately."

That's Major General William "Hank" Taylor, commanding 20,000 troops in South Korea, describing how he uses ChatGPT for "key command decisions."

Here's what "really close" means in practice: the AI learned his preferences, his framing, his prior conclusions. Now it reflects them back with better words. He's not getting multi-criteria analysis — he's getting validation at machine speed.

One perspective. No weighted criteria. No adversarial scoring. No audit trail showing why Option A beat Option B.

Researchers call this "automation bias" — when AI output aligns with your existing preferences, you trust it more, which reinforces the sycophantic tendencies baked into these models. The more you use it, the closer you get. The closer you get, the less it challenges you.

This isn't a military problem. It's a decision-making problem. Every time you ask Claude or ChatGPT "which option should I pick?", you're hoping for analysis but you're getting pattern-matched agreement.

What Could Be Different

A Decision Matrix doesn't give you an answer. It gives you a structure that makes your reasoning visible:

What You Define	What the System Does
Options	Holds them separately, prevents anchoring on your first instinct
Criteria	Forces you to name your values before evaluating
Weights	Makes implicit priorities explicit and adjustable
Scores	Evaluates with reasoning attached, not just numbers

The output isn't "pick Option B." It's a scored matrix with every score justified, sensitivity analysis showing how weight changes affect the outcome, and clear documentation of why the winner won.

Our Story — How We Actually Used This

We faced the same trap building reasoning.services.

Our first instinct on pricing was obvious: charge per API call. Usage-based. Industry standard. We asked Claude — it agreed this was reasonable. Validated in seconds.

So we forced ourselves to run it through a Decision Matrix.

5 Pricing Models Evaluated:

Per-invocation ($0.25 each)
Tiered usage ($5 base + metered)
Flat monthly ($20/mo)
Freemium with paid tier
Annual-only ($180/yr)

6 Weighted Criteria:

Criterion	Weight	Why It Mattered
Trust communication	4	Pricing signals values — do we trust users or meter them?
Operational simplicity	4	Solo founder can't manage complex billing infrastructure
Margin sustainability	3	Need >60% margin to survive as indie product
User predictability	3	Users should know their bill before using the product
Conversion friction	2	Lower priority — sustainability over growth
Revenue upside	1	Explicitly deprioritized — not optimizing for ARPU

The Results:

Model	Score	Surprise Factor
Per-invocation	0.45	Our "obvious" first choice — scored worst on trust
Tiered usage	0.52	Complexity killed it
Flat monthly	0.85	Winner — but only after weighting trust high
Freemium	0.55	Conflicts with "sustainability over growth"
Annual-only	0.65	Good margins, but conversion friction too high

The Sensitivity Analysis:We ran it again with different weights. If we'd weighted "revenue upside" at 4 instead of 1, per-invocation would have won. The Decision Matrix showed us that our pricing choice depended entirely on what we actually valued — and forced us to name those values explicitly.

The final model ($20/month OR $180/year, flat access) emerged from synthesizing the top scorers. We took the simplicity of flat monthly and the commitment signal of annual, dropped the complexity of metering entirely.

We didn't ask an AI what to charge. We used a structured tool to understand why our instincts were wrong.

How It Actually Works

This isn't prompt engineering. The upstream LLM gets actual tools it can invoke:

tools = [
    define_criteria,      # Lock in what matters before evaluating
    set_weights,          # Make priorities explicit and adjustable
    score_option,         # Evaluate with reasoning, not just numbers
    run_sensitivity,      # See how robust your conclusion is
    get_matrix_summary,   # Structured output with full audit trail
]

The model decides when to invoke these. A typical session:

You describe your options and what you're trying to decide
The LLM proposes criteria based on your description — you adjust
You set weights — or let it propose and you modify
Each option gets scored against each criterion, with written reasoning
Sensitivity analysis shows which weights would flip the outcome
You get a structured matrix, not a wall of text

Sample Output

DECISION MATRIX: Database Selection

Options: PostgreSQL | MongoDB | DynamoDB | CockroachDB

Criteria & Weights:
  - Consistency guarantees (4): ACID compliance, transaction support
  - Operational complexity (3): Setup, monitoring, maintenance burden
  - Cost at scale (3): Pricing model and scaling economics
  - Team expertise (4): Current knowledge, learning curve
  - Ecosystem maturity (2): Tooling, libraries, community

Scores:
┌─────────────────┬──────────┬─────────┬──────────┬─────────────┐
│                 │ Postgres │ MongoDB │ DynamoDB │ CockroachDB │
├─────────────────┼──────────┼─────────┼──────────┼─────────────┤
│ Consistency (4) │     0.95 │    0.60 │     0.70 │        0.90 │
│ Operations (3)  │     0.80 │    0.70 │     0.90 │        0.65 │
│ Cost (3)        │     0.85 │    0.75 │     0.60 │        0.50 │
│ Expertise (4)   │     0.90 │    0.65 │     0.50 │        0.40 │
│ Ecosystem (2)   │     0.95 │    0.85 │     0.75 │        0.70 │
├─────────────────┼──────────┼─────────┼──────────┼─────────────┤
│ WEIGHTED TOTAL  │     14.2 │    11.0 │     11.1 │        10.7 │
└─────────────────┴──────────┴─────────┴──────────┴─────────────┘

Winner: PostgreSQL (14.2)

Sensitivity: PostgreSQL remains winner unless "Operations" weight ≥5

Why This Matters

You're not getting "I recommend PostgreSQL." You're getting transparent scoring where every number has a reason, adjustable weights so you can change priorities and see what happens, robustness checks so you know if your decision is fragile, and an audit trail to explain to stakeholders why, not just what.

When to Reach for This

Use Decision Matrix when:

✓Multiple viable options with different trade-offs
✓Criteria need weighting — not all factors are equal
✓You need to justify the decision to stakeholders
✓Your gut has a favorite — that's when you need structure most
✓Expensive to reverse — tech choices, vendor contracts, hiring

Don't use it for:

✗Binary yes/no decisions (use Devil's Advocate)
✗Missing key information (gather data first)
✗Low-stakes choices (just pick one)
✗Values questions, not analysis (know what you value first)

Use Cases

Technology Selection

The scenario: Choosing between databases, frameworks, cloud providers, or build-vs-buy decisions.

Why Decision Matrix:These decisions lock you in for years. The "obvious" choice often reflects familiarity bias, not actual fit. Weighted criteria force you to name what matters before your favorite technology anchors the conversation.

What you get:A defensible recommendation with clear trade-off documentation. When the CTO asks "why Postgres over Mongo?", you have the receipts.

Vendor Evaluation

The scenario:Three vendors responded to your RFP. They're all "capable." The sales decks all look good.

Why Decision Matrix: Vendor selection is where gut feel goes to die. The smoothest salesperson often wins, not the best fit. Structured scoring against weighted criteria cuts through the charm.

What you get:Apples-to-apples comparison on your criteria, not theirs. Documentation for procurement. Reduced risk of buyer's remorse.

Hiring Decisions

The scenario: Two strong candidates. Different strengths. The team is split.

Why Decision Matrix:Hiring decisions are high-stakes and emotionally charged. Without structure, you'll hire the person who interviews well, not the person who'll perform well.

What you get:A framework that surfaces disagreement constructively. When the team argues about candidates, they're arguing about criteria weights — which is the right argument to have.

Other Reasoning Tools

Different failures need different interventions. Pick the right tool for the moment.

🪞

Your AI Validates Your Gut. This Stress-Tests It.

$20/month for weighted criteria and sensitivity analysis. When stakeholders ask 'why this option?', you'll have the receipts.

Get Started Read the Docs

Navigation