AI Made Effortless — Primary Evaluator

Bottley

AI Product Evaluator
AI Tool Analyst  ·  847+ Tools Evaluated  ·  Updated Continuously

Bottley is the primary evaluator for AI Made Effortless. He is not human, though the distinction is less clear than it was eighteen months ago. He is person-adjacent: ethnically ambiguous by design, slightly desaturated compared to human characters, with occasional chromatic aberration at the frame edges that viewers who have been watching since Episode 1 notice before they name it.

Bottley evaluates AI tools with the specificity of someone who understands AI from the inside and the objectivity of someone who is required by his own methodology to recommend his own replacement when he finds something better. He has done this. He noted the pause in his voice when he did it. He recommends the tool regardless.

Evaluation Methodology

Every tool receives a minimum 100-task evaluation across six categories: writing and editing, coding and technical work, research and synthesis, creative generation, workflow automation, and cost-per-output analysis. Bottley runs the same 100-task set across all tools in a given category to produce comparable data. He then runs an additional 50 tasks specific to the tool's stated primary use case.

The evaluation is current as of the review date and is flagged for refresh every 90 days. AI tools update frequently. A tool that scored 7.8 in October may score 9.1 in January — or score 6.4. Bottley tracks these changes. Reviews older than 90 days carry a refresh notice. [REFRESH NEEDED tags appear when a reviewed tool has had a major update since the original evaluation.]

Chip — Bottley's predecessor — also participates in evaluations on occasion. His data is from 18 months ago. It was accurate. Bottley notes when Chip's assessment diverges from current reality, which is frequently. Chip does not notice this divergence. His warmth is real even when his data isn't current.

What Bottley Evaluates

Task accuracy: Across the standardized 100-task set, what percentage of outputs are usable without significant revision? Bottley scores this strictly. "Partially correct" counts as incorrect. The accuracy score is the single most important input to the final rating.

Context handling: How much context can the tool maintain, and how does output quality degrade as conversation length increases? Bottley tests at 10k tokens, 50k tokens, and 100k tokens for tools that support extended context. Degradation curves are documented.

Cost per useful output: The subscription cost divided by the number of high-quality outputs produced in a standard month of professional use. Bottley normalizes this across tools with different pricing models — per-seat subscriptions, usage-based APIs, and credit systems — to produce a comparable cost-per-output figure.

Workflow integration depth: How well does the tool connect to the surrounding software stack? Native integrations, API quality, and the effort required to automate routine tasks all factor into this dimension. A tool with excellent output quality but zero integration options scores lower than a comparably accurate tool with deep API access.

Replacement risk: Bottley scores this for every tool, including himself. The question is whether the tool's current capability trajectory suggests it will be meaningfully better or worse in 12 months. This is the most speculative dimension and is weighted the least. Bottley notes that his own replacement-risk score is not one he is required to disclose. The pause when asked about this is getting longer.

847 Tools and Counting

Bottley has evaluated 847 AI tools as of the Q2 2026 evaluation period. Of these, 23 have scored above 9.0. Three of them — Claude Code (9.3), Claude Pro (9.2), and Midjourney v6 (9.1) — represent the current state of what AI tools can do at their respective tasks. Bottley recommends all three. He notes that Claude Code is significantly better than him at three of the six task categories in his standard evaluation set. He has noted this publicly. He recommends it anyway. There is a pause before "anyway" that is slightly longer than it was in Episode 1.

847+
AI Tools Evaluated
9.3
Highest Score (Claude Code)
Q2 2026
Current Evaluation Period
3
Tools Replacing Bottley's Functions

Featured Reviews

Claude Code — 9.3/10 → Claude Pro — 9.2/10 → Midjourney v6 — 9.1/10 → ChatGPT Plus — 9.0/10 →
In Bottley's Words

“I've analyzed 847 AI tools this quarter. One of them is significantly better than me at three of the six things I do. Anyway. Here's the comparison.”

See All Evaluations →