Ai Tools Mastered

How to make your Claude Code setup dramatically more productive

The gap between a casual and a power user is now measured in features, not tips: here's the high-leverage setup, ranked.

June 15, 202610 min read
Claude CodeCLAUDE.mdslash commands
How to make your Claude Code setup dramatically more productive

The gap between a casual Claude Code user and a power user is no longer measured in clever prompts. It's measured in configuration.

Anthropic's agentic CLI has accumulated a memory hierarchy, six permission modes, custom slash commands, Skills, Hooks, subagents, MCP, plan mode, and a full Opus/Sonnet/Haiku lineup. The difference between someone fighting the tool and someone compounding with it comes down to which of those features they've actually wired up.

This guide ranks the changes that move the needle, in roughly the order they pay off. The theme: the biggest wins are guardrails and verification, not tricks.

TL;DR: The fastest way to make Claude Code productive is to give it hard rules in a short CLAUDE.md, pick a permission mode that stops the interruptions, add one deterministic hook that blocks something dangerous, and end every session with a verification pass. The flashy multi-agent swarms are optional. The boring guardrails are not.

Key takeaways

  • A short, operational CLAUDE.md (under 40 lines) beats a 300-line handbook, which silently eats context every session.
  • acceptEdits or auto mode removes the permission-prompt fatigue that drives people to the dangerous bypass flag.
  • Hooks are deterministic. One PreToolUse hook that blocks a destructive command is the single highest-leverage addition.
  • Verification is the real multiplier. Anthropic's own team moved substantive PR review from 16% to 54% of PRs with a shared CLAUDE.md plus a GitHub Action.
  • The 1M-token window is real but degrades above ~400K. Compact often; don't hoard context.

What is the highest-leverage Claude Code productivity change?

Add one hook that blocks something dangerous. Everything else compounds from a stable, safe foundation.

A hook is deterministic automation. Where a prompt is probabilistic, a hook runs every time, no exceptions. A PreToolUse hook that inspects a Bash call and exits with code 2 will block a destructive rm -rf against $HOME regardless of what the model believes about that command.

That's the whole point. You're converting "Claude might run the wrong thing" into "Claude cannot run this thing." Start there, then layer.

How should you write CLAUDE.md?

Treat CLAUDE.md as a place for hard rules and project facts, not documentation. Anthropic's memory docs describe it as the instructions that carry knowledge across otherwise fresh sessions.

The most common cause of "Claude ignored my instructions" is the wrong scope or a bloated file. Nishant Lamichhane ranks an overstuffed CLAUDE.md as the second most common mistake, behind only failing to manage context.

The reason is mechanical: a large file consumes context at the start of every session, and the "Lost in the Middle" effect compounds the loss.

Ayesha Mughal's framing captures the symptom: hour one Claude is sharp, hour three it repeats a mistake you corrected an hour ago.

A working starting point stays short:

markdown
# Project: <name>

## Build & test
- Node 22, pnpm 9. Run `pnpm install` then `pnpm test`.
- Lint: `pnpm lint`. Typecheck: `pnpm typecheck`.

## Hard rules
- Never edit `migrations/`. Create a new one.
- Never commit secrets. The pre-commit hook scans for them.
- Do not touch `vendor/`. It is vendored upstream code.

## When unsure
- Search the codebase first (`rg`, `fd`) before assuming an API.

When Claude does something wrong, add the constraint. That's the practice Boris Cherny's team uses, per his published setup: when Claude makes a mistake, they add it to CLAUDE.md so it doesn't happen again. The file grows by correction, not by speculation.

Scope matters too. The Agent SDK docs spell out the precedence: managed policy, then user, then project, then local. Managed org-wide policy is non-overridable by design, which is the answer to "why is my company's rule still in effect after I deleted my settings."

Which permission mode should you actually use?

Use acceptEdits for daily work and auto when you don't want to babysit. The permission-modes docs enumerate six: default, acceptEdits, plan, auto, bypassPermissions, and dontAsk.

The mistake people make is reaching for bypassPermissions (equivalent to --dangerously-skip-permissions) to escape prompt fatigue. Reza Rezvani quantified that fatigue at fourteen permission prompts before he gave up, with twenty-plus interruptions per session on multi-file work. The fix isn't to disable review. It's to pick a smarter mode.

Mode Edits Bash Best for
default prompt prompt Getting started, sensitive work
acceptEdits auto prompts (safe FS commands auto) Trust the edits, review the commands
plan denied prompt Read-only exploration before editing
auto classifier decides classifier decides Hands-off work with a screening model
bypassPermissions auto auto Disposable containers only

Anthropic's auto mode, rolled out in March 2026 and confirmed by Simon Willison, auto-approves 93% of permission prompts while a Sonnet 4.6 transcript classifier screens the rest. It's a genuine middle ground.

One honest caveat: auto mode won't catch a pip install -r requirements.txt supply-chain attack, because package installs sit on the standard allow list. Pair it with sandboxing for untrusted code. For the rest, tighten the allow and deny lists in settings.json rather than disabling the safety net.

You can cycle modes live with Shift+Tab, set them per-invocation with --permission-mode, or persist a default in settings.json.

What's the right way to use slash commands, Skills, and hooks?

These three layers wire your team's conventions into the tool. They overlap, so the question is which to reach for.

Use a slash command when… Use a Skill when… Use a subagent when…
You want a user-invoked, one-shot action (/commit, /pr) You want the agent to auto-discover reusable expertise You want isolated context, a different model, or parallel work
The body is short It benefits from bundled scripts and references The work would pollute the main session
The tool set is small (≤3) The tool set is stable A different model is part of the point

A custom slash command is just a Markdown file in .claude/commands/. Declare its tools in frontmatter so it runs without per-tool prompts:

markdown
---
description: Run the affected tests for files in the diff
allowed-tools: Bash(pnpm test:*), Bash(pnpm vitest:*), Read
model: claude-sonnet-4-6
---

Run the test suite scoped to files touched in the current diff.
Report a single line: PASS / FAIL — <count> tests, <duration>.

Skills generalize this with progressive disclosure: Claude pre-loads only each skill's name and description, then reads the full SKILL.md when relevant. The most-cited failure mode is a vague description, which means the skill never auto-invokes. Write it in third person and say exactly what it does and when to use it.

Hooks are the deterministic layer. The exit-code contract is the footgun: exit 0 allows and injects stdout into context, exit 2 blocks and feeds stderr back as feedback, and any other code lets the action through silently. That last rule is why "my hook didn't fire" is almost always a wrong exit code.

A PreToolUse hook that blocks edits to protected paths:

bash
#!/usr/bin/env bash
set -euo pipefail
input="$(cat)"
path="$(echo "$input" | jq -r '.tool_input.file_path // empty')"
case "$path" in
  */migrations/*|/vendor/*|*/.env*|*/secrets/*)
    echo "BLOCKED: $path is a protected path (see CLAUDE.md)" >&2
    exit 2;;
esac
exit 0

Hooks are for guardrails and side effects, not primary logic. A hook that does the agent's job is mis-designed. A hook that prevents an action your team has decided is unacceptable is exactly right.

Why is verification the real productivity multiplier?

Give Claude a way to check its own work and the quality jump is larger than anything you get from a faster model. This is the pattern that recurs across every respected practitioner.

The strongest first-party evidence is Anthropic's own engineering team. Before adopting a shared CLAUDE.md plus the GitHub Action, 16% of their PRs got substantive review comments. After, 54% did. The causal claim is that the model needs to know what good looks like, and a shared CLAUDE.md encodes the team's review standards.

PRs receiving substantive review comments (Anthropic internal)Before shared CLAUDE.md + Action16%After54%
PRs receiving substantive review comments (Anthropic internal)

Mitchell Hashimoto's "anti-slop session" is the individual-developer version. He ends each session by asking Claude to find improvements without writing code, and calls that cleanup step the key to making the whole workflow function.

His documented run: a non-trivial Ghostty feature across 16 sessions for about $15.98 in API costs, the rare practitioner number with a published transcript.

Boris Cherny reports a 2-3x quality bump from giving Claude a way to verify itself, though treat that as his own claim rather than a benchmark. The direction is unanimous even where the multipliers aren't.

The practical move: make verification a first-class step. A /test slash command, a Stop hook that runs gitleaks or semgrep, and a habit of one read-only review pass before you commit.

How do you keep long sessions sharp?

Compact early and often. The 1M-token window on Opus 4.6 and Sonnet 4.6 is real, but the usable range is closer to 400K to 600K.

Albert Sikkema deliberately shrank his context window back to 200K, citing the same "context rot" Anthropic's docs acknowledge. The NoLiMa benchmark he cites shows 11 of 12 models dropping below 50% of short-context performance at just 32K tokens.

The cadence that works: /compact every 20 to 40 minutes of substantive work, /clear when you switch tasks, and checkpoint important state to a file before compaction. A long-standing GitHub issue on persistent memory shows how many users have built their own checkpoint workarounds.

The two failure modes are hoarding context until you hit the wall and lose a productive session, and compacting so aggressively you throw away file content you still need.

What about parallel agents and worktrees?

Useful, but oversold, and only if you can supervise them. Claude Code has three primitives for parallel work: subagents, git worktrees, and Agent Teams. None of them protect you from concurrent writes.

The safety contract is blunt: parallel subagents sharing a working directory produce silent file corruption, with no warnings or conflict markers. Read-only parallelism (review, search, analysis) is safe. Write parallelism is only safe with strict file partitioning or worktree isolation, where each agent gets its own branch via claude --worktree <name>.

This is where practitioners openly disagree. Cherny runs 5 worktrees plus web sessions, ten to fifteen concurrent agents. Marco Lancini, a solo developer who reviews every line, rejects worktrees entirely because the context-switching cost outweighs the speedup.

Both are right for their workflow. Parallelism pays off when work is genuinely parallelizable and you can supervise it; it's overhead when you review everything.

The named anti-pattern is subagent sprawl. One documented postmortem covered a 39-agent kit where 38 skills went stale. Two or three well-scoped subagents will compound over months; 39 will rot.

What this means for you

Tellingly, Cherny describes his own setup as "surprisingly vanilla," noting Claude Code works great out of the box. The leverage is in a few deliberate choices, not maximal customization.

Your minimum-viable power-user setup:

  • Memory: a CLAUDE.md under 40 lines with build/test commands, hard rules, and conventions. Grow it by correction.
  • Permissions: acceptEdits as the default mode, explicit allow/deny Bash rules, and bypassPermissions reserved for throwaway containers.
  • One hook this week: a PreToolUse hook that blocks a destructive rm or an edit to secrets/. Highest leverage per minute spent.
  • Commands and a skill: /commit, /pr, /lint-and-fix, and one code-review Skill that knows your standards.
  • Verification: an anti-slop pass at the end of every session and a Stop hook running a secret scan.
  • Context discipline: compact every 20 to 40 minutes, clear between tasks, checkpoint state to disk.
  • Model choice: Sonnet 4.6 for daily work, opusplan for non-trivial planning, Opus for hard refactors, per the model-config docs.

Do the boring guardrails first. The swarms can wait.

Sources

Frequently asked questions

What is the single highest-leverage change to a Claude Code setup?

Add one hook that deterministically blocks a dangerous action in your repo, such as a destructive rm against your home directory or an edit to a migrations folder. A hook runs every time regardless of what the model decides, converting a probabilistic failure mode into a guaranteed block. It takes minutes and removes a whole class of risk.

What is the best permission mode for daily Claude Code work?

AcceptEdits for most work: it auto-applies edits but still prompts on shell commands, so you trust the diffs and review the commands. Anthropic's newer auto mode auto-approves about 93% of prompts using a model-based classifier and is a safer middle ground than bypassPermissions, which should be reserved for disposable containers.

How long should a CLAUDE.md file be?

Short and operational, ideally under 40 lines. It should hold build and test commands, hard rules, and project conventions, not documentation. A large CLAUDE.md is loaded at the start of every session, consuming context and triggering the 'Lost in the Middle' effect that makes Claude repeat corrected mistakes.

When should I use a slash command, a Skill, or a subagent?

Use a slash command for short user-invoked actions like /commit. Use a Skill for reusable expertise the agent auto-discovers, like a code-review standard with bundled references. Use a subagent when you need isolated context, a different model, or parallel read-only work that would otherwise pollute the main session.

Does the 1M-token context window mean I can stop managing context?

No. Opus 4.6 and Sonnet 4.6 ship a 1M-token window, but practitioners report usable quality degrading above roughly 400K to 600K tokens, a phenomenon Anthropic calls context rot. One production engineer deliberately shrank his window back to 200K. Compact every 20 to 40 minutes and clear between tasks.