# GenAlphAI

> GenAlphAI is a research-driven AI publication for engineers and operators: deep, evidence-backed analysis of agentic systems, model evaluation, and the economics of AI software.

## Pillars

- [SWE-bench Pro vs Verified: Can You Trust Coding Benchmarks?](https://genalphai.com/swe-bench-pro-vs-verified/): OpenAI deprecated SWE-bench Verified after finding flawed tests in 59.4% of hard tasks. How SWE-bench Pro and DeepSWE's 32.5% verifier error rate change agent evaluation.

## Articles

- [Context Rot and the Dumb Zone: Engineering Past 100k Tokens](https://genalphai.com/context-rot-and-the-dumb-zone/): Context rot degrades LLM agents well inside advertised windows. Why the ~100k dumb zone exists, what 'lost in the middle' research shows, and the inner-loop/outer-loop architecture that fixes it.
- [AGENTS.md vs CLAUDE.md vs Cursor Rules: Agent Config Done Right](https://genalphai.com/agents-md-vs-claude-md/): AGENTS.md, CLAUDE.md, and .cursor/rules compared: three-tier permissions, context budgeting, and the canonical-plus-adapters pattern that keeps coding agents obedient.
- [The Ralph Wiggum Loop: Why Stateless Agents Beat Smart Ones](https://genalphai.com/ralph-wiggum-loop-stateless-agents/): The Ralph Wiggum loop re-feeds one prompt to a fresh agent process forever, using files and git as the only memory. Why this dumb pattern keeps winning.
- [Reasoning-First LLMs: Make Models Reason, Not Rationalize](https://genalphai.com/reasoning-first-llms/): LLMs rationalize answers they already chose. Process supervision, self-consistency, and faithfulness probes force models to reason to the right answer.

## Important Links

- [llms-full.txt](https://genalphai.com/llms-full.txt): Consolidated Markdown archive of all articles.
- [RSS](https://genalphai.com/rss.xml)