The Data Praxis

The Virtue of Laziness: Why AI-Generated Code Is Making Systems Larger, Not Better

Thu, 30 Apr 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: The empirical case that AI coding tools increase system complexity, not just productivity. Four independent studies (GitClear, Uplevel, CodeRabbit, Google DORA), Bryan Cantrill's 'virtue of laziness' thesis, historical precedent from prior constraint removals, and a complexity governance framework with metrics and practices.
**Who should read it**: Engineering leaders, principal architects, AI Governance teams, and anyone responsible for codebase health in organizations adopting AI coding tools.
**Key takeaway**: AI coding tools amplify whatever incentive structure you already have. If you measure velocity, they will make you faster. If you measure simplicity, they will make you simpler. Most organizations only measure velocity. The complexity is accruing unchecked.
**Bottom line**: Every prior removal of an engineering constraint (cheap storage, cheap compute, cheap bandwidth) expanded systems without improving them. AI-generated code is following the same pattern. You need a complexity governance framework alongside your productivity metrics.

Four independent studies confirm AI coding tools increase code churn, bug rates, and system complexity. Bryan Cantrill's argument that LLMs lack the 'virtue of laziness' is not a hot take; it is a testable hypothesis with growing empirical support. Here is the governance framework to manage it.

Gemma 4, Decoded: Why Google Released It Free and How It Actually Works

Sun, 19 Apr 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: A decoded view of Gemma 4 across three dimensions. The strategy (why Google released it free and what the Apache 2.0 license change actually signals), the architecture (MoE routing, interleaved attention, multimodal encoders, Gemini-to-Gemma distillation), and the practitioner framework (when to pick closed, open-weight, or truly open-source).
**Who should read it**: VPs of Data and AI, Principal Data Architects, Platform leaders picking foundation models, and Governance/Risk leads who need a defensible framework for 'is this model safe to build on?' conversations.
**Key takeaway**: 'Open' is a three-tier category, not a binary. Closed (GPT-5, Claude Opus, Gemini) keeps weights private. Open-weight (Gemma 4, Llama 4, Qwen, Mistral, DeepSeek) releases weights but not training data or code. OSI-compliant open-source (OLMo 3, Pythia, BLOOM) releases all three. Gemma 4 is now in tier two with a genuinely permissive license, a meaningful shift, but still not tier three.
**Bottom line**: Google did not release Gemma 4 for free out of generosity. It released Gemma 4 to deny Meta the open-weight monopoly, match Chinese permissive licensing, and funnel developers into Google Cloud. The practitioner's job is not to celebrate the free model; it is to choose the right tier for the workload and read the license before the vendor reads it to you.

Google released Gemma 4 under Apache 2.0 on April 2, 2026. The license change is the real story, not the benchmarks. This is the three-tier framework for 'open' AI (closed, open-weight, open-source), a technical breakdown of how Gemma 4's MoE and multimodal pipeline work, and a practitioner decision flow for picking the right tier.

The $0.20 Jailbreak: Why LLM Safety Alignment Is Shallow

Sat, 11 Apr 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: Why safety alignment in LLMs is fragile by design. The mechanism behind guardrail collapse after fine-tuning. Five documented cases of safety bypass, from GPT-3.5 ($0.20) to GPT-4o (3.6% refusal rate). The dataset similarity finding that predicts which fine-tuning jobs will degrade safety most.
**Who should read it**: AI Engineers fine-tuning foundation models, Model Risk teams validating AI deployments, and AI Governance leads responsible for safe AI operations.
**Key takeaway**: Safety alignment in current LLMs is concentrated in the first few output tokens. It is a learned reflex, not deep understanding. Fine-tuning overwrites this reflex whether or not the training data contains harmful content.

Fine-tuning GPT-3.5 on 10 examples for $0.20 strips its safety guardrails. Removing safety from Llama 3 takes 5 minutes on one GPU. This article explains the mechanism: safety alignment concentrates in the first few output tokens, creating a shallow defense that fine-tuning, prefilling attacks, and adversarial suffixes bypass trivially.

The Benchmark Illusion: Why Passing Safety Tests Means Almost Nothing

Sat, 11 Apr 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: The most comprehensive safety evaluation published to date (32 models, 56 attacks, 4.6 million API calls), the Goodhart's Law problem in safety benchmarks, the Safety Tax (7-31% accuracy drop), and a comparative analysis of 10 alignment techniques from RLHF through DOOR, SPF, and LoRA-based alignment.
**Who should read it**: ML Engineers selecting and fine-tuning models, AI Governance teams defining safety evaluation standards, and Model Risk teams validating AI deployments.
**Key takeaway**: No single alignment technique covers the full attack surface. The most practical defense today is a three-layer approach: training-time alignment, fine-tuning-time preservation, and inference-time guardrails.

A study of 32 models across 56 jailbreak techniques found attack success rates jumping from 0.6% to 96.3% depending on the attack type. The Safety Tax costs 7-31% accuracy. Ten alignment techniques are competing to solve this. None covers the full attack surface.

LLM Safety After Fine-Tuning: Governance, Regulation, and What To Do

Sat, 11 Apr 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: EU AI Act obligations for fine-tuned models (August 2, 2026 deadline), the provider-to-deployer liability shift, reasoning models as autonomous jailbreak agents (97% success rate), agentic AI safety amplification, vendor safety landscape (OpenAI, Anthropic, Meta, Google, Amazon), and a minimum viable safety governance checklist.
**Who should read it**: AI Governance leads, Chief AI Officers, compliance officers, and Model Risk Management teams responsible for regulatory readiness and safe AI operations.
**Key takeaway**: When you fine-tune a foundation model, the original provider's safety evaluation no longer applies and regulatory responsibility can shift to you. Only 50% of organizations have formal AI guardrails. The checklist at the end of this article is where to start.

The EU AI Act makes you responsible for safety when you fine-tune. Reasoning models can autonomously jailbreak other models at 97% success. Half of organizations have no formal AI guardrails. This article provides the regulatory map, the liability analysis, and a minimum viable safety governance checklist.

Privacy-Preserving Computation: Encrypted Processing, Federated Learning, and the Explainability Paradox

Sat, 11 Apr 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: Homomorphic encryption (computing on encrypted data), federated learning (training models without centralizing data), secure multi-party computation (joint analysis across organizations), and the SHAP/LIME explainability-privacy tension created by GDPR Article 22 and EU AI Act Article 86. Part 6 showed Meridian deferring HE (47x latency) and rejecting FL (single-tenant architecture). This article explains the mechanics behind those decisions and closes with a capstone PET decision framework spanning Parts 7 through 9.
**Who should read it**: Privacy Architects, ML platform engineers, and Data Governance leads evaluating advanced privacy-preserving technologies for Restricted and Highly Confidential data.
**Key takeaway**: SHAP explanations, mandated by regulation, can leak training data. Membership inference attacks using SHAP values can determine whether a specific person's data was in the training set. The new frontier of privacy engineering is building systems that are simultaneously private, useful, and explainable.

Part 6 showed Meridian rejecting federated learning (single-tenant architecture) and deferring homomorphic encryption (47x latency). This article explains the mechanics behind those decisions, introduces secure multi-party computation, and reveals the tension between GDPR's explainability mandate and privacy protection. Concludes with a capstone PET decision framework spanning Parts 7 through 9.

Mathematical Privacy Guarantees: Differential Privacy and Synthetic Data

Sat, 11 Apr 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: Differential privacy (epsilon explained, Laplace vs Gaussian mechanisms, local vs central models), production deployments at Apple, Google, and the US Census Bureau, synthetic data generation (CTGAN, vendor landscape), evaluation metrics, and a decision framework for technique selection. Part 6 introduced Meridian's epsilon decision for query analytics; this article provides the full technical foundation behind that choice.
**Who should read it**: Data Architects evaluating privacy-preserving analytics, privacy officers building data-sharing agreements, and ML engineers training models on sensitive data.
**Key takeaway**: Epsilon is the number that quantifies how much privacy you are trading for utility. Apple uses epsilon 2-8 per use case. The US Census chose epsilon 19.61. There is no universally correct value. Your PET assessment from the Privacy Guide framework (Part 3) should drive epsilon selection based on regulatory context, threat model, and data sensitivity.

Part 6 showed Meridian adopting differential privacy for query analytics. This article explains why: what epsilon means, how noise is calibrated, what Apple and the Census Bureau chose, and when to use synthetic data instead.

Privacy-Enhancing Technologies: Masking, Tokenization, and De-identification

Sat, 11 Apr 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: Three families of operational privacy protection techniques: data masking (static and dynamic), tokenization (vault-based and format-preserving), and statistical de-identification (k-anonymity, l-diversity, t-closeness). Includes Airbnb's classification-to-enforcement pipeline as a production case study.
**Who should read it**: Data Architects, platform engineers, privacy officers, and governance leads who need to translate the PET assessment pattern from Parts 3 and 6 into deployed protection mechanisms.
**Key takeaway**: Parts 3 and 6 of this guide treated PETs as governance decisions: adopt, reject, or defer. This article provides the operational depth behind those decisions: which technique to deploy for which context, and where each technique breaks down.

Part 3 introduced PETs as governance decisions. Part 6 showed Meridian evaluating them. This article explains how each technique actually works: static and dynamic masking, vault-based and format-preserving tokenization, and the k-anonymity family of de-identification methods.

Willison's Agentic Engineering Patterns: What Data Practitioners Should Steal

Sat, 04 Apr 2026 00:00:00 GMT

Executive Briefing:

**The core risk**: Bad code crashes visibly. Bad data looks plausible. Agent-assisted data work is riskier than agent-assisted software because failures are silent.
**What transfers**: Willison's Red/Green TDD maps to data contracts before transformation. His testing discipline gives data teams a concrete framework for making existing tools (dbt tests, Great Expectations) the primary interface with agents.
**What doesn't transfer cleanly**: 'Pipelines are cheap' is true for authoring cost but not compute cost. 'Hoard knowledge' is harder for data because the most valuable institutional knowledge lives in people's heads, not in code.
**The uncomfortable truth**: Data engineering already has testing tools. What most teams lack is the discipline of using them as acceptance criteria for agent output.

Bad code crashes visibly. Bad data looks plausible. That asymmetry makes agent-assisted data work riskier than software, and it is why Simon Willison's Agentic Engineering Patterns guide matters for data practitioners. His Red/Green TDD maps to data contracts before transformation. His testing discipline gives teams a framework for agent verification. But some patterns need adaptation: 'pipelines are cheap' is only half true, and hoarding knowledge is harder when institutional context lives in people's heads, not in code.

Context Engineering, Formalized: Five Criteria That Validate the Agent Quality Thesis

Sat, 04 Apr 2026 00:00:00 GMT

Executive Briefing:

**The paper**: Vishnyakova (2026) introduces Context Engineering as a standalone discipline with five quality criteria (Relevance, Sufficiency, Isolation, Economy, Provenance) and frames context as 'the agent's operating system.' Published March 2026, arXiv 2603.09619.
**Why it matters**: The paper independently validates the thesis from this blog's three-part agent quality series: the context window is an unmonitored data pipeline, it needs engineered quality controls, and automated checks are necessary but not sufficient without human judgment.
**What it adds**: Two concepts absent from the blog's framework: Isolation (sub-agents must not see each other's context) and a four-level maturity pyramid where Context Engineering is necessary but not sufficient. Intent Engineering and Specification Engineering sit above it.
**The convergence signal**: Multiple recent analyses are pointing toward the same structural insight in early 2026. When independent analyses converge, the underlying thesis is likely correct.

Vishnyakova's 'Context Engineering' paper (arXiv 2603.09619) proposes five production-grade quality criteria for agent context and a four-level maturity pyramid. The framework independently validates the thesis from our three-part agent quality series and extends it with Isolation, Economy, and two higher-order disciplines: Intent Engineering and Specification Engineering.

Multi-Agent Systems: When One Agent Isn't Enough

Sat, 04 Apr 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: When to move from single-agent to multi-agent architecture, the three orchestration patterns that handle most production use cases, task decomposition principles, and why debugging multi-agent systems is fundamentally harder.
**Who should read it**: Engineers and architects who have a working single agent and are evaluating whether to add more. Not a starting point for beginners.
**Key takeaway**: Research shows a single agent with well-organized skills can match multi-agent accuracy while using fewer tokens and lower latency. Most teams that think they need three agents actually need one agent with better context.
**The uncomfortable truth**: Anthropic's own multi-agent research system initially spawned 50 subagents for simple queries and endlessly searched for nonexistent sources. If Anthropic's team over-engineered multi-agent, your team probably will too.

Nine articles in this series used a single agent. This one explains when that stops being sufficient and what to do about it. Four signals tell you it is time. Three patterns handle 90% of cases. The hardest part is not building the system; it is debugging it when something goes wrong.

Privacy in Practice: Diagnosing the Gaps and Building the Foundation

Sat, 04 Apr 2026 00:00:00 GMT

Executive Briefing:

**The scenario**: Meridian Analytics, a B2B SaaS company with an AI copilot feature, receives a DPIA request from a major EU client. Their privacy team cannot answer basic questions about where data goes, who processes it, or how long it is retained.
**The diagnosis**: Mapping Meridian's current state against the 8-component privacy framework reveals gaps in every layer. No AI-specific Data Classification, blanket consent, vague retention, undocumented sub-processors.
**What they build**: This article walks through implementing the Foundation Layer (Data Classification with AI categories, ML-aware retention schedules) and Control Layer (three-tier consent architecture, sub-processor registry). Every artifact is shown populated, not as an empty template.
**For practitioners**: If your organization has shipped AI features without updating your privacy infrastructure, Meridian's gaps are likely your gaps. Start with the diagnostic checklist at the end.

A fictitious B2B SaaS company receives a DPIA request it cannot answer. This walkthrough applies the privacy framework from Part 3 to build Data Classification, retention schedules, consent architecture, and sub-processor transparency from scratch.

Privacy in Practice: From Compliant to Operationally Ready

Sat, 04 Apr 2026 00:00:00 GMT

Executive Briefing:

**Continuing the Meridian walkthrough**: Part 5 built the foundation (Classification, retention, consent, sub-processors). This article completes the privacy program with the Compliance and Governance layers.
**Cross-border transfers**: Meridian maps every data flow for Copilot across EU and US infrastructure, documenting legal mechanisms per jurisdiction pair. The 36% visibility stat from InCountry becomes personal.
**EU AI Act readiness**: Copilot is classified as limited risk, but Meridian's insurance analytics feature qualifies as high-risk under Annex III. Full Article 10 compliance required by August 2026.
**The payoff**: Six months later, Meridian answers the same Allianz DPIA in three business days. The article closes with the before/after comparison.

Meridian Analytics completes its privacy transformation. This walkthrough covers cross-border transfer documentation, EU AI Act compliance mapping, PET assessments, the governance operating model, and what the company looks like six months later when Allianz asks the same DPIA questions.

The Data Privacy Practitioner's Guide

Sat, 04 Apr 2026 00:00:00 GMT

Executive Briefing:

**What this is**: A ten-part guide covering Data Privacy from company teardown through program design to full implementation, plus a three-part deep dive into Privacy-Enhancing Technologies. Total read time across all ten parts: about 2 hours 45 minutes.
**Who it's for**: Chief Privacy Officers, Data Governance leads, privacy engineers, data architects, and anyone building, reforming, or evaluating a privacy program in an organization that collects user data or deploys AI.
**Core thesis**: Your privacy posture reflects your revenue model whether you design for it or not. Netflix and Apple prove this from opposite directions. The framework articles show how to build a program that accounts for AI from day one, grounded in real enforcement actions and the regulatory landscape as it stands in 2026. The PET deep-dive (Parts 7-9) provides the technical depth behind every protection technique referenced in the framework.
**Scope**: ~33,000 words across ten articles covering 2 company teardowns, an 8-component privacy framework, 20+ US state privacy laws, GDPR enforcement data, EU AI Act obligations, a complete implementation walkthrough with populated artifacts, and a three-part technical series on Privacy-Enhancing Technologies from masking through differential privacy to encrypted computation.

A ten-part series from teardown to framework to implementation. Two company analyses (Netflix, Apple), an 8-component privacy program framework, the 2026 regulatory landscape, a complete implementation walkthrough using a fictitious B2B SaaS company, a three-part deep dive into Privacy-Enhancing Technologies, and a synthesis of what it all means for practitioners.

What This Series Taught Me About Privacy

Sat, 04 Apr 2026 00:00:00 GMT

Executive Briefing:

**The arc**: From Netflix's data monetization gaps through Apple's structural privacy limits, a complete implementation walkthrough with Meridian Analytics, and a three-part deep dive into Privacy-Enhancing Technologies, this series maps what privacy actually looks like in practice.
**What surprised me**: Apple's privacy investment is genuine, not marketing. Enforcement is faster than expected. AI privacy is a present enforcement target, not a 2027 concern. The hardest implementation problem was cross-border transfers, not classification. And the regulation that demands model explanations directly conflicts with the regulation that demands privacy protection.
**What to do**: Start with the Do Next table at the end. Seven actions, from this week to this year, synthesized from all ten parts.

The conclusion to the Data Privacy series. Two company teardowns, a framework, a regulatory map, two implementation walkthroughs, and a three-part deep dive into Privacy-Enhancing Technologies. Here is what surprised me, what I got wrong, and what practitioners should do next.

The Evolution of AI Agents: From AutoGPT to Production (2023-2026)

Thu, 02 Apr 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: A chronological narrative of AI agent development from the AutoGPT explosion (March 2023) through production maturity (2026), organized into four distinct phases.
**Who should read it**: Engineering leaders, product managers, and architects evaluating where their agent initiatives sit on the maturity curve.
**Key takeaway**: The field matured from 'let the agent loop until it works' to disciplined engineering with evals, guardrails, observability, and human-in-the-loop design in under three years.
**The uncomfortable truth**: Most organizations are still building Phase 1 agents while the industry has moved to Phase 4 patterns. Gartner predicts over 40% of agentic AI projects will be canceled by 2027.

A practitioner's timeline of how AI agents evolved from viral GitHub demos to production infrastructure in three years. The hype, the correction, the protocols, and the lessons that survived.

Observability: Seeing What Your Agent Actually Does

Wed, 01 Apr 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: The five dimensions of agent observability (execution tracing, token economics, tool call monitoring, context window health, behavioral drift), the tooling landscape (open-source and commercial), and a week-by-week instrumentation plan.
**Who should read it**: Platform engineers, SREs, data architects, and engineering leaders operating or evaluating AI agents in production environments.
**Key takeaway**: Evals tell you if the output is good. Guardrails prevent known failures. Observability reveals the failures you did not anticipate. Without it, you are operating a system that makes autonomous decisions with no visibility into how or why.
**Why it matters now**: 89% of agent teams have some form of observability, but only 52% run offline evals. Most teams can see that the agent responded. They cannot see how it got there. The gap between 'it ran' and 'it reasoned correctly' is where production failures hide.

Your monitoring says 200 OK. The agent returned the wrong answer. Traditional APM was designed for deterministic software. Agents reason, branch, and call tools in sequences they decide at runtime. This article covers the five dimensions of agent observability, the tooling landscape, and a practical instrumentation plan.

Prompt Engineering for Production Agents

Wed, 01 Apr 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: Five production prompt engineering patterns that eliminate the inconsistency, hallucination, and schema drift that plague agent systems beyond the prototype stage.
**Who should read it**: Engineers building agent tool chains, data practitioners designing extraction pipelines, and technical leaders evaluating agent reliability in production.
**Key takeaway**: Most prompt failures in production are not model failures. They are specification failures: vague criteria, missing examples, schemas without escape hatches. Fixing the prompt specification fixes the output.
**Why it matters now**: As agents move from demos to production, the gap between 'works in my notebook' and 'works reliably at scale' is almost entirely a prompt engineering gap. The five patterns in this article close it.

Production agents need prompts that produce consistent, structured output under adversarial conditions. This article covers the five patterns that separate production prompt engineering from tutorial-grade prompting: explicit criteria, few-shot examples, nullable fields, enum-with-fallback, and output format contracts.

Build a Real Agent This Weekend: From Zero to a Working Research Assistant

Wed, 25 Mar 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: A complete, runnable research assistant agent built with the Anthropic SDK. Three tools (web search, page reader, note writer), structured error handling with error categories and retry logic, context management, loop termination, and a basic output eval.
**Who should read it**: Engineers, data practitioners, and technical leaders who have read about agents but have not built one end-to-end. This article bridges the gap between theory and working code.
**Key takeaway**: A non-trivial agent is roughly 200 lines of Python. The code is straightforward. The hard parts are the ones you do not see in tutorials: classifying errors so the agent knows whether to retry or escalate, managing context growth, knowing when to stop, and measuring whether the output is any good.
**Why it matters now**: The series so far has defined agents (Article 1), established design principles (Article 3), but has not built anything beyond a 15-line weather lookup. You cannot internalize agent patterns by reading about them. You internalize them by building.

The series has defined agents, established design principles, and mapped failure modes. This article builds one. A complete research assistant agent with three tools, structured error handling with error categories and retry logic, context management, and a basic eval, all in one runnable Python file using the Anthropic SDK.

Context Is the Program: Why Data Quality Inside the Agent Matters More Than the Model

Wed, 25 Mar 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: A deep dive on Pike's Rule 5 ('Data dominates') applied to AI agents. The context window is the agent's data structure, and its quality determines behavior more than the model architecture.
**Who should read it**: Data leaders, AI platform engineers, and product managers building or evaluating agentic AI systems who want to understand where to invest for the highest reliability gains.
**Key takeaway**: Upgrading the model is the wrong optimization target. Chroma Research tested 18 models and found that performance degrades continuously as context grows, regardless of model capability. The right data, structured and validated, beats more data in a bigger window every time.
**The practical implication**: A freshness check on tool results takes 10 lines of code and catches the most common silent failure mode in agent architectures. Context quality controls are cheap to build and expensive to skip.

Pike's Rule 5 says data dominates. In AI agents, the context window IS the data structure. This article traces why context quality determines agent behavior more than model capability, maps the five criteria that define good context, and shows what happens when stale data enters the reasoning loop unchecked.

Evals: How to Know If Your Agent Actually Works

Wed, 25 Mar 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: Why most AI agent teams have no evaluation strategy, what to measure, and how to build evals that catch failures before users do.
**Who should read it**: Engineering leads, AI product owners, data architects, and anyone deploying or evaluating agentic AI systems in production.
**Key takeaway**: Developers using AI coding agents believed they were 20% faster. A randomized controlled trial found they were actually 19% slower. If you cannot trust your own perception of agent performance, you need evals.
**The uncomfortable truth**: Only 52% of agent teams run any offline evals at all, per LangChain's 2025 survey of 1,340 practitioners. The rest rely on observability alone, which tells you the agent ran but not whether its output was correct.

Most agent teams ship without evals and rely on 'looks right' testing. Pike's first two rules apply directly: you cannot tell where an agent fails, and you cannot fix what you have not measured. Here is how to build an eval strategy that catches what demos miss.

Guardrails and Safety: The Boundaries Every Agent Needs

Wed, 25 Mar 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: A practical framework for the three guardrail layers every AI agent needs: input safety, reasoning validation, and output quality. Maps where current tooling exists, where it does not, and what to prioritize.
**Who should read it**: Data architects, ML engineers, platform engineers, AI product owners, and risk officers building or deploying agentic AI systems.
**Key takeaway**: Most guardrail frameworks defend the input and the output but leave the reasoning layer, where tool results enter the context window, unprotected. This is where the compound error problem lives.
**The uncomfortable truth**: An agent with 85% per-step accuracy fails 80% of the time over 10 steps. Every additional tool, hop, or agent in a chain is a new failure surface. Simpler architectures are not a compromise; they are a safety strategy.

Pike's Rule 4 says fancy algorithms are buggier. In agent systems, complexity multiplies failure surfaces. This article maps the three guardrail layers every agent needs, identifies the gap most frameworks miss, covers escalation patterns and workflow gates, and explains why simpler architectures are safer.

Pike's Five Rules Are Now the Five Rules of Agent Development

Wed, 25 Mar 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: A rule-by-rule mapping of Rob Pike's five rules of programming (1989) onto the five dominant failure modes in AI agent development, with evidence from 2025-2026 research.
**Who should read it**: Engineering leaders, AI product managers, data architects, and practitioners building or evaluating agent systems who need a principled decision framework.
**Key takeaway**: The three principles that made Pike's rules durable for 37 years (measure before optimizing, prefer simplicity, get the data right) are the same three principles that separate agent projects that ship from agent projects that stall.
**The uncomfortable truth**: Most teams building agents are violating all five rules simultaneously. They upgrade models without measuring, build multi-agent architectures without proving a single agent fails, and invest in model selection while ignoring context quality.

Rob Pike wrote five rules of programming in 1989 at Bell Labs. Thirty-seven years later, they map onto AI agent development with striking precision: measure before tuning, start simple, and get the data right. Nobody has made this connection explicitly. Here is the mapping, the evidence, and the framework it gives you.

The Practitioner's Guide to AI Agents

Wed, 25 Mar 2026 00:00:00 GMT

Executive Briefing:

**What this is**: A twelve-part series on understanding, building, and improving AI agents. Total read time across all twelve: about 2.5 hours.
**Who it's for**: Anyone from 'I keep hearing about agents but haven't built one' to 'I have agents in production and want to validate my architecture.'
**The backbone**: Rob Pike's five rules of programming (1989), mapped to agent development. Pike was a systems programmer at Bell Labs whose design principles shaped Go, Unix, and Plan 9. His rules are famous because they keep being right. Old wisdom applied to new problems.
**How to read it**: You do not need to read all eleven. Pick your starting point from the guide below.

A twelve-part series that takes you from 'what is an agent?' to building self-improving systems. Pick your starting point based on where you are today.

The Self-Improving Agent: From Static Prompts to Learning Systems

Wed, 25 Mar 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: How to build agents that improve over time, from the simplest feedback loop (one agent, one file, one metric) through inner/outer loop architectures that separate daily execution from periodic learning.
**Who should read it**: AI engineers, data architects, platform engineers, and product leaders designing agent systems intended to run for months or years, not just once.
**Key takeaway**: Self-improvement follows Pike's Rules 3-4. Start with the simplest possible learning loop. Complexity compounds bugs. Add sophistication only when you have measured evidence that the simple version is insufficient.
**The uncomfortable truth**: Most teams skip straight to multi-agent orchestration, retrieval-augmented memory, and autonomous prompt evolution. The Karpathy Loop produced an 11% speedup with one agent, one file, and one metric. Fancy is not a feature.

Most AI agents run the same prompt every time. The best ones evolve. This article maps the spectrum from static to self-improving agents, introduces the inner loop / outer loop architecture, and walks through a real system that learns from feedback weekly. Pike's Rules 3-4 set the boundary: start simple, add complexity only when measurement demands it.

From Problem to Agent: An Implementation Reference Guide

Wed, 25 Mar 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: A seven-step decision framework for building an AI agent, applied end-to-end to a Data Quality monitoring agent. Each step maps to a previous article in the series.
**Who should read it**: Data architects, platform engineers, AI product owners, and anyone who has read the earlier articles and wants to see the concepts composed into a single implementation.
**Key takeaway**: Building a good agent is 20% coding and 80% decision-making. The seven steps force you to answer the hard questions before you write the first line of code: Should this even be an agent? What context does it need? How will you know if it works?
**Why it matters now**: Most agent projects fail not from bad models but from skipped steps. This guide makes every step explicit so you can audit your own process against it.

The series taught ten concepts across ten articles. This capstone walks through all of them applied to one problem: building a Data Quality monitoring agent. Seven steps, from problem definition through production deployment, showing the decision-making process that separates agent projects that ship from agent projects that stall.

What Is an AI Agent (and What Isn't)?

Wed, 25 Mar 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: A clear definition of AI agents, the four components every agent shares (LLM, tools, memory, loop), and the spectrum from chatbot to autonomous agent.
**Who should read it**: Anyone hearing 'AI agents' in meetings and wanting a precise mental model, from data practitioners to engineering leaders evaluating agent-based products.
**Key takeaway**: An agent is not a smarter chatbot. It is a loop: the LLM observes context, decides on an action, executes it via a tool, and repeats until the goal is met. The loop is what separates agents from everything else.
**Why it matters now**: Agents are becoming the primary interface to data platforms, APIs, and enterprise systems. Understanding what they are (and what they are not) is the prerequisite for building, evaluating, or governing them.

An AI agent is a system that uses an LLM to decide which actions to take in a loop until a goal is met. This article breaks down the four components every agent shares, the spectrum from chatbot to autonomous agent, what tool calling actually looks like in code, and the design principles that separate good tool definitions from bad ones.

When NOT to Build an Agent

Wed, 25 Mar 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: A decision framework for when AI agents are the wrong tool, with six disqualifying criteria, an agent-vs-script comparison table, common anti-patterns, and the Klarna case study.
**Who should read it**: Engineering leaders evaluating agent proposals, product managers scoping AI features, and architects deciding between agents and traditional automation.
**Key takeaway**: If your task has deterministic logic, low-latency requirements, or failure modes where 'almost right' is unacceptable, an agent will cost more, run slower, and fail in ways a script never would.
**Why it matters now**: Gartner estimates only 130 of thousands of 'agentic AI' vendors have genuine agent capabilities. Most agent projects that get cancelled were agent projects that should never have been agent projects. The cheapest agent bug to fix is the one you avoid by not building an agent.

Not every problem needs an AI agent. This article gives you a decision framework for when agents are the wrong choice, with a comparison table, anti-patterns, and the Klarna case study that proves the cost of over-engineering.

From Vibe Coding to Agentic Engineering: What Karpathy's Shift Means for Data Work

Tue, 24 Mar 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: How the shift from manual coding to agent orchestration, as described by Andrej Karpathy in his March 2026 No Priors interview, changes the role of data and AI practitioners.
**Who should read it**: Data architects, AI engineers, platform engineers, and engineering leaders navigating the transition from building systems to orchestrating agents.
**Key takeaway**: The value shift Karpathy describes (from writing code to directing agents) has a direct parallel in data work: from building pipelines to ensuring the quality of what flows through agent context windows.
**The uncomfortable truth**: Karpathy says agent failures are 'skill issues,' not model limitations. The practitioners who thrive will not be the fastest coders but the ones who know how to structure context, evaluate output, and catch errors the agent cannot see.

Andrej Karpathy hasn't written a line of code since December. His 80/20 flip from manual coding to agent orchestration is not a personal anecdote. It is the clearest signal yet that the value in data and AI work has shifted from execution to judgment.

Your AI Agent Has a Data Quality Problem and No One Is Checking

Sat, 21 Mar 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: Why the context window is the newest and most consequential data pipeline in enterprise AI, and why it has no standardized Data Quality controls between tool outputs and LLM reasoning.
**Who should read it**: Data leaders, AI platform engineers, product managers, and risk officers deploying or evaluating agentic AI systems.
**Key takeaway**: The six Data Quality dimensions that enterprises spent two decades building into warehouses and pipelines (Accuracy, Completeness, Timeliness, Consistency, Validity, Uniqueness) have no standardized equivalent inside the context window. Most agents operate without input quality visibility.
**The uncomfortable truth**: Across 1,563 contaminated tool-output turns and 7 LLMs, no agent ever questioned tool-data reliability. If your agent cannot validate what it reads, your 85%-accurate system fails 80% of the time on a 10-step task.

AI agents trust every tool response they receive, with no standardized quality controls between tool-calling outputs and LLM reasoning. This article maps the 6 traditional Data Quality dimensions onto the context window, exposing the most consequential unmonitored data pipeline in enterprise AI.

Judgment-in-the-Loop: The Human Role AI Cannot Automate

Sat, 21 Mar 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: A framework for understanding the human role in AI-augmented work, one defined not by execution speed but by applied judgment.
**Who should read it**: Data leaders, engineering managers, AI product owners, and individual contributors rethinking their value proposition in an AI-native world.
**Key takeaway**: The differentiator is not seniority itself but the applied judgment that experience brings. AI often amplifies existing judgment, domain knowledge, and direction more than it replaces them.
**The uncomfortable truth**: 89% of senior executives across four countries report no labor-productivity impact from AI over the last three years, yet wages in AI-exposed industries are rising 2x faster. In many workflows, the first visible value is capability expansion rather than raw throughput, and only people with domain expertise can direct it.

Everyone talks about keeping a human in the loop. But which human, and what do they bring? The answer is judgment: domain knowledge, institutional memory, and the ability to recognize when AI output looks right but is wrong. This article defines that role and the evidence behind it.

The Missing Data Quality Layer in AI Agent Architecture

Sat, 21 Mar 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: A technical analysis of where Data Quality checks exist in AI agent architectures, where they are missing, and what a fix looks like.
**Who should read it**: Data architects, ML engineers, platform engineers, and AI product owners building or deploying agentic AI systems.
**Key takeaway**: The data flow inside an AI agent crosses four boundaries. Three have validation. The one between tool results and the context window has no standardized quality gate, and it is the one that matters most.
**The uncomfortable truth**: RAG evaluation developed partial patterns for context quality (relevance scoring, faithfulness checking, contradiction detection). Nobody has adapted those patterns for general tool-calling results.

AI agent architectures have quality checks for input safety and output toxicity, but no standardized layer validates whether tool-calling results are accurate before they enter the context window. Here is what that missing layer should look like.

Welcome to The Data Praxis

Sat, 21 Mar 2026 00:00:00 GMT

Executive Briefing:

**What this is**: The launch post for The Data Praxis. Explains the blog's focus on practitioner-grounded analysis of Data Governance, AI, privacy, and the companies building these systems at scale.
**Who should read it**: Anyone new to the blog. Start here for orientation, then follow the links to the content that interests you.
**What you will find**: Company teardowns (Netflix, Apple, Uber), implementable frameworks (privacy programs, governance maturity models, AI oversight), regulatory analysis, and perspective essays on career and decision-making.
**Cadence**: Two articles per week, ongoing.

Why this blog exists, what it covers, and where to start. The Data Praxis bridges the gap between what Data Governance frameworks promise in conference talks and what actually works when you build these systems at scale.

The Data Privacy Regulatory Landscape in 2026: GDPR, CCPA, AI Laws, and the Insurance Market for When AI Goes Wrong

Wed, 18 Mar 2026 00:00:00 GMT

Executive Briefing:

**Regulatory scope**: GDPR fines have reached EUR 5.65 billion across 2,245 enforcement actions. Twenty US states now have comprehensive privacy laws, with no federal standard in sight. The EU AI Act is phasing in high-risk obligations by August 2026.
**The fragmentation problem**: US organizations must now comply with up to 20 different state privacy regimes. Most follow the Virginia template, but key differences in data minimization requirements, applicability thresholds, and enforcement mechanisms create real compliance complexity.
**AI changes the equation**: The EU AI Act, NIST AI RMF, and ISO 42001 are creating a new layer of Data Governance obligations on top of existing privacy law. California's AB 316 eliminates the 'autonomous AI' defense. AI liability insurance is an emerging market with policies covering up to $50 million.
**What to do**: Design for the most restrictive jurisdiction you operate in. Integrate AI governance into your existing privacy program. Evaluate AI liability insurance. The gap between regulation and enforcement is shrinking, not growing.

A practitioner's reference to the global privacy regulatory landscape. GDPR fines have crossed EUR 5.6 billion. Twenty US states have privacy laws with no federal standard. The EU AI Act is phasing in. And a new insurance market is emerging for AI agents that go off script. This is where the rules stand, what they require, and what is coming next.

How to Build a Privacy Program in the Age of AI

Tue, 17 Mar 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: A practical framework for building a privacy program that accounts for AI and ML from day one. Includes Data Classification taxonomies, retention schedules, consent architecture, third-party transparency requirements, and a governance operating model with clear ownership.
**Who should read it**: Chief Privacy Officers, Data Governance leads, AI/ML engineering managers, and anyone responsible for standing up or modernizing a privacy program in an organization that builds or deploys AI systems.
**Key argument**: The EU AI Act (Article 10, effective August 2026), NIST AI RMF, and EDPB opinions are creating new Data Governance obligations that traditional privacy programs were never designed to handle. Bolting AI requirements onto an existing program will fail. The framework must be rebuilt with AI data as a first-class category.
**For practitioners**: Every section includes what to build, how to implement it, and what 'good' looks like, grounded in real enforcement actions (Netflix's EUR 4.75M fine, OpenAI's EUR 15M fine, Clearview AI's EUR 100M+ in penalties) and regulatory requirements.

A practitioner's framework for building a privacy program that treats AI data as a first-class concern. Covers Data Classification for training data, retention schedules for ML pipelines, consent architecture, third-party transparency, cross-border transfers, EU AI Act Article 10, NIST AI RMF, privacy-enhancing technologies, and governance operating models.

Apple Privacy Teardown: When Privacy Is the Product, Where Does It Break Down?

Mon, 16 Mar 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: A teardown of Apple's privacy practices, from the policy itself to the business model that enables privacy positioning. Covers App Tracking Transparency, iCloud China, Siri's $95M settlement, Apple Intelligence's on-device architecture, and the growing Apple Ads business.
**Who should read it**: Data architects, privacy engineers, governance leaders, and anyone who uses Apple products and assumes 'privacy by design' means privacy without gaps.
**Key finding**: Apple earns the highest privacy score among digital platforms (Ranking Digital Rights, 2025) and 79/100 from Common Sense Media for Apple TV+. But Apple's advertising revenue has grown to an estimated $7.4B in 2025, Siri recorded users without disclosure for over a decade, and iCloud data in China is operated by a state-owned entity. Privacy positioning has structural limits.
**For practitioners**: Apple's approach demonstrates that privacy is a business architecture decision, not a moral one. The lesson is not to copy Apple's features but to understand how incentive structures shape what is possible. Your privacy posture will reflect your revenue model whether you design for it or not.

A Data Governance teardown of Apple's privacy practices. What Apple actually collects, how hardware margins fund privacy positioning, where Apple falls short on Siri, China, and its own ad network, and what practitioners can learn from privacy as a business strategy.

Netflix Privacy Policy Teardown: What 325 Million Subscribers Actually Agreed To

Sat, 14 Mar 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: A teardown of Netflix's privacy policy, benchmarked against Apple TV+ and scored by independent evaluators. Includes the €4.75M GDPR fine, the ad-tier data shift, and actionable lessons for Data Governance teams.
**Who should read it**: Data architects, privacy engineers, governance leaders, and anyone who subscribes to Netflix and has never read the policy they agreed to.
**Key finding**: Netflix scores 38/100 on privacy (Privacy Watchdog, Grade D) and 23.7/100 on readability (VPN Overview, 2024). Apple TV+ scores 79/100 on privacy (Common Sense Media, 2024). These are separate evaluators with different methodologies, but the direction is consistent: Netflix trails its peers.
**For practitioners**: Netflix's policy is a case study in the gap between legal disclosure and genuine user understanding. The data categorization structure is strong. The transparency is not.

A Data Governance teardown of Netflix's privacy policy. What they collect, who they share it with, how they compare to Apple TV+, what the €4.75M GDPR fine revealed, and what practitioners can learn from how Netflix structures its data practices.

Harness Engineering: The Real Lesson from OpenAI's Million-Line Experiment

Fri, 13 Mar 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: OpenAI's harness engineering experiment, where AI agents wrote 1 million lines of production code with zero manual coding, and the practical takeaways for product and data teams.
**Who should read it**: Engineering leaders, data architects, product managers, and anyone building with AI agents or wondering how AI changes their team's workflow.
**Key takeaway**: The most valuable engineering skill is shifting from writing code to designing constraints: linters, structural tests, and feedback loops that encode team standards into automated enforcement.
**The uncomfortable truth**: Most teams debating which AI model to use are optimizing the wrong variable. The harness, not the model, determines whether AI agents produce reliable output.

OpenAI built a million-line product in five months without writing code manually. Most coverage focused on the spectacle. The real insight is that harness engineering principles apply to every team building products today, with or without AI agents.

Metadata Management in 2026: Why Lineage Without Context Is Just Expensive Decoration

Fri, 13 Mar 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: Why most Metadata Management investments deliver lineage graphs nobody uses, and what an actionable metadata strategy actually looks like in 2026
**Who should read it**: Data architects, platform engineers, governance leads, and anyone who has ever been asked "why did we spend $500K on a Data Catalog nobody opens?"
**Key takeaway**: Lineage tells you WHERE data flows. Without business context, quality signals, and usage analytics layered on top, it is expensive decoration. The shift from passive cataloging to active metadata orchestration is not optional. Multiple secondary sources attribute to Gartner the prediction that active metadata adoption will reach 30% by 2026
**Bottom line**: Stop buying tools. Start building a metadata strategy that connects lineage to meaning, meaning to trust, and trust to action

Why most Metadata Management investments deliver lineage graphs nobody uses, and what an actionable metadata strategy actually looks like in 2026.

Netflix Got the Hard Parts Right: A Teardown of Their LLM Post-Training Framework

Mon, 23 Feb 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: A collaborative teardown of Netflix's February 2026 article on scaling LLM post-training, including peer company comparisons (Spotify, LinkedIn, Airbnb, Pinterest, Uber) and a build-vs-buy decision framework.
**Who should read it**: ML platform engineers, Data Architects, and engineering leaders evaluating whether to build or buy LLM fine-tuning infrastructure.
**Key takeaway**: Netflix's four-pillar framework and their SFT-to-RL pivot reflect mature infrastructure thinking. The engineering insights (4.7x throughput from async packing, HuggingFace as ecosystem anchor, Verl integration for RL) are applicable well beyond Netflix's scale.
**The uncomfortable truth**: The industry has robust frameworks for training LLMs and for guardrailing chatbots. It has almost nothing for guardrailing recommendation LLMs, and the offline-to-online eval gap remains the weakest link in every post-training pipeline.

Netflix published a detailed article on scaling LLM post-training. Here is what they built, what the engineering decisions reveal, what five peer companies are doing differently, and five open questions I would love their team to answer next.

AI Governance Is Not AI Ethics: A Practical Framework for Enterprise AI Oversight

Wed, 18 Feb 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: A practical framework for enterprise AI Governance that maps NIST AI RMF, EU AI Act risk tiers, and banking regulation (SR 11-7) into a unified operating model, not another ethics manifesto.
**Who should read it**: Data leaders, AI/ML platform owners, risk managers, and compliance officers responsible for standing up or maturing AI oversight programs.
**Key takeaway**: AI ethics tells you what you *believe*. AI Governance tells you what you *do*, and whether you can prove it to a regulator. Organizations that treat these as the same thing are building on sand.
**Bottom line**: PwC's 2024 US Responsible AI Survey found that only 11% of executives have fully implemented essential responsible AI capabilities. The gap between stated principles and operational controls is where regulatory, financial, and reputational risk lives.

A practical framework for enterprise AI Governance that maps NIST AI RMF, EU AI Act risk tiers, and SR 11-7 into a unified operating model with clear decision rights, risk classification, validation processes, and monitoring capabilities.

Breaking Down Uber's Data Quality Platform: What Works, What Doesn't, and What It Means for the Rest of Us

Sun, 15 Feb 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: A critical analysis of Uber's multi-layered Data Quality platform (DQM, D3, UDQ, Databook, and DataCentral), what each component does, and where the engineering meets (or exceeds) the hype.
**Who should read it**: Data engineers, platform architects, and Data Quality leads evaluating whether Big Tech patterns apply to their organizations.
**Key takeaway**: Uber's Data Quality platform is genuinely innovative in its statistical approach to anomaly detection and its "data as code" cultural framework. But 80% of what makes it work is organizational commitment, not technology, and that is the part nobody can copy-paste.
**The honest assessment**: Most organizations will benefit more from Uber's *principles* than from replicating their *architecture*.

A critical analysis of Uber's five-system Data Quality ecosystem -- DQM, D3, UDQ, Databook, and DataCentral -- examining what is genuinely innovative, what is standard practice in disguise, and which patterns are worth adopting at any scale.

The Data Governance Maturity Model Most Organizations Get Wrong (And a Practical Alternative)

Thu, 12 Feb 2026 00:00:00 GMT

Executive Briefing:

**What this covers**: Why the dominant Data Governance maturity models (CMMI DMM, DCAM, Stanford, Gartner) measure documentation completeness rather than governance effectiveness, and what to do instead.
**Who should read it**: Data Governance leads, CDOs, data architects, and anyone tasked with standing up or rescuing a governance program.
**Key takeaway**: Maturity models that focus on process formalization create compliance theater. An outcome-driven governance framework built around decision rights, Data Literacy adoption, and measurable business impact delivers actual results.
**The uncomfortable truth**: Gartner predicts 80% of D&A governance initiatives will fail by 2027. The maturity model you are using might be the reason.

Why dominant Data Governance maturity models like CMMI DMM, DCAM, and Gartner measure documentation completeness instead of governance effectiveness, and a practical outcome-driven framework built around decision rights, Data Literacy, and measurable business impact.