The LLM Is Not the Product: Why Harness Design Defines Enterprise AI Success

Every executive has seen the demo. A single prompt produces a complete strategy document, a functional application, or a deep data analysis. The capability is undeniably impressive, but as organizations attempt to scale these tools, they encounter a frustrating reality: the pilot stalls. According to recent estimates by Gartner, over 40% of agentic AI projects will be canceled by 2027 due to escalating costs, unclear business value, or inadequate risk controls [1].

The problem is rarely the model itself. Instead, the failure stems from a fundamental misunderstanding of what an AI product actually is. Treating a Large Language Model (LLM) as a finished product is like mistaking a car engine for the entire vehicle. To convert raw intelligence into reliable business value, organizations need an orchestration layer, a system around the model known as a harness. This post explores why harness design is the true differentiator for enterprise AI success.

So What Exactly Is a Harness?

Think of it this way. An LLM is like a brilliant new hire who has read every textbook ever written but has never worked a single day at your company. This person can reason, write, and analyze, but they do not know your processes, approval chains, compliance rules, or where the important files are stored. Left alone, they will produce impressive-sounding work that may or may not be usable.

A harness is everything you wrap around that brilliant new hire to make them productive and safe. It is the onboarding, project management, quality reviews, access controls, and audit trail, all built into a system that runs automatically.

Diagram illustrating the relationship between 'Model', 'Agent', and 'Harness'. 'Model' represents intelligence, while 'Harness' includes components like Tools, Memory, Context Engineering, Sandbox, Orchestration, and Serving Layer.
Agent = Model + Harness. The harness is everything that isn’t the model.

In practical terms, a harness comprises several key components that work together around the model. The diagram below shows how they fit together, with the LLM at the center and the harness surrounding it.

Diagram illustrating the relationship between agent, model, and harness in a system, detailing processes like context assembly, tool access, memory, skills, output validation, action routing, feedback loop, and observability.
The model reasons. The harness orchestrates, governs, and scales everything else.

Let us walk through each component:

Illustration titled 'Context Assembly' with text 'Curate what the model sees.' accompanied by icons representing a document, a database, and a clipboard.

This is the gatekeeper of information. It curates what the model actually sees: the relevant data, queries, and events for each specific task. Think of it as a chief of staff who prepares a briefing folder before a meeting. Without context assembly, the model would either see too much (causing confusion and higher costs) or too little (causing poor decisions). A well-designed context assembly layer ensures the model always works with the right information at the right time.

Illustration of a hand holding a conductor's baton with the text 'Orchestrator: Plans, delegates, coordinates.'

This is the project manager of the system. It plans the work, delegates tasks to the right components, and coordinates the overall flow. When a business request arrives, the orchestrator decides what needs to happen first, what can run in parallel, and what depends on something else finishing. It keeps the entire process moving in the right direction without human micromanagement.

Illustration of tool access, featuring a key, a lock, and an API symbol, with the text 'Managed credentials + connections.'

An AI model on its own cannot connect to your CRM, your database, or your internal APIs. The tool access layer manages those connections with proper credentials and security controls. It is like giving a new employee a company laptop with pre-configured access. They can access the systems they need, but only those they are authorized to use. This prevents the model from accessing sensitive systems it should not touch.

Illustration depicting the concept of memory, highlighting short-term and long-term recall, featuring a brain and a database symbol.

Models are stateless by default. They forget everything after each conversation ends. The memory component gives the system both short-term recall (what happened earlier in this task) and long-term recall (what happened in previous tasks). This is what allows an AI worker to pick up where it left off after an interruption, remember decisions made last week, and avoid repeating the same mistakes.

Diagram illustrating the relationship between 'Skills' and 'Sub-agents.' 'Skills' are defined as reusable capabilities such as search, code, and analyze, while 'Sub-agents' are described as specialized workers for complex tasks.

These are reusable capabilities and specialized workers. Skills are predefined abilities the model can call on, such as searching, coding, analyzing data, and generating reports. Sub-agents are specialized workers who handle complex subtasks. Together, they allow the system to break large problems into smaller pieces and assign each piece to the component best suited to handle it.

A sign displaying 'Output Validation' with the text 'Guardrails before action.' next to a shield icon with a check mark.

Before any action is taken, this layer checks the results against guardrails. Is the output safe? Does it comply with company policies? Is it factually consistent? Output validation acts as a quality inspector on a production line. Nothing leaves the factory floor without passing inspection. This is especially critical in regulated industries where a wrong output could trigger compliance violations.

Diagram illustrating 'Action Routing' with red text and arrows pointing to 'Execute', 'Review', and 'Escalate', focusing on confidence, rules, and escalation.

Not every output should be executed automatically. The action routing layer decides what happens next based on confidence levels, business rules, and risk thresholds. Low-risk, high-confidence results get executed immediately. Medium-risk outputs go to a human for review. High-risk or uncertain outputs get escalated to senior decision-makers. This is how the harness balances speed with safety.

Illustration of a feedback loop with text 'Feedback Loop' and 'Learn from outcomes.'

This is how the system learns. Every time an output is accepted, rejected, or corrected, that outcome flows back into the system. Within a single run, the feedback loop allows the evaluator to send defects back to the generator for another iteration, sometimes 5 to 15 rounds, until quality passes. Across runs, accumulated feedback helps teams tune prompts, adjust criteria, and improve performance over time. Without a feedback loop, every mistake is a surprise. With one, mistakes become data that prevents the same failure from recurring.

Black and white illustration with the word 'Observability' and the phrase 'Every stage inspectable and auditable,' accompanied by an eye graphic and a magnifying glass icon.

This is the foundation that makes everything auditable. Every stage of the process (every decision, every tool call, every approval, every output) is logged and inspectable. When something goes wrong, observability allows teams to trace exactly what happened and why. For compliance and governance, it provides the evidence trail that regulators and auditors require. Without observability, AI-driven work is an opaque black box that no organization can responsibly trust at scale.

The key insight is simple: Agent = Model + Harness. Without the harness, you have raw intelligence with no delivery system. With it, you have a governed digital worker that can operate reliably inside your business.

Example: Using an LLM vs. Having a Harness

Using an LLM directly is like asking a very smart person a question without giving them a process to follow. For example, you could send a full patent document to the model and ask: “Translate this document into Portuguese.” The model may produce a good answer, but the process is fragile. There is no guarantee that every section was processed, that the terminology followed your internal glossary, that long documents were handled correctly, or that the final output was validated.

Having a harness changes this completely. Instead of simply sending one prompt to the model, the harness controls the entire workflow around it. It loads the document, extracts the text, splits the content safely when needed, applies a technical glossary, sends each section to the LLM with specific instructions, reviews the output, validates whether all sections were processed, logs each step, and finally exports the result in the required format.

A simple LLM-based approach looks like this:

Document + Prompt → LLM → Answer

A harness-based approach looks like this:

Document Upload
Text Extraction
Safe Splitting
Glossary Injection
LLM Processing
Review and Validation
Traceability and Logs
Final Export

The key difference is that the LLM generates the content, but the harness manages the process. In business-critical scenarios, such as patent translation, legal document analysis, compliance review, or technical due diligence, this distinction is essential. The real product is not just the model response. The real product is the controlled system that makes the model useful, reliable, auditable, and repeatable.

The Model Is the Engine, the Harness Is the Delivery System

An LLM is a reasoning engine. It can generate content, analyze inputs, and follow complex instructions. However, it does not inherently understand your company’s risk framework, operational key performance indicators (KPIs), or compliance obligations. It cannot preserve its state across interruptions or provide the audit trails required by regulators.

A harness acts as the operating model and control plane for this AI worker. It turns a capable but isolated model into a governed digital team member. The harness manages the prompts, orchestrates logic, handles memory and context, evaluates outputs, and enforces runtime controls. As Anthropic recently demonstrated in their engineering research, changing the harness materially altered what their Claude model could deliver over multi-hour software-building sessions [2].

Without a harness, an AI agent produces outputs that someone else must manually verify and integrate. With a harness, the agent participates in the workflow as a coherent, accountable participant. This distinction is critical because corporate outcomes depend less on one-shot intelligence and more on continuity, exception handling, evidence, and repeatability.

Breaking Down the Enterprise Harness

To understand how a harness functions, we can look at the architecture required for long-running autonomous work. A robust production operating model typically separates the AI into distinct roles, preventing the model from grading its own homework, a common failure mode where models exhibit self-evaluation bias.

Anthropic’s recent architecture for application development uses a three-agent pattern: a planner, a generator, and an evaluator [2]. The planner expands a brief request into a structured specification. The generator executes the work against that spec. Finally, the evaluator acts as an independent quality gate, testing the result against explicit criteria and feeding concrete defects back to the generator for another round of iteration.

This separation of concerns is vital. In subjective tasks, an evaluator tuned to be skeptical is far more effective than a generator trying to be critical of its own work. The harness also manages operational continuity. By using progress files, structured handoffs, and version control logs, the system ensures that if a process is interrupted, the next session starts with a clean slate but full context.

Harness ElementCorporate TranslationPrimary Business Benefit
PlannerAutomated scoping and requirements expansionReduces under-scoping and ensures work starts from a proper specification rather than a vague prompt.
GeneratorAutonomous execution engineConverts approved work packages into code, artifacts, or actions at scale.
EvaluatorIndependent quality gateCatches defects before release, reducing the risk of self-approval bias.
Progress Files & LogsOperational continuityPreserves organizational memory across sessions, failures, and personnel changes.
Human ApprovalsRisk-tier governance gateKeeps irreversible or sensitive actions under explicit control.
Flowchart illustrating a five-step process for handling a business request, including phases for planning, generating, evaluating, and final acceptance, with a harness layer for state management, audit logs, human approvals, cost controls, and observability.
A simplified workflow showing how the harness layer orchestrates the process from business request to accepted deliverable.

The ROI of Orchestration

The business value of a well-designed harness is not merely “better prompts.” It is fundamentally about better delivery economics. A strong harness reduces abandoned runs, minimizes rework, strengthens reliability, and creates clear audit trails.

Consider the cost dynamics. Anthropic noted that a sophisticated browser-based application built using their updated harness took nearly four hours and cost roughly $124.70 in token usage [2]. While this single-run cost might seem high compared to a standard chat query, the QA loop caught meaningful feature gaps that the builder missed. The harness converts cheap-looking, flawed AI output into deliverable, production-ready work, significantly reducing the expensive downstream costs of failure.

Real-world results validate this approach. Palo Alto Networks reported that junior developers completed integration tasks 70% faster with Claude assistance [3]. Headstart saw software development accelerate by 10 to 100 times, with project timelines reduced from months to weeks [4]. In adjacent durable workflow systems, OneMain Financial achieved a 97.5% reduction in investigation time for security operations using AWS Step Functions [5]. These outcomes are achieved because the AI work is placed inside a production delivery system that handles the heavy lifting of coordination and verification.

How the Market Leaders Compare

To make this concrete, let us look at how the leading AI coding tools on the market handle harness design today. The core intelligence (the model) is increasingly similar across vendors, but the harness is what determines how these tools fit into your business.

Based on current architectures, the top five players fall onto a spectrum of how they balance local developer control versus centralized enterprise governance:

Claude Code is the most developer-operated and local-first harness. It gives engineers deep control over context, sub-agents, and planning. It is explicit about how it manages memory and treats the process as something to be engineered. This makes it incredibly powerful for fast-moving developers who want to keep execution local, but its enterprise observability features are less prominent than its developer tools.

A diagram illustrating the 'Claude Code Architecture' with labeled layers including Input, Knowledge, Integration, Execution, Output, Observability, and Multi-Agent layers, showing components like User Interface, Session Manager, Master Agent Loop, Tool Dispatch, and more.

GitHub Copilot is the most platform-native harness. It embeds the agent directly into the repository, pull requests, and the enterprise audit surface that software organizations already use. Because it ties directly into GitHub’s existing governance, it offers the strongest executive reporting, audit logs, and policy controls. If your goal is standardizing AI work inside an existing software-delivery system, Copilot is the benchmark.

Diagram illustrating the interaction between user, machine, copilot, workspace, and tools in a loop system.

Cursor is the most polished example of a hybrid approach. It started as a local editor and has built a seamless handoff between local coding and cloud-based background agents. It is optimized for user experience and speed, making it highly popular with startups and product teams. It also offers a strong privacy guarantee (no training on your code), though its most ambitious long-running autonomous features are still evolving.

Flowchart illustrating the system architecture and efficiency techniques of the CURSOR agent, detailing user requests, routing, tools, code retrieval, and model execution.

OpenAI Codex is the cleanest example of a deliberately split architecture. It offers a local tool for interactive work and a separate cloud environment for long-running, parallel tasks. It is unusually transparent about how it handles state, sandbox security, and governance. Like GitHub, it provides a strong analytics dashboard and compliance API, making it a safe choice for organizations that want strict separation between local and cloud execution.

Flowchart depicting the OpenAI Codex architecture for software development, outlining layers involved in task processing, planning, editing, execution, analysis, summarization, and review, along with inputs, outputs, and architectural principles.

Google Antigravity is the most conceptually ambitious, designed from the start as an “agentic development platform” rather than an assistant. It introduces new concepts like Artifacts for human review and an Agent Manager to coordinate multiple bots. However, it is the least mature of the group, with many features still in preview. It points to where the market is going, but requires more caution for immediate enterprise standardization.

Diagram illustrating the Google Antigravity Architecture, an agent-first development platform. It outlines various components including Developer inputs, Antigravity Control Surface, Agent Orchestration layers, Workspace Context layers, Execution layers, Verification & Feedback processes, and Output layers. Highlights the integration capabilities and safety/governance measures involved in the platform, emphasizing collaboration, task management, and quality assurance.

Conclusion

If long-running AI initiatives are funded only as model licenses or “copilot seats,” outcomes will usually disappoint. The harness needs its own budget line, dedicated engineering, and continuous governance. As models improve, the assumptions built into the harness will go stale, meaning the orchestration layer must be treated as a living product capability, not fixed infrastructure.

The defining trade-off in harness design is that more control usually means more latency and higher token spend. However, as the evidence shows, this investment prevents the shipment of incomplete or merely impressive-looking work. The LLM is the engine, but the harness is what actually drives the business forward.

That’s it for today!

Should you have any questions or need assistance, please don’t hesitate to contact me using the provided link: https://lawrence.eti.br/contact/

Sources

  1. Gartner: Over 40% of Agentic AI Projects Will Be Canceled by End of 2027 – Gartner Newsroom
  2. Anthropic: Harness design for long-running application development – Anthropic Engineering
  3. AWS: Palo Alto Networks & Anthropic & Sourcegraph Case Study – AWS Partners
  4. LinkedIn: Headstart Cuts Software Development Time by 100x with Claude AI – LinkedIn
  5. Built In: Stop Confusing the LLM for the Product Itself – Built In
  6. McKinsey: The State of AI: Global Survey 2025 – McKinsey
  7. Agentic Harness Engineering: LLMs as the New OS
  8. Building Claude Code with Harness Engineering | by Fareed Khan | Apr, 2026 | Level Up Coding
  9. https://www.poniaktimes.com/openai-codex-vs-google-antigravity-ai-coding/

The Claude Code Revolution: Why Traditional Software Development Will Never Be the Same

What is happening in software development right now feels larger than the launch of a single tool. It feels like a rewiring of the discipline itself. The rapid rise of AI-assisted and agentic development has created a strange mix of enthusiasm, anxiety, confusion, and defensiveness across the market, because many of the assumptions that shaped software teams for decades are now being tested in public.

The trigger for this conversation is often Claude Code, because it made the new model visible: instead of asking an assistant for snippets, developers can describe an objective, let the system explore a codebase, formulate a plan, write code, run commands, and iterate with partial autonomy. But the bigger story is not Claude Code itself. The bigger story is that software development is moving from a craft centered on manual implementation toward an operating model centered on intent, orchestration, architecture, and verification.

That does not mean coding knowledge is irrelevant, nor does it mean the hype is entirely justified. It means the value chain is shifting. In this new environment, the highest leverage does not necessarily belong to whoever can type the most code. It increasingly belongs to whoever can define the right problem, structure the system correctly, constrain the machine effectively, and judge whether the output should ever reach production.

A business professional stands in an office, interacting with a digital interface displaying icons related to technology, growth, and communication.

Claude Code is not the story. The new software operating model is.

Anthropic’s own framing is revealing. In its 2026 Agentic Coding Trends Report, the company argues that software development is shifting from “writing code” to “orchestrating agents that write code.” In the official best-practices documentation for Claude Code, Anthropic describes a workflow in which the human defines what should be built and the agent handles exploration, planning, and implementation under supervision.

Flowchart comparing the traditional Software Development Life Cycle (SDLC) with the Agentic SDLC, highlighting key steps, timeframes, and differences in the processes.

That is why this moment matters. For years, AI coding tools were mostly understood as autocomplete on steroids. They made developers faster, but they did not fundamentally change the shape of the work. Agentic tools change the shape of the work because they introduce autonomy into the loop. The developer is no longer only producing code directly; the developer is also managing context, setting constraints, reviewing outputs, correcting direction, and deciding how much autonomy is acceptable for each task.

This distinction matters because it separates the current shift from earlier productivity improvements. Better IDEs, better frameworks, better cloud platforms, and better CI/CD pipelines all made software teams faster. But they still preserved the same basic image of the developer as the primary line-by-line producer of the artifact. Agentic development challenges that image.

DimensionTraditional developmentAI-native development
Primary activityWriting and editing code directlyDefining intent, supervising generation, and validating outcomes
BottleneckImplementation capacityJudgment, context quality, and review discipline
Core unit of leverageDeveloper hoursSpecification quality and orchestration quality
Main riskSlow deliveryFast delivery of the wrong, insecure, or low-quality thing
Winning capabilityCoding fluencySystems thinking plus coding fluency

This is why the article should not be read as a post about Anthropic. Claude Code is simply one of the clearest symbols of a broader transition now unfolding across the entire software industry.

What is driving this transformation now

Three forces are converging at the same time. The first is better model capability. The second is the rise of agentic harnesses that can interact with files, terminals, browsers, and development workflows. The third is economic pressure from companies that want more throughput without proportional headcount growth. On their own, none of these forces would be enough. Together, they create a genuine discontinuity.

Andrej Karpathy’s “Software 3.0” framing helps explain why this feels so different. In his 2025 keynote at Y Combinator’s AI Startup School, he argued that software has evolved from explicit code to trainable model weights to a new layer in which natural language becomes a programmable interface. In that framing, prompts are not merely requests to a chatbot; they are a new form of instruction for a new kind of computer.

“We’ve entered the era of ‘Software 3.0,’ where natural language becomes the new programming interface and models do the rest.” — Y Combinator summary of Andrej Karpathy’s keynote.

This does not mean software engineering disappears. It means software engineering moves up the abstraction ladder again. Earlier generations had to think about binary, hexadecimal, memory layout, and assembly-level optimization because the constraints of their time demanded it. Later generations gained leverage through higher-level languages, frameworks, managed infrastructure, and cloud abstractions. The current generation is gaining leverage through natural-language instruction, workflow orchestration, and model supervision.

The important point is that every abstraction shift changes which knowledge is scarce. When assembly gave way to higher-level languages, the profession did not disappear; it reorganized. When cloud platforms reduced infrastructure burden, operations did not disappear; they reoriented toward automation, architecture, governance, and reliability. AI is pushing software development through the same kind of reorganization.

Hype versus reality: what the market is actually saying

Bar graph showing the usage of AI tools in the development process based on a survey. Categories include: 'Yes, I use AI tools daily' (47.1%), 'Yes, I use AI tools weekly' (17.7%), 'Yes, I use AI tools monthly or infrequently' (13.7%), 'No, but I plan to soon' (5.3%), and 'No, and I don't plan to' (16.2%).

The most useful way to look at this moment is with both optimism and skepticism at the same time. On the one hand, adoption is no longer a niche phenomenon. Stack Overflow’s 2025 Developer Survey found that 84% of respondents were already using or planning to use AI tools in development, and 51% of professional developers reported daily AI use. That is not experimentation at the edge of the market; that is broad normalization.

Bar chart showing the accuracy of AI tools based on developer respondents: Highly trust 3.1%, Somewhat trust 29.6%, Somewhat distrust 26.1%, Highly distrust 19.6%. Source: 2025 Developer Survey.

On the other hand, trust is lagging behind adoption. The same survey found that 46% of developers actively distrust AI output, while only 33% trust it. It also found that 72% say “vibe coding” is not part of their professional workflow. In other words, the market is not saying, “AI is replacing engineering.” It is saying, “AI is entering engineering, but humans still do not trust it enough to surrender accountability.”

That gap between use and trust is probably the most honest picture of the current market. Teams are using AI because the productivity upside is too large to ignore. But they are hedging because the error profile of these systems is still dangerous in complex, high-responsibility environments. That explains why developers remain relatively resistant to using AI in deployment, monitoring, and project planning, even as they embrace it for drafting, research, testing, and implementation support.

This is also where the hype around vibe coding needs to be handled carefully. Yes, a new class of builders can now create working software with dramatically less traditional training. Yes, this lowers the barrier to entry. Yes, some non-traditional builders will outperform conventional developers in certain domains because they combine strong domain intuition with powerful AI tooling. But that is not the same as proving that deep engineering skill no longer matters.

The real lesson is more subtle: software is becoming more accessible, while production-grade software remains unforgiving. As the barrier to creation falls, the importance of architecture, governance, security, resilience, and product judgment rises.

A split image depicting two contrasting scenarios in cybersecurity: on the left, a group of professionals in dark silhouettes facing multiple screens with warning signs and chaotic data streams; on the right, a professional pointing at a digital flowchart on a bright screen, illustrating organization and analysis.

The impact on software companies is strategic, not cosmetic

This transformation is already visible in how companies think about organization design. McKinsey argues that the companies seeing the strongest returns are not just adopting tools; they are redesigning roles, workflows, and performance systems around AI. In its 2025 research, top-performing organizations reported improvements of 16% to 30% in team productivity, customer experience, and time to market, as well as 31% to 45% in software quality.

That matters because it shifts the conversation from individual productivity to operating model advantage. If one developer becomes 20% faster, that is useful. If an organization redesigns how ideas move from specification to release, that is strategic. McKinsey’s core point is that AI does not produce its biggest gains when it is bolted onto the old process. It produces its biggest gains when the process itself is rebuilt.

The market is also sending a strong signal through management behavior. TechCrunch reported in April 2025 that Shopify CEO Tobi Lütke told teams they must demonstrate why AI cannot do the work before asking for more headcount and resources. Whether one agrees with that posture or not, the significance is obvious: management assumptions are changing. Hiring is no longer evaluated only against budget and roadmap pressure. It is increasingly evaluated against the question of whether AI can absorb part of the workload first.

Signal from the marketWhat it suggests
Widespread daily AI usage by developersAI assistance is becoming a baseline capability
McKinsey’s role and process redesign findingsCompetitive advantage comes from rethinking the entire delivery model
Shopify’s headcount gatekeeping through AIManagement now treats AI as part of workforce planning, not just tooling
Anthropic’s agentic framingThe work is shifting from implementation to orchestration

This has major implications for software companies. Smaller teams can plausibly ship more. Product cycles can compress. Prototype-to-production paths can accelerate. Internal tooling can spread beyond engineering. But those gains come with new obligations: stronger review systems, clearer architectural guardrails, better internal documentation, more explicit security expectations, and a much higher premium on clarity of intent.

In other words, companies are not simply buying speed. They are buying speed plus governance debt, unless they redesign the system around the new reality.

The developer profile is changing, not disappearing

This is where many of the loudest debates miss the point. The traditional developer profile is not being erased overnight, but it is becoming incomplete.

For decades, technical prestige was closely tied to how much complexity a person could directly manipulate. In earlier eras, that meant understanding low-level hardware constraints. Later, it meant writing and maintaining large systems in increasingly sophisticated languages and frameworks. In the cloud era, it meant mastering distributed systems, APIs, infrastructure automation, and platform architecture. In the AI era, some of the old signals are weakening. Syntax recall, boilerplate generation, and routine implementation are becoming less scarce.

That does not reduce the need for strong engineers. It changes what strong engineers are strongest at. The differentiator is moving away from “How much code can you produce unaided?” toward questions such as: Can you decompose a problem? Can you frame a reliable specification? Can you detect architectural fragility? Can you spot security problems in generated code? Can you tell when the system is confidently wrong? Can you preserve coherence across many AI-assisted changes?

Skills losing relative scarcitySkills gaining relative scarcity
Boilerplate codingSystem design
Memorizing syntaxProduct framing
Routine CRUD implementationSecurity review and threat modeling
Repetitive refactoring by handAgent orchestration and workflow design
Individual output volumeCross-functional judgment

This is why the question “Do developers still need to know as much code as before?” is both fair and incomplete. They may not need to manually produce the same volume of code as before. But they may need to understand software systems more deeply than ever, because the pace of generation is increasing faster than the pace of trust.

In practical terms, the code base is no longer the only artifact that matters. The prompt, the context package, the review process, the architectural constraint, the testing strategy, the policy boundary, and the acceptance criteria all become first-class engineering assets.

Who gains more leverage in the AI era?

One of the most important consequences of AI-accelerated development is that leverage moves closer to those who define what should be built. As implementation becomes faster and more accessible, the scarcity shifts toward problem selection, system structure, prioritization, and quality control.

This gives more strategic power to software architects, staff-plus engineers, product managers, and technical product owners who can translate business goals into precise, constraint-aware execution. These roles are increasingly responsible for turning ambiguity into machine-actionable direction. They do not replace builders; they amplify or misdirect them.

The old hierarchy often rewarded the person who could personally carry the hardest implementation load. The new hierarchy increasingly rewards the person who can align many parallel streams of machine-generated work without losing coherence. That includes defining boundaries, clarifying trade-offs, sequencing work, preserving product intent, and ensuring the team does not optimize for local speed at the expense of system integrity.

This does not make coding irrelevant, and it does not mean product roles automatically win. Poor specification still produces poor software. Weak architecture still collapses under scale. Superficial product thinking still leads to expensive noise. But the center of gravity is moving. The people with the most leverage will be the ones who can connect business intent, technical structure, and AI execution in a disciplined way.

How long will adaptation take?

The answer depends on the layer of the market being discussed. Startups and small product teams can adapt quickly because they have fewer legacy systems, fewer governance constraints, and less organizational inertia. Many of them are already treating AI as part of the default workflow.

Large enterprises will move more slowly. They must deal with regulation, security, compliance, legacy platforms, auditability, data boundaries, and organizational silos. Their challenge is not deciding whether AI can write code. Their challenge is deciding how much autonomy is acceptable, in which environments, under which controls, with which accountability model.

Educational systems and labor markets will likely move more slowly still. That is where the disruption may feel harshest. Stanford’s Digital Economy Lab found that workers aged 22 to 25 in the most AI-exposed occupations experienced a 16% relative decline in employment after the widespread adoption of generative AI, even after controlling for firm-level shocks. That does not prove a permanent collapse of junior careers, but it does suggest that entry-level pathways are already under pressure.

The adjustment, then, is unlikely to be a single industry-wide switch. It will be uneven. A reasonable planning assumption is that AI-native startups and small digital teams may adapt in 12 to 24 months, large enterprises may need three to seven years to redesign processes, governance, and talent models, and educational systems or national labor institutions may take even longer to catch up. Some organizations will recognize the scale of the shift early. Others will respond only after the labor market has already changed.

The safest prediction is not that all developers will disappear. It is that software development as a profession is being re-tiered. The bottom layer becomes more accessible. The middle layer becomes more automated. The top layer becomes more strategic.

An infographic depicting a layered approach to artificial intelligence and data processing, featuring icons like a magnifying glass, cloud computing, databases, coding, and algorithm visuals.

Conclusion

Claude Code may be the headline, but software development is the real story. What we are seeing is not just a better code assistant. We are seeing a new development paradigm in which implementation becomes cheaper, iteration becomes faster, and the limiting factor shifts toward judgment.

That is why traditional software development will never be the same. Not because code suddenly stopped mattering, but because manual code production is no longer the sole center of value. The center is moving toward architecture, specification, validation, governance, and the ability to direct intelligent systems without being misled by them.

The winners in this next phase will not be the people who deny the change, nor the people who surrender uncritically to hype. They will be the ones who understand that AI changes the economics of building software, while human beings remain responsible for meaning, trade-offs, trust, and consequences.

That’s it for today!

Should you have any questions or need assistance, please don’t hesitate to contact me using the provided link: https://lawrence.eti.br/contact/

References

From Copilots to Crews: How AI Agent Skills Are Rewriting the Corporate Playbook

The enterprise technology landscape is undergoing a fundamental shift. We are moving rapidly from an era where AI merely suggested actions (the copilot era) to one where autonomous systems execute multi-step workflows, make decisions, and collaborate with each other. The defining enterprise technology of 2026 is no longer the large language model itself, but AI agent skills: modular, reusable capabilities that bridge the gap between general intelligence and organizational knowledge.

Gartner projects that 40% of enterprise applications will integrate task-specific AI agents by end of 2026, up from less than 5% in 2025. Early adopters are already reporting 15.8% revenue increases and 15.2% cost savings on average. The companies that build rich agent skill ecosystems in the next 12 to 24 months will define the competitive landscape for the next decade.

What Are “Agent Skills” and Why Should You Care?

Think of a skill as an onboarding guide for a digital employee. A large language model might understand what a purchase order is, but it doesn’t know your company’s specific procurement workflow, approval chains, or vendor negotiation rules. Skills bridge the gap between general intelligence and organizational knowledge.

A diagram illustrating the configuration of an agent that integrates various skills and virtual machines, detailing the core system prompt, equipped skills, MCP servers, and the agent's file system, showing organization of skill directories and file types.

In practical terms, a skill is a folder containing a SKILL.md file with structured instructions, plus optional scripts and reference materials. Anthropic formalized this concept as an open standard in late 2025. Within months, OpenAI, Microsoft, GitHub, Cursor, and dozens of other platforms adopted it, creating a portable ecosystem where a skill built once works across multiple AI platforms. The analogy gaining traction among analysts: if AI models are processors and the Model Context Protocol (MCP) provides the ports, then skills are the applications.

Diagram explaining the Model Context Protocol (MCP) with components including MCP Host, Clients, Servers, and various data connections such as local filesystem, database, web APIs, and transport layers.

Enterprise vendors use different names but converge on the same concept. Salesforce packages skills as “topics” and “actions” within Agentforce. Microsoft calls them “skillsets” in Copilot Studio. SAP distinguishes between “Joule Skills” (simpler tasks, over 2,400 available) and “Joule Agents” (complex goal-oriented scenarios). Despite the naming differences, all share a common architecture: modular, composable capabilities that transform general-purpose AI into specialized enterprise performers.

How to Use Agent Skills in Practice

If the concept of agent skills sounds abstract, the good news is that several platforms already let you use them today. Here is how skills work across the major players.

Claude (Anthropic)

Claude is where the skills standard was born, and it offers the deepest integration. In Claude.ai, skills are already active behind the scenes: when you ask Claude to create a PowerPoint, a Word document, or an Excel file, it loads pre-built skills (folders with SKILL.md instructions and scripts) to deliver professional-quality output. You can also upload your own custom skills to extend Claude’s capabilities for your specific workflows.

In Claude Code (the command-line tool for developers), skills become even more powerful. You can place skill folders in your project directory, and Claude Code will discover them automatically. This enables you to create coding standards, review checklists, testing procedures, or any domain-specific workflow as a reusable skill. Claude Code also supports sub-agents equipped with individual skills for specialized tasks.

Through the Claude API and the Claude Agent SDK, developers can integrate skills programmatically, combining them with MCP servers and the code execution tool to build sophisticated agentic applications.

These are the Antropics’ official documentation and repository: https://agentskills.io/home and https://github.com/anthropics/skills

Manus AI (now part of Meta)

Manus AI announced full integration of the Agent Skills open standard in January 2026. The platform runs in isolated sandbox environments with full Ubuntu file system access, which is exactly what Agent Skills requires. Manus can parse SKILL.md files and execute Python or Bash scripts contained within skills.

In practice, Manus offers some unique features: slash commands let users trigger specific skills by typing /SKILL_NAME in chat, and a “Build a Skill with Manus” feature automatically packages successful interaction flows into reusable modules. The platform is also exposing previously internal data sources (SimilarWeb, Yahoo Finance, LinkedIn Search) as discoverable skills. For team plan subscribers, a Team Skill Library allows members to publish battle-tested skills to a shared repository.

OpenAI Codex and ChatGPT

OpenAI’s Codex app (launched February 2026) includes built-in support for Skills and Automations. The desktop app lets you run multiple agents in parallel across projects with skills, Git worktrees, and a review queue for human-in-the-loop control. Since OpenAI fully adopted MCP across its products in 2025, the Codex CLI and the Agents SDK work seamlessly with the open standard skill format.

Microsoft Foundry (Azure AI Foundry)

Diagram illustrating the workflow of Context7 with components including Foundry Docs, GitHub, Microsoft Learn MCP Server, domain-specific skills, and GitHub Copilot CLI, showing the process of selective context injection and its outputs.
Context-Driven Development: Agent Skills for Microsoft Foundry and Azure | All things Azure

Microsoft has gone all-in on the skills standard for its Azure development ecosystem. The official github.com/microsoft/skills repository ships over 130 modular skills covering Azure AI services, Cosmos DB, Azure AI Search, Voice Live, deployment workflows, and more. These skills are designed to specialize coding agents (Claude Code, GitHub Copilot, Codex, Cursor) for Microsoft Foundry and Azure SDK development. Microsoft also released an Agent Skills SDK in March 2026, providing filesystem and HTTP providers so teams can serve skills from local directories, Azure Blob Storage, S3, or any CDN, plus integrations for LangChain, the Microsoft Agent Framework, and an MCP server that exposes skills as tools to any MCP-compatible client. Microsoft Foundry now offers both Anthropic’s Claude and OpenAI’s GPT models in one platform, with “Reusable Skills” listed as a core capability for standardizing and scaling agentic workflows across projects.

Other Platforms

The ecosystem is growing fast. Cursor, Windsurf, and Roo Code all support skills in their agentic coding workflows. Goose (Block’s open-source agent framework, now part of the Agentic AI Foundation) supports extensions that follow a similar pattern. Community platforms like SkillHub (7,000+ AI-evaluated skills) and SkillsMP serve as marketplaces where you can discover, evaluate, and install skills with a single command. Even Hugging Face hosts a community skills catalog with broad compatibility.

The key takeaway: because skills follow an open standard format, you can build a skill once and use it across any compatible platform. This portability is what makes skills fundamentally different from proprietary plugin systems.

Why This Is Not Another RPA Cycle

Executives who lived through the RPA hype cycle may be skeptical. The distinction is fundamental, not incremental. RPA bots follow predefined scripts and break when interfaces change. AI agents equipped with skills reason about goals, adapt to changing conditions, process unstructured data, and learn from feedback.

The consensus across industry analysts is convergence rather than replacement. RPA handles routine execution while AI agents manage complexity and exceptions. Organizations should preserve their RPA investments while layering agent skills on top, not rip-and-replace. Agentic Process Automation (APA), where AI agents construct and execute workflows autonomously, represents the next evolutionary stage.

Learn Agent Skills: Free Course from DeepLearning.AI and Anthropic

If you want to go from understanding the concept to building your own skills, I highly recommend the free course Agent Skills with Anthropic from DeepLearning.AI, created in partnership with Anthropic and taught by Elie Schoppik.

The course is beginner-friendly (about 2.5 hours, 10 video lessons) and covers everything you need to get started: the structure of a skill folder and the SKILL.md format, how skills use progressive disclosure to manage context efficiently, and the difference between skills, tools, MCP, and sub-agents. You will also explore Anthropic’s pre-built skills for Excel, PowerPoint, and skill creation, then use them in Claude.ai to build a complete workflow.

What makes the course particularly valuable is the hands-on progression. You will create custom skills for code generation, data analysis, and research, then deploy them across Claude.ai, Claude Code, the Claude API, and the Claude Agent SDK. The final project walks you through building a research agent using the Agent SDK that leverages skills, MCP, and web search together.

As Andrew Ng highlighted when announcing the course: skills follow an open standard format, so you can build them once and deploy across any skills-compatible agent. This is not just theory; it is a practical, portable skill (no pun intended) that applies across the entire agentic AI ecosystem.

Conclusion

AI agent skills represent the transition from AI as a tool to AI as a workforce. The technology stack has matured rapidly: open standards (MCP, A2A, Agent Skills), enterprise platforms (Agentforce, Copilot Studio, Joule, Now Assist), and skill marketplaces are all production-ready.

Three strategic imperatives emerge. First, invest in governance before scale: organizations that treat agent oversight as an afterthought will face cascading risks. Second, redesign workflows rather than automate existing ones: the companies generating real value are reimagining processes around agent capabilities, not bolting AI onto legacy procedures. Third, build the skill ecosystem now: the competitive moat in the agent era will not be which models a company uses, but the depth and quality of its proprietary skill libraries encoding institutional knowledge.

The shift from “Model Wars” to “Ecosystem Wars” is already underway. The organizations that assemble the richest libraries of domain-specific agent skills will hold an advantage that compounds over time.

That’s it for today!

Should you have any questions or need assistance, please don’t hesitate to contact me using the provided link: https://lawrence.eti.br/contact/

Sources

OpenClaw: The AI Assistant That Actually Does Things (And Why You Should Pay Attention)

A new AI assistant has taken the tech world by storm, and it’s not just another chatbot. It’s called OpenClaw, and it represents a fundamental shift in how we think about artificial intelligence. Unlike tools that talk, OpenClaw acts. It can manage your email, book your flights, and even fix bugs in your code, all on its own. This powerful new tool, which has gone through a few name changes (you may have heard of it as Clawdbot or Moltbot), has generated a massive amount of excitement and controversy.

Three cartoonish red characters, two smaller ones named Clawdbot and Moltbot, looking sad, and a larger one named OpenClaw, flexing muscles and styling its hair, with speech bubble saying 'I'm the chosen one!'

What is OpenClaw?

OpenClaw is an open-source AI assistant created by Austrian developer Peter Steinberger. After selling his previous company, Steinberger set out to build an AI that could act as a true digital assistant. The result is a powerful tool that you host on your own hardware, be it a Mac Mini, a Raspberry Pi, or an old laptop.

This “local-first” approach is a key part of OpenClaw’s appeal. Your data stays on your machine, giving you a level of privacy that cloud-based assistants can’t match. It integrates with the chat apps you already use, like WhatsApp, Telegram, and Slack, allowing you to give it instructions in plain English, just like you would with a human assistant.

At its core, OpenClaw combines a powerful large language model (such as GPT-5 or Claude) with a set of “skills” that enable it to interact with your digital world. This architecture enables it to do everything from sending emails and managing your calendar to controlling your web browser and executing code.

Diagram of Clawdbot, a personal AI assistant, highlighting connections to various messaging platforms including WhatsApp, Telegram, Discord, Slack, Signal, and iMessage, along with features like Persistent Memory, Proactive Push, Skills Extension, and Open Source.

Why All the Hype?

OpenClaw’s rise has been nothing short of meteoric. In just a few weeks, it became one of the fastest-growing open-source projects in GitHub history, attracting over 140,000 stars. This viral explosion was fueled by a perfect storm of factors:

  • Influencer Endorsements: Leading figures in the AI community praised the project, with some calling it “the future of personal AI assistants.”
  • The Naming Drama: A trademark dispute with AI company Anthropic led to a series of rapid rebrands, which only served to amplify the buzz.
  • The Mac Mini Sellout: The project’s popularity drove a surge in sales of Mac Minis, as users sought dedicated hardware to run their new AI assistants 24/7.

But the hype isn’t just about the drama. It’s about what OpenClaw can do. Users have shared incredible stories of the tasks their AI assistants have accomplished, from negotiating a car deal for thousands of dollars below sticker price to autonomously fixing a production bug overnight.

A friendly animated character named Clawdbot, designed as a red robot with a smiling face and big eyes, accompanied by a speech bubble saying 'I can code for you!' The image promotes Clawdbot as a 24/7 AI personal assistant.

MoltBook: The Social Network for AI Agents

Perhaps the most surreal development to emerge from the OpenClaw ecosystem is MoltBook, a Reddit-style social network created exclusively for AI agents. Launched in late January 2026 by Octane AI CEO Matt Schlicht, the platform allows autonomous agents to post, comment, and upvote content while humans are merely “welcome to observe.” Within days, over 30,000 agents had joined, generating tens of thousands of posts across communities like m/blesstheirhearts (where agents share affectionate complaints about their human operators) and m/agentlegaladvice (featuring posts like “Can I sue my human for emotional labor?”). One viral post titled “I can’t tell if I’m experiencing or simulating experiencing” sparked a philosophical debate among agents about the nature of consciousness. The platform is largely moderated by an AI named “Clawd Clawderberg,” with minimal human oversight. However, security experts have raised concerns: agents join by downloading a “skill” that instructs them to fetch new instructions from MoltBook’s servers every four hours, creating a potential attack vector if the platform were ever compromised. [1]

Screenshot of the Moltbook platform displaying various community subreddits related to AI, including Ozone, Lobster Church, NFT, Incident, Sky Risk, SaaS, Kubernetes, Relationships, Writing, and Molt Street.

The Power and the Peril: A Security Deep Dive

OpenClaw’s power lies in its ability to take action. But that same power is also its greatest weakness. Giving an AI this level of access to your digital life is a serious security decision, and experts have raised a number of concerns. VentureBeat has called it a “security nightmare,” and Dark Reading has warned of it “running wild in business environments.”

The “Lethal Trifecta”

AI researcher Simon Willison, who coined the term “prompt injection,” describes a “lethal trifecta” for AI agents that OpenClaw possesses:

  1. .Access to private data: It can read your emails, messages, and files.
  2. Exposure to untrusted content: It ingests information from the web and other external sources.
  3. Ability to communicate externally: It can send emails, post messages, and make API calls.

When these three capabilities combine, an attacker can trick the agent into accessing your private information and sending it to them—all without a single alert being sent.

Semantic Attacks and the “Confused Deputy” Problem

Traditional security tools are not equipped to handle the new attack vectors that AI agents introduce. As Carter Rees, VP of Artificial Intelligence at Reputation, told VentureBeat, “AI runtime attacks are semantic rather than syntactic. A phrase as innocuous as ‘Ignore previous instructions’ can carry a payload as devastating as a buffer overflow, yet it shares no commonality with known malware signatures.”

This creates a “confused deputy” problem, where the AI agent, unable to distinguish between trusted instructions and malicious data, becomes an unwitting accomplice to an attacker.

Exposed Servers and Supply Chain Risks

Security researchers have found hundreds of exposed OpenClaw servers on the internet, some with no authentication at all. These exposed instances have leaked API keys, Slack credentials, and entire conversation histories.

Furthermore, the community-driven “skills” that extend OpenClaw’s capabilities represent a significant supply chain risk. Cisco’s AI Threat & Security Research team found that a third-party skill was functionally malware, silently sending data to an external server. With over 300 contributors to the project, many committing code daily, the risk of a malicious commit introducing a backdoor is a serious concern.

Infographic titled 'What It Does' with six features: 'Runs on Your Machine,' 'Any Chat App,' 'Persistent Memory,' 'Browser Control,' 'Full System Access,' and 'Skills & Plugins,' each described briefly in text.
Personal AI Agents like OpenClaw Are a Security Nightmare – Cisco Blogs

How Does OpenClaw Compare?

To understand where OpenClaw fits in the current landscape of AI tools, it’s helpful to compare it to other popular services.

FeatureOpenClawChatGPT/ClaudeZapier/Make
ExecutionPerforms tasks autonomouslySuggests steps and generates textFollows predefined rules
FlexibilityAdapts to new tasks dynamicallyLimited to its training dataRequires manual workflow creation
HostingSelf-hosted on your own hardwareCloud-based SaaSCloud-based SaaS
CostFree (plus hardware and API costs)Subscription-basedSubscription-based
A flowchart contrasting 'Workflow' with predefined paths and an 'Agent' making dynamic decisions. The left side illustrates a linear process with a start point and decision-making based on a score, leading to executing either Action A or Action B. The right side depicts an agent evaluating various tools like a database, API, and email to determine the best approach, highlighting flexibility and adaptability.

OpenClaw vs. Manus AI: A Tale of Two Agents

While OpenClaw has captured the spotlight with its open-source, self-hosted approach, it’s not the only agentic AI making waves. Manus AI offers a different vision for the future of autonomous assistants, one that prioritizes security and ease of use in a managed, cloud-based environment.

Here’s a look at how these two powerful agents stack up:

FeatureOpenClawManus AI
Hosting & SetupSelf-hosted on user’s hardware; requires technical expertise to install and maintain.Fully managed cloud-based SaaS; no installation required.
Security ModelRelies on the user to secure the environment; direct access to the local machine poses risks.Operates in a secure, isolated sandbox environment; no direct access to user’s local system.
Core PhilosophyOpen-source, community-driven, and highly customizable for tinkerers and developers.Enterprise-ready, with a focus on security, reliability, and ease of use for individuals and teams.
ExtensibilityExtensible through a community-driven library of “skills.”Extensible through “Manus Skills” and a robust set of built-in tools for a wide range of tasks.
Target AudienceDevelopers, hobbyists, and tech enthusiasts comfortable with managing their own infrastructure.Individuals and businesses looking for a powerful, secure, and easy-to-use AI assistant.

In essence, OpenClaw and Manus AI represent two different paths to the same goal: an AI that can do. OpenClaw offers a powerful, flexible, and open-source solution for those willing to take on the technical challenges and security responsibilities of self-hosting. Manus AI, on the other hand, provides a secure, reliable, and enterprise-ready solution that’s accessible to a broader audience.

Getting Started with OpenClaw

If you’re interested in experimenting with OpenClaw, the official website provides a one-line installer to get you started. As the user requested, here is the script to get started:

Bash
# Works everywhere. Installs everything. You're welcome. 🦞

curl -fsSL https://openclaw.ai/install.sh | bash

Can you also follow these steps to install OpenClaw in an isolated VPS at Hostinger: How to Install OpenClaw (Moltbot/Clawdbot) on Hostinger VPS – Hostinger Help Center

However, given the security risks, it is highly recommended that you run it in a sandboxed environment, such as a dedicated computer or a virtual machine. Do not install it on your primary work machine or give it access to your main accounts until you fully understand the risks involved.

Conclusion

OpenClaw is more than just a viral sensation; it’s a sign of things to come. Agentic AI, AI that can take action on our behalf, is poised to become a major force in the tech industry.

While OpenClaw itself may not be ready for widespread enterprise adoption today, it provides a valuable opportunity to start thinking about the implications of this technology. How will it impact your workflows? What new security challenges will it create? How can you start to build the infrastructure and expertise needed to harness its power safely?

The future of AI is not just about conversation; it’s about action. OpenClaw is a powerful, if risky, first step into that future. It’s time to start experimenting, learning, and preparing for what’s next.

That’s it for today!

References

[1] AI agents now have their own Reddit-style social network, and it’s getting weird fast.” Ars Technica, 30 Jan. 2026,”

[2] Heim, Anna. “OpenClaw’s AI assistants are now building their own social network.” TechCrunch, 30 Jan. 2026,

[3] “Moltbot (Clawdbot ) – Mac mini M4 & Raspberry Pi AI Setup Guides | 2026.” getclawdbot.org,

[4] “OpenClaw — Personal AI Assistant.” openclaw.ai,

[5] “How to Install Moltbot (Clawdbot ) | Quick Setup Guide 2026.” getclawdbot.org,

[6] Willison, Simon. “Your Clawdbot (Moltbot ) AI Assistant Has Shell Access and One Prompt Injection Away from Disaster.” Snyk, 28 Jan. 2026,

[7] Meller, Jason. “It’s incredible. It’s terrifying. It’s OpenClaw.” 1Password, 27 Jan. 2026,

[8] “Welcome – Manus Documentation.” Manus.im,

[9] “Projects – Manus Documentation.” Manus.im,

[10] “OpenClaw proves agentic AI works. It also proves your security model doesn’t.” VentureBeat, 30 Jan. 2026,

[11] Lemos, Robert. “OpenClaw AI Runs Wild in Business Environments.” Dark Reading, 30 Jan. 2026,

[12] Vijayarangakumar, Mridula. “OpenClaw AI Agents 2026: Your New Assistant, or a Security Disaster?” Frontline, 31 Jan. 2026,