Most weeks, the signal is mixed—a model update here, a feature launch there, a lot of noise dressed up as progress.
This week was different.
Every major AI platform shipped autonomous agents to real users. Not demos. Not waitlists. Not “coming soon” blog posts with concept videos. Production software, in people’s hands, doing work without being told each step.
OpenAI merged three products into one agent that operates its own virtual computer. Anthropic launched an enterprise plugin marketplace that sent IBM’s stock down 13.2%. Cursor announced that 35% of its pull requests are now written by autonomous AI agents. Google shipped Gemini doing background tasks on Samsung phones. And ads landed inside ChatGPT at $60 CPM.
We’ve crossed from “AI can do things” to “AI is doing things.” Without asking permission. Without a human in the loop for every step.
Here’s what happened, why it matters, and what you should actually do about it.
ChatGPT Agent: The “One Agent” Moment
Let’s start with OpenAI, because what they shipped this week is the clearest signal of where this is all heading.
They took three separate products—Operator (which could browse and click on websites), Deep Research (which synthesized information across the web), and ChatGPT itself—and collapsed them into a single thing called ChatGPT Agent.
What’s different from everything that came before: ChatGPT Agent runs its own virtual computer. It doesn’t just search the web and summarize results. It reasons about what it’s seeing on screen, then takes action—clicking buttons, filling out forms, navigating between sites, completing multi-step workflows inside a contained environment.
It’s rolling out to Pro users now, with Plus and Team following in days.
Here’s why I keep coming back to this. OpenAI launched Operator as a standalone product just thirteen months ago. It’s already been absorbed into ChatGPT. That tells you how fast the product strategy is evolving—and how quickly these capabilities are becoming table stakes rather than premium features.
The practical implications are immediate. Think about the tasks your team runs every week. Competitive research across ten websites. Vendor comparison. Travel booking. Pulling data from multiple dashboards into a summary. All of that is now a candidate for delegation to an agent.
And the upgraded Deep Research is genuinely useful for anyone doing serious analytical work. You can now focus it on specific websites and connected apps, create and edit a research plan before it starts, and track progress mid-run. That’s a task that used to take a junior analyst a full day—competitive pricing analysis across five websites, synthesized into a comparison—handed to an agent in a prompt.
The question I keep asking myself: at what point does an AI agent replace a workflow—not just assist with one? If ChatGPT Agent can research, analyze, and take action on websites... what happens to the competitive intelligence platforms, the travel management software, the procurement tools we currently pay for?
Fair counterpoint: Operator had mixed reviews when it launched. The Computer-Using Agent model still struggles with complex enterprise UIs. And merging three products into one doesn’t automatically make the result better—there’s a real risk of feature bloat.
But here’s the thing. ChatGPT has over 200 million weekly active users. When you ship agent capabilities to that base, you’re not announcing a feature. You’re creating a behavior at scale.
Anthropic’s Enterprise App Store—and Why IBM Lost $20 Billion
This is the story that flew under the radar for most people outside enterprise tech. And it might be the most consequential move of the month.
On February 24th, Anthropic held a briefing called “Enterprise Agents” and announced thirteen new enterprise plugins for Claude—connectors to Google Workspace, DocuSign, WordPress, and more. But the plugins themselves aren’t the story. The plugin marketplace is.
Organizations can now build private plugin stores. Connect them to internal GitHub repositories. Control exactly which AI-powered workflows their employees can access. And Anthropic rolled out prebuilt templates for specific departments: HR, legal, finance, investment banking, equity research, private equity, wealth management.
IBM’s stock dropped 13.2% the same day. Over $20 billion in market cap, gone. DocuSign and Thomson Reuters—Anthropic’s integration partners—rallied.
Why? Because Anthropic is no longer competing on model quality alone. They’re competing on platform reach. The pitch is straightforward: “Your legal team gets contract review plugins. Your finance team gets reconciliation tools. Your HR team gets onboarding automation. All managed by your IT admin, with guardrails.”
That’s department-by-department AI deployment. Which is exactly how enterprise software adoption actually scales.
I’ve seen this pattern play out across every major platform shift in the last two decades. The winner isn’t always the company with the best technology. It’s the company that makes adoption easiest for the buyer. And in enterprise, the buyer is an IT admin who needs control, audit trails, and the ability to say “legal gets these tools, marketing gets those tools.”
Anthropic just built that.
Now—the skeptic’s counterpoint is fair. Plugin marketplaces have been promised before. Remember ChatGPT Plugins? Huge fanfare. Quietly faded. The question is whether Anthropic’s approach, which is admin-controlled and enterprise-focused, solves the discovery and quality problems that killed the first wave.
And thirteen connectors at launch isn’t Salesforce’s AppExchange. Scale matters here.
But that IBM stock drop tells you something. The market is pricing in the possibility that AI-native platforms could replace entire categories of enterprise software. Not eventually. Starting now.
If you’re a founder building tools, this is a distribution channel worth watching closely. If you’re an exec, audit which department workflows could be automated with plugins this quarter. The window to be early is small.
35% of Cursor’s Code Is Now Written by Agents
Here’s a stat from this week that I can’t stop thinking about.
35% of Cursor’s pull requests are now generated by autonomous AI agents.
Not copilot suggestions a human accepts. Not autocomplete. Fully autonomous agents—each running on its own isolated virtual machine, setting up their own dev environment, writing code, testing it, recording video demos of their work, and producing merge-ready PRs.
You can run 20 of them in parallel. From web, desktop, mobile, Slack, or GitHub.
I’ve spent 21 years scaling teams. And the mental model I keep coming back to is this: instead of hiring five engineers to work on five features, one senior engineer could supervise twenty parallel agents. That’s not a productivity improvement. That’s a different org chart.
The talent conversation shifts fundamentally. It’s no longer “how many engineers do we need?” It’s “how many engineers do we need who can review and steer agent output?” And that’s a completely different skill set. The ability to write code matters less than the ability to recognize good code. Taste. Architecture. Judgment. Knowing what “done right” looks like.
MIT Technology Review named “vibe coding” one of the ten breakthrough technologies of 2026. I think they undersold it. What’s happening isn’t just developers coding faster. It’s a structural change in how software gets built—from individual craft to fleet management.
And this isn’t just Cursor. Claude Code is generating over $2.5 billion in annualized run-rate revenue. OpenAI’s Codex has surpassed 1.5 million weekly active users. GPT-5.3-Codex was the first model that helped build itself—the team used early versions to debug its own training pipeline.
The competitive dynamics here are fascinating. Cursor’s approach is architecturally different from OpenAI’s—dedicated VMs per agent, parallel execution, artifact generation with video evidence. OpenAI’s Codex is more single-agent focused. It’ll be one of the more interesting races to watch over the next twelve months.
Now, the honest counterpoint: 35% of PRs doesn’t mean 35% of engineering value. Many of those could be simple fixes, boilerplate, or repetitive tasks. The hard problems—system design, architectural decisions, understanding what customers actually need—those remain deeply human. And there’s a real question about whether agent-generated code creates technical debt that humans clean up later.
But if you’re an engineering leader, here’s the experiment worth running this week. Take five low-risk tasks—bug fixes, small feature additions, test coverage improvements. Spin up Cursor Cloud Agents on each one. Measure time-to-PR, code quality, and human review time required. The data will tell you whether autonomous agents are ready for your team. My bet is the answer will surprise you.
The Lightning Round: Five More Stories That Matter
Grok 4.20’s Multi-Agent Brain. xAI launched a four-agent debate system where specialized agents collaborate and cross-check each other before answering. Early users report 47–65% hallucination reduction. If independently verified, this architectural pattern could reshape how every AI provider approaches reliability. The catch: self-reported hallucination numbers are like self-reported customer satisfaction scores. Take them with salt until someone else runs the benchmarks.
Gemini 3.1 Pro Doubles Its Reasoning Score. Google quietly shipped a model that scored 77.1% on ARC-AGI-2—more than double its predecessor. If you’ve been defaulting to GPT or Claude for complex analytical tasks, it’s worth benchmarking Gemini again. Google keeps shipping strong models while everyone talks about the other two. Don’t sleep on this for enterprise analytical workloads.
Ads Arrive in ChatGPT. $60 CPM, $200K minimum spend. Brands like Expedia and Qualcomm are already in. Two implications: if you’re a marketer, this is a new premium channel worth testing. If you’re an exec, know that your free-tier employees are now seeing ads in their AI assistant. Worth a conversation about whether paid plans make sense for your team.
NotebookLM Becomes a Presentation Machine. Prompt-based slide revisions plus PowerPoint export. Research a topic, generate slides, edit with natural language, export to PPTX—all in one tool. The research-to-presentation pipeline just collapsed into a single product. If you spend money on consultants making slide decks, take a serious look.
n8n 2.0 Goes Enterprise. Draft vs. published workflow states, human-in-the-loop approval flows, sandboxed code execution. This takes n8n from a developer tool to a legitimate enterprise automation platform. Zapier and Make should be paying attention—especially on the self-hosted, security-conscious side.
The Pattern: Agentic Convergence
Step back from the individual stories and the pattern is impossible to miss.
Every major AI platform shipped autonomous agents this month. ChatGPT Agent runs your virtual computer. Cursor agents write your code. Gemini runs background tasks on your phone. Meta embedded an autonomous ad operations agent into Ads Manager. n8n added orchestration for multi-agent workflows.
This isn’t a coincidence. And it’s not marketing. It’s a convergence.
We’re watching the transition from “AI as a tool you use” to “AI as a coworker you manage.” The interface is shifting from prompts to delegation. The skill is shifting from “how to use AI” to “how to supervise AI.”
Three other patterns worth noting from this week:
AI commerce is becoming a battleground. Both OpenAI and Google now let users buy products inside their AI chatbots. Ads arrived in ChatGPT. Google launched “Direct Offers.” AI is becoming a transaction layer, not just an information layer.
Enterprise integrations are the new competitive moat. Anthropic’s marketplace, OpenAI’s connected apps, Cursor’s integrations—platforms are competing on how many tools they plug into, not just model quality. The “best model” matters less if the other platform plugs into your existing tools.
Model consolidation is accelerating. OpenAI retired four models in a single day. Claude Sonnet 4.6 replaced older versions as the default. The “too many models” era is ending. Vendors are converging on fewer, better models with broader capabilities.
Five Things You Can Do This Week
I try to end every brief with things you can actually do—not just read about.
1. Test ChatGPT Agent on a real workflow. Pick something you do every week. Competitive research, vendor comparison, travel booking. Hand it to ChatGPT Agent. Time it. Compare to your manual process. Don’t judge on perfection—judge on “80% as good in 10% of the time.” Because if it is, that changes your operating model.
2. Audit your AI plugin stack. Anthropic and OpenAI both launched enterprise integrations this month. Map which department workflows in your org are repetitive and document-heavy. Check if the new plugins cover them. If you’re on Claude Enterprise, talk to your admin about a pilot.
3. Benchmark Gemini 3.1 Pro. Take your hardest analytical prompt and run it through Gemini, Claude, and GPT. The results might surprise you. Google’s reasoning jump is real.
4. Run a Cursor Cloud Agent experiment. Five low-risk tasks. Measure time-to-PR, quality, and human review time. The data will tell you whether autonomous agents are ready for your engineering team.
5. Understand your AI advertising exposure. Ads landed in ChatGPT at $60 CPM. If you’re a marketer, test it. If you’re an exec, your free-tier employees are seeing ads in their AI tools now. Decide if paid plans are worth the conversation.
What to Watch Next Week
Keep an eye on the ChatGPT Agent rollout to Plus and Team users—that’s when we’ll see mainstream adoption data.
Watch for enterprise reactions to Anthropic’s plugin marketplace, especially in financial services and legal. Those sectors move first when the tools are right.
And pay attention to the White House’s “Rate Payer Protection Pledge” event on March 4th. Data center companies and AI labs will be formalizing how they share power costs. That’s the infrastructure story nobody’s talking about—and it will shape AI economics for years.


