3/3/26: Forget Hiring More BDRs. Sendoso Built 3 AI Agents Instead. (Here's What Happened)
Welcome to yet another day in the GTM AI podcast world. Today we went deep and I mean REALLY deep with the www.sendoso.com team including:
Kris Rudeegraap Co-CEO and Cofounder
Austin Sandmeyer Director of Growth and GTM Engineering
Egan Callahan GTM Engineer (and Sendoso Secret Weapon)
You will love this deep dive! And below you have 2 playbooks and a goodie:
1 is how to create a moat with your own data
2 is an AI Agent building guide
3 is this NotebookLM which is pre-loaded with some goodies for you.
Lets get into the podcast!
Sendoso Cut Their BDR Team from 15 to 4. Pipeline Doubled.
You can go to Youtube, Apple, Spotify as well as a whole other host of locations to hear the podcast or see the video interview.
Most companies respond to declining outbound results by hiring more BDRs. Sendoso fired 73% of theirs and built an AI system that uses internal data nobody else has access to. The result: 4 BDRs now generate over 30% of pipeline, double what 15 were producing.
I brought three of the Sendoso team on the GTM AI podcast: CEO Kris Rudeegraap, Austin Sandmeyer, and their internal AI builder Egan Callahan (I call him Sendoso’s secret weapon). What they showed me wasn’t flashy demos. It was practical, revenue-generating AI infrastructure that any GTM org can learn from. Here’s what matters:
1) Internal data is the only defensible moat in AI outbound.
Every AI SDR tool on the market scrapes LinkedIn for job changes and hiring signals. You know who else gets that data? The 40 other vendors emailing your prospect the same thing.
Sendoso’s system (built on UserGems’ Gemmy) pulls from Snowflake product usage data and Salesforce history: closed-lost opportunities, previous champions at new companies, product engagement patterns, multi-threading contacts. Nobody else has this data.
The output: emails that reference a specific internal champion by name, surface a competitor case study relevant to the prospect’s stack, and pair it with a personalized gift (like a hands-free running leash because AI found the prospect’s dog on social media). Prospects reply thinking a human spent 30 minutes researching them. The BDR didn’t even know the email went out.
2) The BDR role didn’t shrink. It evolved into pure relationship-building.
Austin’s team made a bet: strip email completely off the BDR’s plate. Let AI handle email and LinkedIn sequences. BDRs now spend 100% of their time on calls, responding to warm replies, and building consensus across buying committees.
The logic: emails are ad impressions now. Open, delete, junk. The human advantage is voice, relationship, and real-time conversation. So stop wasting humans on what AI does better, and let humans do what AI can’t.
The transformation timeline:
Month 0-2: Replicated existing BDR email output with AI (same results, fewer people)
Month 2-6: Optimized AI sequences using internal data signals (results surpassed human output)
Month 6+: Scaled team back to 4-5 BDRs focused exclusively on human touchpoints
Austin’s advice for teams wanting to do this: grab one excited BDR, pilot the AI motion, prove the wins, then scale. Trying to flip a 15-person SDR org overnight without executive alignment will stall.
3) Egan’s agent-building framework separates production-grade AI from expensive demos.
Egan builds all of Sendoso’s internal AI systems on Anthropic’s agent harness framework (the same architecture behind Claude Code). Three systems he walked through:
Contract Scraper: Pulls closed-won contracts from Salesforce, OCRs the PDF, converts to PNG, and feeds all three formats (image + OCR + text) to Claude Opus. JavaScript auditor validates output against deterministic business rules before pushing to Salesforce. Saves AMs roughly 45 minutes per renewal and surfaces upsell opportunities that were buried in PDFs nobody read. Result: they hit their net revenue retention goal two quarters running.
Deep Research Engine: Self-hosted vector database with hybrid re-ranking (text search + semantic search + score aggregation). Analyzes every deal individually, builds context over time, then synthesizes across the full dataset. Solves the chatbot hallucination problem where most tools claim to analyze “all your deals” but actually only pull a handful due to context window limits.
AI Proposal Generator: Single-file HTML proposals (not PowerPoints) that pull customer goals from sales conversations, auto-select relevant testimonials, and include e-signature capability. Reps can edit inline or let AI decide. Hosted behind Cloudflare with email-restricted access. Egan’s theory: slides are the past, HTML is the future of sales collateral.
His build-vs-buy decision tree:
Can I build it in n8n? Does it scale? Ship it.
Too complex for n8n? Can we build it in Python and deploy on internal infrastructure?
Cost equation doesn’t work? Now buy software that solves the proven use case.
4) “Hallucination” is a context problem, not an AI problem.
Egan dropped the best analogy of the episode: take the smartest person in the world, sit them at a desk with email and a phone, give them zero context about your business, and ask them to answer a customer email. They’ll make things up too.
Most AI pilots fail because teams stuff an agent with data and “hope for the best.” Egan’s approach: structured validation layers (JavaScript auditors checking deterministic rules), rich context injection (product details, contract terminology, business-specific definitions), and hybrid retrieval that scores results across multiple search methods.
Prompt engineering isn’t dead. For single-shot conversations with Claude, sure, simple prompts work fine. For production agentic workflows with real business consequences, prompt engineering is more important than ever.
The tactical shift:
What to do this week:
Audit your BDR team’s time allocation. What percentage is email vs. calls vs. relationship-building? If email dominates, that’s your AI opportunity.
Inventory your internal data assets (Salesforce history, product usage, support tickets). This is your moat. External scraping data is table stakes.
Test one agent use case using the harness framework: structured context in, deterministic validation, human audit on edge cases.
The companies winning at AI outbound aren’t using better AI models. They’re feeding better data into the same models everyone else has access to. Your CRM history is the competitive advantage you’re sitting on and Sendoso is using all kinds of tools including momentum.io to take their CRM data to the next level.
Here’s every piece of tech mentioned on the call, organized by category:
🤖 AI Models & Platforms
Anthropic / Claude — Core AI model powering their agent harnesses; specifically referenced Claude Code and Anthropic’s agent harness paper
Claude Opus — The specific Claude model used for contract scraping
Gemmy (UserGems AI) — AI tool writing personalized emails using internal data signals
Piper (Qualified AI) — AI agent running on website and in email flows
Momentum — Unstructured conversational data analysis and deal research
📊 Data & CRM
Salesforce — Primary CRM for opportunity tracking, data extraction, and proposal triggering
Snowflake — Data warehouse feeding product activity and user metrics into AI workflows
Google Sheets — Used as a lightweight temporary database for scraping results
Vector Store / Vector Database — Self-hosted database storing customer communication embeddings for deep research
🔧 Automation & Agent Building
N8N — Self-hosted workflow automation tool; Egan’s primary agent builder
Zapier — Mentioned as a common no-code alternative to N8N
Python — Used for custom agent development and data pipelines
JavaScript — Used for deterministic validation and business rule auditing
📣 Sales & Outbound Tools
Qualified — AI platform for website engagement and pipeline routing
UserGems — Contact acquisition and job-change tracking
Clay — Data enrichment and scraping (used in previous workflows)
Orum — Dialer and AI-ranked call queue for BDRs
SalesLoft — Traditional sales engagement platform (referenced as legacy)
Gong — Sales conversation intelligence (referenced for comms tracking)
🏗️ Infrastructure & Dev
Cloudflare — Securing and restricting access to generated sales proposals
DocuSign — Proposal signing integration
React — Frontend framework for building proposal components
HTML/CSS — Used to generate portable single-page sales proposals
OCR — Technology for extracting text from PDF contracts
🎯 The Core Stack Summary
Snowflake + Salesforce → Gemmy (UserGems) → Outreach/SalesLoft
↓
N8N agents (Python + JS validation)
↓
Anthropic Claude → Proposals (React/HTML) → Cloudflare → DocuSign
↓
Qualified (Piper) + Orum (dialing)
↓
Momentum (conversation intel)
📘 PLAYBOOK 1: The GTM AI Agent Guide
How to Build Production-Grade AI Agents Using the Harness Framework
THE CORE PROBLEM
90% of AI agent pilots in GTM orgs die the same death.
The agent didn’t fail because the AI was bad. It failed because there was no harness around it. A harness is the infrastructure that wraps around the AI model to handle context management, session recovery, progress tracking, and validation. Without it, you have a chatbot. With it, you have a production system.
THE FIVE FAILURE MODES (And How to Fix Them)
Failure Mode What Happens The Fix No Persistent Context Agent repeats the same mistakes Progress files and logs read at startup Premature Completion Agent skips edge cases Structured JSON feature lists with pass/fail criteria Broken Environment System left in unrecoverable state Init scripts and Git rollbacks No Validation Layer Errors compound silently Code checks + AI re-verification + human audits Context Hallucination Agent fills gaps with fiction Hybrid retrieval and rich context injection
THE AGENT HARNESS FRAMEWORK (5 Architecture Patterns)
Pattern 1: Session Startup Sequence
This comes from the Claude Harness paper.
Before doing anything, every agent must:
Run an environment check — confirm all APIs, credentials, and dependencies are live
Recover context from logs — read what was done last session, what failed, what’s pending
Select tasks from a structured queue — never let the agent decide what to work on
Run a smoke test — validate one sample task before full execution
Pattern 2: Structured Task Lists
Tasks are defined in JSON format with explicit criteria. Nothing is “done” unless validated.
{
"task_id": "contract_001",
"task": "Extract renewal date from Acme Corp contract",
"status": "pending",
"boolean_passes": false,
"pass_criteria": "Date extracted, formatted MM/DD/YYYY, pushed to Salesforce field"
}
The boolean_passes field only flips to true after validation clears — not when the AI thinks it’s done.
Pattern 3: Incremental Progress with Rollback
Work one task at a time (one contract, one account, one proposal)
Commit progress before moving to the next task
Maintain a rollback point at each commit so failures don’t cascade
Think Git commits — small, auditable, reversible
Pattern 4: Multi-Layer Validation
Three layers, every time:
Layer 1 — Code (Deterministic): JavaScript or Python auditors check business rules. Did the date parse correctly? Is the field populated? Does the output match the expected format?
Layer 2 — AI (Re-verification): A second AI pass reviews Layer 1 failures with full error context. The AI re-attempts with the failure reason injected.
Layer 3 — Human (Escalation): Anything below confidence threshold gets routed to a human reviewer queue. Not bypassed — escalated with full context.
Pattern 5: Rich Context Injection
Every agent interaction must include:
Product definitions — what your product does, features, terminology
CRM field definitions — what “Stage 3” means, what fields matter
Brand voice guidelines — tone, language, what not to say
Historical performance — what worked, what failed, with examples
This is injected at the start of every task execution — not once at setup.
ANATOMY OF A PRODUCTION GTM AGENT
[TRIGGER]
↓ Scheduled run OR event (e.g., Salesforce Closed-Won)
[CONTEXT LOADER]
↓ Recover session state + inject business rules/glossary
[TASK QUEUE]
↓ Structured JSON list, prioritized, one item selected
[EXECUTOR]
↓ AI model (Claude) completes one task
[VALIDATOR]
→ Layer 1: Code checks
→ Layer 2: AI re-check on failures
→ Layer 3: Human escalation for low-confidence outputs
[OUTPUT]
↓ Validated results pushed to CRM / Email / Reports
SENDOSO’S THREE PRODUCTION AGENTS
Agent 1: Contract Scraper
What it does: Automates renewal research by OCR-ing PDF contracts and extracting key fields (renewal date, contract value, upsell signals)
Model: Claude Opus
Result: 45 minutes saved per renewal
Output: Directly pushed to Salesforce renewal fields
Agent 2: Deep Research Engine
What it does: Analyzes all deal communications using a self-hosted vector store with hybrid retrieval (vector search + re-ranking)
Use case: Win/loss pattern analysis, deal coaching, rep enablement
Output: Insights surfaced for leadership and AEs in structured reports
Agent 3: AI Proposal Generator
What it does: Pulls data from Salesforce + call transcripts → generates HTML proposals with auto-selected testimonials and customer goals
Infrastructure: Hosted behind Cloudflare
Result: Reps send personalized proposals in minutes, not hours
THE BUILD VS. BUY DECISION TREE
Step 1: Validate the Use Case
Ask these three questions:
Is a human currently doing this task?
Is the output measurable (time, accuracy, revenue impact)?
Does it compound — does getting better at it create a durable advantage?
If yes to all three → proceed.
Step 2: Choose Your Path
Scenario Decision Can be prototyped in n8n Build it — ship fast Needs Python + internal infrastructure Build it — bring an engineer Maintenance cost exceeds build benefit Buy software
Step 3: The 5-Day Proof-of-Value Sprint
Day 1: Map the workflow end-to-end (inputs, outputs, edge cases)
Day 2: Build v0.1 in n8n or Python
Day 3: Test against 10 real examples, document every failure
Day 4: Add Layer 1 + Layer 2 validation
Day 5: Calculate actual ROI — time saved × volume × cost-per-hour
PRODUCTION-GRADE AGENT CHECKLIST
✅ Context
[ ] Business context injected into every task
[ ] Multi-format input handling (PDF, email, Slack, CSV)
[ ] Hybrid retrieval (vector search + keyword re-ranking)
[ ] Session recovery on restart
✅ Validation
[ ] Deterministic code checks (Layer 1)
[ ] AI re-verification pass (Layer 2)
[ ] Human escalation path defined (Layer 3)
[ ] Structured JSON task list with explicit pass criteria
[ ] Rollback capability at each task commit
✅ Infrastructure
[ ] Self-hosted for sensitive data
[ ] Provider-agnostic data layer (not locked to one LLM)
[ ] Access controls and permissions defined
[ ] Audit trails for every agent action
[ ] Cost monitoring per agent run
✅ Measurement
[ ] OKR alignment — what north star metric does this move?
[ ] Accuracy tracking (% tasks passing Layer 1 without escalation)
[ ] Time-to-value per task
[ ] Failure mode catalog — documented list of known edge cases
THE 30-DAY AGENT ROADMAP
Week 1 — Scope
List every manual, repetitive task your GTM team does
Score each one: impact × frequency × measurability
Pick the single highest-scoring task to automate first
Write explicit success criteria before building anything
Week 2 — Prototype
Set up n8n (recommended starting point)
Build v0.1 of the agent
Test it against 10 real examples
Document every failure — these become your validation rules
Week 3 — Harden
Build Layer 1 deterministic validation (code checks)
Build Layer 2 AI re-verification
Define your human escalation queue and routing rules
Test against 50 examples — document edge cases
Week 4 — Deploy
Deploy with 100% human-in-the-loop (every output reviewed)
Calculate actual ROI from real usage
As confidence grows, reduce human review percentage
Build the business case to scale to agent #2
📗 PLAYBOOK 2: The Internal Data Playbook
How Sendoso Cut Their BDR Team 73% and Doubled Pipeline with AI
THE BROKEN OUTBOUND MODEL
Sendoso’s reality before the transformation:
15+ BDRs generating less than 15% of total pipeline
Mass personalization that was neither mass nor personal
Reps spending 80% of time on tasks AI can do better
After transformation:
4–5 BDRs generating 30%+ of pipeline
AI handling outbound research, sequencing, and proposal generation
Reps focused exclusively on phone calls, warm replies, and multi-threading
FRAMEWORK 1: THE INTERNAL DATA ADVANTAGE
Most teams compete on the same external signals. Here’s the signal hierarchy:
Tier Data Type Examples Competitive Advantage Tier 1 — Table Stakes External signals Job changes, funding, hiring (LinkedIn, Crunchbase) LOW — everyone has it Tier 2 — Differentiator Internal CRM history Closed-lost records, previous champions, competitor intel MEDIUM Tier 3 — Moat Behavioral/product data Feature adoption, usage patterns, engagement data HIGH Tier 4 — Unfair Advantage Combined signal layer Usage + CRM + external + conversation intelligence unified VERY HIGH
The goal: Get to Tier 4. Your internal data is the unfair advantage that no competitor can replicate.
FRAMEWORK 2: THE BDR TRANSFORMATION ROADMAP (6 Months)
Phase 1: REPLICATE (Months 0–2)
Goal: Prove AI can match current BDR email output
Pull your top 10 performing email sequences from your sequencing tool
Use them as baseline prompts — train AI on what’s actually working
Run AI-generated emails in parallel with human-written ones (A/B test)
Success Metric: AI emails match or exceed existing reply rates
What you’re doing: Replacing low-value output tasks, not the human.
Phase 2: OPTIMIZE (Months 2–4)
Goal: Connect internal data to AI to surpass human personalization
Connect Snowflake/BigQuery (product usage) + Salesforce (deal history) to your AI email engine
AI now personalizes using behavioral signals humans can’t manually access at scale
Sequences reference: product feature usage, days since last login, previous deal history, current champion movements
Success Metric: AI sequences outperform human-written emails by 20%+
What you’re doing: Replacing low-value research tasks, not just writing tasks.
Phase 3: TRANSFORM (Months 4–6+)
Goal: Strip automated outreach from BDR job description entirely
Remove email sequencing, list building, and research from BDR responsibilities
BDRs now focus 100% on:
Phone calls (AI-ranked priority queues)
Responding to warm inbound replies
Multi-threading relationships within accounts
Success Metric: Fewer BDRs generating more pipeline
What you’re doing: Redefining what a BDR is.
FRAMEWORK 3: THE AI OUTBOUND TECH STACK
Layer Tool Options Function Data Warehouse Snowflake, BigQuery Product usage data source CRM Salesforce, HubSpot Deal history, champion tracking AI Email Engine UserGems (Gemmy), Clay, Regie.ai Signal-based personalized generation Website AI Qualified (Piper), Drift Real-time engagement & routing Gifting Layer Sendoso SmartSend AI-suggested gift timing and selection Calling/Dialer Orum, Nooks AI-ranked call priority queues Conversation Intel Momentum, Gong Call transcripts → signals Agent Builder n8n, custom Python Workflow orchestration
Architecture flow:
Snowflake (usage data)
+ Salesforce (CRM history)
+ Conversation Intel (Momentum/Gong)
↓
AI Email Engine (Clay/UserGems)
↓
Sequencer (Outreach/Apollo)
↓
BDR responds to warm replies only
FRAMEWORK 4: THE AGENT ARCHITECTURE DECISION TREE
Step 1: Validate the Use Case
Is a human currently doing this?
Do they dislike doing it? (good signal for automation)
Is the output clearly measurable?
Step 2: Choose Your Build Path
Scenario Path Can be built in n8n/no-code Build it — ship in days Requires Python + internal infra Build it — bring an engineer Maintenance cost too high Buy software
Step 3: Production-Grade Checklist
[ ] Structured context injection (business rules, CRM field definitions)
[ ] Layer 1 deterministic validation (JavaScript/Python business rule checks)
[ ] Hybrid retrieval (vector search + keyword re-ranking)
[ ] Human audit queue for edge cases
[ ] Audit trail for every agent output
THE THREE READY-TO-IMPLEMENT USE CASES
Use Case 1: AI Contract Scraper
The problem: Renewal prep takes 45+ minutes per account — pulling PDFs, finding dates, entering Salesforce fields manually.
The solution:
Trigger: Salesforce Closed-Won or renewal date field within 90 days
Agent pulls contract PDF from Salesforce attachments
OCR processes PDF → extracts renewal date, contract value, tier, add-ons
Claude Opus validates extracted fields against expected format
Layer 1 validation: Are all required fields populated and correctly formatted?
Pushes directly to Salesforce renewal fields
Escalates to human if confidence is below threshold
Result: 45 minutes saved per renewal
Use Case 2: Deep Research Engine
The problem: Win/loss patterns are buried in emails, calls, and Slack — invisible to reps and leadership.
The solution:
Ingest all deal communications: emails, call transcripts (Gong/Momentum), Slack threads
Chunk and embed into a self-hosted vector store (not OpenAI’s — you control the data)
Hybrid retrieval: vector similarity search + BM25 keyword re-ranking
Query: “What objections did we lose to Competitor X in Q3?”
Output: Structured win/loss report surfaced in Slack or Salesforce
Result: Leadership makes coaching decisions from evidence, not gut feel
Use Case 3: AI Proposal Generator
The problem: Custom proposals take hours — pulling data, writing goals, finding testimonials, formatting.
The solution:
Trigger: Salesforce opportunity moves to “Proposal Sent” stage
Agent pulls: Account data, deal history, call transcript summary, use case category
AI selects relevant testimonials from a testimonial library based on use case match
Generates HTML proposal with: customer goals (from call transcript), relevant features, social proof
Hosted behind Cloudflare — rep gets a link, can review and send in minutes
Result: Reps spend time on calls, not on document formatting
THE 30-DAY ACTION PLAN
Week 1: Audit
Map exactly how BDRs spend their time (time audit by task category)
Inventory all internal data sources: CRM fields, data warehouse tables, call recordings
Calculate your current cost-per-meeting (BDR salary ÷ meetings booked per month)
This number is your baseline — everything you build is measured against it
Week 2: Pilot Design
Pick one excited BDR — not a skeptic, not the top performer (they’ll resist)
Select one AI tool to test (start with Clay or UserGems for email)
Load the top 10 performing sequences as training data / prompt templates
Define the A/B test setup: AI sequences vs. human sequences, same ICP, same volume
Week 3: Launch and Measure
Run the A/B test live
Track: reply rate, meeting rate, reply quality (manually reviewed), time spent per BDR
Document every failure or awkward AI output — these are your edge cases to solve
Do NOT optimize during this week — just observe and document
Week 4: Decide and Expand
Build the business case from actual data: What did the AI sequence cost? What did it produce?
Present to leadership with three options:
Stop — if results don’t support investment
Continue pilot — if directionally positive but need more data
Expand — if results clearly show ROI, define the next three accounts/BDRs to roll out
Begin Phase 2 planning: connecting internal data (Snowflake/Salesforce) to your AI engine
THE NORTH STAR METRICS TO TRACK
Metric Why It Matters Cost-per-meeting True unit economics of your outbound motion AI email reply rate vs. human Validates the core thesis BDR time on phone vs. admin Leading indicator of transformation Pipeline generated per BDR Ultimate measure of leverage Agent accuracy rate % of outputs that pass Layer 1 without human escalation
Tune in Thursday for another amazing podcast!


