6/4/26: Why and How to run AI with NO Internet
Once again we are here digging into some goldness goodness on the GTM AI Podcast.
My man Jonathan Moss interviews John Williams and they get detailed on how and why you should run AI without internet and how to keep more of your own data.
As per usual, we have podcasts, articles, and notes every week and you can get the rundown here of what to expect.
Now lets get into it.
You can go to Youtube, Apple, Spotify as well as a whole other host of locations to hear the podcast or see the video interview.
Have you ever asked yourself who actually owns your AI conversations?
I hadn’t. Not really. Then John Williams said this on the podcast and I haven’t stopped thinking about it since:
“Possession is nine-tenths of the law. If you can’t access it, then perhaps you don’t own it.”
Sit with that for a second. You’ve spent a year, maybe two, doing your best thinking inside Claude, ChatGPT, Gemini, and Groq. Prompts that work. Decisions you reasoned through out loud. Whole projects planned turn by turn. Where does all of it live? Behind someone else’s login, under someone else’s terms of service, dependent on someone else’s uptime.
John is a 20-year GTM operator who’s spent the last five years running an independent practice, and he came into the kitchen this week and actually cooked: live demos, real repos, receipts on screen. No keynote fluff. What he showed adds up to something bigger than any single tool, and I want to walk you through all five layers of it, because the through-line is a strategy most operators haven’t named yet.
The strategy is ownership.
1) We just entered the toolbox era of GTM.
John opened with an analogy that reframed how I think about operator careers. An HVAC mechanic shows up to your house with their own toolbox. An automotive tech brings their own tools to the shop. John’s argument: GTM operators are next.
“When we arrive at a situation, whether that’s our next FTE role or as an independent operator, you’re expected to bring some of your own tool stack with you.”
Read that again if you’re job hunting or running fractional. The expectation is shifting from “can you use our stack?” to “what do you bring with you?” And the proof of what you bring lives in public. When someone from John’s network pitches him for project work, his first question is “Would you share your GitHub repo link with me?” Your repo is becoming your resume.
He took it one step further, and this is the part I loved: if he applied for his next full-time role, he’d do the entire application in public. Document the approach, publish the work, and let the employer find him instead of being applicant 847 in the first hour.
The tactical move this week:
Create a GitHub account if you don’t have one (private repos are free)
Push one thing you’ve built: a prompt library, a workflow doc, a custom skill
Side benefit I learned the hard way: it also syncs your work across machines. I once lost two days to an iCloud sync death loop between my Mac Mini and laptop. A repo would have saved me both days.
2) Your conversation history is an appreciating asset. Treat it like one.
Here’s the mental shift: every AI conversation you have produces two outputs. The answer you needed today, and a record of how you think. Almost everyone keeps the first and throws away the second. John argues the second is worth more.
His tool, Chat Archive, is a free open-source browser extension that exports any AI conversation to JSON or markdown with one click. He demoed it live: exported a full Claude conversation about a technical project, uploaded the JSON into Groq, and Groq picked up the project with complete context. It even summarized an overnight run he hadn’t reviewed yet. Mid-project model switching, solved. The thing that matters most architecturally: zero outbound calls. Nothing leaves your machine.
But the convenience play is the small play. Three bigger ones:
The audit trail. If you work in a regulated industry, raw inference files prove you acted in your client’s best interest. John compared it to preserving security camera footage. When AI-assisted work gets questioned (and it will), the operators with receipts win.
The digital twin. John is accumulating his full conversational history so a model can mine it for patterns. His framing stuck with me: “We try to do our best as organic LLMs to remember to do all of the right things... but we do forget. We do lose our own context.” A year of archived chats becomes a dataset about your own thinking, and models are exceptional at surfacing patterns you can’t see from inside your own head.
Federated intelligence on your terms. Each archived conversation becomes a node you control. Stitch them together and you’ve built an intelligence layer that follows YOUR privacy policy, not a vendor’s.
One more detail worth stealing even if you never install anything: John keeps a master markdown file per project. After every session, he updates what got done against what he hoped to get done, then feeds that file into whatever model he opens next. Every conversation starts fully informed. You stop re-explaining yourself to the AI forever.
3) Own your own inference.
John runs local models on his laptop through Ollama, and his reasoning goes well past privacy.
Privacy first, though, because his framing is the cleanest I’ve heard: a cloud model “may be very well-protected, but it’s not contained.” Local inference on a laptop with the wifi off is contained by definition. For client work, for sensitive strategy, for anything you’d hesitate to put in an email, that distinction is everything.
Then redundancy. We watched GPU brownouts hit in March. Think about the supply math John laid out: the GPUs serving you today were purchased and installed two years ago, and the demand curve just went vertical. Every agentic workflow spun up in the past 30 days is token demand that did not exist at the start of the year. Supply is fixed in the short run. Demand is compounding. You don’t need a PhD in economics to see what happens to availability and price. I use Claude every single day, and when it goes down I lose real money in productivity. A local model is the backup generator.
And then the reason nobody talks about: craft.
“In your learning journey, you want to move past being a prompt jockey.”
Working with a small local model teaches you how these systems actually behave: what context does, where models break, when to push back on a plan. John described the aha moments where he’d challenge the model’s direction and it would respond with “that actually is a way better path.” That judgment, knowing when to redirect the machine, compounds into every project after it. You can’t read your way to it. You have to run the reps.
Small models now run fine on a normal laptop. The hardware barrier you remember from a year ago is gone. And one detail that stung: Ollama lets you switch models between turns without losing context. Start a turn with one model, answer with another, context intact. Opus to Sonnet mid-conversation still can’t do that. Open source is ahead of the labs on this one.
4) The token economics nobody is pricing in.
This was the most quietly important stretch of the episode. Two facts, one collision course:
Fact one: the labs lose money on inference. John put it plainly: “the inference costs for them are actually way more than they earn on their subscriptions.” Your $20/month plan is subsidized. That doesn’t go on forever, and we should expect a rebalancing.
Fact two: the work AI is absorbing didn’t get free. It moved. John’s framing deserves to be quoted in full:
“Where we maybe previously paid the W-2 of a human to do this necessary thing for the business, that cost didn’t really go away. It just transferred from a W-2 to an inference provider.”
Put those together and “token efficiency” stops being a nerd concern and becomes a line item your CFO will eventually ask about. The operators who get ahead of it will do three things: route work to the cheapest model that can handle it (I plan in Sonnet, build in Opus; the planning tokens are cheap, the building tokens earn their cost), batch non-urgent work to lower-cost processing, and move private, repetitive, high-volume work to local models where marginal token cost rounds to zero.
John calls the end state “token authority”: the ability to keep processing work on your own terms when the meter, the grid, or the vendor says no.
And on the jobs doomerism that usually hijacks this conversation: we used to employ switchboard operators and lamplighters. Was that the best use of a human mind? Every platform shift in history has produced more jobs than it destroyed, and the W-2-to-inference transfer is the mechanism, watching it happen in real time. The question worth asking is John’s: if nothing prevented you from doing anything, where would you actually spend your time?
5) Agents need contracts before they need apologies.
Your agents are about to spend money and agree to terms on your behalf. Most people have given exactly zero thought to the rules.
John’s Agent Commerce open spec codifies the transaction layer: how much an agent can spend without checking in, which terms and conditions it can accept, how an IP owner on the other side exposes pricing and terms in a language agents understand. It rides existing payment rails. It just makes the rules of the deal machine-readable, so the agent can check itself before committing you.
His AI Acceptable Use Policy spec solves the company-side version of the same problem. Enablement teams got overrun by shadow AI, and most companies are starting from zero. The AUP is an open-source base layer they can adapt, so AI gets embraced responsibly instead of banned badly or ungoverned entirely. Both are open source at github.com/fxops-ai, and notably, the contributing models (Groq, Claude Opus) are listed as authors. That transparency is the point.
Same energy applied to OpenClaw: huge respect for the project, real caution on the blast radius. A tool that can log in as you, write files, and legally commit you deserves scrutiny before trust. John’s filter is one question: “Would my security director approve of my use of this tool?” If the answer is maybe, paste the repo into your model first and ask it to flag the security risks. API keys leak. Prompt injection hides in agent skills. Two minutes of vetting beats a horror story. Or take my preferred play: point Claude Code at the repo, have it understand the concept, and build your own version. You get the capability without inheriting the attack surface.
The through-line
Five layers, one strategy: own your proof of work (GitHub), own your context (archives), own your inference (local models), own your economics (token authority), own your agents’ behavior (guardrails). Each one is small on its own. Stacked, they’re the difference between operators who negotiate from strength when the rebalancing comes and operators who pay whatever the meter says.
John named the urgency early in the episode, and it’s the most honest sentence anyone’s said on this show: “We probably would’ve chosen a slower pace, but we didn’t get to make that choice.”
My challenge to you this week: pick ONE layer and claim it. Easiest start: export one important AI conversation and store it where you control it. Five minutes. Then look at it and ask what a year of those is worth to you.
I hope this one shifts how you think about ownership, because it shifted mine. Reply and tell me which layer you’re starting with. I read every response.
Find John at github.com/fxops-ai and on Hugging Face as johnwilliamsatl. He’s in the Pavilion AI>M channel, and if you’re building an independent practice, he and Henning teach the Be Fractional course every six weeks.
The AI Ownership Playbook
Own your chats, your models, your economics, and your agents in 30 days
You’ve spent the last year building your best thinking inside AI tools you don’t control. Your prompts, your workflows, your decisions, your context. All of it lives behind someone else’s login, someone else’s terms, and someone else’s uptime.
Here’s the sentence that should bother you, courtesy of 20-year GTM operator John Williams: “Possession is nine-tenths of the law. If you can’t access it, then perhaps you don’t own it.”
This playbook fixes that in five moves. Each move stands alone, includes copy-paste templates, and tells you exactly what “done” looks like. Work through all five and you’ll have something most operators won’t have for years: full custody of your AI work, a backup plan for the next outage, and a real answer when someone asks what your AI spend is buying.
Inspired by the GTM and AI Podcast episode with John Williams (github.com/fxops-ai). He cooked. This is the recipe.
Start here: The 5-question ownership audit
Score yourself honestly. 1 point per “yes.”
If your main AI provider deleted your account tonight, would you still have your conversation history tomorrow?
If every cloud AI went down for 48 hours, could you still get AI-assisted work done?
Do you know (roughly) what you spent on AI tokens/subscriptions last month, and what it replaced?
Have you security-vetted every AI tool and extension you currently have installed?
If your agent spent $500 or accepted a terms-of-service agreement tomorrow, would it have been following written rules you set?
Score 4-5: You’re ahead of 95% of operators. Skim for the templates. Score 2-3: Normal. The moves below close the gaps in order of impact. Score 0-1: Good news: you’re one weekend away from a different position entirely.
Move 1: Archive every AI conversation (15 minutes to start, lifetime payoff)
The problem: every AI conversation produces two outputs. The answer you needed today, and a record of how you think. Almost everyone keeps the first and throws away the second. The second is worth more, and right now it sits in someone else’s vault. Lose the account, lose the context, lose the year.
The fix:
Install Chat Archive, John Williams’ free open-source browser extension (find it via github.com/fxops-ai). Works in Chromium browsers: Chrome and Edge.
Open any AI conversation (it auto-detects Claude, ChatGPT, Gemini, Groq; Perplexity support came from community contributor Nathan Spear, who also added bulk export). Refresh the page so the extension can read the DOM.
Export to BOTH formats: JSON (machine-readable, for feeding other models) and markdown (human-readable, for your notes).
Save to a consistent local structure:
/ai-archive/[tool]/[project]/[YYYY-MM-DD]-[topic]Back the folder up to a private GitHub repo. Private repos are free, and you get cross-machine sync without cloud-sync nightmares. (I once lost two days to an iCloud sync death loop between two computers. A repo would have saved both days.)
What the export captures: the full URL, timestamps, and every turn between you and the model, so a different model can reconstruct not just what was said but when and in what sequence.
The master markdown ritual (the highest-leverage 3 minutes of your week):
John keeps one master markdown file per project. After every working session, he updates it. Then he uploads that file into whatever model he opens next, and every new conversation starts fully informed. You stop re-explaining yourself to AI forever. Copy this template:
# [Project Name]: Master Context File
Last updated: [date] Goal: [one sentence: what done looks like] Current status: [one sentence]
## Session log
[date]: Planned: [what I hoped to get done]. Actual: [what got done]. Next: [first task of next session]
## Decisions made (and why)
[decision]: [reasoning in one line]
## Open questions
[question]
## Things that didn’t work (don’t retry)
[approach]: [why it failed]
Why both formats matter: the JSON export is portable context. Start a project in Claude, hit an outage or a rate limit, upload the JSON to Groq or Gemini, and the new model picks up exactly where you left off. On the episode, Groq summarized John’s overnight Claude run before he’d even reviewed it himself. Mid-project model switching, unlocked.
The compounding play: mine your archive. Once you have 90+ days of archived conversations, feed batches into a model with prompts like these:
“Here are 3 months of my AI conversations. What topics do I keep circling back to without finishing? What does that suggest I should prioritize or drop?”
“Identify the 5 prompts or framings in these conversations that produced my best outputs. Turn each into a reusable template.”
“What patterns do you see in how I make decisions? Where do I consistently lose time?”
“Based on these conversations, what’s an opportunity or connection I appear to be missing?”
This is John’s “digital twin” concept in miniature: a year of archived chats is a dataset about your own thinking, and models are exceptional at seeing patterns you can’t see from inside your own head. As John put it: “We try to do our best as organic LLMs to remember to do all of the right things... but we do forget. We do lose our own context.”
Bonus use case for regulated work: raw inference files are an audit trail. If you handle financial stewardship or client funds, the original conversation files prove you acted in good faith. John compares it to preserving security camera footage. When AI-assisted work gets questioned, the operator with receipts wins.
Done looks like: extension installed, top 5 conversations exported in both formats, archive folder backed up to a private repo, master markdown file started for your most active project.
Move 2: Set up local inference (45 minutes)
The problem: GPU brownouts arrived in March. The GPUs serving you today were bought and installed two years ago, and every agentic workflow spun up in the past 30 days is new token demand that didn’t exist at the start of the year. Fixed supply, compounding demand. When the meter, the grid, or the vendor decides your day, you don’t have authority over your own work.
The fix: run a small model on your own laptop.
Download Ollama (ollama.com). Free. Mac, Windows, Linux.
Open a terminal and pull a model sized to your machine:
Your machine Start with Why 8GB RAM ollama run llama3.2 (3B) Small, fast, surprisingly capable 16GB RAM ollama run mistral or ollama run gemma3 Strong reasoning for the size 32GB+ RAM ollama run llama3.1 (8B+) or larger Handles longer context and harder tasks
Talk to it. Then disconnect your wifi and talk to it again. That feeling is what John calls token authority.
Note the trick the big labs haven’t shipped: Ollama lets you switch models BETWEEN TURNS without losing conversation context. Opus to Sonnet mid-chat still can’t do that. Open source is ahead here.
Your first 5 reps (this is how you move past prompt jockey):
Give it a real task from your week (summarize notes, draft an email) and compare against your cloud model. Notice the gaps. The gaps teach you what the expensive models are actually doing for you.
Challenge its plan mid-task: “Does it really make sense that we’re headed down this path? Why wouldn’t we do it this way instead?” Watch it either defend the approach with reasons or fold to the better path. That judgment loop is the skill.
Paste in an archived conversation (Move 1) and ask it to continue the project.
Switch models mid-conversation and watch the context survive.
Run something you’d never send to the cloud: comp planning, a sensitive client situation, a negotiation strategy. Contained by definition.
The local vs. cloud decision matrix:
Local: sensitive client data, regulated work, anything you wouldn’t put in an email, drafts and brainstorming, learning reps, outage backup, high-volume repetitive tasks where marginal cost should be zero
Cloud: complex multi-step builds, long-context reasoning, final-pass quality, anything where the best model materially changes the outcome
Why this matters beyond the backup plan: as John framed it, a cloud model “may be very well-protected, but it’s not contained.” And the craft argument is real: “In your learning journey, you want to move past being a prompt jockey.” Working with a small model on your own machine teaches you how these systems behave: what context does, where models break, when to redirect. That learning compounds into every project after it.
Done looks like: Ollama installed, one model pulled, one full work task completed offline, one mid-conversation model switch performed.
Move 3: Vet before you trust (10 minutes per tool)
The problem: open tools are powerful and the ecosystem moves fast. So do bad actors. Leaked API keys and prompt injection hiding inside agent skills are not hypotheticals. They’ve happened. And the more powerful the tool (OpenClaw-class agents can open files, write files, log in as you, and commit you legally), the bigger the blast radius.
The fix: the Security Director Test. Before adopting any open tool, ask one question: “Would my security director approve of my use of this tool?” If the answer is maybe or not sure, run this prompt before you install anything:
“Review this GitHub repository: [URL]. Act as a cautious security engineer. Flag any information security risks including: outbound network calls and where they go, credential or API key handling, permissions requested, prompt injection surface in any skills or instruction files, code that writes files or executes commands, and anything that could commit the user financially or legally. Rate overall risk low/medium/high and explain your top 3 concerns in plain English.”
The red flags checklist:
[ ] Outbound network calls you can’t explain (the best privacy tools make zero; everything stays local)
[ ] API keys or credentials stored in plain text or sent anywhere
[ ] Permissions broader than the job requires
[ ] Instruction or skill files that could carry prompt injection
[ ] Ability to transact, agree to terms, or act as you without an approval step
[ ] No visible community, contributors, or commit history (ghost repos)
The rebuild play (often better than installing): point Claude Code at the repo and ask it to understand the concept, then build YOU a version scoped to exactly what you need. You get the capability without inheriting the attack surface. Don’t copy the code. Copy the idea.
Calibration note: this is not a reason to avoid open source. John’s entire stack is built on it, and he credits the builders he learned from by name (Jaron at TriFall’s chat export approach became the foundation of Chat Archive). Trust but verify. Use the test, then move with confidence.
Done looks like: Security Director Test run on every AI tool and extension currently installed, anything that fails removed or rebuilt.
Move 4: Give your agents a budget (30 minutes)
The problem: agents now research, transact, and agree to terms on your behalf. Most people deploy them with no written rules at all. That works right up until it really doesn’t, and “it really doesn’t” looks like an agent accepting exclusivity terms or recurring billing you never saw.
The fix: write the guardrails before the first incident. Copy this and fill in your numbers:
Agent Spending & Terms Policy (personal)
My agent may spend up to $___ per task and $___ per month without asking me.
Any single purchase over $___ requires my explicit approval before checkout.
My agent may accept standard terms of service for: [research data, content access, API usage]. It may never accept terms involving: exclusivity, sharing of client or personal data, recurring billing over $___/month, or legal commitments beyond the purchase itself.
My agent identifies itself as an agent wherever disclosure is required.
Every transaction gets logged: date, vendor, amount, terms accepted, task it served.
I audit the log every [week/month].
The transaction log (one row per event, keep it in the same repo as your archive):
| Date | Agent/tool | Vendor | Amount | Terms accepted | Task served | Flag? |
If you’re doing this for a company, not just yourself: John’s open-source AI Acceptable Use Policy is the base layer. Enablement teams got overrun by shadow AI, and most companies are starting from zero. The AUP gives L&D and HR a starting point that embraces AI responsibly without being overly restrictive (the two failure modes: ban it badly, or let shadow AI run the show). His Agent Commerce spec covers the transaction layer: machine-readable rules for what agents can buy and agree to, riding existing payment rails. Both at github.com/fxops-ai.
Done looks like: policy filled in and saved, transaction log created, and (if applicable) the AUP forwarded to whoever owns enablement at your company.
Move 5: Run a token budget (20 minutes, then 10 minutes monthly)
The problem: the labs lose money on inference. Your $20/month subscription is subsidized, and a rebalancing is coming. Meanwhile, every task AI absorbs from a human didn’t get free; as John put it, the cost “just transferred from a W-2 to an inference provider.” When prices correct, token efficiency becomes a line item. Get ahead of it now.
The worksheet (15 minutes, once):
List your AI spend: subscriptions + API costs + agent/tool costs = $___/month
List what it replaced or produces: hours saved/week × your effective hourly rate = $___/month
Your ratio: if line 2 isn’t at least 3x line 1, your usage is a hobby, not a system. Fix usage before cutting spend.
The 3 efficiency moves, in order of impact:
Route by cost. Plan with a cheaper model, build with the expensive one. (My pattern: Sonnet plans, Opus builds. Planning tokens are cheap; building tokens earn their cost.) The cheapest model that can handle the task gets the task.
Batch the non-urgent. Overnight and batch processing run at lower cost. Anything that doesn’t need an answer in real time shouldn’t pay real-time prices.
Go local for the repetitive. High-volume, private, repetitive work goes to your local model (Move 2), where marginal token cost rounds to zero.
Done looks like: you know your number, your ratio, and which workloads move to cheap/batch/local this month.
The 30-day ownership plan
Week 1: Custody. Run the audit. Install Chat Archive. Export your top 5 conversations. Create the private repo. Start one master markdown file.
Week 2: Authority. Install Ollama. Pull one model. Run the 5 reps. Complete one real task fully offline.
Week 3: Security. Run the Security Director Test on every installed AI tool. Remove or rebuild anything that fails. Write your agent spending policy.
Week 4: Economics. Run the token budget worksheet. Move one workload each to cheap-model, batch, and local. Calendar a monthly 10-minute review.
Then keep two habits forever: update the master markdown file after every session, and export important conversations as you go.
The final checklist
[ ] Ownership audit scored
[ ] Chat Archive installed, top 5 conversations exported (JSON + markdown)
[ ] Private GitHub repo created, archive backed up
[ ] Master markdown file live for your most active project
[ ] Ollama installed, one model pulled, one task done offline
[ ] One mid-conversation model switch performed
[ ] Security Director Test run on every installed tool
[ ] Agent spending policy written, transaction log created
[ ] Token budget worksheet done, ratio known
[ ] First archive-mining prompt run (after 90 days of archiving)
Ten boxes. Thirty days. Full custody of your AI work.
My challenge to you: check the first two boxes today. Export one conversation that matters and put it where you control it. Then come back and tell me what it felt like to hold your own data for the first time.
I hope this saves you the two days I once lost to a sync death loop. Learn from my pain ;)
Stay curious.
Coach K GTM AI Academy

