Mastermind recap
AIMM Session — July 17, 2025: Don's LinkedIn Machine and the Legal AI Demo
“You can have an application like this running with Groq in half a day — and all of your people are going to have access to your stuff for like 5, 10 bucks a month. Your cost.” — Lou, demoing his live legal AI app
30-Second Summary
Lou demoed a working legal AI application — RAG, multi-tenant user management, slash-command prompts, hybrid search, and sub-3ms inference — and then explained how the whole thing costs less per month than a tank of gas. Meanwhile, the group traded notes on Kimi’s surprisingly un-AI-like writing style, the AI video arms race (HeyGen, Cling, Runway, and a dozen others), a killer prompt engineering breakthrough involving investor role-play, and a content workflow that’s getting real LinkedIn engagement without turning your voice into a chatbot.
Topic 1: Kimi Writes Like a Human. We’re Still Figuring Out Why.
Kimi (from Moonshot AI, also accessible via Groq’s model playground). Jamie W: “It gets me to a reasonable result very quickly where I don’t have to go back and change a lot of stuff.” Lou put it up against Claude and found something different: “There’s something about Kimi that just feels somehow more personal, more natural.” His hot take: “The only reason AI uses em dashes is because it’s in the training data — which means a lot of people must be using them.” Lou’s reminder: “Every day there’s a major announcement. It’s tempting to transfer everything to the next best AI — but then it comes right back around to the one you were using.”
Topic 2: The AI Video Arms Race Is Getting Uncomfortably Real
Current landscape: Runway, HeyGen, Synthesia, Fal (with LoRA), Veo 3, Cling. HeyGen avatars trained on multiple camera angles now sync expression, movement, and tone in a way that even a critical eye struggles to catch. Dark side Lou flagged: AI “influencer” accounts with tens of millions of followers and no disclosure.
Topic 3: Role-Play Your Prompts — The Investor Persona That Changed Everything
Dirk tried “Act like a Big 4 analyst” — better, but not good enough. Then, out of frustration: “Just behave like an investor.” The model shifted. Suddenly framing the analysis as if someone who had hired Bain & Company to do due diligence was reading an investor dashboard. Five companies in, immediately identifying the blind spots of CEOs.
Lou unpacked why the role flip worked:
“When you give it a role, you’re removing a whole bunch of options for it to return. You’re narrowing the amount of space in which it’s looking.”
Dirk also found: he’d been assuming more prompt = more precision. He studied what the best prompt engineers were actually doing — and their prompts were short. He cut his to one-third of its original length and got better output.
Topic 4: Don’s LinkedIn Machine — AI as Draft Engine, Human as Editor
The stack:
- Builds ideal client psychographic profile in ChatGPT — 11 breakout areas, expert panel recruited
- From the profile, generates 10 content pillars
- Picks one pillar, generates 6 months of topic ideas
- For each topic: ChatGPT generates an outline → Don edits the outline → ChatGPT writes a draft → three versions come out (conservative / middle-of-the-road / edgy stop-the-scroll)
- He prints the draft. Picks up a pen. Rewrites it in his own voice.
Lou’s suggestion: take five pairs of (AI draft → final edited version), put them in Claude, and ask it to build a voice profile. Feed new drafts through that profile and cut 45 minutes of editing per article.
Topic 5: The Legal AI App Demo — This Is What “Half a Day” Can Build
What it does:
- RAG database seeded with construction law (Bruner & O’Connor) and engineering specifications
- Slash commands for common prompts (e.g.,
/contract-review) - Hybrid search (BM25 metadata + dense embeddings) with re-ranking
- Multi-tenant user management
- Per-user conversation history, notes, and response ratings for future reinforcement learning
- Side-by-side model comparison (local vs. Groq-hosted inference)
The architecture:
- Frontend: Open Web UI (open source, Docker-based, runs in a $5-6/month Digital Ocean droplet)
- Inference: Groq API — same open-source models running at 400–850 tokens/second vs. 13-20 locally
- Data privacy: RAG database and conversation history stay local. Only inference context is sent to Groq — ephemerally, then gone.
The economics: Third-party alternative priced at $22,000–$35,000/year. Self-hosted version: development time + ~$5-10/month in cloud costs. Contract value: $35-40K to build and deliver. Market rate: $100K+.
Groq Deep Dive: What Is an LPU?
Groq built the Language Processing Unit (LPU) from the ground up — not repurposed GPUs — designed specifically for sequential token-prediction. The result: 400–850 tokens per second. Time to first token: ~2ms. Why often free? Groq is trying to drive LPU adoption.