Mastermind recap

AIMM Session — July 17, 2025: Don's LinkedIn Machine and the Legal AI Demo

· AIMM 2025 · 90 min

Facilitators: Lou D'Alo

“You can have an application like this running with Groq in half a day — and all of your people are going to have access to your stuff for like 5, 10 bucks a month. Your cost.” — Lou, demoing his live legal AI app

30-Second Summary

Lou demoed a working legal AI application — RAG, multi-tenant user management, slash-command prompts, hybrid search, and sub-3ms inference — and then explained how the whole thing costs less per month than a tank of gas. Meanwhile, the group traded notes on Kimi’s surprisingly un-AI-like writing style, the AI video arms race (HeyGen, Cling, Runway, and a dozen others), a killer prompt engineering breakthrough involving investor role-play, and a content workflow that’s getting real LinkedIn engagement without turning your voice into a chatbot.

Topic 1: Kimi Writes Like a Human. We’re Still Figuring Out Why.

Kimi (from Moonshot AI, also accessible via Groq’s model playground). Jamie W: “It gets me to a reasonable result very quickly where I don’t have to go back and change a lot of stuff.” Lou put it up against Claude and found something different: “There’s something about Kimi that just feels somehow more personal, more natural.” His hot take: “The only reason AI uses em dashes is because it’s in the training data — which means a lot of people must be using them.” Lou’s reminder: “Every day there’s a major announcement. It’s tempting to transfer everything to the next best AI — but then it comes right back around to the one you were using.”

Topic 2: The AI Video Arms Race Is Getting Uncomfortably Real

Current landscape: Runway, HeyGen, Synthesia, Fal (with LoRA), Veo 3, Cling. HeyGen avatars trained on multiple camera angles now sync expression, movement, and tone in a way that even a critical eye struggles to catch. Dark side Lou flagged: AI “influencer” accounts with tens of millions of followers and no disclosure.

Topic 3: Role-Play Your Prompts — The Investor Persona That Changed Everything

Dirk tried “Act like a Big 4 analyst” — better, but not good enough. Then, out of frustration: “Just behave like an investor.” The model shifted. Suddenly framing the analysis as if someone who had hired Bain & Company to do due diligence was reading an investor dashboard. Five companies in, immediately identifying the blind spots of CEOs.

Lou unpacked why the role flip worked:

“When you give it a role, you’re removing a whole bunch of options for it to return. You’re narrowing the amount of space in which it’s looking.”

Dirk also found: he’d been assuming more prompt = more precision. He studied what the best prompt engineers were actually doing — and their prompts were short. He cut his to one-third of its original length and got better output.

Topic 4: Don’s LinkedIn Machine — AI as Draft Engine, Human as Editor

The stack:

  1. Builds ideal client psychographic profile in ChatGPT — 11 breakout areas, expert panel recruited
  2. From the profile, generates 10 content pillars
  3. Picks one pillar, generates 6 months of topic ideas
  4. For each topic: ChatGPT generates an outline → Don edits the outline → ChatGPT writes a draft → three versions come out (conservative / middle-of-the-road / edgy stop-the-scroll)
  5. He prints the draft. Picks up a pen. Rewrites it in his own voice.

Lou’s suggestion: take five pairs of (AI draft → final edited version), put them in Claude, and ask it to build a voice profile. Feed new drafts through that profile and cut 45 minutes of editing per article.

What it does:

  • RAG database seeded with construction law (Bruner & O’Connor) and engineering specifications
  • Slash commands for common prompts (e.g., /contract-review)
  • Hybrid search (BM25 metadata + dense embeddings) with re-ranking
  • Multi-tenant user management
  • Per-user conversation history, notes, and response ratings for future reinforcement learning
  • Side-by-side model comparison (local vs. Groq-hosted inference)

The architecture:

  • Frontend: Open Web UI (open source, Docker-based, runs in a $5-6/month Digital Ocean droplet)
  • Inference: Groq API — same open-source models running at 400–850 tokens/second vs. 13-20 locally
  • Data privacy: RAG database and conversation history stay local. Only inference context is sent to Groq — ephemerally, then gone.

The economics: Third-party alternative priced at $22,000–$35,000/year. Self-hosted version: development time + ~$5-10/month in cloud costs. Contract value: $35-40K to build and deliver. Market rate: $100K+.

Groq Deep Dive: What Is an LPU?

Groq built the Language Processing Unit (LPU) from the ground up — not repurposed GPUs — designed specifically for sequential token-prediction. The result: 400–850 tokens per second. Time to first token: ~2ms. Why often free? Groq is trying to drive LPU adoption.