We Built an AI Insight Agent to Cure Content Overload (and it runs locally)

Arindam Nath
May 27, 2025
5 min read

Updated: Jun 28, 2025

There’s a quiet guilt that builds up in every ambitious PM’s day. It manifests through unread Substack issues, bookmarked interviews, and podcasts that “you really need to listen to.” But often, they remain unattended.

Not because the content lacks value—it's actually gold.

The problem lies in the packed schedule. You have back-to-back meetings, a half-written PRD, and perhaps a few stolen minutes at night before your brain checks out. Learning gets postponed repeatedly.

I’m no different. I follow absolute powerhouses in the product world—Akash Gupta, Claire Vo, Lenny Rachitsky, Peter Yang, Pawel Huryn, Shreyas Doshi, and many more. They drop insight-packed YouTube interviews, podcasts, and breakdowns faster than I can log in to Notion!

This Saturday morning, I found myself staring at 13 open YouTube tabs from all these creators. I had saved them for weeks—maybe months. Each promised insight bombs, yet none were watched. Like many PM aspirants or professionals trying to level up, I suffer from content overload paralysis.

From Overload to Output: The Spark

I turned to my co-builder, Summer (my AI sidekick), and said:

“What if we built something that listens for me, watches the content, extracts insights, and sends back a crisp, structured brief? Like a McKinsey consultant on Red Bull.”

No fancy interfaces. No friction. Just input → output. To her credit, Summer didn’t flinch. She replied, “Let’s build it.”

And just like that, we were off. A 48-hour rabbit hole began. The first working version was ready in under three hours. That’s what happens when you remove distractions and just build.

Thinking Like Product People First

Before we wrote a single line of code, we put on our PM hats. This wasn’t just a vibe coding session; it was a real product build. So we did what good PMs do: we specced.

Problem Statement

PMs and PM aspirants are overwhelmed with long-form content but lack the time to extract and retain the insights hidden within. There’s no scalable way to condense a 60-minute interview into structured takeaways.

User Personas

Prem – a PM aspirant juggling family and coursework, looking for summaries to stay sharp.
Sonal – a mid-level PM who wants to stay relevant but can’t sit through ten interviews weekly.
Abhinandan – a mentor curating PM content for her students, needing structured summaries at scale.

MVP Scope

Input: YouTube URL
Fetch video audio
Transcribe using OpenAI's Whisper
Summarize using GPT-4 into key themes
Output: branded `.docx` with headers, takeaways, and a "Learn More" ChatGPT prompt under each theme.

Envisioned Phases

v1.0 = CLI-based (command line interface, a.k.a. the terminal) local script
v1.1 = Add cost tracking, usage logging
v1.2 = Multi-video batch, Notion export

We even scoped a pricing model: cost-only mode and profit-margin mode, based on OpenAI API burn estimates.

We explored various ways to make this public:

API wrappers using FastAPI or Flask
Limited-access forms powered by Make.com
Slack bots or Notion auto-summarizers

Beyond the build, we envisioned platform potential.

The idea was simple: Kill content guilt and spark actual learning.

The Build Wasn’t Sexy — But It Worked

We began scrappy. My inner PM switched to a junior SWE, setting up Python libraries on my Mac. This is what the agent script flow looked like:

First, we got `yt-dlp` installed. It takes a YouTube URL as input, downloads a `.webm` file, and converts it into an `.mp3` audio file. Initially, it threw SSL errors on the Mac, but we fixed it in about 15 minutes.

Next, we turned to `Whisper` for transcription. It takes `.mp3` files as input and creates `.txt` transcript files as output. We installed a local instance of the latest Whisper model, and it was up and running shortly. After some initial hiccups with filenames containing emojis, we managed to get accurate transcripts.

Once we cracked that, we wired it into a Python script for summarizing the transcript, `summarizer.py`, and included `.env` support to secure our OpenAI key—a crucial lesson on the importance of `.env` handling in real-world Python scripting. The summarizer uses GPT-4 with a carefully crafted prompt, extracting ten key themes and summarizing each in approximately 200 words, including a “Learn More” prompt under each theme.

Then, we faced the formatting war.

We experimented with `fpdf`, but it stumbled on emojis. Likewise, `python-docx` failed with colored headers. We rewired everything to use justified alignment, brand red color, and added social footer links.

After several broken files and one near mid-burnout rant, the moment of glory arrived:

> “Done. Final output saved at: outputs/InsightDoc_xxxxxx.docx”

Victory!

Maiden Run: A Badge of Honor 🏅

Here’s what our very first real-world test run looked like:

That’s it. ₹6 worth of AI magic (less than a chai-sutta 🚬 ☕ break!). Completely reusable.

Here’s a screengrab of the full agent run:

Here’s how the full agent structure looked:

While I caught my breath, we didn’t stop. We pushed ourselves to think about how we would price it and limit usage.

Cost Estimation

Average run = ~25,000–30,000 tokens for a 60–90 minute podcast/video
GPT-4 cost: ~$0.15–$0.20 per run
Infra + storage buffer: $0.05
Total cost per insight: ~$0.25

Pricing Models

Cost-only: ₹25–30/run
Profit mode (10%): ₹30–35/run

We developed a usage logging script, `usage_logger.py`, to track:

Video link
Token count
Cost
File delivered

We considered Razorpay integration, quota limiting via Google Sheets, and even Make.com webhooks. As I sipped my café crème, this weekend project evolved into a tool with a pseudo business plan!

The Local LLM Question

The mentor in me was keeping a watchful eye. Naturally, we asked: what if we ditch the OpenAI API and run our own GPT locally? Can I let my students use this without incurring costs?

We explored models like Mixtral, LLaMA 3, and Mistral 7B. We looked into `llama-cpp`, `vllm`, and GGUF quantizations. The pros included:

✅ Total control
✅ No API cost
✅ Full offline privacy

But there were drawbacks:

❌ 13B models ≈ GPT-3.5 at best
❌ 1 token/sec on CPU = slow
❌ Requires GPUs for practical runs
❌ Setup complexity for non-devs

For now, we’re sticking with OpenAI—but the fork is ready. This solution will go fully local when necessary.

What We Learned (a.k.a. Why You Should Do This Too)

You don’t need a platform to build a product. You need a problem and persistence.
You don’t need a UI to test value. We tested it on CLI and shipped a result.
You don’t need permission. You just need one solved pain.
`.env`
> hardcoded keys. Never going back.
You can go from problem to working agent in three hours. No excuses.

We didn’t build for everyone; we built for ourselves. Consequently, we created something ten times more valuable than another theory-packed blog post.