637633049206310
top of page
Search

We Built an AI Insight Agent to Cure Content Overload (and it runs locally)

Updated: Jun 28, 2025

There’s a quiet guilt that builds up in every ambitious PM’s day. It manifests through unread Substack issues, bookmarked interviews, and podcasts that “you really need to listen to.” But often, they remain unattended.


Not because the content lacks value—it's actually gold.


The problem lies in the packed schedule. You have back-to-back meetings, a half-written PRD, and perhaps a few stolen minutes at night before your brain checks out. Learning gets postponed repeatedly.


I’m no different. I follow absolute powerhouses in the product world—Akash Gupta, Claire Vo, Lenny Rachitsky, Peter Yang, Pawel Huryn, Shreyas Doshi, and many more. They drop insight-packed YouTube interviews, podcasts, and breakdowns faster than I can log in to Notion!


This Saturday morning, I found myself staring at 13 open YouTube tabs from all these creators. I had saved them for weeks—maybe months. Each promised insight bombs, yet none were watched. Like many PM aspirants or professionals trying to level up, I suffer from content overload paralysis.


From Overload to Output: The Spark


I turned to my co-builder, Summer (my AI sidekick), and said:


“What if we built something that listens for me, watches the content, extracts insights, and sends back a crisp, structured brief? Like a McKinsey consultant on Red Bull.”

No fancy interfaces. No friction. Just input → output. To her credit, Summer didn’t flinch. She replied, “Let’s build it.”


And just like that, we were off. A 48-hour rabbit hole began. The first working version was ready in under three hours. That’s what happens when you remove distractions and just build.


Thinking Like Product People First


Before we wrote a single line of code, we put on our PM hats. This wasn’t just a vibe coding session; it was a real product build. So we did what good PMs do: we specced.


Problem Statement


PMs and PM aspirants are overwhelmed with long-form content but lack the time to extract and retain the insights hidden within. There’s no scalable way to condense a 60-minute interview into structured takeaways.


User Personas


  • Prem – a PM aspirant juggling family and coursework, looking for summaries to stay sharp.

  • Sonal – a mid-level PM who wants to stay relevant but can’t sit through ten interviews weekly.

  • Abhinandan – a mentor curating PM content for her students, needing structured summaries at scale.


MVP Scope


  • Input: YouTube URL

  • Fetch video audio

  • Transcribe using OpenAI's Whisper

  • Summarize using GPT-4 into key themes

  • Output: branded `.docx` with headers, takeaways, and a "Learn More" ChatGPT prompt under each theme.



Envisioned Phases


  • v1.0 = CLI-based (command line interface, a.k.a. the terminal) local script

  • v1.1 = Add cost tracking, usage logging

  • v1.2 = Multi-video batch, Notion export


We even scoped a pricing model: cost-only mode and profit-margin mode, based on OpenAI API burn estimates.


We explored various ways to make this public:


  • API wrappers using FastAPI or Flask

  • Limited-access forms powered by Make.com

  • Slack bots or Notion auto-summarizers


Beyond the build, we envisioned platform potential.


The idea was simple: Kill content guilt and spark actual learning.


The Build Wasn’t Sexy — But It Worked


We began scrappy. My inner PM switched to a junior SWE, setting up Python libraries on my Mac. This is what the agent script flow looked like:



First, we got `yt-dlp` installed. It takes a YouTube URL as input, downloads a `.webm` file, and converts it into an `.mp3` audio file. Initially, it threw SSL errors on the Mac, but we fixed it in about 15 minutes.


Next, we turned to `Whisper` for transcription. It takes `.mp3` files as input and creates `.txt` transcript files as output. We installed a local instance of the latest Whisper model, and it was up and running shortly. After some initial hiccups with filenames containing emojis, we managed to get accurate transcripts.


Once we cracked that, we wired it into a Python script for summarizing the transcript, `summarizer.py`, and included `.env` support to secure our OpenAI key—a crucial lesson on the importance of `.env` handling in real-world Python scripting. The summarizer uses GPT-4 with a carefully crafted prompt, extracting ten key themes and summarizing each in approximately 200 words, including a “Learn More” prompt under each theme.


Then, we faced the formatting war.


We experimented with `fpdf`, but it stumbled on emojis. Likewise, `python-docx` failed with colored headers. We rewired everything to use justified alignment, brand red color, and added social footer links.



After several broken files and one near mid-burnout rant, the moment of glory arrived:


> “Done. Final output saved at: outputs/InsightDoc_xxxxxx.docx”


Victory!


Maiden Run: A Badge of Honor 🏅


Here’s what our very first real-world test run looked like:



That’s it. ₹6 worth of AI magic (less than a chai-sutta 🚬 ☕ break!). Completely reusable.


Here’s a screengrab of the full agent run:



Here’s how the full agent structure looked:



While I caught my breath, we didn’t stop. We pushed ourselves to think about how we would price it and limit usage.


Cost Estimation


  • Average run = ~25,000–30,000 tokens for a 60–90 minute podcast/video

  • GPT-4 cost: ~$0.15–$0.20 per run

  • Infra + storage buffer: $0.05

  • Total cost per insight: ~$0.25


Pricing Models


  1. Cost-only: ₹25–30/run

  2. Profit mode (10%): ₹30–35/run


We developed a usage logging script, `usage_logger.py`, to track:


  • Video link

  • Token count

  • Cost

  • File delivered


We considered Razorpay integration, quota limiting via Google Sheets, and even Make.com webhooks. As I sipped my café crème, this weekend project evolved into a tool with a pseudo business plan!


The Local LLM Question


The mentor in me was keeping a watchful eye. Naturally, we asked: what if we ditch the OpenAI API and run our own GPT locally? Can I let my students use this without incurring costs?


We explored models like Mixtral, LLaMA 3, and Mistral 7B. We looked into `llama-cpp`, `vllm`, and GGUF quantizations. The pros included:


  • ✅ Total control

  • ✅ No API cost

  • ✅ Full offline privacy


But there were drawbacks:


  • ❌ 13B models ≈ GPT-3.5 at best

  • ❌ 1 token/sec on CPU = slow

  • ❌ Requires GPUs for practical runs

  • ❌ Setup complexity for non-devs


For now, we’re sticking with OpenAI—but the fork is ready. This solution will go fully local when necessary.


What We Learned (a.k.a. Why You Should Do This Too)


  • You don’t need a platform to build a product. You need a problem and persistence.

  • You don’t need a UI to test value. We tested it on CLI and shipped a result.

  • You don’t need permission. You just need one solved pain.

  • `.env`
    > hardcoded keys. Never going back.

  • You can go from problem to working agent in three hours. No excuses.


We didn’t build for everyone; we built for ourselves. Consequently, we created something ten times more valuable than another theory-packed blog post.


Ready to Try the Agent?


Built for PMs who learn like builders. Want to test it? Reach out and DM me “INSIGHT” on LinkedIn.

We built it because we needed it.

You should build yours too.


 
 
 

1 Comment


Dolly Firake
Dolly Firake
May 27, 2025

Wow Arindam!! This is amazing and much needed.

Like
  • Instagram
  • LinkedIn
  • YouTube
  • Topmate
bottom of page