Learning Pipeline

Passive consumption → Searchable knowledge

"I built an external brain because context switching was destroying me. I have ADHD. Context switching is my nemesis. Every interruption = 30-60 minutes rebuilding mental state."

Watch and forget → transcribe, embed, retrieve.

"My best teachers have often been YouTube tutorials, reviews and demos. I found myself spending more than five hours a day watching videos at 2-3x speed."

External memory for minds that can't hold it all. The cognitive prosthetic for video consumption.

Why This Exists

The Hyperfocus Learning Pattern

"It's just my normal pattern—getting excited, getting hyperfocused. It's novelty and dopamine, not getting down to executive function."

Problem: Absorb massive amounts of content during hyperfocus sessions. Lose it all when context switches.

Solution: Make every video searchable forever. Never lose what you learned.

The Real Numbers

31,832

Videos tracked

15,456

Transcribed

6,152

Channels

4,142

Hours watched

The Problem

Video is inefficient. 1 hour video = 10 minutes reading. But videos have content text doesn't.
Consumed ≠ retained. Watching at 2x speed helps throughput but not recall.
No cross-reference. What did that tutorial say about X? Lost in watch history.
API costs. Cloud transcription is expensive at scale (10K+ videos).

The Solution

Local-first learning infrastructure. Capture → Transcribe → Structure → Store → Retrieve. Zero API costs. Searchable knowledge base from video consumption.

CAPTUREDownload video/audio from any sourceyt-dlp, browser extensions

TRANSCRIBEConvert audio to searchable textWhisper, Parakeet (local ML)

STRUCTUREExtract topics, timestamps, key pointsLLM processing

STORESearchable database with embeddingsSupabase, vector DB

RETRIEVEQuery across all consumed contentSemantic search

Research Questions

Retention: Does searchable transcription improve recall vs. passive watching?
Speed vs. depth: What's the optimal consumption speed for different content types?
Active retrieval: How often do people actually search their knowledge base?
Compression: Can AI summarization replace full consumption for some content?

Content Categories

The pipeline reveals learning patterns across three distinct domains:

AI & Tech

• 1littlecoder (1,256 videos)

• TwoMinutePapers (947)

• AI Daily Brief (792)

• Matt Williams, Theo

Tutorials, model releases, coding

Neurodivergence

• ADDitude Magazine (1,181)

• ADHD podcasts

• Autism resources

Understanding my own brain

Torah

• Vayimaen (823)

• Living Lchaim (781)

• Ohr Somayach Q&A

Shiurim, hashkafa, halacha

Preliminary Data

Scale Achieved

31,832 videos tracked. 15,456 transcribed. 6,152 channels. 1,407 rewatched. Local ML transcription via Whisper/Parakeet—zero API costs.

Time Savings

"YouTube Pipeline Saved Me 4 Hours a Day"

"What did that tutorial say about X?" answered in seconds instead of scrubbing through video.

Consumption Patterns

Primary content: tutorials, tech reviews, lectures. Average watch speed: 2-3x. Peak consumption: late evening hyperfocus sessions.

Pipeline Architecture

YouTube/Podcast → yt-dlp → Audio file
                              ↓
                    Whisper/Parakeet (local)
                              ↓
                    Transcript + timestamps
                              ↓
                    LLM extraction (topics, summary)
                              ↓
                    Supabase + embeddings
                              ↓
                    Semantic search interface

Roadmap

Build transcription pipeline
Process 10K+ videos
Semantic search interface
Retention study (before/after)
Speed optimization research
Open source pipeline

Documentation

Part of the Cognitive Prosthetic

"Reimagine the way to consume content."

The YouTube pipeline is one piece of the larger external brain infrastructure:

Conversations

353K messages, 106K embedded

YouTube

32K videos, 15K transcripts

GitHub

132 repos, 1,427 commits

All queryable through the same brain-mcp interface. Semantic search across everything consumed.

Contribute

Share your own learning infrastructure, consumption data, or retention studies.

Open an issue →

Built with yt-dlp, Whisper, and local ML. · All tools → · How I think →