Learning Pipeline

Passive consumption → Searchable knowledge

"I built an external brain because context switching was destroying me. I have ADHD. Context switching is my nemesis. Every interruption = 30-60 minutes rebuilding mental state."
Watch and forget → transcribe, embed, retrieve.
"My best teachers have often been YouTube tutorials, reviews and demos. I found myself spending more than five hours a day watching videos at 2-3x speed."

External memory for minds that can't hold it all. The cognitive prosthetic for video consumption.

Why This Exists

The Hyperfocus Learning Pattern

"It's just my normal pattern—getting excited, getting hyperfocused. It's novelty and dopamine, not getting down to executive function."

Problem: Absorb massive amounts of content during hyperfocus sessions. Lose it all when context switches.

Solution: Make every video searchable forever. Never lose what you learned.

The Real Numbers

31,832

Videos tracked

15,456

Transcribed

6,152

Channels

4,142

Hours watched

The Problem

The Solution

Local-first learning infrastructure. Capture → Transcribe → Structure → Store → Retrieve. Zero API costs. Searchable knowledge base from video consumption.

CAPTUREDownload video/audio from any sourceyt-dlp, browser extensions
TRANSCRIBEConvert audio to searchable textWhisper, Parakeet (local ML)
STRUCTUREExtract topics, timestamps, key pointsLLM processing
STORESearchable database with embeddingsSupabase, vector DB
RETRIEVEQuery across all consumed contentSemantic search

Research Questions

Content Categories

The pipeline reveals learning patterns across three distinct domains:

AI & Tech

• 1littlecoder (1,256 videos)

• TwoMinutePapers (947)

• AI Daily Brief (792)

• Matt Williams, Theo

Tutorials, model releases, coding

Neurodivergence

• ADDitude Magazine (1,181)

• ADHD podcasts

• Autism resources

Understanding my own brain

Torah

• Vayimaen (823)

• Living Lchaim (781)

• Ohr Somayach Q&A

Shiurim, hashkafa, halacha

Preliminary Data

Scale Achieved

31,832 videos tracked. 15,456 transcribed. 6,152 channels. 1,407 rewatched. Local ML transcription via Whisper/Parakeet—zero API costs.

Time Savings
"YouTube Pipeline Saved Me 4 Hours a Day"

"What did that tutorial say about X?" answered in seconds instead of scrubbing through video.

Consumption Patterns

Primary content: tutorials, tech reviews, lectures. Average watch speed: 2-3x. Peak consumption: late evening hyperfocus sessions.

Pipeline Architecture

YouTube/Podcast → yt-dlp → Audio file
                              ↓
                    Whisper/Parakeet (local)
                              ↓
                    Transcript + timestamps
                              ↓
                    LLM extraction (topics, summary)
                              ↓
                    Supabase + embeddings
                              ↓
                    Semantic search interface

Roadmap

Documentation

Part of the Cognitive Prosthetic

"Reimagine the way to consume content."

The YouTube pipeline is one piece of the larger external brain infrastructure:

Conversations

353K messages, 106K embedded

YouTube

32K videos, 15K transcripts

GitHub

132 repos, 1,427 commits

All queryable through the same brain-mcp interface. Semantic search across everything consumed.

Contribute

Share your own learning infrastructure, consumption data, or retention studies.

Open an issue →

Built with yt-dlp, Whisper, and local ML. · All tools → · How I think →