⌂ Home

THE BROADER THINKING

A perspective on AI in education · strong views, loosely held

Flight Lab is a single demo — a kids' physics experience where Claude vision grounds the feedback. This page is the larger thesis it sits inside of.

Education is where each generation learns what to expect from AI — the AI a six-year-old works alongside today shapes what she demands, tolerates, and governs about AI systems decades from now. Every other AI policy lever — alignment, regulation, governance — operates against a backdrop set in the formative years of this technology. Educational AI isn't one product category among many; it's foundational infrastructure for the human–AI relationship at civilizational scale.

Education isn't education at all if it isn't centered on human relationships and people's connection to society. The classroom, teacher, and peer group are not inefficiencies to personalize around — they are community infrastructure. An educational AI that atomizes learning erodes the one institution uniquely positioned to be the community foundation of a post-AGI world.

Underneath every theme below is a single commitment — AI in education should build agency, not dependency. Autonomy plus education compounds into agency: autonomy that knows what to do with itself. Structured friction is the method. Agency is what it's for.

None of this is certain. The space is genuinely emergent — the design questions are open, the evidence is partial, and the most interesting problems don't have settled answers yet. These are working beliefs, offered with conviction but held with open hands. That tension — strong perspective, genuine uncertainty — is what makes the problem worth working on.

Foundations

Designing for learner agency.

Every theme below is in service of one thesis. These are the four commitments I keep returning to — working positions, held with conviction and open to revision. The sub-posture underneath agency, not dependency.

  • 01Commitment

    Education is for second-order wants, not first-order impulses.

    Consumer AI optimizes for "I want the answer now." Education optimizes for "I want to be someone who can reason carefully" even when the two are in tension, which in a learning moment is most of the time. The product's job is to protect the reflective want from the impulsive one, especially when the learner can't yet see the difference.

  • 02Commitment

    The learner is always the primary actor in their own learning.

    AI is a co-actor in a specific interaction moment — not in the learning process. The failure mode isn't "AI takes over"; it's subtler: AI fluency and learner literacy are not the same thing. When AI's fluency substitutes for the learner's own noticing, struggling, and articulating, literacy erodes — invisibly, while every surface metric looks fine.

  • 03Commitment

    Pedagogical authority is borrowed, not inherent.

    A textbook is authoritative because five conditions hold: expert authorship, editorial review, institutional adoption, stability and inspectability, and bounded scope. AI inherits the same chain — no shortcuts. The inspectability leg carries more weight for AI than for textbooks: a book is inspectable by default; AI is inspectable only if built to be.

  • 04Commitment

    "Lasting agency" is measurable.

    Transfer, retention, epistemic behaviour, autonomous problem-solving — not completion rates, not engagement time, not self-report. The sharpest test: a learner more capable in the product's presence but less capable in its absence has failed the theme, regardless of every other metric. Discernment is the current-period proxy; agency is the underlying goal — and unlike discernment, it does not expire with any particular model threshold.

Thesis

Educational AI must leverage AI to demand rigor rather than reliance.

The method. Agency is the end it serves.

Themes

Seven themes that execute the Foundations.

Each theme is a commitment the Foundations makes operational — a category-level design posture, grounded in the learning-science literature. These are positions, not conclusions. Applying learning science to AI-native educational products is genuinely new work, and the most interesting parts of the design space are still open. Some themes show up concretely in Flight Lab; some are beyond what a single demo can evidence. Where the demo embodies a theme, there's a short note.

  1. 01Theme

    Ask, don't show.

    AI's greatest unforced error is giving users the exact answer they ask for. Educational AI should ask the questions that generate learning, not hand over the answers.

    • Epistemic prompts beat exposition. "How do you know?", "What would change your mind?", "What evidence would distinguish those?" — the moves that drove the largest learning gains in the inquiry-science meta-analysis.
    • Hint laddering over direct answers. Mistakes are essential, not failures to prevent. Premature help undermines motivation and produces cognitive debt.
    • Neither tell nor abandon. AI that tells short-circuits schema construction; AI that abandons the learner to open exploration overloads working memory. The defensible posture is the middle — scaffolded questioning with fading hints.

    The 50-year inquiry-learning literature converges on a single design principle: ask, don't show.

    → In Flight Lab: Beat 02 — the child commits to a hypothesis before Claude reveals anything. Beat 03 — review grounds in a photo of the plane the kid actually folded, and the question ties to a feature in their specific plane.

  2. 02Theme

    Scaffolded, not unbounded.

    Open-ended digital environments invite multitasking and the fiction of "digital natives." Education needs structure; AI's job is to provide it, not retreat from it.

    • Not an open search bar. No "do my homework" machine, no unstructured chat. The interaction should bound what the learner can ask and require active input before the system synthesizes anything.
    • Adaptive tutoring lowers load. Good AI tailors pace, format, and feedback — reducing extraneous cognitive load while preserving the germane load that produces learning.
    • The curvilinear PISA finding. Classrooms where inquiry is frequent but unstructured post lower science achievement than classrooms with moderate, teacher-guided inquiry. "Let students freely explore with the chatbot" maps directly onto this failure mode.

    The productive zone is moderate, structured, and guided — not maximum freedom.

    → In Flight Lab: A fixed five-beat structure — no chat box. Every interaction is a bounded choice or a photo-grounded review; the learner is never dropped into an empty text field.

  3. 03Theme

    Supplement, not replacement.

    AI excels at cognitive tutoring but fails at the biological, social, and relational foundations of learning. The product has to protect the conditions it can't create.

    • No encroachment on foundations. Sleep, recess, physical handwriting, student-teacher and family relationships — each carries outsized effects on learning that AI cannot substitute for. Time a product takes from these is debt, not surplus.
    • Never a wedge. Student-teacher relationships are the single largest instructional multiplier in the literature. AI must act as a relationship facilitator — surfacing when a human is needed — not a replacement for the human.
    • Structural nudges where they help. Monitor attention decay, suggest microbreaks, enforce focus-mode shells, ignore debunked pseudo-science like "learning styles." Restructure the environment rather than lecture the learner.

    The classroom, teacher, and peer group are community infrastructure — not inefficiencies to personalize around. The product protects the conditions of its own goal.

    → In Flight Lab: Assumed unit is kid + caregiver, not kid alone. The session ends pointing the child back at physical paper — off-screen, with their hands, not another app.

  4. 04Theme

    Teachers are the point.

    AI's most durable impact is giving teachers back the time from admin so they can do what only humans do well — build relationships with students. Everything else flows from there.

    • Teacher Copilot for differentiation, leveling, rubric generation, IEP drafting, and baseline grading — offload the administrative middle-management that eats planning time.
    • Interaction Rehearsal Simulator with AI student avatars for classroom-management PD and difficult-conversation practice — because rehearsal with coaching outperforms theory.
    • Teacher–Student Relationship Radar that surfaces which student needs a human right now. The anti-wedge feature — AI as a relationship facilitator, not a substitute.

    The goal of saving a teacher's time is to reallocate it back to the single highest-effect-size lever in education: trusting teacher-student relationships.

  5. 05Theme

    Trust is the priority feature.

    Most conversations about safety in edtech start and end with COPPA/FERPA. That's the wrong starting point.

    • Trust across the ecosystem. Students who trust the system ask the questions they're afraid to ask elsewhere. Teachers who trust it surface struggling learners. Families who trust it consent to richer support. Trust is the substrate on which every other pedagogical opportunity compounds.
    • Trust breaks asymmetrically. A single breach or creepy interaction doesn't trigger a fine — it severs the relationship the product exists to strengthen. Once broken, the trust of a teacher or a 15-year-old is not recovered by a patch release.
    • Architect for it. Treat privacy as the product's most important feature, not a regulatory hurdle. The most sensitive moments — a student's frustration, a teacher's candid reflection, a parent's concern — must be handled with the same care a trusted adult would bring.

    The real competition isn't another AI platform — it's non-adoption. Districts face pressure from parents and boards who may simply say no. Trust isn't a feature that beats a competitor; it's the precondition for AI being welcomed into classrooms at all.

  6. 06Theme

    Purpose-built architecture for education.

    The market's answer is a frontier LLM with fences — RLHF guardrails bolted onto a model trained on the open web. It trades safety for depth, or depth for safety. Neither feels like the right answer.

    • Reject the false dichotomy. Train a mid-sized Medium Language Model (~10–70B) from the ground up on curated pedagogical corpora: textbooks, peer-reviewed learning science, tutoring transcripts, standards frameworks. Not a retrofit of a generalist model trained on the open web.
    • Local-first architecture. An on-device SLM gatekeeper handles PII redaction, egress review, and the most sensitive categories (self-harm signals, IEP content, counselor disclosures) 100% locally. The MLM runs in a district VPC, privacy-preserving enclave, or at the edge — never a general-purpose cloud LLM.
    • The Truth Layer. Authority-ranked RAG across verified curriculum, learner context, and style/delivery, with a Natural Language Inference check that validates factual claims against the Factual Core before the learner sees them.
    • Cost and latency at scale. Frontier API pricing applied across millions of classroom interactions per day is economically unviable without massive subsidy. And the 2–3 second latency that's tolerable for a knowledge worker breaks a 6-year-old's attention mid-thought. A purpose-built model deployed closer to the student addresses both.

    Two honest claims this does NOT buy: tokenizing names doesn't make queries anonymous (educational prompts leak too much context), and RAG doesn't eliminate hallucination (it substantially reduces parametric-memory fabrication; models still drift within retrieved context). The honest framing is minimization, bounded egress, and substantially-reduced-and-bounded — not de-identification and not elimination.

  7. 07Theme

    Pro-human interface design.

    The interface has to champion the human learner on academic efficacy and non-academic well-being at once. How AI presents itself is as critical as how it reasons.

    • Intrinsic motivation over dopamine loops. No points, badges, or streaks. Engagement has to come from the learner's fascination with the subject, not cheap behavioural hooks that generate attention decay and dependency.
    • Strict non-personification. AI as a sophisticated tool — not a friend, an entity, or an emotional companion. Young learners should never form attachments to algorithms; the interface must make the tool-ness legible.
    • Beyond the screen. Multimodal and physical integration where possible: voice, vision of physical artifacts, handwriting over typing for early grades. A learning product shouldn't default to a glowing rectangle and a keyboard.
    • The invisible AI. The product centers the learning, not the technology. The best educational tools disappear into the background; the novelty of AI capability is never the focal point.

    World-class products are deeply loved by users without ever pretending to be human.

    → In Flight Lab: No mascot, no badges, one calm voice that belongs to the lab and not a character. The plane flying is the reward. Physics takes the stage in Beat 04 — not the model.

The longer arc

The AGI horizon: from competence to eudaimonia.

The themes above are calibrated to the critical period — a window in which AI is powerful but fallible, discernment is the operational proxy for agency, and the economic stakes of skill-building are immediate. But the underlying logic does not expire with the critical period. It deepens — and at this point it becomes more speculative: following an argument further than the evidence currently reaches, offered in that spirit.

The philosophical tradition this vision is recovering — against the industrial-era reduction of education to productive competence — is Aristotelian eudaimonia: flourishing understood not as pleasure or output, but as the actualization of human potential in community. In a world where safe, benevolent AGI resolves material scarcity, the question shifts from “what do we produce?” to “how do we live?” The growth instinct shifts from quantitative to qualitative: identity, belonging, the development of others, societal fulfillment.

Paideia — education, in the Greek frame — was not a life phase that prepared you for the polis. Education was life. It was continuous participation in the life of the polis itself — not separate from community, but constitutive of it. The institution of learning and the community it sustained were not two things.

The civilizational implication is sharp. School and work are currently the two dominant community-forming institutions in modern life — both economically compelled. Friendships formed in school are deeper and more durable than those formed at work, because school provides proximity and repetition without an instrumental agenda in the relationship itself. In a post-AGI world where economic compulsion no longer drives either school attendance or work participation, the question becomes: what institution has the structural properties to generate community at scale, without requiring economic pressure or shared belief?

Education — reconceived as continuous, community-embedded practice rather than an age-gated credential phase — is the strongest candidate. It carries intrinsic value: development, identity, belonging. Its growth is non-zero-sum. And it already has the Aristotelian structure: paideia was always the mechanism by which the polis reproduced itself and developed its members.

An educational AI that atomizes learning — me and my AI tutor, asynchronous, individualized, no classroom, no peers, no teacher — is not just pedagogically suboptimal. It actively erodes the one institution uniquely positioned to be the community foundation of the post-AGI world. The classroom, the teacher, the peer group are the community infrastructure. A product that dissolves them in pursuit of personalization is solving the wrong problem at civilizational cost.

This is why Themes 3 and 4 — protecting human relationships and centering teachers — are not conservative design choices. They are the long-horizon commitments, the ones whose stakes become visible only when you follow the AGI frame all the way through.

What I haven't done

Honesty about the gap.

The demo is real. The rest of this page is a category thesis. I think it's worth more to say that clearly than to imply otherwise.

  • This is a thesis, not a pilot.

    I haven't shipped this thesis in a district. Flight Lab validates the interaction model and vision pipeline at the scale of a single kid working on a single concept; everything else here is reasoned from the learning-science literature and category observation.

  • The purpose-built model is aspirational.

    Theme 06 argues for a mid-sized, pedagogically-aligned model trained on curated corpora. That's a multi-year program, not a weekend project. I can argue it should exist; I can't argue I've built it.

  • Architecture and go-to-market aren't ready for public defense.

    Earlier drafts of this page had a full-stack architecture diagram and a phased commercial rollout. I pulled both — I can gesture at the shapes, but the rigor isn't there yet. I'd rather leave them off than ship claims I can't defend.

  • Behavioural efficacy is unproven.

    How 4–8 year olds actually respond to contingent, photo-grounded prompts, voice-paced goal-setting, and prediction capture is an empirical question. The demo demonstrates feasibility across all three; it does not demonstrate learning gains.

  • The AGI scenario doesn't kill the thesis — it reveals it was understated.

    The weakest leg of the bet was whether discernment bottlenecks durable performance when models become very capable. Honest concession: in a true AGI world, discernment-as-output-evaluation is moot. But that proves the bet was stated at the wrong level. Discernment is the current-period proxy for agency. Agency is the underlying goal — and it becomes more urgent, not less, as model capability increases.

Back to the demo

See where the thesis meets the ground.