I’ve been using AI in my Tools in Data Science course for over two years - to teach AI, and using AI to teach.
I told GitHub Copilot (prompt) to go through my transcripts, blog posts, code, and things I learned since 2024 to list my every experiment in AI education, rating it on importance and novelty.
Here is the full list of my experiments.
1. Teach using exams and prompts, not content
- ⭐ Use exams to teach. The typical student is busy. They want grades, not learning. They’ll write the exams, but not read the content. So, I moved the course material into the questions. If they can answer the question, great. Skip the content.
- Use AI to generate the content. I used to write content. Then I linked to the best content online – it’s better than mine. Now, AI drafts comics, interactive explainers, and simulators. My job is to pick good topics and generate in good formats.
- Give them prompts directly. Skip the content! I generated them with prompts anyway. Give students the prompts directly. They can use better AI models, revise the prompts, and learn how to learn with AI.
- ⭐ Add an “Ask AI” button. Make it easy for students to use ChatGPT. Stop pretending that real-world problem solving is closed-book and solo.
- ⭐ Make test cases teach, not just grade. Automate the testing (with code or AI). Good test cases show students the kind of mistake they may - teaching them, not just grading them. That’s great for teachers to analyze, too.
- Test first, then teach from the mistakes. Let them solve problems first. Then teach them, focusing on what failed. AI does the work; humans handle what AI can’t. This lets us teach really useful skills based on real mistakes.
2. Make cheating pointless through design, not detection
- ⭐ Allow copying, collaboration, and hacking. In real work, nobody gets bonus points for working alone or re-inventing the wheel. Collaboration, using available resources well, verifying inputs, disclosed shortcuts – all are rewarded.
- Reward originality without punishing collaboration. Blanket anti-copying rules assume that all similarity is bad. A more AI-native approach is to allow learning from others openly, but give extra credit for genuine variation, initiative, and novel improvement.
- ⭐ Give each student a unique variant. If everyone sees the same problem with the same visible answer path, answer-sharing becomes the dominant strategy. Deterministic but unique variants shift the game from leaking answers to actually solving the problem.
- Make process logs part of the evidence. When outputs can be copied or AI-generated, the trace becomes more valuable than the final artifact. Logs, verification notes, session recordings, and agent traces show whether the student can actually orchestrate the work.
- Use repo-grounded vivas for authenticity. If you really want to know whether a student owns their project, ask them questions drawn from their own repo and make them change something live. That is much harder to fake than polished submitted output.
- Use structural similarity, not string matching. Strip docstrings, tokenize, MinHash. Students who rename variables are still caught; students who genuinely collaborated produce detectable clusters rather than suspicious pairs.
3. Test skills that matter in an AI world
- ⭐ Teach what AI still cannot do well. Syntax and routine execution are declining in value. Judgment, debugging, orchestration, validation, integration, and taste are rising. The curriculum should move upward, not cling to the parts AI is already eating.
- Use hard, messy problems to build real resilience. Some questions should be intentionally tricky, partly wrong, hidden in the UI, or out of syllabus. The students who find and solve them anyway are demonstrating exactly the adaptability that real work demands. Smooth progression alone doesn’t build that.
- Test live, hands-on AI skills. Don’t just lecture about embeddings, vision, structured outputs, or hallucinations. Put students in live API-driven tasks where they have to use these things under time pressure and genuine uncertainty.
- Grade students on designing AI workflows. In many real settings, the important skill is not “give the answer” but “design the chain of steps that gets to the answer reliably.” That includes tools, prompts, datasets, quality checks, fallbacks, and output formats.
- Use game-like tasks to teach agentic work. Mazes, escape rooms, and API games force state tracking, exploration strategy, and backtracking — exactly the behaviors agentic systems require. They’re not gimmicks; they’re the syllabus.
- Test prompt attacks and defenses. Security and adversarial literacy should not be abstract topics. Make students jailbreak, defend, manipulate, and harden model behavior. That turns “prompt security” from a lecture topic into a measurable skill.
4. Make assessment more like real work
- Grade richer work, not just one-line answers. Real work is often multimodal: images, stories, APIs, analyses, and dashboards. If assessment automation cannot handle those, it will keep pushing education toward fake neatness.
- Grade the spec, not the code. When AI writes the code, the real artifact is the machine-readable brief: goal, constraints, done-when, counter-examples, eval suite. That is often where the actual thinking lives anyway.
- Count real open-source contributions as coursework. A merged PR to a public repo is harder, messier, and more educational than most sealed academic assignments. It teaches scoping, etiquette, usefulness, and real external standards.
- Let students build virtual TAs from real course material. The project is educational, and the output becomes infrastructure for the next cohort. Good assignments should create assets, not just submissions.
- Reward originality structurally, not just rhetorically. Most courses praise creativity but grade only correctness. Use embeddings to measure cohort-level similarity and explicitly reward meaningfully distinct outputs. Originality becomes real, not decorative.
5. Use AI to build the course, not just teach inside it
- Let AI write, test, and fix draft questions. The interesting move is not just “AI drafts items.” It is “AI drafts, runs, breaks, revises, and improves them before any student sees them.” That dramatically changes how fast a course can evolve.
- Use coding agents to test the exam before students do. If an agent solves a question instantly, you should ask what the question is actually measuring. Agents become both QA tools and mirrors for curricular relevance.
- Use AI-generated comics to explain why the question exists. Students often resist tasks they do not understand. A comic can smuggle in the pedagogical point with very low friction and high memorability.
- Use interactive explainers for unfamiliar concepts. AI can generate not just text answers but visual, animated, intuitive explanations. That makes concept onboarding faster for both students and new faculty.
- Keep teacher adoption in familiar formats. A good innovation that slots into slides, handouts, OCR flows, and short feedback loops will beat a brilliant system nobody can actually use next semester.
6. Build the infrastructure for AI in education
- ⭐ Break rubrics into binary sub-criteria; reason before judging. Open-ended project grading becomes more auditable when you decompose it into binary yes/no criteria and ask the model to explain its reasoning before delivering a verdict. High or suspicious scores get re-evaluated with stronger guardrails.
- Give every student shared, budgeted AI access. If AI access depends on personal subscriptions, the institution is quietly grading wealth, not skill. Shared governed access makes AI a course capability, not a private advantage.
- Let AI handle routine support; keep humans for judgment. AI handles repetitive, searchable, first-pass questions. Humans handle ambiguity, reassurance, escalation, and final accountability. Neither alone is the right model at scale.
- Turn recurring answers into canonical Q&A cards. Once the same question appears three times, it should stop living in somebody’s head or an old thread. Convert it into a canonical artifact that both humans and bots can cite consistently.
- Govern with green/amber/red review levels. Not every decision needs the same scrutiny. Auto-ship the low-risk, spot-check the medium-risk, always human-review the high-stakes. This is how you scale without losing trust.
- Roll out in shadow mode first. High-stakes academic workflows should not be launched with fingers crossed. Run the AI system quietly in parallel with human judgment and learn before turning it loose.
- Turn policy into executable checks. A policy that cannot be operationalized at scale is mostly theater. If you can translate rules and rubrics into machine-checkable form, governance becomes consistent rather than person-dependent.
- Make the course publicly inspectable. Openness raises the bar. It invites scrutiny, reuse, criticism, and improvement, and it turns the course into a visible institutional experiment rather than a sealed classroom.
- Use reasoning models only for the borderline cases. Cheap screening first, expensive verification for the high-stakes or suspicious. Increasing reasoning effort on even a small model can flip an evaluator from sloppy to reliable — the cost curve makes this the natural operating model.
7. Analyze and research learning exhaust
- Track which AI tools students actually use. Once AI use is instrumented, you stop guessing. You can see which models students choose, when they ask for help, and what behavior actually correlates with success.
- Redesign exams from behavior data, not intuition. Model choice, timing, retry patterns, and deadline behavior all reveal how students really work. That should feed back into question design, support strategy, and pacing.
- Analyze broken code before it compiles. Novices often fail at syntax long before you reach the real misconception. Structural parsing of broken code lets you give feedback on thought process instead of just rejecting the submission.
- Use code traces to surface hidden misconceptions. Timestamped traces reveal overfitting, thrashing, structural confusion, and missing invariants — patterns that polished final submissions hide entirely.
- Turn replay galleries into faculty-readable stories. Raw logs do not change policy. Narrated replays and plain-language error-pattern reports do. The point is to make evidence legible to decision-makers, not just analysts.
- Break problem-solving into coachable steps. “Weak student” is too vague to be useful. Better to ask: did they fail at reading the givens, choosing a strategy, surviving a multi-select trap, or knowing when to cut losses? Each is a trainable failure mode.
- Study bias in peer review itself. If peer assessment matters, reviewer quality matters too. You can detect generous, timid, extreme, and calibrated graders from the data, then moderate or train accordingly.
- Treat learning analytics as a reusable research programme, not a one-off dashboard. The infrastructure for tracking misconceptions, prerequisite transfer, and course-to-course movement can be built once and reused across cohorts. That turns isolated AI experiments into institute-level knowledge and publishable educational research.
8. Upgrade the human role
- Make judgment and taste explicit learning goals. AI makes average output cheap. The premium moves to selecting what is worth doing, recognizing quality, and knowing what to reject. That is a teachable skill, not a vague aspiration.
- Teach directional feedback as a skill. You do not always need to micromanage AI with detailed corrections. The higher-order skill is to say “more concrete,” “less jargon,” “optimize for faculty adoption,” or “make this defensible.” That is learnable and more effective than micromanaging.
- Teach faculty to manage agents, not just chat with them. Institutional AI does not scale on prompting alone. People need to learn specs, budgets, kill switches, and escalation rules — orchestration literacy, not just chatbot familiarity.
- Use AI as a personalized coach. The model is not just an answer engine. It can become a research guide, curiosity amplifier, and next-step recommender tailored to the individual learner’s gaps and goals.
- Let non-coders build interactive learning tools. AI lowers the cost of making timelines, maps, biographies, and interactive explainers. That opens AI-native pedagogy far beyond computer science into humanities and social sciences.
- Teach students to run many AI attempts in parallel. One of the biggest AI-native workflow shifts is from single-path effort to portfolio thinking — run several attempts, compare them, and converge faster. That is a teachable habit, not an obvious default.