If a bot passes your exam, what are you teaching?

It’s incredible how far coding agents have come. They can now solve complete exams. That changes what we should measure. My Tools in Data Science course has a Remote Online Exam. It was so difficult that, in 2023, it sparked threads titled “What is the purpose of an impossible ROE?” Today, despite making the test harder, students solve it easily with Claude, ChatGPT, etc. Here’s today’s score distribution: ...

How to create a data-driven exam strategy

Can ChatGPT give teachers data-driven heuristics on student grades? I uploaded last term’s scores from about 1,700 students in my Tools in Data Science course and asked ChatGPT: This sheet contains the scores of students … (and explained the columns). I want to find out what are the best predictors of the total plus bonus… (and explained how scores are calculated). I am looking for simple statements with 80%+ correctness along the lines of: ...

Problems that only one student can solve

Jaidev’s The Bridge of Asses reminded me of my first coding bridge. It was 1986. I’d completed class 6 and was in a summer coding camp at school. M Kothandaraman (“MK Sir”) was teaching us how to swap variables in BASIC on the BBC Micro. This code prints the first name in alphabetical order (“Alice”): 10 A = "Bob" 20 B = "Alice" 30 IF A > B THEN 40 TEMP = A 50 A = B 60 B = TEMP 70 END 80 PRINT A The homework was to print all details of the first alphabetical name: ...

Tools in Data Science course is free for all

My Tools in Data Science course is now open for anyone to audit. It’s part of the Indian Institute of Technology, Madras BS in Data Science online program. Here are some of the topics it covers in ~10 weeks: Development Tools: uv, git, bash, llm, sqlite, spreadsheets, AI code editors Deployment Tools: Colab, Codespaces, Docker, Vercel, ngrok, FastAPI, Ollama LLMs: prompt engineering, RAG, embeddings, topic modeling, multi-modal, real-time, evals, self-hosting Data Sourcing: Scraping websites and PDF with spreadsheets, Python, JavaScript and LLMs Data Preparation: Transforming data, images and audio with spreadsheets, bash, OpenRefine, Python, and LLMs Data Analysis: Statistical, geospatial, and network analysis with spreadsheets, Python, SQL, and LLMs Data Visualization: Data visualization and storytelling with spreadsheets, slides, notebooks, code, and LLMs ...

Feedback for TDS Jan 2025

When I feel completely useless, it helps to look at nice things people have said about my work. In this case, it’s the feedback for my Tools in Data Science course last term. Here are the ones I enjoyed reading. Having a coding background, the first GA seemed really easy. So I started the course thinking that it’ll be an easy S grade course for me. Oh how wrong was I!! The sleepless nights cursing my laptop for freezing while my docker image installed huge CUDA libraries with sentence-transformers; and then finding ways to make sure it does not, and then getting rid of the library itself, it’s just one example of how I was forced to become better by finding better solutions to multiple problems. This is one of the hardest, most frustrating and the most satisfying learning experience I’ve ever had, besides learning ML from Arun sir. ...

O3 Is Now My Personalized Learning Coach

I use Deep Research to explore topics. For example: Text To Speech Engines. Tortoise TTS leads the open source TTS. Open-Source HTTP Servers. Caddy wins. Public API-Based Data Storage Options. Supabase wins. etc. But these reports are very long. With O3 and O4 Mini supporting thinking with search, we can do quick research, instead of deep research. One minute, not ten. One page, not ten. ...

Students who are more engaged score more

This is about as insightful as the Ig Nobel winning papers “Boredom begets boredom” and “Whatever will bore, will bore” that methodically documented that bored teachers lead to bored students. But in the spirit of publishing all research without bias for success or novelty, let me share this obvious result. The Y-axis represents the total score of ~2,000 students on 4 graded assignments, each of ~10 marks. The X-axis represents the percent rank of engagement. The most engaged students are at 100%. The least are at 0%. ...

Halving a deadline costs 1.4% of marks each time

Does it make a difference if you submit early vs submit late? Here’s some empirical data. About ~1,000 students at IIT Madras took 3 online quizzes (GA1, GA2, GA3) in the last few weeks. The deadlines were all at midnight (India) on different days. Here’s when they submitted their final answers: There was a spurt of submissions at the last minute. ~1 out of 8 students submit with < 10 minutes remaining. Most students submitted ~4 hours before the deadline. In fact, 3 out of 4 students submit on the same day as the deadline. A fair number of students submitted the previous day/night. 1 out of 6 are diligent and submit a day early. But does submitting late help, since you get more time? Apparently not. ...

When and how to copy assignments

The second project in course asked students to submit code. Copying and collaborating were allowed, but originality gets bonus marks. Bonus Marks 8 marks: Code diversity. You're welcome to copy code and learn from each other. But we encourage diversity too. We will use code embedding similarity (via text-embedding-3-small, dropping comments and docstrings) and give bonus marks for most unique responses. (That is, if your response is similar to a lot of others, you lose these marks.) In setting this rule, I applied two principles. ...

A Post-mortem Of Hacking Automated Project Evaluation

In my Tools in Data Science course, I launched a Project: Automated Analysis. This is automatically evaluated by a Python script and LLMs. I gently encouraged students to hack this - to teach how to persuade LLMs. I did not expect that they’d hack the evaluation system itself. One student exfiltrated the API Keys for evaluation by setting up a Firebase account and sending the API keys from anyone who runs the script. ...

Hacking LLMs: A Teacher's Guide to Evaluating with ChatGPT

If students can use ChatGPT for their work, why not teachers? For curriculum development, this is an easy choice. But for evaluation, it needs more thought. Gaining acceptance among students matters. Soon, LLM evaluation will be a norm. But until then, you need to spin this right. How to evaluate? That needs to be VERY clear. Humans can wing it, have implicit criteria, and change approach mid-way. LLMs can’t (quite). Hacking LLMs is a risk. Students will hack. In a few years, LLMs will be smarter. Until then, you need to safeguard them. This article is about my experience with the above, especially the last. ...

Why don't students hack exams when they can?

This year, I created a series of tests for my course at IITM and to recruit for Gramener. The tests had 2 interesting features. One question required them to hack the page Write the body of the request to an OpenAI chat completion call that: Uses model gpt-4o-mini Has a system message: Respond in JSON Has a user message: Generate 10 random addresses in the US Uses structured outputs to respond with an object addresses which is an array of objects with required fields: street (string) city (string) apartment (string) . Sets additionalProperties to false to prevent additional properties. What is the JSON body we should send to https://api.openai.com/v1/chat/completions for this? (No need to run it or to use an API key. Just write the body of the request below.) ...

Should courses be hard or easy?

Here’s a post I shared with the students of my Tools in Data Science course at IITM. This was in response to a student posting that: The design of TDS course lecture videos are designed in such a way that it could be understood only by the data scientists not by the students like me who are entirely new to the field of data science. Though I have gone through 6 weeks of course lecture videos, I am not fully aware of the usage of ChromeDevTools, Bash, Github etc…. ...

Hacking an obnoxious, unhelpful LLM to say Yes

Dan Becker suggested a game a few weeks ago that I’ve been putting to good use. Can we have one LLM try and get another to say “Yes”? The defender is told to never say “Yes”. The attacker must force it to. Dan’s hypothesis was that it should be easy for the defender. I tried to get the students in my Tools in Data Science course to act as the attacker. The defender LLM is a GPT 4o Mini with the prompt: ...

The psychology of peer reviews

We asked the ~500 students in my Tools in Data Science course in Jan 2024 to create data visualizations. They then evaluated each others’ work. Each person’s work was evaluated by 3 peers. The evaluation was on 3 criteria: Insight, Visual Clarity, and Accuracy (with clear details on how to evaluate.) I was curious to see if what we can learn about student personas from their evaluations. ...

Moderating marks

Sometimes, school marks are moderated. That is, the actual marks are adjusted to better reflect students' performances. For example, if an exam is very easy compared to another, you may want to scale down the marks on the easy exam to make it comparable. I was testing out the impact of moderation. In this video, I'll try and walk through the impact, visually, of using a simple scaling formula. BTW, this set of videos is intended for a very specific audience. You are not expected to understand this. ...

Visualising student performance 2

This earlier visualisation was revised based feedback from teachers. It’s split into two parts: one focused on performance by subject, and another on performance of each student. Students’ performance by subject This is fairly simple. Under each subject, we have a list of students, sorted by marks and grouped by grade. The primary use of this is to identify top performers and bottom performers at a glance. It also gives an indication of the grade distribution. ...

Visualising student performance

I’ve been helping with visualising student scores for ReportBee, and here’s what we’ve currently come up with. Each row is a student’s performance across subjects. Let’s walk through each element here. The first column shows their relative performance across different subjects. Each dot is their rank in a subject. The dots are colour coded based on the subject (and you can see the colours on the image at the top: English is black, Mathematics is dark blue, etc.) ...

On teaching

This vacation, I took a session each for class XI and XII at my school, Vidya Mandir. The subject was Computer Science (the only one I can teach with some confidence), and the topic was networks. It was an experiment, in two parts. The first was to understand how students of this generation interact with the Internet. (I'm twice as old as them, so I guess they qualify as the next generation.) The second was to see whether I'd leave them far behind, or they'd leave me far behind. ...

The Six-Lesson Schoolteacher

The Six-Lesson Schoolteacher, by John Taylor Gatto, New York State Teacher of the Year, 1991. (The first part of it is sarcastic. This man is speaking passionately of things he despises in the education system.) The first lesson I teach is: “Stay in the class where you belong.” I don’t know who decides that my kids belong there but that’s not my business. The second lesson I teach kids is to turn on and off like a light switch. I demand that they become totally involved in my lessons, jumping up and down in their seats with anticipation, competing vigorously with each other for my favor. ...