Tools in Data Science has a remote online exam (ROE). It has a tough reputation. We conducted one today.

Here’s how today’s ROE unfolded.

The TAs had created 13 questions and shared it with me yesterday. This morning, I tried solving them.

At first glance, it looked scarily hard! But I just jumpted down a few questions, and found that five questions were trivial, i.e. I just used the “Ask AI” button to copy the question into ChatGPT and it gave me the answer.

For another four, I needed to just make sure I uploaded the files or HTML:

Two questions confused ChatGPT a bit, and it needed some nudges. There is real learning here.

One was pretty hard and ChatGPT struggled with it.

  • 🔴 02. Korean Speech Dataset API Validation (5.0) actually requires work. At first, ChatGPT refused ethically. When reframed, it tried, but the human-in-the-loop (me) was too slow. So I used Codex, which literally hacks towards the solution! It searched online for existing solutions, read the GitHub discussions for this topic, found my browser tab and started testing itself, … and solved it in 10.5 min!

This leaves the one question that AI can’t solve:

  • 🔴 01. Collaborative Token Exchange (5.0) asks you to collect “tokens” from other people and share it. I asked Codex to hack it, but after an hour (of logging into my personal account, my IITM account, even my father’s account, exhausting my token limits, and totally psyching me), it declared the question unhackable.

Based on this, the instructors, teaching assistants and I decided that:

We deployed the exam at 12:00 pm IST, an hour before the scheduled time.

The hackers (e.g. who scan the source code, or change their system clocks) could see the questions earliy and started sharing and solving them.

The TAs said, “Anand, shall we add more questions to make it tougher?”

I said, “No, it’s OK. The ROE has built a reputation for difficulty. Let that change. Let them have an easy exam.”

“If they’re going to split this in groups and have coding agents solve it, they’ll score full marks in 10-15 minutes,” I said to myself.


When the exam ended, this was the score distribution.

There were several surprises here. Firstly, 9 questions were repeats. Yet, barring one question, they scored lower, though they appeared in equally tough ROEs in the past.

# Question % Previous % Previous exam
14 FastAPI Time Series Caching 8% 39% 2025 Sep ROE
15 AI Video Attendee Extraction 12% 64% 2026 Jan GA3
11 🟢 Cross-entity disambiguation 13% 64% 2026 Jan GA4
7 🟢 Region Containing Point 15% 38% 2025 Sep ROE
9 🟠 Refactor Python Code with VS Code 17% 57% 2026 Jan GA1
13 🟢 Record Terminal Session with asciinema 21% 66% 2026 Jan GA1
10 🟢 Fix Broken JSON File 30% 64% 2026 Jan GA1
12 🟡 Simple Question 41% 64% 2026 Jan GA1
8 🟠 Reorganize Files with Shell Commands 53% 38% 2026 Jan GA1

Just as surprisingly, they scored higher than these on 3 of the 5 new questions:

One new question ended up being almost the hardest question - despite it being one-shot-table for ChatGPT (GPT 5.4, extended thinking).

The toughest, though, had a 1% success rate. Though Codex could solve it in 10 min, it’s a genuinely hard question.

Which leaves us with the collaboration question - the one that AI can’t solve.

This question is a whole new dynamic altogether. There were about 5 clear clusters of students, ranging from 5 - 35 students, who were collaborating. They were trading bundles of tokens between themselves. There were a few “super-collaborators” who were doing the bulk lifting. But even with this, the largest correct submission had 84 tokens. The strongest submission was a 51-token submission that 6 students submitted.

(I need to study this far more!)

Yet, far smaller than even the original 100 token target I had set. Clearly, they aren’t collaborating enough.


It’s surprising how little students were using the “Ask AI” button.

  • 🔴 ~100 students didn’t use it at all.
  • 🟠 ~100 students clicked on it JUST once.
  • 🟡 ~100 students used it just 2-5 times. For 15 questions, that’s clearly low.
  • 🟢 ~100 students used it 6-20 times. That’s OK
  • 🔵 ~15 students used it 20+ times. (1 clicked on it 45 times. Clearly loves AI.)

Finally, most students saved their results for the first time just before the deadline.

The problem is that their system clocks were off, so they got a “late submission” error.

BTW, some students used a timing trick for hacking. By setting their system clock late, they can see the exam questions before release. But the submissions are checked only against the server clock. So, waiting until the last minute to save is a terrible idea. That hurt some students.


Based on this, here’s what I learnt:

Pressure makes a difference. In past exams, with similar time pressure, students solved the same questions much better. I think they panic-ed on the first two questions. To be fair, so did I, when I saw them. That’s why I started solving from the bottom.

LESSON 1: Scan end-to-end. Solve quick-wins (high impact, low effort) problems first.

We have no clue what’s easy or tough. When different students are using different tools, what’s easy for ChatGPT might be hard for Claude and vice versa. Without knowing tool capabilities and usage, this is hard to assess.

LESSON 2: With AI, no one knows what’s easy or hard. Try for yourself.

They aren’t using AI enough. Our advice is to use the “Ask AI” button every time. Half the students barely used it once.

LESSON 3: Use AI first. Focus on what AI can’t do well

They aren’t collaborating enough. The collaboration question was designed to encourage collaboration. Yet, the largest bundle of tokens shared was 84, far smaller than the 500 token target.

LESSON 4: Make friends with classmates. Work together. It helps: now, and in the future.


That’s worth repeating:

  1. Scan end-to-end. Solve quick-wins (high impact, low effort) problems first.
  2. With AI, no one knows what’s easy or hard. Try for yourself.
  3. Use AI first. Focus on what AI can’t do well.
  4. Make friends with classmates. Work together. It helps: now, and in the future.