September 12, 2025

My Tools in Data Science course uses LLMs for assessments. We use LLMs to Suggest project ideas (I pick), e.g. https://chatgpt.com/share/6741d870-73f4-800c-a741-af127d20eec7 Draft the project brief (we edit), e.g. https://docs.google.com/document/d/1VgtVtypnVyPWiXied5q0_CcAt3zufOdFwIhvDDCmPXk/edit Propose scoring rubrics (we tweak), e.g. https://chatgpt.com/share/68b8eef6-60ec-800c-8b10-cfff1a571590 Score code against the rubric (we test), e.g. https://github.com/sanand0/tds-evals/blob/5cfabf09c21c2884623e0774eae9a01db212c76a/llm-browser-agent/process_submissions.py Analyze the results (we refine), e.g. https://chatgpt.com/share/68b8f962-16a4-800c-84ff-fb9e3f0c779a This changed our assessments process. It’s easier and better. Earlier, TAs took 2 weeks to evaluate 500 code submissions. In the example above, it took 2 hours. Quality held up: LLMs match my judgement as closely as TAs do but run fast and at scale. ...