GDPval is a benchmark that compares how well AI does (vs experts without AI) on useful real-world tasks.

In several areas, the agents outperform experts.

For example, AI beats personal financial advisors, but not accountants and auditors. So I used ChatGPT / Claude to decide where to invest, but am having an accountant file my taxes. That’s a high leverage activity, especially since I might not have hired a personal financial advisor by default, and ChatGPT is certainly better than me (I’m not an expert) at personal financial advice.

Financial management is just one aspect of life. There are several. I don’t hire professionals in many areas where I’m not an expert, so hiring AI agents here is almost a no-brainer.

  • Doctor: Symptom triage, report analysis, drug check, health planning.
  • Financial advisor: Budgeting, investing, tax saving.
  • Lawyer: Contract review (rental, employment, …), disputes (claims, employment, property), planning (will, power of attorney), legal compliance.
  • Real-estate broker: Property search, lease negotiation, regulatory compliance.
  • Editor: Presentations, documents, emails, code.
  • Investigator: Client, vendor, partner, competitor, consultant, product.
  • Teacher: Skill development, test prep, concept learning, project guidance.
  • Counselor: Mental health, relationship, career, life coaching.

But these are the more obvious ones. I had Claude list what AI agents can be “hired” of some of the less obvious but high-leverage “hires” and what I could ask them:

  • Philosopher.
    • Why does this feel uncomfortable?
    • What do these choices reveal about my implicit values?
    • AI may cut jobs but also improve lives. Help me resolve this.
    • Play devil’s advocate from three ethical frameworks.
  • Historian.
    • Generate structured questions to ask parents/elders.
    • Here’s a transcript. Extract themes and suggest follow-up questions.
    • Connect this family story to its broader historical context.
    • Turn these fragmented stories into a coherent narrative.
  • Relationship architect.
    • Brief me on my history with this person before I meet them.
    • I haven’t contacted X in months. Draft a warm reconnection message.
    • Who in my network could help me reach [goal]? What’s the warmest path?
    • What patterns do you notice in the relationships I find energizing?
  • Scenario planner.
    • Create three 2030 scenarios for [domain]. Stress-test my current strategy against each.
    • What robust moves work across all scenarios?
    • What early warning indicators should I watch for?
    • I’m choosing between X and Y. What would I need to believe for each to be correct?
  • Epistemologist.
    • What are my load-bearing beliefs? Which haven’t I stress-tested recently?
    • Steelman the strongest case against my view on [topic].
    • Here’s a prediction I made. Track it. What’s my calibration like?
    • Where might I have galaxy-brained myself into an unusual position?
  • Diplomat.
    • I’m meeting [context]. What unwritten expectations might I miss?
    • Review this email. Am I being appropriately [direct/indirect] for this culture?
    • What mistakes do Indians typically make in [context]?
    • The meeting felt off. Here’s what happened. What did I miss?
  • Taste curator.
    • I liked [X, Y, Z] but not [A, B]. What does this reveal about my aesthetic?
    • What’s adjacent to my current taste that would stretch me without losing me?
    • I want to develop taste in [domain]. What’s the learning path? What do I experience first?
    • My taste feels derivative. What would make it more authentically mine?
    • Recommend [books/films/music] that would give me vocabulary for [aesthetic/emotion/idea].
    • I’m designing [space/event/gift]. What references should I draw from given my taste?
    • My taste in [domain] is developed. My taste in [other domain] is naive. Bridge them.
  • Rhetorician.
    • Analyze this transcript. What’s my default argumentation style? Its blind spots?
    • I need to convince [skeptical audience] of X. What’s the optimal structure?
    • Help me turn this observation into a memorable, quotable framework.
    • Steelman their likely objections and give me responses.
  • Archivist.
    • What have I previously thought about [topic]?
    • This new idea connects to something—find the link in my past work.
    • I’ve written about X and Z separately. Synthesize them.
    • What gaps exist in my thinking on [domain]?
  • Liturgist.
    • Design a family ritual for [transition] that fits our values.
    • Create a weekly reflection practice for our family.
    • We lost [person]. Design a remembrance practice that feels genuine.
    • I want to mark [milestone] meaningfully, not performatively. How?
  • Activist.
    • I care about [issue]. Map the power structure. Who actually decides?
    • What’s the smallest intervention with the largest leverage on this system?
    • Who are unlikely allies? What would make opponents neutral?
    • I have [resources/reach]. What’s my highest-impact move?
    • Craft a narrative frame that makes [change] feel inevitable, not radical.
    • What’s the history of successful change in similar domains? What worked?
  • Intelligence agent.
    • What weak signals should I monitor for [risk/opportunity]?
    • Here’s what [competitor/market] did this quarter. What does it reveal about their strategy?
    • What am I not seeing because of my position? Where are my blind spots?
    • This seems like noise. Is there a pattern I’m missing?
    • Red team this: How could [scenario] hurt me? What would I not see coming?
    • Verify this claim. What would make it false? What’s the source quality?
  • Bodyguard.
    • Audit my digital footprint. What’s publicly available that shouldn’t be?
    • I’m traveling to [place]. What’s the threat profile? What precautions matter?
    • Review this [email/message/request]. Is this social engineering?
    • What’s my current weakest security link—physical, digital, financial?
    • Someone determined wants to harm me. What’s their easiest path? How do I close it?
    • Design a security protocol for [context: travel/home/data] that I’ll actually follow.
  • Lab assistant.
    • I want to test whether [intervention] affects [outcome]. Design an N=1 experiment.
    • Here’s two weeks of data. What’s the signal? What’s noise?
    • I think X causes Y in my life. What confounders should I control for?
    • This experiment failed. Was it the hypothesis or the method?
    • What’s the minimum viable test before I commit to [major change]?
    • I’ve tried [interventions]. Synthesize: what actually works for me?

AI Advisory Cabinet