Data

What does Gramener ask ChatGPT?

I looked at how Gramener uses ChatGPT Plus by evaluating 600+ chats asked over 3 months from Oct 2023 to Jan 2024.

The team asks 6 questions a day. We don’t track who or how many actively use ChatGPT Plus. This also excludes personal ChatGPT accounts. Still, 6/day is low for an entire team put together.

The questions fall into 8 categories.

Category%
Excel, data exploration & analysis25%
Text extraction and summarization13%
HTML, CSS, or JavaScript code13%
Python code13%
LLMs, AI and use cases9%
OCR and image analysis9%
Generate images, logos, and designs7%
General knowledge, policy & environment5%
Audio and translation5%

Here are some questions from each category – to give you an idea of emergent ChatGPT Plus usage.

Excel, data exploration & analysis (25%)

  • Excel clean and merge. There are 2 worksheets in this excel with data, can you clean up the data and merge the data in both the sheets
  • Excel CO2 Data Analysis. You are an expert Data Analyst who is capable of extracting insights out of data. Analyze this sheet and let me know the findings
  • Excel Chi-Square Analysis Guide. how to perform chi square analysis in excel
  • Log Data Insights & KPIs. Looking at the columns from this excel, what kind of insights are possible, what are key KPIs to be looked at

Text extraction and summarization (13%)

  • Complaint Investigation Summary. The following is the summary of an internal investigation for a customer complaint. Now this internal summary is to be paraphrased (in 3-4 lines) as part of a closure
  • Extracting Tables from RTF. Can you write a script to extract the tables from this document
  • Extracting Entities from Text. [{'word1': '(P)', 'nearest_word1': 'P/N:', 'nearest_word2': '0150-25034', 'nearest_word3': 'CARTIRIDGE'}, {'word1': 'P/N:', 'nearest_word1': '(P)', 'nearest_word2': '015...
  • Extract PDF Font Details. Extract text formatting information from this document. Especially find font styles, families and sizes.

HTML, CSS, or JavaScript code (13%)

  • HTML/CSS Chart Template. Give me HTML, CSS and chart code for this design.
  • CSS Font Stack: Explanation. Explain this CSS font convention: Arial, Helvetica, Segoe UI, sans-serif
  • Checkbox Validation with JavaScript. In HTML form, I have a set of checkboxes. How do I write the form so that at least one of them being checked is mandatory?
  • Prevent Text Wrapping CSS. <span class="text">Chief Communications Officer</span> I need CSS such the text inside should not wrap create new line
  • ReactJS App with Routing. Give me developed version using ReactJS use react router for sidebar section navigation to the pages use Tailwind css for styling. Use styled components for conditional …

Python code (13%)

  • Python Code Documentation Guide. Can you generate documentation for a project code written in python?
  • Linux Commands for Python. Give me list of linux commands to work on python coding
  • Code explanation request. What’s this code about? …
  • FastAPI Async Testing. Write a fastapi code and a python client to test the asynchronous nature of the fastapi package.
  • Streamlit App for Translation. Given the following python code, give me a simple streamlit app that takes file upload and converts that into a target language: …

An interesting sub-topic was interview question generation.

  • Python Decorator for Database Queries. Create one medium level question for Decorators in python Industryy usecase specific with solution

LLM, AI and use cases (9%)

  • LLMs for Data “What Ifs”. You are an LLM Expert. Can you tell me how can we leverage LLM for implementing What IF scenarios on Data?
  • LLMs: Current Challenges & Concerns. what are current challenges with LLMs
  • LLM Applications in Marketing. Show LLM applications for the marketing function of a music company.
  • Gen AI usage. What industries are using Gen AI the most
  • Best LLMs in 2023. Search the internet for the most recent LLMs and list the best LLMs in terms of performance
  • Best Image Classification Models. suggest best models to tell what there in the image

OCR and image analysis (9%)

  • Browser history OCR. This is a screenshot of my browser history. Convert that to text. Categorize these into common topics.
  • Extracted C Code. This image contains C code. Extract it.
  • Image text extraction and annotation. Extract the text from this image and annotate the boundaries of the text
  • Detecting Document Image Orientation. oreientation detection of documnet image
  • AI Project with OpenCV & YOLO. Consider yourself as Open CV and Yolo expert and help me with AI project
  • Image Correction Techniques. what are the approaches we have in computer vision where my image is tilted or rotated in reverse or image is not in readable format

Generate images, logos, and designs (7%)

  • Google Chacha and ChatGPT Bhatija. Generate an image of Google Chacha and ChatGPT Bhatija
  • Regenerative Systems Group Image. Generate an Image with below context > “A group of people interested in Regenerative systems. The focus is on reusing food, energy and mental health”
  • Twitter Reply Icons Design. Give me three icons: icon16.png, icon48.png, icon128.png for an extension that I’m building that suggests replies to tweets
  • Generate flowcharts. Make a flowchart of the underlying working of a web app. Here’s how it works. 1. The user uploads a document – a PDF or an image. They then select the language that …
  • Create Animated GIF from Photos. I have 4 photos I want to make an animated gif out of them. How can i do that?
  • Climate Impact Illustration. An illustration showcasing the impact of climate change on daily life, focusing on a rural setting near the coast. In the foreground, a small farm is visibly struggling, …

General knowledge, policy & environment (5%)

  • Design Thinking Overview. What is Design thinking
  • Arthashastra. What can Arthashastra teach us about modern politics?
  • Community Impact on Habits. Is there research to suggest the impact of community on habit building?
  • Focus at Age 28. What should a 28 year old focus on?
  • Superconductors. Explain superconductors like I’m five years old.
  • Climate Career: Impactful Choices. You a career counsellor at a University campus. You want to create 4 to 5 talking points for students to consider a career in Climate space.
  • Sustainability Division Vision. I run a software outsourced product development company. I want to start a new division that focuses on sustainability services offerings. Please draft a vision…

Audio and translation (5%)

  • Audio Timestamp Mapping. timestamp mapping for transcribed audio
  • Transcribe Lengthy Audio: Segment. Transcribe this audio file.
  • Traducción del MOU al Español. Translate this document to Spanish, and create a new translated document. Maintain text formatting.
  • Telugu Transcription into Hindi. Transcribe the following telugu text into hindi. You are supposed to transcribe, not translate. శ్రీనివాస పూజావిధానము …
  • GPT lacks native audio support. Does gpt support audio in audio out natively?

Learning to speak better

Microsoft ported its PowerPoint Speaker Coach to Teams. Since September, it’s given me suggestions covering 11 hours in 77 calls (I speak ~10 min/call.)

I say “uhh” a lot. That’s intentional

I use the filler word “uhh” in 70% of my calls. That did not surprise me. I do that intentionally.

  1. On a poor network, they know I’m still connected
  2. They know I’m going to say something
  3. I sound less confident. That invites critique I can learn from

But I also use filler words like “You know” and “I mean” in half the calls, and “like”, “actually”, and “basically” in a fifth. That’s NOT intentional, and I’ll be conscious.

Filler words% of calls# / call
uhh70%3.6
You know48%2.4
I mean43%2
like22%1.4
actually19%1
basically18%1.2
anyway14%1.1
hmm16%1.1
umm9%1.4
ah4%1.3

I say “maybe” a lot. That’s surprising

What did surprise me was “maybe“. I use it every fourth call, but when I do, I say “maybe” ten times per call. That’s a lot of maybe!

Sometimes, I say maybe because I’m communicating uncertainty.

Maybe we’ll have 20-30% success rate…

So and I had to switch 3 laptops or maybe 4.

… then she said, “OK, maybe it’s some other Sam”

Sometimes I’m proposing tentatively.

… one of the reasons why I’m nudging towards that is maybe a large reuse initiative is high return,

We can even put this in as part of the project by maybe offering it to different teams…

Maybe by having dedicated support…

Maybe I’ll drop off. Bye

But sometimes, it’s testable hypotheses.

Uh, maybe I’m getting the names wrong, but I think it was Socrates…

Maybe it’s me, but yeah, I guess…

You know, maybe it’s because I don’t store any of my stuff in…

One of my year’s goals is to run 50 experiments. I’d been doing well until April, and then fizzled out. Partly motivation. Partly a lack of testable hypotheses.

And now, in October, I discovered that I literally speak out one testable hypothesis every call — roughly every 10 minutes I speak! I’m amazed at how blind I’ve been, and how easy it can be to find experiments to test. I guess I need more of a scientific mindset. (Or just plain curiosity.)

The next time I say, “maybe” (or see it in my transcript), I’ll write it down as a hypothesis to test.

Repetitive words cluster

Another discovery was: I tend to pick a phrase and use it repeatedly in calls. For example, I said “let’s say” twelve times in just one call of 15 minutes. I said “main” 20 times over 2 calls of 8 minutes each. I said “cool” 7 times in an 11-minute call.

Repetitive word# calls# / call
lets say112
main210
also18
only27.5
correct77.4
in terms of17
alright36.3
that is36
cool25

Clearly it’s something to watch out for. But maybe repetition of words isn’t a bad thing if it’s not the same phrase repeated across calls? (There! I said “maybe”. Let me find out!)

Modulate the pace

In a third of my calls, I need to speed up. In a third of my calls, I need to slow down. (On some calls, I need to do both!)

Clearly, I need to vary my pace a lot more, consciously. It’s not that I talk fast or slow. I do both. But I get stuck in one mode of speaking for too long.

Takeaways

I used to think I was a pretty good speaker. That’s not a bad thought, but it can blind me to feedback and improvements. There’s no end to learning how to speak. Speaker Coach is a great “in-your-face” feedback mechanism. I hope Microsoft adds more features to it.

But what I’m going to do now is:

  1. Every time I say “maybe”, write down an experiment
  2. Speed up and slow down more in calls
  3. Watch for words I use repeatedly

Old songs in my music library

My music library has around 1,000 songs (mostly Tamil and Hindi, with some Telugu and English film songs).

I spent this morning tagging them by year with mp3tag. (Manually. You don’t automate the pleasures of life.)

I thought my 1990s collection would be the largest. I was in college, listening to lots of music then. But surprisingly, my collection has grown post the 1990s.

I have 3 guesses why.

  1. Recency bias. I re-built this collection recently. Maybe I forgot older songs?
  2. Digitization bias. Maybe I listened to more songs as the cost of transmission/storage fell?
  3. Worsening standards. Maybe I used to be choosier about music?

Though I’m not sure of the above, there’s another interesting anomaly.

There is a spike in the 1960s.

I don’t need to guess this one. I know why. Those are the songs my parents liked. I grew up hearing them.

The oldest song Tamil song is from Thiruneelakantar (1939). It’s from my father’s collection. I’ve heard it often enough to still enjoy it.

The oldest Hindi song is from Jaal (1952). He has a fondness for Dev Anand’s songs. So do I. This one is a beauty.

The oldest Tamil song my mother introduced me to is from Parasakthi (1952). She used to dance to this song when young.

The earliest Hindi song she introduced me to was from Jhanak Jhanak Payal Baaje (1955). It’s the song I grew up on, and it’s still among my favorites. What a melody!


My wife prefers newer songs. But I have low standards and few preferences. It makes my life rather happy.

So, in celebration of Make Music Day on 21 June, I’m treating myself to 2 weeks of my collection from the 1960s!

PS: My full collection is at https://gist.github.com/sanand0/877637165b17239aa27beac03749c9a6

How to find a Chinese actor to cast in Hollywood

Film actors mostly act within their own industry.

For example, Hollywood actors act outside Hollywood just 10% of the time. Chinese actors act with non-Chinese actors just 1% of the time.

So, if you’re a Hollywood producer trying to cast a Chinese actor, how would you find them?

One way is to list Chinese actors with the largest number of Hollywood co-stars. Let’s see who tops that list.

#5. Pei-Pei Cheng

You may know her as Jade Fox, the sly governess in Ang Lee’s Crouching Tiger, Hidden Dragon (2000), or Golden Swallow, the skilled swordsman sister in Come Drink With Me (1966), or even as the voice of the matchmaker who disgraces Mulan in Mulan (2020).

She mainly acts in Chinese films, co-starring nearly 180 times with actors like Hua Yueh, Lieh Lo, and Chung-Hsin Huang. But she’s also co-starred over 20 times with Hollywood actors like Jamie King (of Sin City), Peter Bowles (of The Bank Job), and Sandra Oh (of Grey’s Anatomy).

#4. Jet Li

You may know him as Han Sing, the martial artist and ex-cop in Romeo Must Die (2000), or Gabe Law, the former MultiVerse Authority agent in The One (2001), or Yin Yang, the unarmed member of The Expendables (2010).

He has co-starred over 100 times with Chinese actors like Jackie Chan, Simon Yam, and Sammo Kam-Bo Hung. But he’s also co-starred 30 times with Hollywood actors like Antonio Banderas, Morgan Freeman, and Sylvester Stallone.

#3. Joan Chen

She’s famous as Wanrong, the Chinese empress in The Last Emperor (1987), Josie Packard, the owner of the Twin Peaks mill in Twin Peaks (1989), or Dr Ilsa Hayden, assistant to the villain Rico Dredd in Judge Dredd (1995).

She’s co-starred over 80 times with Chinese actors like Tony Chiu-Wai Leung, Leon Lai, and Tony Ka Fai Leung. But she’s co-starred over 40 times with Hollywood actors like Michael Caine, Peter O’Toole, and Christopher Walken.

#2. Jackie Chan

The most famous Chinese martial arts actor in the world, and one of the highest-paid actors in the world, is famous as Detective Inspector Lee in Rush Hour (1998), Mr Han in The Karate Kid (2010), and the voice of Monkey in Kung Fu Panda (2008).

He has co-starred nearly 200 times with Chinese actors like Sammo Kam-Bo Hung, Maggie Cheung, and Kent Cheng. But he’s co-starred over 50 times with Hollywood actors like Arnold Schwarzenegger, Owen Wilson, and Chris Tucker.

#1. Michelle Yeoh

You may know her as Wai Lin, the Chinese spy and James Bond’s ally in Tomorrow Never Dies (1997), Yu Shu Lien, the warrior swordswoman in Crouching Tiger, Hidden Dragon (2000), or as Eleanor Young, the domineering mother-in-law in Crazy Rich Asians (2018).

She’s an actress at the borderline of the Chinese – Hollywood clusters. She’s acted ~60 times with Chinese actors like Maggie Cheung, Chow Yun-Fat and Jet Li. But she’s acted almost as many times with Hollywood actors like Sigourney Weaver, Zoe Saldana and Sam Worthington.

More actors

Here are half a dozen more Chinese actors that have acted with Hollywood actors often.

Chow Yun-Fat
Donnie Yen
Andy Lau
Simon Yam
Gong Li
Josie Ho

It’s interesting to see that 3 of the top 6 (Chow Yun-Fat, Pei-Pei Cheng, and Michelle Yeoh) had all acted in the blockbuster Crouching Tiger, Hidden Dragon (2000).

So, perhaps the simple message to our Hollywood producer is to “look no further than the cast of the first foreign-language film to break the $100mn mark in the USA.”

How isolated is Bollywood from world cinema?

These are the major group actors based on who they act with most.

Actors mostly act with other actors in the same…
  1. Language. Not country. For example, the Spanish / Mexican group is across countries. But Indian actors divide into North Indian and South Indian. It’s language, not country.
  2. Time period. Old American actors are a separate group from Hollywood. (Naturally. Brad Pitt was born after Humphrey Bogart died. They couldn’t have acted together.)
  3. Genre. Hollywood Porn actors don’t act with mainstream Hollywood. Same with Japanese Porn, Hollywood TV, and Hollywood Horror actors.

How are these groups themselves connected? Do Chinese actors act with Hollywood often? How isolated is Bollywood from world cinema?

Hollywood is the core group

Take groups that act with other groups at least 5% of the time. Mainstream Hollywood acts with British and Hollywood TV/Horror actors. All other clusters are isolated.


Indian & Japanese clusters emerge

Let’s go more liberal. Take groups that act with other groups at least 2% of the time. Hollywood forms a big connected cluster. It includes most of Europe — British, German, French, Czech, Yugoslavian & Italian actors.

North & South Indian actors form the first non-Hollywood cross-language cluster.

The Japanese and Japanese porn actors form a cluster too. (Interestingly, it’s easy for a Japanese porn actor to act with mainstream Japanese actors. Hollywood porn actors find it far harder to act with Hollywood.)

Among groups that act with other groups at least 1% of the time, we have:

Chinese & Korean cluster emerges

Chinese & South Korean actors form the first cross-country cross-language cluster.

Hollywood expands to act with Scandinavian, Spanish, Polish, Brazilian & Nigerian films.

Other film industries (Russian, Greek, Egyptian — even Hollywood Porn — are still isolated.)


World Cinema vs the rest

Among groups that act with other groups at least 0.5% of the time, we have:

  1. Turkish & Iranian groups coming together
  2. Indonesian actors acting with the Chinese
  3. Hollywood expanding to cover Russian, Greek, Egyptian, and finally, Hollywood Porn. (It’s easier for Brazilian / Nigerian to act with Hollywood than to be a Hollywood Porn actor.)

At this point, there are 6 actor groups that act with each other at least 1 out of 200 times (0.5%).

  1. World Cinema (Hollywood & friends)
  2. Japanese (mainstream & porn)
  3. Indian (North & South)
  4. Chinese, South Korean & Indonesian
  5. Turkish & Iranian
  6. Filipino

One world of cinema

If we look at groups that act with other groups at least 0.5% of the time, we have a far more unified picture. Almost every actor group acts with another group at least 1 out of 400 times.

But even here, there’s an exception. Filipino actors — the most insular major actor group in the world.


So, how isolated is Bollywood from World Cinema? For its size, it’s one of the most isolated actor groups. (But not as much as Iranian/Turkish or Filipino.)

Can foreigners enter Hollywood?

An aspiring Malaysian actor posted on Reddit:

I am a 18-year old biracial Malaysian kid who wants to be an actor in Hollywood. I’m taking a diploma for performing arts in a college called Sunway University in 8 days and I’m considering pulling out of it because why do something that I like when my dreams might never be fulfilled and the price for taking this diploma is seriously expensive. I am starting to doubt my chances of making it to Hollywood and I suffer from extreme anxiety. Is it possible for someone like me to enter Hollywood? What are my chances?

Breaking into Hollywood is hard. As a foreigner, it would be even harder. So I asked myself:

Do Hollywood actors act with foreigners?

Let’s take Will Smith. He frequently acts with Martin Lawrence, Tommy Lee Jones, Jaden Smith, Jon Voight, and 84 other actors.

His every co-star is a Hollywood actor, except the Spanish actor Jordi Mollà in Bad Boys II, and the Dutch actor Marwan Kenzari in Aladdin. Will Smith acts with just 2% of foreign co-stars.

On the other hand, Jackie Chan is more cosmopolitan. He acts with:

Of his 224 co-stars, 70 are non-Chinese. Jackie Chan acts with over 30% foreign co-stars.

Are Chinese films be more foreigner-friendly? Should our Malaysian friend try there instead?

Is Hollywood less open to foreigners than other countries?

I took all movie actors across the world and broke them into groups using a community structure. Actors within the group act mostly within themselves, and less with other groups.

The largest group is Hollywood, with ~80,000 actors (mostly American). They act with each other 90% of the time and act with other groups only 10% of the time.

In comparison, the Chinese group has ~20,000 actors. They act with each other 98% of the time. When they do act outside the group, it’s mostly with Hollywood (0.5%), Japanese (0.3%), South Korean (0.3%), and Indonesian (0.1%)

Clearly, Jackie Chan is more the exception than the norm.

But among the large groups, there are 2 groups that are even more insular than Chinese actors.

The ~8,200 Turkish actors act only with each other 99.1% of the time, occasionally venturing to act with Iranian actors (0.2%).

Even more insular are the ~7,000 Filipino actors who act with each other 99.3% of the time. They occasionally venture out to act in Hollywood 0.2% of the time.

There are no other sizeable groups of actors that’re as insulated.

Hollywood is actually among the most cosmopolitan groups, along with the West European films. So, to our budding Malaysian actor, I’d say:

It’s hard to get an acting break. As a foreigner, it’s 10 times harder in Hollywood. But you’re better off in Hollwood or Western Europe than in any other country, where it would be 50 to 100 times as hard!

Releasing modified mosquitoes precisely

At PyCon Indonesia, I spoke about a project we worked on with the World Mosquito Program.

The World Mosquito Program (WMP) modifies mosquitoes with a bacteria — Wolbachia. This reduces their ability to carry deadly viruses. (It makes me perversely happy that we’re infecting mosquitoes now 😉.)

Modifying mosquitoes is an expensive process. With a limited set of “good mosquitoes”, it is critical to find the best release points that will help them replicate rapidly.

But planning the release points took weeks of manual effort. It involved ground personnel going through several iterations.

So our team took high-resolution satellite images, figured out the building density, estimated population density based on that, and generated a release plan. This model is 70% more accurate and reduced the time from 3 weeks to 2 hours.

More details at the Gramener website.

The slides for the talk are below.

Jolie No. 1

There are more Bollywood actors in Hollywood. Some are even turning down Hollywood roles.

So we wondered: How easily can a Bollywood actor connect to a Hollywood actor?

As part of the Oct 2019 Gramener data story hackathon, AnandKishore, and Niyas created a Jolie No 1 — a data video where Govinda announces (in our imagination) that he will act with Angelina Jolie in Jolie No 1, but declines to comment on who introduced them.

We picked a theme first

The hackathon theme was “movies”. We explored 5 themes:

  1. Who acts most in cameo roles, and what’s the impact on revenue? (Based on The Numbers)
  2. Which actors acted often together? (Based on IMDb data)
  3. Which movies become hits on TV? (Based on BARC TV data)
  4. What is the social network of actors in individual movies (https://www.xkcd.com/657/)
  5. Correlation of TV series actors and their revenues

We explored insights next

We picked the first two themes because we liked them.

1. Cameo appearances

Some observations were:

  • Stan Lee starred in 45 cameo roles. No one even comes close. Some roles are:
    • A school bus driver in Avengers: Infinity War (2018)
    • A strip club DJ in Deadpool (2016)
    • A hot-dog vendor in X-Men (1995)
  • Jay Leno (25) and Larry King (21) follow, mostly starring as themselves
  • Alfred Hitchcock (16) has famous cameo appearances in most of his films, such as:
    • Man mailing letter in Suspicion (1941)
    • Man winding the clock in Rear Window (1954)
    • Man walking the docs in The Birds (1963)

We didn’t have inflation-adjusted box-office revenues, so we couldn’t compare the revenues.

2. Which actors acted often together

Some observations were:

  • Top hero-heroine combo:
    • Overall: Prem Nazir & Jayabharati
    • Hollywood: Billy Dee & Mike Horner (pornstars)
    • Tollywood: Krishna Ghattamaneni & Jaya Prada
    • Bollywood: Jeetendra & Rekha
  • Top male combo: Sivaji Ganesan & Nagesh (more recently, Senthil & Goundamani)
  • Top female combination: Lalitha & Padmini
  • Top pair of:
    • Shah Rukh Khan: Rani Mukherji
    • Amitabh Bachchan: Hema Malini
    • Kamal Haasan: Sridevi
    • Rajinikanth: Sridevi
    • Sridevi: Krishna Ghattamaneni
    • Chiranjeevi: Vijayshanti
    • Dev Anand: Madhubala

The observations focus on Bollywood and Hollywood (because of our familiarity) — but there are number of insights on Japanese and French films too.

We decided to go with this theme because it offered multiple storylines:

  • Some actors pair up with each other, e.g. Gemini – Savithri
  • Some actors have a big “following” e.g. RajinikanthKamal HassanJitendra have acted most with Sridevi
  • Some actors form cliques — working only with each other
  • Often, comedians are the bridge between cliques
  • It’s interesting to see how actors from one clique can connect to another

Creating the storyline

When exploring of actors’ connections, we found a clearly delineated network structure.

Actor SNA

The group of densely clustered actors is the Bollywood-Tollywood-Mollywood-Kollywood nexus. It appears disconnected from the Hollywood cluster. (We excluded anyone who hadn’t acted together in at least 4 films.)

The data was created using this Jupyter notebook.

We realized that it’s tough for someone in Bollywood to connect to Hollywood. Maybe that could be the plot? For example, what if Amitabh Bachchan wants to act with Metryl Streep?

But this isn’t an interesting story. So we asked:

The plot summary was: Govinda wants to act with Angelina Jolie. Who can connect them?

The analysis is in this Jupyter notebook.

Write the screenplay

The morning of the hackathon was spent finalizing the screenplay and dialogues, written on Dropbox Paper.

CUT TO:
    - Video of Govinda "declining James Cameron's Avatar" on Aap Ki Adalat
    - Niyas: On July 29, 2019, Govinda announces he declined a role in Avatar.
    - Video: https://youtu.be/NyFF18a7e-Y
    - Picture: https://twitter.com/mohan_rajkeshav/status/1156148768049262592

CUT TO:
    - Visual: Show an interview video of Govinda and of Angelina
    - Niyas: Today, he announced his next film with Angelina Jolie.
             A “close friend” connected them, but didn't say who.
    - Kishore: Who is this close friend? Why is he not naming them?
    - Video: https://youtu.be/NyFF18a7e-Y (Govinda)
    - Video: https://youtu.be/JNrH1W7aKc8 (Angelina)

CUT TO:
    - Visual: Show the top 8 heroines Govinda has acted with.
              Visualize this data with animation.
              One option is to have Govinda’s pic in the center,
              and have each of these 9 heroine’s images appear around him
              as a circle, with the number of pictures in a link.
              Or as the inverse link distance (e.g. 11 is closest)

    11 Neelam Kothari
    10 Kimi Katkar
    10 Karisma Kapoor
     9 Raveena Tandon
     9 Farha Naaz
     8 Juhi Chawla
     6 Anita Raj
     6 Mandakini
     5 Shilpa Shetty Kundra

    - Niyas: Maybe it’s because it’s one of his heroines?
             He’s mostly acted with Neelam, Kimi and Karishma.
             But none of them has acted with any Hollywood actor.

MORPH TO: 
    - Visual: Add these actors with pics to the same visual,
              but clearly differentiated by gender. Also add their names.

    22 Shakti Kapoor
    18 Kader Khan
    13 Gulshan Grover
     9 Anupam Kher
     8 Dharmendra
     7 Johnny Lever
     6 Sadashiv Amrapurkar
     6 Vikas Anand
     6 Sanjay Dutt
     6 Prem Chopra
     6 Asrani

    - Kishore: So maybe this “close friend” is a male actor?
    - Niyas: He’s acted with Gulshan Grover, Kader Khan and Shakti Kapoor a lot.
    - Kishore: Shakti Kapoor is practically his boyfriend!

MORPH TO:
    - Visual: Zoom into Gulshan Grover and Anupam Kher.
              Build a network of film posters around them
              with their Hollywood films (max 2-4)
        - Anupam Kher
            - Bend It Like Beckham
            - Lust & Caution
            - Silver Linings Playbook
            - A Family Man
        - Gulshan Grover
            - Prisoners of the Sun
            - The Second Jungle Book
            - Marigold
            - Monsoon
    - Niyas: Gulshan Grover and Anupam Kher have acted in a number of Hollywood films
    - Kishore: But have they acted with Angelina Jolie?
    - Niyas: No, never with Angelina Jolie.
    - Kishore: But what if any of them connected him to someone who connected him to Angelina?

CUT TO:
    - Visual: Show Angelina Jolie with ~100 actors around her. Highlight the following:
        - Jack Black, 3
        - Dustin Hoffman, 3
        - Giovanni Ribisi, 2
        - Robert De Niro, 2
        - Brad Pitt, 2
        - Elle Fanning, 2
        - Bryan Cranston, 2
        - 92 other actors with only 1 film each
        - Highlight Irrfan Khan — A Mighty Heart
    - Niyas: Angelina Jolie has acted with less than 100 actors.
             Dustin Hoffman and Jack Black, mostly.
             Only one of them is an Indian actor: Irrfan Khan

MORPH TO:
    - Visual: Expand the connection between Angelina and Irrfan
    - Kishore: So, Govinda needs to connect to Irrfan Khan somehow.

MORPH TO:
    - Visual: Connect Govinda to Irrfan Khan via
        - Gulshan Grover via Knock Out
        - Sanjay Dutt via Knock Out
        - Tabu via Saajan Chale Sasural, Dil Ne Phir Yaad Kiya (and 2 others)    
    - Niyas: That should be easy.
             Gulshan Grover and Irrfan Khan have acted together in Knock Out.
             So has Sanjay Dutt.
             But Tabu will be a better option. Govinda and Irrfan Khan have acted with her in 4 movies each.

MORPH TO:
    - Visual: Show path from Govinda to Tabu to Irrfan to Angelina.
    - Kishore: Then, Govinda must have connected to Tabu
               who introduced him to Irrfan Khan,
               who in turn connected him with Angelina Jolie.

Create the video

Anand and Niyas created the visuals on PowerPoint, collaborating on Dropbox.

This is the first version of the presentation. It uses morph transitions extensively.

PPT screenshot

Niyas and Kishore recorded the audio in two parts on their phone, shared it with Anand via WhatsApp.

We integrated these using the Windows 10 video editor. It’s simple, but now powerful. For our use, simplicity was more important.

The process took 6 hours (from 8 am to 2 pm).

  • Writing the screenplay and dialogues: 1.5 hours
  • Creating the presentation: 2 hours
  • Recording the audio: 1 hour
  • Integrating into the video: 1.5 hours

At the last minute, we picked the title “Jolie No. 1” as a parody of Govinda’s No. 1 film series).

We published this on Google Drive, and then on YouTube.

How to direct a data movie

Ganes and I created a data movie on speed-cubing records as part of a Gramener hackathon.

Here’s a video of us talking about how we created it.

Anand: We picked the Rubik’s cube story for this hackathon. Tell me more about how this excited you.

Ganes: Since my son started solving the Rubik’s cube a few months back, I’ve been fascinated with these competitions. I still don’t know how to solve it, but I like watching it.

Anand: But he does?

Ganes: Yeah, he does. So, in the competitions, I’ve seen kids solving the Rubik’s cube in under 10 seconds. So that was the first source of amazement. I’ve seen kids doing it with one hand, blindfolded. I first couldn’t believe it. Doing it with their legs. So that got me really interested.

When we were talking about this, and I was sharing my amazement, we were talking about the hackathon and the conversations kind of merged. So that, I think, the curiosity around it led to picking this as the story.

Anand: And what was the next step?

Ganes: I have always seen the World Cube Association publishing these records. Their website is great. So I thought maybe we could scrape from that, and that’s when I start looking at the website and the competitions we can pick. and then I stumbled on the export feature where they have multiple formats neatly curated that you can take and directly start the analysis.

Anand: Which was actually a big factor in deciding to go for this. Big data set. Very rich, interesting possibilities.

Ganes: So we had had some five or six ideas. This immediately shot up to the top. So after we got the idea, you kind of took over. I think after I mentioned that all these formats were available, it got you excited. So what did you do after that?

Anand: Then it became a question of what all interesting things we can find. It’s almost an exploratory data analysis, but my approach to EDA (exploratory data analysis) is: let’s formulate the hypotheses and then validate, and see if there Is an interesting story behind it.

So it begins with, for instance, the speed at which records have been broken. Today, it’s at 3½ seconds. We know that. But how fast did it fall? Or: what’s the spread of solving-speed for somebody who solves it fast? Does the same person solve it really fast sometimes and really slow sometimes? Is there a movement in their average? You said, “Let’s see how much longer it takes to solve bigger cubes.” Nikhil was going to take the demographics of solvers and see how they’re spread out. There are definitely a lot of Chinese solves in the spread. So, the thing was, let’s look at possible ideas that could lead to an interesting answer, and then validate those.

Ganes: It was almost like “What would we be interested in finding out” and not necessarily like looking at the column of data.

Anand: Yes. And that I think is important, because, from the data, there may be some ideas. But after absorbing it, knowing what’s interesting is what should drive the story.

Ganes: Right. Yeah. So that was a good starting point. We listed all of these on the board. Then, what did you do next?

Anand: Then it’s about proving these. So, we know here are some possible interesting stories, and let us explore and validate whether these are, in fact, interesting, or can be turned into something interesting. So, when I looked at the speed at which records were broken, for instance, I thought that would be an interesting story. But it wasn’t. It was just getting broken at a steadily successive pace.

But something that I did not expect emerged, which is that Wusheng Du, who holds the world record, is not the person who was there in the records consistently. In fact, Felix Zemdegs has been the consistent winner for the last 10 years and is the only cubing champion who’s won the WCA twice. So, that was something that emerged from doing the analysis. So, that has the ability, therefore, of both proving what we’re looking to prove (or disproving), and also coming up with new stuff that we can choose to incorporate into the story.

Ganes: Almost like starting with a business hypothesis, or what, in the enterprise world, the business wants to know, and then once you get into the data, the data is revealing a few interesting insights, and then you kind of marry both. Looks just like that.

Anand: Exactly. Exactly.

Ganes: So, we identified the insights. And then, the target here was to come up with a 2 minute video. So how did you plan from insights to the video.

Anand: So, one of my cousins is a director, and she tried explaining to me the concept of a screenplay. I never really understood it, even though I’ve read a number of screenplays. So, in the last hackathon, when I was creating a (data) movie, that’s when I realised: as I started writing what I want to shoot (because it requires a whole lot of planning), I was effectively writing a screenplay.

The steps are, basically, you have to decide what are the frames or the sequences you want to shoot. So, one sequence was: we want to introduce this Rubik’s cube win. Another sequence was: we want to show how quickly different types of cubes can be solved, etc.

So, for each of these, what I do is: create a storyline that has the following structure. One: what is the message I want people to take away from that.

Ganes: The headline from there.

Anand: Exactly.

And then, in order to do that, what are the words I would narrate on top of it? That literally forms the dialogue. The third thing is, what are the visuals that prove the dialogue. That I structure in the form of a video. The fourth thing is the transition — from one video to another, or from one sequence to another, how do I flow. These are the 4 things that I captured.

When I write down the full dialogue. I speak it out, put in a timer, and then say “OK, this took 10 seconds, this took 15 seconds, this took 14 seconds” and so on.

Then comes the process of recording (the audio). Assembling the visuals, yes, but timing it and sequencing it based on the recording is pretty critical. So, actually, I wanted your voice – it’s better. And initially, I wanted you to do the recording, but because you were busy in the Dell workshop, I had to do the recording to make sure that I get the timing. Then you re-recorded post that.

That recording makes a huge difference. The audio quality on my iPhone is better than the laptop. I transfer it via Dropbox on to the system.

Ganes: Were there some issues because you have some insights and you have a certain sequence, but it may not add up to 2 minutes. Or, there might be something which will just not flow. How do you correct those issues?

Anand: I found that I consistently underestimate (the time). I thought that we only have material for 1½ minutes, but I knew at that point that invariably, because of this bloat, it will somehow add up to 2 minutes. Which is exactly what happened. It moved to 2 minutes 4 seconds.

Ganes: Yes. Exactly. Yeah.

Anand: So, once you’ve done it once or twice, that amount of correction is there. It’s in fact a whole lot easier to control a video than something as crazy as a (software) program, for instance. The estimation error in programming is much higher than this.

The good part is that post production or editing can take care of a lot of stuff. That 2-minute video can be cut to 1½ if required.

Ganes: Yeah, it can be improved, but my biggest fear is: after recording, the post production is a nightmare. It takes hours and hours of effort. A five-minute video, to post, probably takes 2 hours.

Anand: That is true.

Ganes: How do you go about it? After having these audio clippings, videos and images, how do you stitch all together into a video?

Anand: My workflow is on PowerPoint, mostly, and then on Windows Video Editor. And then you introduced iMovie into the mix.

PowerPoint makes it fairly simple. I can put in an audio in the background. I can handle the animations. It’s not a great tool at all, but it’s a tool I’m very familiar with. So, my workflow is: one slide is one shot or one headline in the storyline. Then I record the video independently or download it from YouTube, put it in the background or wherever. Create all the visuals, create the animations around it, put it there. At this point, the raw material is in. Then I insert the audio and let it play the background for that particular slide. Then I time the animation to the audio.

This is a slow process because PowerPoint doesn’t have the right tools. So I play the audio till that point and then set the animation. Then I start from the beginning again, play the audio to the next point, and then set that animation. Which takes a long duration. But once that’s sorted out, I play that full slide and it works out, I then go back and correct.

The good part is that the audio is the time keeper. I pre-recorded the audio. So I know that the entire duration is only going to be 1.8 minutes (and then towards the end we added a few more vidoes that took it to 2 minutes). So the audio keeps you in control, and if you synchronize everything to the audio, then it becomes easier.

Then I exported it into a video file from PowerPoint directly, and then did a little bit of post-processing, adding a background music and adding a few captions, mostly, on Windows Video Editor, and then gave it to you. Which was at around 9 o’clock or so. What did you do from 9 o’clock to 3 o’clock?

Ganes: So, the first thing — on the PowerPoint, I couldn’t believe that you’d done all this on PowerPoint. Yes, you’re taking the tool beyond the limit it was designed for.

I’ve been working with iMovie for a year, and I find it very powerful. For someone who doesn’t come from that background, it was very easy for me to pick up. I had the images and raw video footage for the different portions we were trying to introduce. I was able to split the audio that you recorded from the video, and then was able to record mine and add it. iMovie has these multiple streams you can insert and remove. I had one stream for my audio for my voice over. And there was this video which you had.

On top of that, I could overlay the pictures and other videos that I had towards the end — two videos playing side-by-side. So all of that was possible. and then I could also introduce background music at the very end. iMovie makes it very easy to move all of these things around. And even the synchronization issue which you told about, that’s much easier to resolve in iMovie.

So, all of this finally coming together, I think, at 3 o’clock… when I had all of this, at 3 o’clock I was hunting for the background music (laughs). I was playing all kinds of clips and finally I chose one. So that’s how we got the final YouTube video.

Anand: My lesson from this is: make sure you have a team member who has a Mac!

Ganes: Right, yeah. So let’s go back and look at our video and see what we can learn from it. Thank you!

2 inches will change my life

I walked ~11 million steps in the last 3 years, at ~10K steps daily.

Since 1 Jan 2018, I’ve steadily increased my walking average until Aug 2018. Then my legs started aching. So I cut it down until Jan 2019. In Feb, I resumed and was fairly steady until May 2020. To complement workouts like this, products that are aimed for men over 50 can be used.

In May, my wife refused to let me walk for more than an hour a day. It took me a few months to convince her and level up. I ended 2020 averaging a little over 10K steps for the year.

I’m becoming more regular. I walked 10K/day 15% more in 2020 than in 2018.

2018: I walked 10K steps almost half the time.
2019: it grew to a bit more, to 56%.
2020: I walked 10K steps a day almost two-thirds of the time.

But in May 2020, I went for 5 days without walking even 3K steps.

In 2018, I started being more and more regular until my leg started aching.
2019 was fairly consistent.
2020 is when I applied brakes again — for very different reasons.

I’ve never gone for 5 days without walking even 3K/day before, since 2018. At most, it was 3 days at a stretch.

But when my wife refused to let me walk for more than an hour a day in May 2020, I went on strike! 😉

I walk ~77 min daily. This has increased over the years.

In 2020, this has gone up slightly to 84 min — but it’s still under an hour-and-half. I spend most of this time on calls or listening to audio books / podcasts.
Instead of spending it with my family.

Sometimes, I lose myself in calls and walk for almost 3 hrs and 20K steps.

Naveen is usually to blame. But this happens rarely. I walked 20K steps just 6 times over the last 3 years.

Though the longest walk here indicates over 3 hrs, I’ve never walked 3 hrs in a day.

On 21 Nov, my daughter borrowed my phone and went for her walk. So my phone shows our combined walks, not mine. Many of the other long walks are spread out during the day when I commute by walking in Singapore.

Datehrskm#Why?
21-Nov-203.4615.51My daughter took my phone.
These are her + my walking stats.
15-Nov-192.9811.52Walked to meetings in Singapore.
17-Sep-192.9610.73Walked to meetings in Singapore.
11-Jul-202.8913.94Was talking to Pratap & Ganes.
15-Oct-182.839.55Walked to meetings in Singapore.
03-Sep-202.8213.06Was talking to Naveen & my coach.

I want to walk faster. I walk at ~4.4 km/hr. My target is 5 km/hr.

Walking at over 5 km/hr speeds the heart up and improves metabolism. (Or so I’ve heard.)

I was steadily going towards 5 km/hr in my early days of walking. I slowed down starting Aug 2018, since my legs were aching. Then I picked up speed in end-2018.

I slowed down again in Nov 2019 — and I don’t remember why.

In Jun 2020, I started walking much faster — mainly to complete 10K steps within the hour my wife gave me. That seems to have had a lasting impact. I walked faster overall in 2020.

I’ve managed fast walking 66 times in 2020, a bit more than before.

In Jun 2020, I walked at over 5 km / hr on 20 / 30 days — a very consistent high speed. I’ve never gotten close to this any other month.
(Clearly, there are adverse effects of being able to convince my wife.)

The fastest I walked was in 2018, at 6.8 km/hr. It might have led to my leg aches.

My top 5 walking speeds were in 2018. In 2020, I’ve managed to walk faster than 6 km / hr just once.

Fastest dayskm/hr
07-Jun-20186.80
05-Jan-20196.65
16-Mar-20186.34
08-Jun-20186.31
06-Feb-20186.19
05-Jun-20206.02

The normal stride/height ratio is 0.43. I’m 5’8″. My stride is 2.4 ft. That’s almost exactly 0.43 times my height. So all is well.

By increasing my stride by 2 inches, I can cover 10,000 steps in 8 min less time.

For every inch I lengthen my stride, I walk ~0.2km/hr faster.

I’ve walked with a stride as long as 32″, which is 3″ more than my 2020 average stride. By walking with a 2″ longer stride, I can be 9.2% faster.

So in 2021, I plan to get healthier (and scolded less) with a 2″ longer stride.

A longer stride means a faster walk. That’s a good cardio exercise.
A faster walk also means that it takes less time. So I’ll get beaten up less.
All it takes is stretching my legs 2″ more. Might hurt a bit. I’ll report on this when I know better.

NowNewChangeBenefit
Longer stride29″31″2″Builds character?
Faster walk (kmph)4.55.00.5Better cardio exercise
Time to 10K steps (min)8477-8Less scolding from wife

PostScript: This analysis was done in Excel. Download see the sheet below.