Many languages have functions to process lists (array). These functions usually return a list, so you can pass that to another list function. This chaining of functions is really powerful.
UNIX provides this sort of chaining capability. If I had a cities (with some repetitions) and I wanted to find out how many started with the letter ‘A’, I’d just type:
cat cities | sort | uniq | grep "^A" | wc
cat: types the cities. sort: sorts the cities alphabetically. uniq: finds unique cities (works only if sorted - that's why we had to sort the list). grep: filters the cities. Only allows cities beginning with A. wc: word count
To do this on Excel, the only way is to
- get the unique values. Data – Filter – Advanced Filter, and select “Unique records only”, “Copy the list to another location”, and select a location
- get the first letter. =LEFT(cell,1) returns the first letter of the cell.
- count the number of “A”s. =COUNT(range, “A”) counts the number of “A”s.
But ideally, I’d like a 1-line formula like
Excel doesn’t provide these functions by default, but you can add them as user defined functions. Doing this lets you condense several cells into one. Instead of having to copy all your data into a set of unique values, and then adding a column for the first cell, the entire operation can be condensed into one formula.
I consider the following functions the a basic set for list processing.
- LENGTH(list) counts the number of elements in a list
- INDEX(list, n) returns the nth element of the list
- GREP(string, list) returns elements of the list that have the string
- UNIQUE(list) filters unique values
- UNION(list, list) returns elements in at least one of the lists
- INTERSECTION(list, list) returns elements in both lists
- DIFFERENCE(list, list) returns the elements in the first list but not the second
- REVERSE(list) reverses the order of the list
- STRJOIN(separator, list) joins the elements of the list into a string, separated by a separator
- STRSPLIT(separator, string) splits the string into a list, using a separator
- MVLOOKUP(value, lookup, result) looks up value in “lookup”, and returns the corresponding MULTIPLE values from “result”
I created these UDFs. You can download the functions and play with them. Below are some tasks that you can do with them, that are difficult otherwise.
- Get the file name from a path.
=INDEX(REVERSE(STRSPLIT("\", filename)), 1)
- Count the number of unique elements in a range.
- How many common elements are there in range 1 and range 2?
- How many words are there in a string?
=LENGTH(STRSPLIT(" ", string))
- Get the smallest unique numbers in a range
- Count the number of mismatches between two lists.
=COUNT(Range1)+COUNT(Range2) - COUNT(INTERSECTION(Range1,Range2))
- Get a list of mismatches between two lists.
- Count duplicate entries in a range.
- VLOOKUP multiple values
- Count the unique matches in a VLOOKUP
This is a small sample. The power of list processing is phenomenal, especially when combined with array formulas. Download these macros and play with them!