Faster data crunching

I’ve been playing with big data lately. The good part is, it’s easy to get interesting results. The data is so unwieldy that even average value calculations provoke a “Amazing! I didn’t know that,” response (No exaggeration. I heard this from two separate ~ $1bn businesses this month.) The bad part is that calculating even that simple average is slow. For example, take this 40MB file (380MB unzipped) and extract the first column. ...