Software for my new laptop 2

Time for a new laptop, and to replace software. Here’s my new list.

A lot has changed in the last 5 years. Mainly, I use the browser, cygwin and Portable Apps a lot more. (The last is to escape jailers, not registry bloat.)

Media

  • Chrome [new]: For browsing and development. Fast, light, and stays out of the way.
  • Firefox: I keep it just for printing. Chrome sucks at printing.
  • Media Player Classic: Nothing against it, but I decided to stick to just one app, which is…
  • VLC: Continues to be the best media player, IMHO.
  • WinAmp: I just manage my playlists as M3U files, using Python programs.
  • Audacity: Still the easiest way to record audio.
  • Camstudio: The simplest free portable screen capture software I know.
  • PicPick [new]: Lightweight, powerful screenshot grabber
  • VirtualDub: Not the simplest, but still good for what I need: cropping and joining video.
  • MediaCoder [new]: Good for video/audio conversions. Maybe I’ll install this later.
  • Foxit Reader: The simples free portable PDF reader I know, better than…
  • NitroPDF Reader [new]: … which is good for Printing PDFs – better than…
  • Primo PDF: … which has trouble on rare occasions.
  • Microsoft Reader: I have a lot of ebooks in .LIT.
  • Kindle for PC [new]: I don’t own a Kindle, but I’ve bought a few ebooks.
  • Paint.NET: Good enough for cropping and adjusting colours on images.
  • Windows Live Writer [new]: The best way to write this blog WYSIWYG
  • Inkscape [new]: I occasionally edit vector graphics.
  • Google Earth. Google Maps is good enough.
  • ImgBurn: I no longer use CDs/DVDs. Just flash drives and external hard disks.
  • Picasa: I’ve stopped browsing pictures. No time.

Sharing

  • Dropbox [new]: Simplest way of sharing files.
  • Skype: I use it more than my phone.
  • Google Talk: For those friends who have chat enabled on Gmail.
  • TeamViewer [new]: Pretty efficient screen sharing. Works better than Skype, I think.
  • Google Calendar Sync: To keep Outlook in sync with Google Calendar.

Utilities

  • 7-Zip [new]: Covers all compressed formats, and has the best compression ratio.
  • WinRAR: 7-Zip has it covered.
  • AutoHotKey [new]: Shockingly powerful macro functionality. Shockingly underused.
  • Clip [new]: Command line clipboard. dir | clip copies the directory to the clipboard.
  • ClipX [new]: Stores multiple clipboard entries and history. Invaluable.
  • DiskTT [new]: I’m paranoid about disk speed. I keep measuring it.
  • WinDirStat [new]: Best way to find what’s taking up space on disk.
  • ProcessExplorer [new]: Just in case Task Manager doesn’t show you everything.
  • Google Desktop: Well, it’s dead.
  • mDesktop [new]: A Virtual Desktop Manager (multiple screens) for Windows 7.
  • PowerToys: doesn’t work on Windows 7, but I got X-Mouse working.
  • Teracopy: I don’t worry too much about copying files any more. Maybe later.
  • Junction Link Magic [new]: To map folders. But I now use Cygwin, and symlinks rock.
  • uTorrent [new]: For bittorrent.
  • ntlmaps [new]: proxies requiring a password to a proxy not requiring a password
  • Putty [new]: SSH for Windows, but can also act as an SSH tunnel
  • TrueCrypt [new]: To securely back up my bank details on the cloud.

Development

Data Visualisation

  • R [new]. The God of all statistical packages. Install reshape and ggplot2.
  • Gephi [new]: Does network visualisations quite well. 
  • GraphViz [new]: Does network visualisations not quite as well.
  • Google Refine [new]: Helps clean up messy data.
  • qhull [new]: For voronoi treemaps. Don’t ask.
  • wkhtml2pdf [new]: To print web pages as PDF.

What am I missing that you really like?

Faster data crunching

I’ve been playing with big data lately.

The good part is, it’s easy to get interesting results. The data is so unwieldy that even average value calculations provoke a “Amazing! I didn’t know that,” response (No exaggeration. I heard this from two separate ~ $1bn businesses this month.)

The bad part is that calculating even that simple average is slow.

For example, take this 40MB file (380MB unzipped) and extract the first column.

The simplest Python script to get the first column looks like this:

for row in csv.reader(fileinput.input(), delimiter='\t'):
    if len(row) > 0: print row[0]

That took a good 3 minutes to execute on my laptop.

Since I’m used to UNIX data processing, I tried cut -f1. Weirdly, that’s worse. 5 minutes. Paradoxically, awk '{print $1}' only takes 17 seconds. That's about 12 times faster. Clearly the tool makes a big difference. And we always knew UNIX was fast.

But I also ran these on an Amazon EC2 server, and a Hostgator server. Here’re the results.

  python cut awk
My Dell E5400 3:04 (1x) 5:42 (0.5x) 0:17 (11x)
EC2 standard 0:33 (6x) 0:5.6 (33x) 0:16 (11x)
Hostgator 0:19 (10x) 0:2.5 (74x) 0:0.7 (265x)

What took 3 minutes with Python my Dell E5400 took less than a second on Hostgator’s server with awk. Over 250 times faster. (Not 250%. 250 times).

And it’s not just hardware. A good tool (awk) made things 11x faster on my machine. Good hardware (hostgator) made the same program 10x faster. But choosing the right combination can make things go faster than 11 x 10 = 110 times. Much faster.

There are a few of things I’m taking away from this.

  1. Good hardware can speed you up much as (or more than) choosing the right tool.
  2. Good hardware can be rented. From many places. Cheaply.
  3. Always test what’s fast. awk’s fastest on my machine and Hostgator, but not on EC2.