Poor Miss Wormwood

It’s hard not to feel sorry for Miss Wormwood. Comments Suraj 13 Feb 2016 4:34 am: I pity his babysitter Rosalyn more!

Software I currently use

Every few years, I review the software I use. Here are some of my earlier lists. Right now, among browsers, Chrome is my primary browser. What’s interesting is that IE 11 has overtaken Firefox in terms of usage. That’s partly because we’re working with Microsoft a lot, but also because Firefox has a number of weird bugs like IE6 used to have, and is slowly lagging in the race. Next to browsers, I spend most of my time on the command prompt. I use Console2 for tabbed console windows. Given the number of command prompts I open, this is often necessary. I use bash in Cygwin as the default shell. Haven’t had the need for PowerShell. ...

A-Z of my browsing history

When you start typing in the address bar, Chrome suggests a link to visit, based on frecency. What do my recommendations look like? A is for airtel.in/smartbyte-s/page.html – the page where you can check your bandwidth usage. I used to check it infrequently until I upgraded to a 125GB connection. Now I check it every few days and feel miserable that I’ve nowhere near used up my quota. This has coerced me to watch many Telugu movies, of which I don’t understand a word. B is for blog.gramener.com – I blog there on data stories. The last month or so has been fairly active thanks to the elections. C is for calendar.google.com – which has become primarily a shared calendar. It was always indispensible to manage my time. Now it helps my colleagues pick when to call me. Right now, my calendar has events booked about two months in advance. D is for docs.google.com – for effectively one single purpose: shared spreadsheets. This is such a common and powerful use case, and I’m surprised it hasn’t become much easier to use. E is for epaper.timesofindia.com – some of our content has been published by The Economic Times, and I keep doing ego-searches in the print edition. But close behind is eci.nic.in which I’ve been scraping a lot, and election-results.ibnlive.in.com which we created for CNN-IBN. F is for flipkart.com – not facebook.com. I’m not often on Facebook. G is for gramener.com. Naturally. (It’s not surprising that it’s not google.com: I search directly from the address bar.) H is for handsontable.com – a library that I’ve been using a lot recently, followed by html5please.com that tells me which HTML5 features are ready for use. I is for ibn.gramener.com – another property we created, but it only just beats irctc.co.in. J is for join.me – a clean way to share your screen without the audience having to install anything (though you the sharer do have to install the software.) K is for kraken.io – an amazingly efficient image compressor. As you might have guessed, I lead a strange life. L is for learn.gramener.com – our Intranet. Sorry, you can’t access this one. M is for mail.google.com. I’ll probably be moving away from gmail as a backend this weekend to Mail-in-a-box, though. Google’s pulling the plug on Google Reader has shaken my faith. N is for news.ycombinator.com. When I’m bored and want to watch something while I have dinner, I don’t open YouTube. I open Hacker News. O is for odc.datameet.org – the Open Data Camp. I’m quite into open data. P is for pay.airtel.com, but if you ignore the number of bills I pay, it would be pandas.pydata.org, the home page of a remarkable data processing library. Q is for quirksmode.org, PPK’s remarkable browser-compatibility guide R is for reader.s-anand.net, my self-hosted RSS reader. It used to be reader.google.com, but Google let me down there. S is for s-anand.net – this blog. T is for twitter.com. Unlike Facebook, I don’t dislike Twitter so much. U is for underscorejs.org. Clearly I need to get a life. V is for visualizing.org. They have a number of interesting data visualisations. W is for webpagetest.org – it helps measure the speed of web pages. X is for xem.github.io. I’ve probably visited this page once, but it’s the only one in my recent history that starts with X Y is for youtube.com. I lied. I spend an order of magnitude more time watching Telugu movies on YouTube than on Hacker News. Z is for zoemob.com. Again, a page I visited only once, but there’s nothing else in Z at the moment. Comments Software I currently use | s-anand.net 9 May 2014 6:24 pm (pingback): […] course, some of my apps apps have moved online, and my earlier post on the A-Z of my browsing history covers that. But there are a few applications that I’ve hosted which I must talk about. […] chandigarh 13 Oct 2015 7:27 pm: you can delete your web search history through link https://history.google.com/history

Why I’m blogging less

My blog’s been through a number of phases. Between 1996 – 1999, it was just a website with a few facts about my and some of my juvenile ramblings. Inspired by robotwisdom.com, I converted it into a blog – except that I didn’t know what blogging was and just called it “updating my site every day.” It was mostly a link blog. In 2006, around the time when I moved from Mumbai to London, I reduced my link-blogging and started writing longer articles talking about my experiences. This was a fairly productive phase, and I was churning a few dozen articles every year until 2012. ...

A utilitarian’s apology

A couple of years ago, my HTC Explorer’s screen died. I bought a Micromax A50. This triggered a series of reactions prompting this post. I have many defects. Like most men, I can’t tell colours apart – like the difference between pink and purple – and am constantly corrected by my six-year-old. I can’t hear two people at the same time – or even in-between each other. I can’t find things outside of my narrow field of vision. I can’t recognise faces, and need at least three one-on-one interactions before I place people. (If you ask me “Do you recognise me?” and I say “Yes, of course!”, I’m usually lying.) I can’t place voices on the phone. My memory is terrible – my wife’s learnt to make me write errands on my laptop. I cannot identify cars – in fact, I couldn’t drive until recently. ...

Weight lines, again

A few years ago, I ended up losting weight, mostly by dieting. That worked out rather well up to a point: I lost about 20kgs rapidly. But I ended up putting them back on almost as rapidly. What I learnt from this was that dieting made me more short-tempered. It also reduced my metabolic rate. My body would adjust to the hunger and enter a “starvation-mode”, using the limited food ridiculously efficiently. So I’d have to eat even less to continue losing weight. ...

Motorbike science lab

My cousin’s working on an interesting project at the Agastya Foundation. A group of scientifically inclined volunteers go around on a bike to schools, taking with them a science lab kit, and show children in rural schools a variety of experiments. Google will award this and 3 other projects (out of 10) Rs 3 crores based on public votes. You can vote for and read more at https://impactchallenge.withgoogle.com/india2013#/agastya|vote ...

Courtesy

We are often subject to body searches, baggage inspections, and identity verifications. At malls. At airports. At offices. These are to ensure that no one carries ammunition inside, or goods or secrets outside. In other words, to deter terrorists and thieves. It’s nothing personal, of course. When someone does not know me, I can choose to accept that (or not; the choice is mine). When I’m invited somewhere, however, I assume that I am not deemed a security threat. Therefore, I expect that: ...

Open source in corporates

[This is a post that I’d published internally in InfyBlogs in Dec 2009. Time to share it.] Last month, my first application went live. I’ve been writing code for 20 years. Not one line of my code has been officially deployed in a corporate. (Loser…) It’s a happy feeling. Someone defined happiness as the intersection of pleasure and meaning. Writing code is pleasurable. Others using it is meaningful. But this post isn’t quite about that. It’s about the hoops I’ve had to jump through to make this happen. ...

The scary Internet

I’m not that difficult to scare, and this log message certainly didn’t help: ip223.hichina.com [223.4.183.127] failed - POSSIBLE BREAK-IN ATTEMPT! That’s the message I saw – one thousand five hundred and seventy times yesterday in /var/log/auth.log on one of my Amazon EC2 instances. Someone, presumably from China, has been patiently trying out a variety of SSH keys to log into this system. These were grouped as batches. There were exactly 314 attempts at 8am yesterday, then 314 at 12noon, then 314 at 4pm, then 314 at 8pm, then 232 at 3am today. (All times are in UTC – that is, UK time without daylight saving). Every burst took 9 minutes to run through all 314 attempts. The worst part was, when I tried using SSH this morning, I wasn’t able to log in. (It turned out that I had made a configuration error, but this is the sort of thing that gets me quite worried.) Perhaps I shouldn’t be complaining. I’ve written enough scrapers to make most webmasters cringe at their logs. I remember a few years ago, when I was working on a project at Tesco, and was scraping bestsellers lists from most sites. (Here’s a blog post about it.) We were putting together a prototype to see how real-time competitive pricing could help. The scraper was a pretty mild one. It would visit a hundred links, roughly at the pace of one a second. No images were loaded, of course, just the HTML. One fine day, a few weeks after this had started, I got a call from Andy. “Hi Anand, are you running any scrapers on our books website?” “Yes, why?” “Oh! The site’s very slow. Could you shut it down immediately?” Turns out that not a single page on the site loaded, and it had almost crawled to a halt. Now, obviously, my little 100-page script could hardly cause damage, but it’s easy to understand their reactions. No unauthorised scraping! After a few days of trying to figure out what the problem was, they increased the memory and things went back to normal. Not a bad solution, actually – throw hardware at the problem, and if it vanishes, it’s probably the cheapest solution. But anyway, I’m sure it’s some nice chap who’s just curious to know what I’ve got on my servers. I’d be happy to share some of it. And even if it’s not so nice a chap, there’s little that I can do, is there? Update (1pm India, 3rd June): Actually, I now realise that this has been happening ever four hours since May 29th, as regular as a clockwork. Wish I knew enough UNIX programming to pull a prank… ...

Hosting options

I've been trying out a number of options for hosting recently, and have settled on Amazon spot instances. Here were my options: Application hosting, like Google AppEngine. I used this a lot until 2 years ago. Then they changed their pricing, and I realised what “lock-in” means. I can’t just take that code and move it to another server. Besides, I’m a bit wary of Google pulling the plug. Heroku? Same problem. I just want to take the code elsewhere and run it. Shared hosting, like Hostgator. This blog is run on Hostgator and I’m extremely happy with them. But the trouble is, with shared hosting, I don’t get to run long-running processes on any ports I like. Run you own servers. The problem here is quite simple: power cuts in India. Dedicated hosting, like Amazon EC2, Azure, GCE, etc. This remains as pretty much the main hosting option I’m a price optimisation freak. So I ran the numbers for a year’s worth of usage. I was looking at the CPU cost of a large machine with 7-8GB RAM. Bandwidth and storage are negligible. The cost per hour worked out to: ...

Visualising networks

Some slides from my talks on visualising networks. (These are part of a series of talks I’m giving at a number of forums; the one at The Fifth Elephant is open to public.)

Geocoding in Excel

It’s easy to convert addresses into latitudes and longitudes into addresses in Excel. Here’s the Github project with a downloadable Excel file. This is via Visual Basic code for a GoogleGeocode function that geocodes addresses. Function GoogleGeocode(address As String) As String Dim xDoc As New MSXML2.DOMDocument xDoc.async = False xDoc.Load ("http://maps.googleapis.com/maps/api/geocode/" + _ "xml?address=" + address + "&sensor=false") If xDoc.parseError.ErrorCode <> 0 Then GoogleGeocode = xDoc.parseError.reason Else xDoc.setProperty "SelectionLanguage", "XPath" lat = xDoc.SelectSingleNode("//lat").Text lng = xDoc.SelectSingleNode("//lng").Text GoogleGeocode = lat & "," & lng End If End Function Comments Ryan 8 Jun 2015 9:28 pm: I find this isn’t working and says, Compile Error; User defined type not defined xDoc As New MSXML2.DOMDocument what do I change to fix it? Thank you Richie Lionell 27 Jul 2016 6:40 am: Ryan, Inside the VBE, Go to Tools -> References, then Select Microsoft XML, v6.0 . If that doesn’t work unselect that and select Microsoft XML, v3.0

Goodbye Google

Google Reader was where I spent most of my browsing time, but now, it’s shutting down. Time for alternatives, but not just for Reader: for all Google products. I’m not sure when one of these might go down, become paid, or become unusable. I just uninstalled Google Drive and Google Talk. but I don’t use it much (I use Skype), so no loss. I’ll leave Chrome for the while, but I’m hearing reports that Firefox is improving faster than Chrome is. Or there’s Chromium. ...

Github page-only repository

Github offers Github Pages that let you host web pages on Github. You create these by adding a branch to git called gh-pages, and this is often in addition to the default branch master. I just needed the gh-pages branch. So thanks to YJL, here’s the simplest way to do it. Create the repositoryon github. Create your local repository and git commitinto it. Type git push -u origin master:gh-pages In .git/config, under the [remote "origin"] section, add push = +refs/heads/master:refs/heads/gh-pages The magic is the last :gh-pages.

The most popular scientific Python modules

I just scraped the scientific packages on pypi. Here are the top 50 by downloads. Name Description Size Downloads numpy NumPy: array processing for numbers, strings, records, and objects. 2000000 133076 scipy SciPy: Scientific Library for Python 7000000 33990 pygraphviz Python interface to Graphviz 99000 22828 geopy Python Geocoding Toolbox 32000 18617 googlemaps Easy geocoding, reverse geocoding, driving directions, and local search in Python via Google. 69000 15135 Rtree R-Tree spatial index for Python GIS 495000 14370 nltk Natural Language Toolkit 1000000 12844 Shapely Geometric objects, predicates, and operations 93000 12635 pyutilib.component.doc Documentation for the PyUtilib Component Architecture. 372000 10181 geojson Encoder/decoder for simple GIS features 12000 9407 GDAL GDAL: Geospatial Data Abstraction Library 410000 8957 scikits.audiolab A python module to make noise from numpy arrays 1000000 8856 pupynere NetCDF file reader and writer. 16000 8809 scikits.statsmodels Statistical computations and models for use with SciPy 3000000 8761 munkres munkres algorithm for the Assignment Problem 42000 8409 scikit-learn A set of python modules for machine learning and data mining 2000000 7735 networkx Python package for creating and manipulating graphs and networks 1009000 7652 pyephem Scientific-grade astronomy routines 927000 7644 PyBrain PyBrain is the swiss army knife for neural networking. 255000 7313 scikits.learn A set of python modules for machine learning and data mining 1000000 7088 obspy.seisan SEISAN read support for ObsPy. 3000000 6990 obspy.wav WAV(audio) read and write support for ObsPy. 241000 6985 obspy.seishub SeisHub database client for ObsPy. 237000 6941 obspy.sh Q and ASC (Seismic Handler) read and write support for ObsPy. 285000 6926 crcmod CRC Generator 128000 6714 obspy.fissures DHI/Fissures request client for ObsPy. 1000000 6339 stsci.distutils distutils/packaging-related utilities used by some of STScI’s packages 25000 6215 pyopencl Python wrapper for OpenCL 1000000 6124 Kivy A software library for rapid development of hardware-accelerated multitouch applications. 11000000 5879 speech A clean interface to Windows speech recognition and text-to-speech capabilities. 17000 5809 patsy A Python package for describing statistical models and for building design matrices. 276000 5517 periodictable Extensible periodic table of the elements 775000 5498 pymorphy Morphological analyzer (POS tagger + inflection engine) for Russian and English (+perhaps German) languages. 70000 5174 imposm.parser Fast and easy OpenStreetMap XML/PBF parser. 31000 4940 hcluster A hierarchical clustering package for Scipy. 442000 4761 obspy.core ObsPy - a Python framework for seismological observatories. 487000 4608 Pyevolve A complete python genetic algorithm framework 99000 4509 scikits.ann Approximate Nearest Neighbor library wrapper for Numpy 82000 4368 obspy.imaging Plotting routines for ObsPy. 324000 4356 obspy.xseed Dataless SEED, RESP and XML-SEED read and write support for ObsPy. 2000000 4331 obspy.sac SAC read and write support for ObsPy. 306000 4319 obspy.arclink ArcLink/WebDC client for ObsPy. 247000 4164 obspy.iris IRIS Web service client for ObsPy. 261000 4153 Orange Machine learning and interactive data mining toolbox. 14000000 4099 obspy.neries NERIES Web service client for ObsPy. 239000 4066 pandas Powerful data structures for data analysis, time series,and statistics 2000000 4037 pycuda Python wrapper for Nvidia CUDA 1000000 4030 GeoAlchemy Using SQLAlchemy with Spatial Databases 159000 3881 pyfits Reads FITS images and tables into numpy arrays and manipulates FITS headers 748000 3746 HTSeq A framework to process and analyze data from high-throughput sequencing (HTS) assays 523000 3720 pyopencv PyOpenCV - A Python wrapper for OpenCV 2.x using Boost.Python and NumPy 354000 3660 thredds THREDDS catalog generator. 25000 3622 hachoir-subfile Find subfile in any binary stream 16000 3540 fluid Procedures to study geophysical fluids on Python. 210000 3520 pygeocoder Python interface for Google Geocoding API V3. Can be used to easily geocode, reverse geocode, validate and format addresses. 7000 3514 csc-pysparse A fast sparse matrix library for Python (Commonsense Computing version) 111000 3455 topex A very simple library to interpret and load TOPEX/JASON altimetry data 7000 3378 arrayterator Buffered iterator for big arrays. 7000 3320 python-igraph High performance graph data structures and algorithms 3000000 3260 csvkit A library of utilities for working with CSV, the king of tabular file formats. 29000 3236 PyVISA Python VISA bindings for GPIB, RS232, and USB instruments 237000 3201 Quadtree Quadtree spatial index for Python GIS 40000 3000 ProxyHTTPServer ProxyHTTPServer – from the creator of PyWebRun 3000 2991 mpmath Python library for arbitrary-precision floating-point arithmetic 1000000 2901 bigfloat Arbitrary precision correctly-rounded floating point arithmetic, via MPFR. 126000 2879 SimPy Event discrete, process based simulation for Python. 5000000 2871 Delny Delaunay triangulation 18000 2790 pymc Markov Chain Monte Carlo sampling toolkit. 1000000 2727 PyBUFR Pure Python library to encode and decode BUFR. 10000 2676 collective.geo.bundle Plone Maps (collective.geo) 11000 2676 dap DAP (Data Access Protocol) client and server for Python. 125000 2598 rq RQ is a simple, lightweight, library for creating background jobs, and processing them. 29000 2590 pyinterval Interval arithmetic in Python 397000 2558 StarCluster StarCluster is a utility for creating and managing computing clusters hosted on Amazon’s Elastic Compute Cloud (EC2). 2000000 2521 fisher Fast Fisher’s Exact Test 43000 2503 mathdom MathDOM - Content MathML in Python 169000 2482 img2txt superseded by asciiporn, http://pypi.python.org/pypi/asciiporn 443000 2436 DendroPy A Python library for phylogenetics and phylogenetic computing: reading, writing, simulation, processing and manipulation of phylogenetic trees (phylogenies) and characters. 6000000 2349 geolocator geolocator library: locate places and calculate distances between them 26000 2342 MyProxyClient MyProxy Client 67000 2325 PyUblas Seamless Numpy-UBlas interoperability 51000 2252 oroboros Astrology software 1000000 2228 textmining Python Text Mining Utilities 1000000 2198 scikits.talkbox Talkbox, a set of python modules for speech/signal processing 147000 2188 asciitable Extensible ASCII table reader and writer 312000 2160 scikits.samplerate A python module for high quality audio resampling 368000 2151 tabular Tabular data container and associated convenience routines in Python 52000 2114 pywcs Python wrappers to WCSLIB 2000000 2081 DeliciousAPI Unofficial Python API for retrieving data from Delicious.com 19000 2038 hachoir-regex Manipulation of regular expressions (regex) 31000 2031 Kamaelia Kamaelia - Multimedia & Server Development Kit 2000000 2007 seawater Seawater Libray for Python 2000000 1985 descartes Use geometric objects as matplotlib paths and patches 3000 1983 vectorformats geographic data serialization/deserialization library 10000 1949 PyMT A framework for making accelerated multitouch UI 18000000 1945 times Times is a small, minimalistic, Python library for dealing with time conversions between universal time and arbitrary timezones. 4000 1929 CocoPy Python implementation of the famous CoCo/R LL(k) compiler generator. 302000 1913 django-shapes Upload and export shapefiles using GeoDjango. 9000 1901 sympy Computer algebra system (CAS) in Python 5000000 1842 pyfasta fast, memory-efficient, pythonic (and command-line) access to fasta sequence files 14000 1836 ...

Streaming audio to iOS via VLC

You can play a song on your PC and listen to it on your iPhone / iPad – converting your PC into a radio station. As with most things VLC related, it’s tough to figure out but obvious in retrospect. The first thing to do is set up the MIME type for the streaming. This is a bug that has been fixed, but might not have made it into your version of VLC. ...

Magnetix

I wasn’t entirely sure, but now I’m somewhat convinced: Magnetix magnets can form an infinite chain that won’t break due by its own weight. (This is not true, however, if you introduce the steel bearing balls between them. That structure collapses pretty quickly if you pull it up like a chain.) So, this would be a really nice question for What If, IMHO. What if you made a 1 light-year chain of Magnetix? Well, to begin with, we’d need nearly 40 million trillion pieces. That’d cost at least 10 million trillion dollars based on the current prices at Amazon, and would be about 140,000 times the world’s GDP. I’m sure Randall could take this a lot further. ...

Auto reloading pages

After watching Bret Victor’s Inventing on Principle, I just had to figure out a way of getting live reloading to work. I know about LiveReload, of course, and everything I’ve heard about it is good. But their Windows version is in alpha, and I’m not about to experiment just yet. This little script does it for me instead: (function(interval, location) { var lastdate = ""; function updateIfChanged() { var req = new XMLHttpRequest(); req.open("HEAD", location.href, false); req.send(null); var date = req.getResponseHeader("Last-Modified"); if (!lastdate) { lastdate = date; } else if (lastdate != date) { location.reload(); } } setInterval(updateIfChanged, interval); })(300, window.location); It checks the current page every 300 milliseconds and reloads it if the Last-Modified header is changed. I usually include it as a minified script: ...

Windows XP virtual machine

Here’s the easiest way to set up a Windows XP virtual machine that I could find. (This is useful if you want to try out programs without installing it on your main machine; test your code on a new machine; or test your website on IE6 / IE7 / IE8.) Go to the Virtual PC download site. (I tried VirtualBox and VMWare Player. Virtual PC is better if you’re running Windows on Windows.) If you have Windows 7 Starter or Home, select “Don’t need XP Mode and want VPC only? Download Windows Virtual PC without Windows XP Mode.” If you have Windows Vista or Windows 7, select “Looking for Virtual PC 2007?” Download it. (You may have to jump through a few hoops like activation.) Download Windows XP and run it to extract the files. (It’s a 400MB download.) Open the “Windows XP.vmc” file – just double-clicking ought to work. At this point, you have a working Windows XP version. (The Administrator password is “Password1”.) Under Tools – Settings – Networking – Adapter 1, select “Shared Networking (NAT)” That’s pretty much it. You’ve got a Windows XP machine running inside your other Windows machine. ...