The Random Quotes Generator is a simple tool that creates quotes by mixing up words on a web page. The results are often funny, but sometimes surprisingly insightful.
Yes, this is the equivalent of a million monkeys typing Shakespeare, except that they’re using the works of Shakespeare as a starting point. And it doesn’t have to be Shakespeare. It could be you or your friends.
To try it out, visit this page, select the link and “Add to Favorites” or drag it into your browser’s bookmark toolbar.. Then go to any web page that has a lot of text, and click the link to generate random quotes.
The net will find monetization models of theater and sporting events before them. Indeed, there has to be some way to create websites that do other than Advertising. The expected drop in internet advertising will rapidly lose its value and its impact, for reasons that can easily be understood.
For the technically minded, Programming Pearls has a section on Generating Text that explains the concept. The bookmarklet uses an Order-2 word-level Markov chain. Translated into English, what that means is: I look at every pair of words in and find out what word is likely to follow that.
For example, in the Generating Text page, the pair of words “we can” are followed by the words “extend”, “also”, “get” and “write” with equal probability. We pick one randomly (say “also”) and write “we can also”. Then we look at the word pair “can also”, see what word follows that, pick one at random, and so on.
This is Order-2 because we pick pairs of words. And it’s word-level rather than letter-level because we use words instead of letters as the basic building blocks.
When you’re trying it out, make sure that the page is large enough. If not, you may find that the page’s content is reproduced verbatim.
The bookmarklet is built on top of the excellent Readabilitybookmarklet by Arc90, which helps identify the main content to be randomized.
Creating motion charts in Excel is a simple four-step process.
Get the data in a tabular format with the columns [date, item, x, y, size]
Make a “today” cell, and create a lookup table for “today”
Make a bubble chart with that lookup table
Add a scroll bar and a play button linked to the “today” cell
For the impatient, here’s a motion chart spreadsheet that you can tailor to your needs. For the patient and the puzzled, here’s a quick introduction to bubble and motion charts.
What is a bubble chart?
A bubble chart is a way of capturing 3 dimensions. For example, the chart below could be the birth, literacy rate and population of countries (X-axis, Y-axis and size). Or the growth, margin and market cap of companies.
It lets you compare three dimensions at a glance. The size dimension is a different from the X and Y axes, though. It’s not easy to compare differences in size. And the eye tends to focus on the big objects. So usually, size is used highlight important things, and the X and Y axes used to measure the performance of these things.
If I were to summarise bubble charts in a sentence, it would be: bubble charts show the performance of important things (in two dimensions). (In contrast, Variwide charts show the same on one dimension.)
Say you’re a services firm. You want to track the productivity of your most expensive groups (“the important things”). Productivity is measured by 2 parameters: utilisation and margin. The bubble chart would then have the expense of each group as the size, and its utilisation and contribution as the X and Y axes.
What is a motion chart?
Motion charts are animated bubble charts. They track the performance of important things over time (in two dimensions). This is chart with 4 dimensions. But not all data with 4 dimensions can be plotted as a motion chart. One dimension has to be time, and another has to be linked to the importance of the item.
Motion charts were pioneered by Hans Rosling and his TED Talk shows you the true power of motion charts.
How do I create these charts?
Use the Motion Chart Gadget to display any of your data on a web page. Or use Google Spreadsheets if you need to see the chart on a spreadsheet: motion charts are built in.
If you or your viewer don’t have access to these, and you want to use Excel, here’s how.
1. Get the data in a tabular format
Get the data in the format below. You need the X, Y and size for each thing, for each date.
Date
Thing
X
Y
Size
08/02/2009
A
64%
11%
1
08/02/2009
B
14%
33%
2
08/02/2009
C
78%
55%
3
08/02/2009
D
57%
73%
4
08/02/2009
E
39%
32%
5
08/02/2009
F
40%
81%
6
09/02/2009
A
64%
12%
1
09/02/2009
B
14%
33%
2
09/02/2009
C
78%
56%
3
09/02/2009
D
57%
73%
4
09/02/2009
E
39%
32%
5
09/02/2009
F
40%
81%
6
…
…
…
…
..
To make life (and lookups) easier, add a column called “Key” which concatenates the date and the things. Typing “=A2&B2” will concatenate cells A2 and B2. (Red cells use formulas.)
Date
Thing
Key
X
Y
Size
08/02/2009
A
39852A
64%
11%
1
08/02/2009
B
39852B
14%
33%
2
08/02/2009
C
39852C
78%
55%
3
08/02/2009
D
39852D
57%
73%
4
…
…
…
…
…
…
2. Make a “today” cell, and create a lookup table for “today”
Create a cell called “Offset” and type in 0 as its value. Add another cell called Today whose value is the start date (08/02/2009 in this case) plus the offset (0 in this case)
Offset
0
(Just type 0)
Today
08/02/2009
Use a formula: =STARTDATE + OFFSET
Now, if you change the offset from 0 to 1, “Today” changes to 09/02/2009. By changing just this one cell, we can create a table that holds the bubble chart details for that day, like below.
This is a simple Insert – Chart. Go through the chart types and select bubble. Play around with the data selection until you get the X, Y and Size columns right.
4. Add a scroll bar and a play button linked to the “today” cell
Now for the magic. Add a scroll bar below the chart. Excel 2007 users: Go to Developer – Insert and add a scroll bar. Excel 2003 users: Go to View – Toolbars – Control Toolbox and add a scroll bar
Right click on the scroll bar, go to Format Control… and link the scroll bar to the “Offset” cell. Now, as you move the scroll bar, the value in the offset cell will change to reflect it. So the “today” cell will change too. So will the lookup table. And so will the chart.
Next, create a button called “Play” and edit its code. Excel 2007 users: Right click the button, go to Developer – View Code. Excel 2003 users: Right click the button and select View Code.
Type in the following code for the button’s click event:
DeclareSub Sleep Lib"kernel32" (ByVal dwMilliseconds AsLong)
Sub Button1_Click()
Dim i AsIntegerFor i = 0 To 40: ' Replace 40 with your range
Range("J1").Value = i ' Replace J1 with your offset cell
Application.Calculate
Sleep (100)
NextEndSub
Now clicking on the Play button will give you this glorious motion chart in Excel:
One of the reasons I moved to WordPress was the ability to write posts offline, for which I use Windows Live Writer most of the time. The beauty of this is that I can preview the post exactly as it will appear on my site. Nothing else that I know is as WYSIWYG, and it’s very useful to be able to type knowing exactly where each word will be.
The only hitch is: if you write your own WordPress theme, Live Writer probably won’t be able to detect your theme — unless you’re an expert theme writer.
I hunted on Google to see how to get my theme to work with Live Writer. I didn’t find any tutorials. So after a bit of hit-and-miss, I’m sharing a quick primer of what worked for me.
Open any post on your blog (using your new theme) and save that as view.html in your theme folder. Now replace the page’s title with {post-title} and the page’s content with {post-body}. For example:
This is the file Live Writer will be using as its theme. This page will be displayed exactly as it is by Live Writer, with {post-title} and {post-body} replaced with what you type. You can put in anything you want in this page — but at least make sure you include your CSS files.
To let Live Writer know that view.html is what it should display, copy WordPress’ /wp-includes/wlw-manifest.xml to your theme folder and add the following lines just before </manifest>.
Live Writer searches for wlmanifest.xml in the <link rel="wlmanifest"> tag of your home page. Since WordPress already links to its default wlwmanifest.xml, we need remove that link and add our own. So add the following code to your functions.php:
function my_wlwmanifest_link(){echo'<link rel="wlwmanifest" type="application/wlwmanifest+xml" href="'. get_bloginfo('wpurl').'/wp-content/themes/<i>name</i>/wlwmanifest.xml" />';}
remove_action('wp_head','wlwmanifest_link');
add_action('wp_head','my_wlwmanifest_link');
That’s it. Now if you add your blog to Live Writer, it will automatically detect the theme.
This morning, I was watching an episode of Finley the Fire Engine in which one of the trucks had hiccups. Reminded me of this Calvin & Hobbes — especially Hobbes’ remark in the second strip.
By curious coincidence, just a day after my post on client side scraping, I had a chance to demo this to a client. They were making a contacts database. Now, there are two big problems with managing contacts.
Getting complete information
Keeping it up to date
Now, people happy to fill out information about themselves in great detail. If you look at the public profiles on LinkedIn, you’ll find enough and more details about most people.
Normally, when getting contact details about someone, I search for their name on Google with a “site:linkedin.com” and look at that information.
Could this be automated?
I spent a couple of hours and came up with a primitive contacts scraper. Click on the link, type in a name, and you should get the LinkedIn profile for that person. (Caveat: It’s very primitive. It works only for specific URL public profiles. Try ‘Peter Peverelli’ as an example.)
It uses two technologies. Google AJAX Search API and YQL. The search() function searches for a phrase…
From this result, it displays all the <LI> tags which have a class and a <H3> element inside them (that’s what the //li[@class][h3] XPath does).
The real value of this is in bulk usage. When there’s a big list of contacts, you don’t need to scan each of them for updates. They can be automatically updated — even if all you know is the person’s name, and perhaps where they worked at some point in time.
“Scraping” is extracting content from a website. It’s often used to build something on top of the existing content. For example, I’ve built a site that tracks movies on the IMDb 250 by scraping content.
There are libraries that simplify scraping in most languages:
But all of these are on the server side. That is, the program scrapes from your machine. Can you write a web page where the viewer’s machine does the scraping?
Let’s take an example. I want to display Amazon’s bestsellers that cost less than $10. I could write a program that scrapes the site and get that information. But since the list updates hourly, I’ll have to run it every hour.
That may not be so bad. But consider Twitter. I want to display the latest iPhone tweets from http://search.twitter.com/search.atom?q=iPhone, but the results change so fast that your server can’t keep up.
Nor do you want it to. Ideally, your scraper should just be Javascript on your web page. Any time someone visits, their machine does the scraping. The bandwidth is theirs, and you avoid the popularity tax.
This is quite easily done using Yahoo Query Language. YQL converts the web into a database. All web pages are in a table called html, which has 2 fields: url and xpath. You can get IBM’s home page using:
select * from html where url="http://www.ibm.com"
Try it at Yahoo’s developer console. The whole page is loaded into the query.results element. This can be retrieved using JSONP. Assuming you have jQuery, try the following on Firebug. You should see the contents of IBM’s site on your page.
$.getJSON('http://query.yahooapis.com/v1/public/yql?callback=?',{
q:'select * from html where url="http://www.ibm.com"',
format:'json'},function(data){
console.log(data.query.results)});
That’s it! Now, it’s pretty easy to scrape, especially with XPath. To get the links on IBM’s page, just change the query to
select * from html where url="http://www.ibm.com" and xpath="//a"
Or to get all external links from IBM’s site:
select * from html where url="http://www.ibm.com" and xpath="//a[not(contains(@href,'ibm.com'))][contains(@href,'http')]""
Now you can display this on your own site, using jQuery.
This leads to interesting possibilities, such as Map-Reduce in the browser. Here’s one example. Each movie on the IMDb (e.g. The Dark Knight) comes with a list of recommendations (like this). I want to build a repository of recommendations based on the IMDb Top 250. So here’s the algorithm. First, I’ll get the IMDb Top 250 using:
select * from html where url="http://www.imdb.com/chart/top" and xpath="//tr//tr//tr//td[3]//a"
Then I’ll get a random movie’s recommendations like this:
select * from html where url="http://www.imdb.com/title/tt0468569/recommendations" and xpath="//td/font//a[contains(@href,'/title/')]"
In fact, if you visited my IMDb Top 250 tracker, you already ran this code. You didn’t know it, but you just shared a bit of your bandwidth and computation power with me. (Thank you.)
And, if you think a little further, here another way of monetising content: by borrowing a bit of the user’s computation power to build complex tasks. There already are startups built around this concept.
Let me clarify: I don’t care what you do with my content. Feel free. You don’t have to ask. You don’t have to attribute it to me. You can change it. You can misquote me. Whatever.
This says you can do what you want as long as you attribute my content to me.
But that creates a constraint. And if I had a choice, I’d rather have my content quoted than be attributed.
The license that best captures this is the WTFPL, or Do What The Fuck You Want To Public License.
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
Version 2, December 2004
Copyright (C) 2004 Sam Hocevar
14 rue de Plaisance, 75014 Paris, France
Everyone is permitted to copy and distribute verbatim or modified
copies of this license document, and changing it is allowed as long
as the name is changed.
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. You just DO WHAT THE FUCK YOU WANT TO.
So, in the spirit of a happy and open Internet, the contents and code in this site is released under the WTFPL. Do what you want with it.