Micro Mart ‘A Decade of Distributed Computing’

Spreading the Load: A Decade of Distributed Computing

They say a problem shared is a problem halved. It  follows then, that a problem shared amongst 10,000 would end up only being 0.0001% of its original size, meaning, hey, much smaller problem. That’s more or less been the thinking behind SETI@Home, Folding@Home, climateprediction.net and a raft of other distributed computing projects the world over.

Over the past decade, such projects have sought answers to some big questions: Is there intelligent life on other planets? What will our climate get up to over the next century? Why do some proteins misfold and cause diseases such as Creutzfeldt-Jakob, Parkinson’s and Alzheimer’s?

Units of computing performance measurement that sound more like creatures from In the Night Garden have been harnessed to carry out the required processing. The last ten years have seen teraflops and petaflops of calculations whirring around vast, global networks of ordinary machines. Common folk like you or I have had the chance to take part in research which may turn out to be of staggering importance to the human race. It’s inspiring and ambitious. But what actually is it, and what does its future hold?

The Basics

At its simplest, distributed or volunteer-based computing is just a matter of sharing computational tasks between multiple machines to speed up a process. Like cutting up a piece of steak to make it easier to swallow, distributed programs parcel up jobs and spread them across a network so it takes a fraction of the time a sole machine would need to chug through it alone. It’s a sweet idea really, conjuring up images of those woodland creatures who help Snow White with the cleaning, just using silicon chips instead of chipmunks and bandwidth, not bluebirds.

Projects like Stanford’s Folding@home harness the idle processing power of computers around the globe to create a virtual supercomputer with super computational power. Instead of waiting a million days to solve a problem on just one machine, share it between 100,000  and you’ll have your answer in just ten days. By using the web to create a linked network of nodes all of which take on a little bit of the problem, everybody lends a hand until, hey presto, job done. It’s the big society in action.

Talking of the big society, some distributed computing projects have been brought to their knees by a budget crisis. In April of this year, UC Berkeley’s SETI (Search for Extra-Terrestrial Intelligence) project, which pioneered home-user participation in distributed computing with the launch of SETI@Home in 1999, was forced to hibernate its telescope array due to a lack of funding.

With its future hanging in the balance, we look back at the ingenious trail SETI blazed, and its seminal role in the use of ordinary volunteers to take on incredible work.

SETI’s Legacy

Nothing quite excites our collective imagination like the possibility of extra-terrestrial contact, and that’s exactly what the SETI institute has been scanning the skies for.  Although the institute has been forced temporarily to shut down its radio telescopes, the SETI@Home project is still recruiting participants to help analyse the years of data already collected. Launched in 1999, the project asks volunteers who feel like getting their ET on to download free software which digitally scans noise captured from celestial sources for evidence of alien broadcasts. As far as you or I are concerned, it’s just a screensaver which makes use of your CPU’s spare cycles. Instead of flying toasters or Justin Bieber slideshows though, your processor’s down-time is used to detect narrow-bandwidth radio signals which would point towards the existence of extra-terrestrial technology.

It’s a gamble as to whether anything will eventually be found (nothing concrete in terms of ETI has turned up so far), but you can’t deny it’s a cool one.

Or can you? Amidst general disappointment, some popular reactions to the hibernation of SETI’s Allen Telescope Array have been less than kind. People without a jones for the potential of life on other planets have suggested that money and processing power is better put to other, more humanitarian uses.

Since 2004, the World Community Grid has used distributed computing to run research projects aimed at crucial humanitarian missions such as looking for a cure for cancer, fighting AIDS, and searching for sources of clean energy. As of last month, the equivalent to more than 450,000 years of computing time had been spent on its various projects, volunteered by more than half a million registered users.

You can certainly argue that those goals may be more pressing than SETI’s, but the debt other projects owe to the alien-seeking institute needs to be acknowledged. A huge number of projects, including the World Community Grid, run on the BOINC (Berkeley Open Infrastructure for Network Computing) platform, originally developed to support the SETI@home project. Without BOINC, the World Community Grid and others like it wouldn’t have been accessible to Mac OS X and Linux users. So whether or not SETI comes up with evidence for alien intelligence, it still leaves behind it a sizeable legacy in terms of distributed computing’s history.

Citizen Science

Understandably, people have different attitudes to volunteering depending on what’s required of them. When a magician scans a crowd for a willing participant, or a clipboard carrier tries to make eye contact in a busy high street, most of us take the sane course of action and keep our eyes firmly on the floor. That’s because there’s nothing heroic or philanthropic about being sawn in half or questioned on your yoghurt buying habits. Taking part in a citizen science experiment is different, these things are actually important, world-changing even, and best of all, most of them require hardly any effort on our part. It’s a karmic win-win.

Climateprediction.net is another BOINC-run project, based at Oxford University and launched in 2003. It runs along more or less the same lines as SETI@home in terms of volunteer involvement (you download the free software and a screensaver kicks in with some behind-the-scenes calculations when your CPU is idle), but its feet are much more solidly planted on terra firma.

By repeatedly running then tweaking and re-running simulations of weather conditions on a network of volunteer machines, climateprediction.net seeks to predict the future behaviour of our climate. Its aims stretch a little further than the usual five-day forecast, the project plans to produce climate predictions as far ahead as the year 2100.

Conclusions have already been drawn from CPDN’s research. In February of this year, a paper published in scientific journal Nature reported that the project’s data had been used to find evidence it was “very likely” that human greenhouse gas emissions “substantially increased” the risk of UK flooding in the year 2000. It’s a big finding, forming part of a global discourse which affects international policy, the responsibilities of big business and which could ultimately save people’s lives. The wonder of it all is that you can play a part just by getting up off your computer chair to go to the loo or make a cup of tea.

Charity Begins @Home

Arguably the best known global distributed computing project is Stanford’s Folding@Home, thanks in no small part to Sony’s philanthropic venture in 2008 of pre-installing the software onto its PS3 consoles. According to Jane McGonigal’s Reality is Broken, published in March of this year, one out of every 25 gamers on the PS3 network have contributed their spare computing cycles to the project. She calculates that “PS3 users account for 74% of the processing power used by Folding@home.

Once again using a screen-saver based application, Folding@Home’s software simulates the various possible shapes long chain protein molecules (more or less the building blocks of biology) assemble or ‘fold’ themselves into. By doing so, the Stanford team behind it can study how these proteins work and understand more about why ‘misfolding’ can occur, which is believed to be responsible for diseases such as CJD, Alzheimer’s and some forms of cancer.

According to their website, it takes approximately a day for a computer to simulate one nanosecond of protein folding, yet proteins take over 10,000 nanoseconds to fold, meaning a single computer would have to wait 30 years or so for just one result. Distributed computing has sped things up massively, and in 2010  the project boasted access to more than 350,000 active CPUs. Impressive stuff, but Stanford’s scientists soon decided they weren’t merely after spare CPU cycles. They needed something much more complex.

People, Not Processors

Once upon a time, your computer’s spare processing power was all volunteer computing projects were after. In recent years however, they’ve started to come over a bit B movie: nowadays, they want brains.

Computers may be able to calculate pi to trillions of decimal digits and beat us at Jeopardy, but there are still a range of tasks at which our chaotic brains have the upper hand over their binary logic. Pattern recognition and puzzle solving are thought to be two such areas.

In 2008, a Stanford team launched Foldit, an online game which challenges players to fold digital proteins by hand and learn which patterns are the most stable and useful for various jobs. It makes use of human powers of invention and creativity, studying human strategies with the eventual goal of teaching the most efficient methods to computers.

A year before Foldit was released, the pioneering SETI institute came out with Galaxy Zoo, a crowdsourcing package which asks real people to examine and classify images of space. The project uses volunteers to decide whether particular areas of space are worthy of further investigation.

With several projects acknowledging the superiority of eyeballs over algorithms when it comes to pattern detection and more, more often than not, research which began with distributed computing now has a crowdsourcing element.

Crowdsourcing

A host of crowdsourcing projects have emerged since Galaxy Zoo, each making use of human skills to get the job done. Some of the cleverer ones are cannily tapping into the human thirst to play games, and converting their workload into fun, competitive activities. One such, still in beta, is the Google Image Labeller.

Up until this point, I hadn’t really considered how Google images were labelled, happy was I just to type in “cute sloth baby” and enjoy the results. It occurs to me now that cuteness is probably quite tricky to define by algorithm, and in the absence of detailed captions, those subjective labels must have got there some other way.

The Google Image Labeller game is an interesting crowdsourcing concept. Designed to hone the quality of image search results by getting human players to do its work, it converts the process of identifying and labelling the millions of images uploaded to the web into a game.

Essentially, it’s a ‘say what you see’ task, in which you and an anonymous partner somewhere else in the world have to describe random images using words or phrases under a set time limit, and win points when your descriptions match. To make you work a little harder, a few words commonly associated with the picture are not accepted, so for instance, a picture of Jeremy Clarkson might not accept the words ‘man’ and ‘corduroy’, meaning you’d have to try a bit harder with things like ‘curly hair’, ‘dough-like’ or ‘fatuous gasbag’, depending on your views on Clarkson. You only win points if your description tallies with one of your partner’s though, so it’s a question of getting on the right wavelength. The more you play, the more Google’s image search terms are honed. Ingenious really.

Political Scandal

Investigative journalism too has called upon the hive mind to carry out its work. The UK’s high profile MP expenses scandal which broke in summer 2009 made stacks of scanned documents available to the public in the interests of ‘transparency’. Unhelpfully, these documents were made available only as image files, meaning digital searching and cross-referencing was not an option. Cue the arrival of human ingenuity.

To solve the problem of making sense of some half a million pages of documents, The Guardian newspaper launched the ‘Investigate Your MP’s Expenses’ project. Using human scrutiny to assess the content of each page, the project asked readers to label the documents, noting any which merited further enquiry. This meant individuals trawling, in their own time, through endless pages of receipts, invoices and otherwise incredibly dull accounting paperwork looking for evidence of anyone on the fiddle. Jane McGonigal in Reality is Broken called it “the world’s first massively multiplayer investigative journalism project”.

It was reported that in only the first three days, more than 20,000 players had analysed more than 170,000 electronic documents, thanks to a visitor participation rate of 56%. For purposes of comparison, Wikipedia has an average visitor participation rate of 4.6%.

You already know the results. Overall, 28 MPs resigned and criminal investigations began of 4 more. Hundreds of MPs were ordered to repay a total of £1.12 million and  a new parliamentary expenses code was drawn up. It just goes to show that you can get people to volunteer for even the most mind-numbing of tasks, if there’s even the smallest chance of sticking it to the man. Which brings us to our last point on the potential of distributed computing

A Darker Side

So far we’ve tracked a number of charitable, heart-warming applications for distributed or volunteer computing, but it’s not all curing cancer and alien-spotting. The processing power of thousands, sometimes hundreds of thousands, of individual machines can be put to more harmful use, as MasterCard found out in December of last year.

DDoS (Distributed Denial of Service) attacks could be seen as the less fluffy side of distributed computing. So-called botnets are capable of sending massive data streams from huge networks of machines, disrupting services and rendering them incapable of performing. In retaliation for MasterCard’s decision not to handle donations to the Wikileaks campaign, it was temporarily brought down by just such an attack.

A more recent example of a DDoS attack with possibly more long-term consequences would be the work of the Anonymous hacker collective in the attack waged on Sony’s PlayStation Network in early April 2011. Sony has named the attack as the reason a more serious incursion into the network’s security wasn’t spotted, which led to millions of customer credit card details allegedly being compromised.

Whether distributed computing will continue to be used in future for benevolent or shadier purposes, it remains an incredibly powerful tool. Watch this space.

This article originally appeared in Issue 1159 of Micro Mart 26 May – 1 June 2011