Correlative Analytics♠

Once again, Kevin Kelly explains the intersection of computer science, mathematics, large datasets, and science in a way that few can. The link will take you to the entire post, but these juicy tidbits are here to tease:

There’s a dawning sense that extremely large databases of information, starting in the petabyte level, could change how we learn things. The traditional way of doing science entails constructing a hypothesis to match observed data or to solicit new data. Here’s a bunch of observations; what theory explains the data sufficiently so that we can predict the next observation?…

In a cover article in Wired this month Chris Anderson explores the idea that perhaps you could do science without having theories.

This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.

Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

There may be something to this observation. Many sciences such as astronomy, physics, genomics, linguistics, and geology are generating extremely huge datasets and constant streams of data in the petabyte level today. They’ll be in the exabyte level in a decade. Using old fashioned “machine learning,” computers can extract patterns in this ocean of data that no human could ever possibly detect. These patterns are correlations. They may or may not be causative, but we can learn new things. Therefore they accomplish what science does, although not in the traditional manner…

My guess is that this emerging method will be one additional tool in the evolution of the scientific method. It will not replace any current methods (sorry, no end of science!) but will compliment established theory-driven science. Let’s call this data intensive approach to problem solving Correlative Analytics…

Perhaps understanding and answers are overrated. “The problem with computers,” Pablo Picasso is rumored to have said, “is that they only give you answers.”  These huge data-driven correlative systems will give us lots of answers — good answers — but that is all they will give us. That’s what the OneComputer does —  gives us good answers. In the coming world of cloud computing perfectly good answers will become a commodity. The real value of the rest of science then becomes asking good questions…

This is the clearest expression yet of what I think the Discovery Informatics degree at my school can offer to those interested in these emerging fields. And remember, where science leads, business opportunities follow closely behind. There is much to be done…………….


1 Response to “Correlative Analytics♠”

  1. 1 Pam July 1, 2008 at 8:44 pm

    You know, I love hypotheses. I was trained that they should guide you – and while I’m not all that sensitive about whether it is proven right or wrong – I like hanging onto them as a guidepost (or security blanket?). But geez Ad, our datasets are growing and growing and they’re turning everything upside down. We have one of the DI grads working with us, and he is beyond great – and I am learning so much from him. This week: Support Vector Machines. I’m enamored. We need new ways to ook at our data, we need you guys to help us. My dream? To find a way to integrate meta-genomic, meta-proteomic, and metabolomic data in a visual way – to create some kind of multi-dimensional map that integrates all of these data types into an ‘image’ of our microbial communities (so combining who, what, and where, why and hopefully how!). It’s a brave new world indeed.

    (This from a scientist with her first genome in hand, first functional genomics dataset, first proteome, and first metabolome…yeah, we’re overwhelmed for sure).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

“Life’s hard, son. It’s harder when you’re stupid.” — The Duke.

Education is a companion which no misfortune can depress, no crime can destroy, no enemy can alienate,no despotism can enslave. At home, a friend, abroad, an introduction, in solitude a solace and in society an ornament.It chastens vice, it guides virtue, it gives at once grace and government to genius. Without it, what is man? A splendid slave, a reasoning savage. - Joseph Addison
The term informavore (also spelled informivore) characterizes an organism that consumes information. It is meant to be a description of human behavior in modern information society, in comparison to omnivore, as a description of humans consuming food. George A. Miller [1] coined the term in 1983 as an analogy to how organisms survive by consuming negative entropy (as suggested by Erwin Schrödinger [2]). Miller states, "Just as the body survives by ingesting negative entropy, so the mind survives by ingesting information. In a very general sense, all higher organisms are informavores." - Wikipedia

Blog Stats

  • 30,798 hits

%d bloggers like this: