Posts Tagged 'Large Datasets'

The Scientific Method Pushes Back…..

In my last post, here, I linked to a very interesting article by Chris Anderson, of Wired Magazine. Anderson posited that Google is fundamentally changing science and the scientific method.

Well, it didn’t take long for the scientific community to weigh in on the issue:

From Ars Technica, the other side of the argument:

Every so often, someone (generally not a practicing scientist) suggests that it’s time to replace science with something better. The desire often seems to be a product of either an exaggerated sense of the potential of new approaches, or a lack of understanding of what’s actually going on in the world of science. This week’s version, which comes courtesy of Chris Anderson, the Editor-in-Chief of Wired, manages to combine both of these features in suggesting that the advent of a cloud of scientific data may free us from the need to use the standard scientific method.

…Overall, the foundation of the argument for a replacement for science is correct: the data cloud is changing science, and leaving us in many cases with a Google-level understanding of the connections between things. Where Anderson stumbles is in his conclusions about what this means for science. The fact is that we couldn’t have even reached this Google-level understanding without the models and mechanisms that he suggests are doomed to irrelevance. But, more importantly, nobody, including Anderson himself if he had thought about it, should be happy with stopping at this level of understanding of the natural world.

Obviously, there is a lot more, so follow the link for the full post.

I’m not a scientist, I’m a student. Nevertheless, it is fascinating to see the dynamics of conflicting viewpoints that arise from the inevitable conflicts between orthodoxy and revolution. I suspect that the way forward in this discussion will bring us to a harmonic convergence of new research methods and a revision to the hallowed Scientific Method.

Advertisements

Correlative Analytics♠

Once again, Kevin Kelly explains the intersection of computer science, mathematics, large datasets, and science in a way that few can. The link will take you to the entire post, but these juicy tidbits are here to tease:

There’s a dawning sense that extremely large databases of information, starting in the petabyte level, could change how we learn things. The traditional way of doing science entails constructing a hypothesis to match observed data or to solicit new data. Here’s a bunch of observations; what theory explains the data sufficiently so that we can predict the next observation?…

In a cover article in Wired this month Chris Anderson explores the idea that perhaps you could do science without having theories.

This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.

Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

There may be something to this observation. Many sciences such as astronomy, physics, genomics, linguistics, and geology are generating extremely huge datasets and constant streams of data in the petabyte level today. They’ll be in the exabyte level in a decade. Using old fashioned “machine learning,” computers can extract patterns in this ocean of data that no human could ever possibly detect. These patterns are correlations. They may or may not be causative, but we can learn new things. Therefore they accomplish what science does, although not in the traditional manner…

My guess is that this emerging method will be one additional tool in the evolution of the scientific method. It will not replace any current methods (sorry, no end of science!) but will compliment established theory-driven science. Let’s call this data intensive approach to problem solving Correlative Analytics…

Perhaps understanding and answers are overrated. “The problem with computers,” Pablo Picasso is rumored to have said, “is that they only give you answers.”  These huge data-driven correlative systems will give us lots of answers — good answers — but that is all they will give us. That’s what the OneComputer does —  gives us good answers. In the coming world of cloud computing perfectly good answers will become a commodity. The real value of the rest of science then becomes asking good questions…

This is the clearest expression yet of what I think the Discovery Informatics degree at my school can offer to those interested in these emerging fields. And remember, where science leads, business opportunities follow closely behind. There is much to be done…………….


“Life’s hard, son. It’s harder when you’re stupid.” — The Duke.

Education is a companion which no misfortune can depress, no crime can destroy, no enemy can alienate,no despotism can enslave. At home, a friend, abroad, an introduction, in solitude a solace and in society an ornament.It chastens vice, it guides virtue, it gives at once grace and government to genius. Without it, what is man? A splendid slave, a reasoning savage. - Joseph Addison
The term informavore (also spelled informivore) characterizes an organism that consumes information. It is meant to be a description of human behavior in modern information society, in comparison to omnivore, as a description of humans consuming food. George A. Miller [1] coined the term in 1983 as an analogy to how organisms survive by consuming negative entropy (as suggested by Erwin Schrödinger [2]). Miller states, "Just as the body survives by ingesting negative entropy, so the mind survives by ingesting information. In a very general sense, all higher organisms are informavores." - Wikipedia

Blog Stats

  • 30,783 hits
Advertisements