Once again, Kevin Kelly explains the intersection of computer science, mathematics, large datasets, and science in a way that few can. The link will take you to the entire post, but these juicy tidbits are here to tease:
There’s a dawning sense that extremely large databases of information, starting in the petabyte level, could change how we learn things. The traditional way of doing science entails constructing a hypothesis to match observed data or to solicit new data. Here’s a bunch of observations; what theory explains the data sufficiently so that we can predict the next observation?…
In a cover article in Wired this month Chris Anderson explores the idea that perhaps you could do science without having theories.
This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.
Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
There may be something to this observation. Many sciences such as astronomy, physics, genomics, linguistics, and geology are generating extremely huge datasets and constant streams of data in the petabyte level today. They’ll be in the exabyte level in a decade. Using old fashioned “machine learning,” computers can extract patterns in this ocean of data that no human could ever possibly detect. These patterns are correlations. They may or may not be causative, but we can learn new things. Therefore they accomplish what science does, although not in the traditional manner…
My guess is that this emerging method will be one additional tool in the evolution of the scientific method. It will not replace any current methods (sorry, no end of science!) but will compliment established theory-driven science. Let’s call this data intensive approach to problem solving Correlative Analytics…
Perhaps understanding and answers are overrated. “The problem with computers,” Pablo Picasso is rumored to have said, “is that they only give you answers.” These huge data-driven correlative systems will give us lots of answers — good answers — but that is all they will give us. That’s what the OneComputer does – gives us good answers. In the coming world of cloud computing perfectly good answers will become a commodity. The real value of the rest of science then becomes asking good questions…
This is the clearest expression yet of what I think the Discovery Informatics degree at my school can offer to those interested in these emerging fields. And remember, where science leads, business opportunities follow closely behind. There is much to be done…………….


Recent Comments