Posts Tagged 'Discovery Informatics'

Math Wars…Resumed

Before I resumed my education and began this journey in Discovery Informatics, I did as much research as possible. Among those efforts was a meeting with the Assistant Chairman of the Mathematics Department. I disclosed my dream, my background, and then got to the point: Could I, at my age and with my lack of  background in math, possibly get through the DI program? His response was brief, brutal, and very honest. If you struggle with pre-calculus and algebra, you probably shouldn’t be in the program.

Fair enough. The A in algebra boosted my spirits, but the C+ in pre-calculus scared me. Then it was on to Calculus I…….a mightly battle from which I emerged scarred, and, to a certain extent, wiser.

Today, I walked into my Calc II class. Yes, there stood my old friend, the Assistant Chairman. He began the class with a brief slide presentation; the last dozen or so semesters of Calc I students whol earned either A, A-, or B+ in the class. Know from my Statistics classes that they represent a sample of sufficient size so that we can assume a normal distribution, aka, the bell curve, in the grade distribution. Note, too, that he did not include in his sample population those students who earned a grade less than B+ (like me). He then showed a grade distribution of those students in Calc II.

The median was a C+. There were plenty of grades worse than that (I know, and you should too, the median is the 50th percentile). Some freshman whippersnapper, fresh off his AP SAT score, and thus placed in this class, and heretofore considered by his high school classmates as a genius, stated to the professor that he would, without doubt, get an A. The prof begged to differ, stating that half of us will drop or fail, and of the rest, only 2 or 3 will get an A. Added the prof, You might get an A, and I hope you do, but numbers don’t lie.

Whatever sangfroid I might have felt disappeared completely during this exchange of data, to be replaced with that old familiar sensation….gut wrenching fear. Pulse racing, blood pressure elevated, the room suddenly became too warm and I struggled to breathe. I thought that I had trained myself to suppress these periods of anxiety (that primarily arrived just before any tests), but NO!

So the battle resumes. Visits to the math lab, visits to the professor’s office, Sundays spent studying, and anxiety like you don’t know in the days before each test (4 and a Final that is cumulative); these will be my routines this semester.

Wish me luck, I’m gonna need a lot of it……..


Correlative Analytics♠

Once again, Kevin Kelly explains the intersection of computer science, mathematics, large datasets, and science in a way that few can. The link will take you to the entire post, but these juicy tidbits are here to tease:

There’s a dawning sense that extremely large databases of information, starting in the petabyte level, could change how we learn things. The traditional way of doing science entails constructing a hypothesis to match observed data or to solicit new data. Here’s a bunch of observations; what theory explains the data sufficiently so that we can predict the next observation?…

In a cover article in Wired this month Chris Anderson explores the idea that perhaps you could do science without having theories.

This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.

Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

There may be something to this observation. Many sciences such as astronomy, physics, genomics, linguistics, and geology are generating extremely huge datasets and constant streams of data in the petabyte level today. They’ll be in the exabyte level in a decade. Using old fashioned “machine learning,” computers can extract patterns in this ocean of data that no human could ever possibly detect. These patterns are correlations. They may or may not be causative, but we can learn new things. Therefore they accomplish what science does, although not in the traditional manner…

My guess is that this emerging method will be one additional tool in the evolution of the scientific method. It will not replace any current methods (sorry, no end of science!) but will compliment established theory-driven science. Let’s call this data intensive approach to problem solving Correlative Analytics…

Perhaps understanding and answers are overrated. “The problem with computers,” Pablo Picasso is rumored to have said, “is that they only give you answers.”  These huge data-driven correlative systems will give us lots of answers — good answers — but that is all they will give us. That’s what the OneComputer does —  gives us good answers. In the coming world of cloud computing perfectly good answers will become a commodity. The real value of the rest of science then becomes asking good questions…

This is the clearest expression yet of what I think the Discovery Informatics degree at my school can offer to those interested in these emerging fields. And remember, where science leads, business opportunities follow closely behind. There is much to be done…………….

Future Computing

My major, Discovery Informatics, is, I hope and believe, the future of computing. A hybrid kind of major, encompassing programming skills, mathematics and statistics, and a cognate (an area of specialization), the acquired skills should enable a graduate to apply the skill-set to a variety of disciplines.

As someone that spent the better part of his working life in business, it makes sense to think that I can return to that area, ready to contribute (and earn) in a new, meaningful, and interesting way to the corporate weal.

Articles like this provide encouragement that this bold move may yet pay off in the near term:

Workplace social networks and cloud computing means that the need for a centralized IT department will go away. Firms will no longer need to own/maintain the boxes that they use to run their firm’s apps. With no need to touch a box, there will be no need to have the IT staff co-located with the boxes. Oh, oh — can you hear your job going away?

What does this all mean, and more importantly what should a successful IT staffer (or CIO) do today? The key to your future success is to understand how IT is going to change and what you need to do to change with it. IT is going to become much more about information and how it can be used to help the business grow and prosper. This IT function is going to leave the IT department as we know it today and will migrate into the business unit itself. What this means to you is that you need to know what your firm does, and even more importantly, how it does it. The next question will be what information is needed by the business units to improve how they do their work. This is what tomorrow’s IT staff will provide. Thanks Gartner for the peek into the future!

Can you dig it? I can………

Big, Big News!

After 14 months of hard work, aggravation beyond description, despair of the blackest shade, and, yes, tears, a watershed event. I, erstwhile DI major, terrible math scholar, and programmer par non-excellence, have accomplished something that I have long dreamed about, talked about, and now, finally, accomplished.

Send in the band, please

I have accessed a database, extracted some data, and reported the results, using Python (nifty programming language). Just now, I literally dance for joy. The wife is stunned.

Herewith the details…..

The Code:

# A program to access the MaryRichards database, retrieve
# the tuples for the JOBS table, display as output the JOB id, job description
# and amount billed for the job.

import pyodbc

cnxn=pyodbc.connect(“DSN=MaryRichardsBackup;UID=The Tortoise;PWD=lucky7”)

except pyodbc.Error, error:
print “Error — No access”
print “Connected”
cursor = cnxn.cursor()

# Select some values from the database and print them:

cursor.execute(“select * from job”)
allrecs = cursor.fetchall()
for row in allrecs:
print row.JOB_ID, row.Description, row.AmountBilled


The Result:

Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32
Type “copyright”, “credits” or “license()” for more information.

Personal firewall software may warn about the connection IDLE
makes to its subprocess using this computer’s internal loopback
interface. This connection is not visible on any external
interface and no data is sent to or received from the Internet.

IDLE 1.2.1
>>> ================================ RESTART ================================
1 Paint exterior in 794 White 2750.0000
2 Paint dining room and kitchen 1778.0000
3 Prep and paint upstairs bath 550.0000
4 Paint exterior doors in 633 Red 885.0000
5 Prep and paint interior wood trim 1299.0000

I Exult………

How I Got To This Point

Those few who read this site know what I’m trying to achieve. Most don’t know how I got to this point. Cocktails mutterings, explanations to acquaintances, and other vague comments aside, I embarked on this long, strange trip because of one person. The weird thing is that I didn’t know him until recently, and but for a couple of extreme coincidences, our paths might never have crossed. And yet, here I am, a middle-aged man chasing a dream…..a quixotic pursuit for sure.

My friend and his wife are among the few who are peering intently into the near future, looking at a world awash in data, information, and the myriad methods that are being created to deal with this fundamental shift in our existence. They are excited, hungry, and impatient. They have lit the fire in me, as well.

Today, in a post she writes about the changing that is occurring all around us, and says:

All over the world, people are trying things and talking about problems and solutions and possibilities in almost any arena you can name: medicine, technology, politics, business, media, art … you get the picture. The vast amounts of information on Teh Interwebs let me check in with Effect Measure Seth Godin and TED and BoingBoing and many other sites that show intelligence, understanding and, most importantly, movement. and

Things are happening. Companies are removing restrictive coding from music files. Candidates are finding new ways to fund campaigns. Whether you would vote for Obama or not, the method employed by his campaign to raise money is brilliant, innovative and new. People are exploring technology frontiers in creative ways (check out this and this and this, if you don’t believe me.)

The big challenge for right now, for anyone who wants something better, is to keep exploring, keep learning, keep moving along. To be open to changing our minds as we get new information. We are no longer envisioning a new world, we’re paving the road to get there.

…keep exploring, keep learning, keep moving along. To be open to changing our minds as we get new information…….

Thanks for putting the inspiration into words.

What To Do With A Degree in Discovery Informatics?

See, that’s a question that comes up a lot. My wife is the originator of most of the questions, followed closely by her parents and then by my father. Then there’s the party question: “What do you do?”.


Initially, in place to trying to explain my somewhat questionable fancy for numbers, data, and analysis of same, I responded to inquiries with a deluge of techno-speak. The hope was that I would intimidate the un-initiated and cower them into refusing to ask tough follow-up questions. Plus, I couldn’t seem to construct a response that adequately explains my primary aptitude…….extreme curiousity.

But a new day has dawned. In its own inevitable way, data from the WWW has come to the rescue and supplied me with the key to happiness, a ready response to the pesky question, and a job that will be the envy of my peers. Here is the dream job. A slice:

The model was developed using SAS software and information provided by, and relies primarily on historical data. It was built on a database that captured characteristics of the choices of 3,395 recruits between 2002 and 2004. A large amount of player and team data was gathered for the task. The researchers then developed a special form of a statistical model known as a probit to try and capture the decision making process of recruits.

Yes, friends, assembling data, building a database, and then using statistical analysis to arrive at information not previously known. The work that will carry me, happily, into the sunset. And to do it in football recruiting!

I wonder if they have an online job application form?

Inescapable Data

One of my courses this past semester was an introduction to the concept of discovery informatics. Recently, I wrote about one aspect of the discipline. Via WestHawk, here’s a  post that provides another example of massive data collection, disciplined analysis of that data, and the provocative applications that result.

The U.S. Federal Bureau of Investigation is working on a $1 billion biometric database, containing fingerprints, palm prints, digital face scans, and, in the future iris scans, scars, voice data, and records of peoples’ walking gaits.

After September 11, 2001, the military-technical task changed from precisely targeting a discrete object to finding a discrete person, hidden in either a teeming population or deep in the hinterland. This was traditional detective work. But ancient and highly disciplined codes of silence have long thwarted traditional detective work requiring human sources. Thus the urge for a technological solution, also a classic American impulse.

What are the components of this technical solution to finding someone? First, the person’s biometric characteristics, soon to be found in the FBI’s database. Second, continuous overhead observation, eventually to be provided by long-endurance drones, such as Global Hawk. Third, high-level computing power, now available in abundance. Finally, and still missing, extremely perceptive electro-optical sensors, to be mounted on the overhead drones.

We can be sure that engineers are working on the problem.

This is slightly scary. Applied to an external threat, like Al-Qaeda, or terrorism in general, this sounds like a good idea. Applied to domestic criminal activity, it seems to be an overly aggressive response to a not-quite quantified threat.

And besides, the FBI? They still can’t send each other e-mail, or access the internet. Why should we think that they can manage this project?

“Life’s hard, son. It’s harder when you’re stupid.” — The Duke.

Education is a companion which no misfortune can depress, no crime can destroy, no enemy can alienate,no despotism can enslave. At home, a friend, abroad, an introduction, in solitude a solace and in society an ornament.It chastens vice, it guides virtue, it gives at once grace and government to genius. Without it, what is man? A splendid slave, a reasoning savage. - Joseph Addison
The term informavore (also spelled informivore) characterizes an organism that consumes information. It is meant to be a description of human behavior in modern information society, in comparison to omnivore, as a description of humans consuming food. George A. Miller [1] coined the term in 1983 as an analogy to how organisms survive by consuming negative entropy (as suggested by Erwin Schrödinger [2]). Miller states, "Just as the body survives by ingesting negative entropy, so the mind survives by ingesting information. In a very general sense, all higher organisms are informavores." - Wikipedia

Blog Stats

  • 30,792 hits