Critical importance of data visualization
photo by David Blackwell.
Not sure if you can imagine or have ever experienced a meeting where you bring in your statspack or AWR report, all 30 pages of it, point out some glaring issues that anybody could see and proposed some precise solutions, only to have the management team’s eyes glaze over. Then after you finish your pitch they all start arguing as to what the problem might be despite your clear presentation of the problem and solution.
Have you ever had that same meeting with a printout of top activity from Oracle Enterprise Manager, with it’s load graph of average active sessions and it’s break down as to where the load comes from in terms of CPU and waits and what the top SQL and Session are, and then you explain the problem and solution and they all nod their heads?
Clear presentation of data using graphics are critical to how fast people can understand the information and how comfortable they are in interpreting the information.
Edward Tufte wrote a seminal analysis of the decision to launch the space shuttle on January 28, 1986. Some have been critical of the analysis but for reasons that are orthogonal to what I find important . What I find important is the shockingly huge impact the presentation format of data can have on the the viewers interpretation.
On the night before the shuttle launch, the engineers who designed the solid rocket boosters were concerned that it would be too cold to launch. Cold was an issue because the joints in the solid rocket booster were a type of rubber which becomes stiffer the colder it is. As the rubber became stiffer, it’s capability to seal the joints declined and it increased the danger of solid rocket fuel burning through.
The engineers stayed up late putting together information and faxing it out to the launch control in Florida. The engineers were concerned and trying to prevent the launch the next day. The engineers had information about the damage to the solid rocket boosters from previous flights. On previous flights the rocket boosters were collected and analyzed for damage after they fell back to the ocean after each launch. The engineers used this data to show how in past launches that damage had been related to temperature on the solid rocket boosters.
Here is a fax showing the “History of O-Ring damage on SRM field joints:
The first problem as Tufte points out is that this fax uses three different naming conventions for the data from previous launches which is confusing. Circled in red are the 3 different naming conventions, date, flight# and SRM #
The fax gives overwhelming detailed information on the damage but no information on the temperatures and the goal was to show a correlation between temperatures and damage.
The next fax shows temperatures but missing many of the damaged flights and includes damage from test fires in the desert that were test fired horizontally not vertically nor with the same stresses as actual flight.
Finally the inclusion of tangential data and the exclusion of other data led the comment that there was damage at the hottest flight and the coldest flight.
But the conclusions in the faxes were clear. Estimated temperature at launch was to be 29-38 degrees and the shuttle should not be launched below 53 degrees
If we take the data that was faxed and plot the number of damage incidents at the temperature which they occur we get a graph like
Based on this information do you think there is a correlation between temperature and damage? Would you have launched the shuttle the next day? Remember that there was tremendous pressure to launch the next day.
Well they did launch and the rest is history. As seen in the picture below there is a white flame coming from one of the o-rings in the solid rocket booster. This flame burned into the liquid fuel and the space shuttle exploded.
It was a national tragedy which led to a congressional investigation. As part of the congressional investigation, the information was drawn up in to graphics. The graphics were actually worse than the original faxes because they introduced so much chart junk.
OK, lets look back at the original data
Now let’s take that data and change the y-axis to represent not a simple count of damage but a scale of how bad the damage was, and we get
Now include the flights that had no damage, a major piece of information, which makes a huge difference already
Now mark damages of a different type of a different color which is only the one that occurred at 75 degree
Now at 70 degrees there were both successes and failures, so normalize (average) the damage there
Now we are starting to see some important information
We are also starting to see a stronger indicator of correlation
But probably the most important piece of information is still missing – the temperature at which the launch the next day would take place:
X marks the spot of the predicted launch temperature for the next day, January 28, 1986. The launch the next day was well outside the known world. It was so far it out, that it was almost as big a leap away from the known world as the size of the known world of data was.
In summary
- NASA engineers, they guys that blew us away putting a man on the moon, can still fail at communicating data clear.
- Congressional investigators, some of the top lawyers in the country, can still fail at communicating data clearly.
- Data visualization seems obvious, but it is difficult.
but lack of clarity can be devastating
Further reading
photo by Steve Jurvetson
related story
http://www.contextmattersinc.com/every-picture-tells-a-story-visualization-matters-especially-when-the-alternative-is-data-deluge/
Recently, ModernHealthcare.com published an article about data deluge in the emergency room, in which poor displays and a plethora of alerts on patient-safety issues may be contributing to creating errors in physician orders. This is a great example of how without better visualization incorporated into a workflow, it doesn’t matter how much data you actually track. If the data does not lend itself to easy (and not overwhelming) comprehension, it is the equivalent of having no data at all.
Sent from my iPad
some good charts: http://www.economist.com/blogs/graphicdetail
some bad charts: http://junkcharts.typepad.com
some more bad charts: http://wtfviz.net
another good visualization site: http://vizual-statistix.tumblr.com/
another good reference http://vizcup2.splashthat.com/