« What the hell have I been doing? Part 1: Poor Man's MBA | Main | What the hell have I been doing? Part 3: Living Room, Redux »

What the hell have I been doing? Part 2: Data Representation

Like it or not, any analysis work that you do is pretty much worthless unless you are able to present the data effectively. Effective data presentation becomes more difficult when new data has to be consumed on a regular basis. Hand-massaging the information is forced to take a back seat to automation, otherwise you (the analyst) will spend your entire life recreating the same report. The data also has to be extremely accessible, otherwise your customers will just not even bother looking at the information.

For example, lets consider the story of some data analyst named... Rudiger. Rudiger has a large volume of numbers about... virus outbreaks locked up in SQL somewhere. Using the tried and true methods acquired as a grad student, Rudiger glues some Perl scripts together followed by smoothing and other cherry-picking using Matlab or, god forbid, Excel. As people ask for the data on a more frequent basis, our intrepid hero tries to come up with more additional automation to make his report generation easier, with graphs e-mailed to him and other concerned parties on a regular basis. He quickly discovers that no one is reading his data-laden e-mails anymore, leaving poor Rudiger to announce conclusions that others could draw from simply looking at a graph provided for him.

What Rudiger doesn't quite realize is that people need to be able to feel like they can own data on their own and manipulate it so that it tells them a story, and not just the story that the graph Rudiger wants them to see tells them. In much the same way that many "technical" (absurdity!) stock analysts will generate multiple forms of charts rather than looking at the standard data provided by financial news sites, data consumers want the ability to feel they can draw their own conclusions and interact in the process rather than be shown some static information. There are several interweb startups based upon this very concept.

For those of you who haven't figured this out by now, I'm Rudiger. Rather than send out static graph after static graph that no one looks at, I learned a web language and threw together an internal website that allows people of multiple technical levels to explore information about virus outbreaks. While it is nowhere as sophisticated as ATLAS, the service tries to emulate Flickr's content and tag navigation structure, where viruses are the content and tags are what we know about the specific threat. The architecture is easy to use and provides both a low barrier to entry, as everyone knows how to use a web page. Also, the "friction" associated with the data is low, as anyone who is really interested can subscribe to an RSS feed which goes right to a web page on the virus; two mouse clicks versus pulling data from SQL.

I am generally more accustomed to writing english or algorithms rather than web code. Frankly, I hadn't produced a web app since PHP 3.x was the hotness. After consulting with some of my coworkers and my old friend Jacqui Maher, I decided to throw the site together using Ruby on Rails. With Jacqui on IM and a copy of Ruby on Rails: Up and Running in hand, I went from a cold start to a functioning prototype in about 2 weeks. I was pretty surprised with how far web development has come since 2000, as ad-hoc methods for presenting data from a table have were replaced with formalized architectures integrated deeply into the popular coding frameworks.

Moral(s) of the story?: Reduce the cost and barriers to analyzing your own data. Put your data in the hands of the consumer in a digestible, navigable form. Remove yourself from the loop. Don't worry, you will still be valuable even when you aren't the go-to guy for generating graphs, as there is plenty of work to go around right now.

[Sidenote: The sad thing is I learned this lesson about reducing the burden of analyzing regularly generated data once before. The entire motivation behind a project I consulted on many moons ago, namely Sourcefire's RNA Visualization Module, was to provide attack analysts with an easy-to-absorb presentation of underlying complex data.]

Comments (1)

You're better than Rudgier and that's why you're awesome:)

Where is this site anyway? Did you finish it? Is it internal? Would love to see it. I'm sorry I wasn't able to help you more - it might be easier to do that in person, or you could just give me access to your company's svn repository, because hey what could go wrong?

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)


This page contains a single entry from the blog posted on May 18, 2007 12:23 PM.

The previous post in this blog was What the hell have I been doing? Part 1: Poor Man's MBA.

The next post in this blog is What the hell have I been doing? Part 3: Living Room, Redux.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.33