Tuesday, March 16, 2010

Box plots

So I am in the process of putting together a manuscript my lab has been working on over the course of the past few months. We've done a lot of quantitative PCR to get at an integral question about an environment we've (the royal 'we') been interested in for years. So, the first order of business I do is put together all my figures and tables. The tables were easy peasy, and most of the figures were too. I've come to the qPCR data however and I had a couple of options. I could do the data as tables or I could do figures. The journal I'm sending the manuscript to has no problem with an excess of tables (I've seen manuscripts with up to nine tables and no figures) but I don't like them.

Now, if you look at a lot of qPCR manuscripts, people use columns, which I suppose is fine because you'll see the mean and standard deviation ... but you also lose a lot of data. I mean, say your Y axis goes from 1 to 100, and the mean goes to 65. You know, probably for a fact, that you didn't have a reading of 0, or 1, or 2 ... but there is that area ... shaded out as a part of your column. Not really accurate, right?

So I've become enamored with box and whisker plots. They really do give you more information than your traditional columns graph. Just take a look at the figure below as an example.

Which one gives you more information?


Now, there are ways to make the box and whisker plot even more descriptive (I'm using the Analyze It! plug-in for Excel), but on the face of it, I think the box and whisker plots are a great tool for presenting qPCR data. So I looked to see who presents their data that way ... and almost no one does.

So, what am I missing here? Is it really a good way to present qPCR data, or isn't it? If it is, why isn't anyone else doing it? If not, what is the best way to present the data?

6 comments:

Genomic Repairman said...

I'm pro-G&W plot. You get a lot of data for the ink, much more than the column chart.

soil mama said...

I'm also working on a manuscript that will include some QPCR data. I was looking to see how others have presented it and it's nearly all normal bar charts or tables. I get what you're saying about the bar chart seeming misleading by shading in the whole area, but I think most people still just look at the mean and error bars.
one reason why people probably don't present box plots is that it shows the "shape" of the data and it would be really obvious if it has a normal distribution or not. I know for a fact that many folks in our field don't know jack about stats and don't log transform before doing stats, even when their data doesn't fit normality assumptions of the statistical test. If they were to present box plots, then it would be very obvious that they didn't do the stats right. I've also found that some people are just presenting means and SE and not even doing stats on QPRC data.

soo, I think you could/should go for the box and whisker plots. I think it's similar to soil enzyme data where most people do standard bar charts, but some do box plots since they like them better. Just be aware that if your data doesn't look pretty is will be much more obvious with the box plots.

*I use SigmaPlot :)

Thomas Joseph said...

Good points soil mama, and I have nothing to hide! B&W plots for me (and I did log transform my data before doing my statistics). We use SAS for our statistics, but their graphing options stink so I splurged for Analyze-It! :)

Philip H. said...

What - you're rebelling against SAS-Graph? Oh the burn . . .

Ok, back to reality. I htink box and whisker is an excellent choice for ranges of data over a population wher eyour results don't trend to zero (so your right hand figure). IF they do trend to zero, or you have data skewed to zero one way or the other, then they aren't as descriptive because they won't do as good a job of showing Standard Deviation or anything like that.

Thomas Joseph said...

None of my data trends to zero. The ranged typically from 10^5 to 10^7 copies of gene (ABC), (DEF), or (XYZ) in a particular environment, so even log transformed, the values range from 4.xx to 7.xx

And while SAS is great at doing analysis ... I'm not too fond of the lengths I need to go to to get that data. *grumble*

PS: Saw your comment over at The Big Stick. Next time I'm in DC I'll drop you an email and we can meet for lunch or something.

Philip H. said...

that would be fantastic - but my office is actually out in Silver Spring. I do run into town a lot however, so with enough notice I can easily be there.