Friday, June 20, 2008

GenBank is Broken

Meet GenBank. As NCBI explains GenBank, it is: the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.

GenBank might be annotated, but it certainly isn't curated. As I'm going to research blog, eventually, GenBank contains many incorrect sequences. One example is the multitude of chimeric 16S sequences found scattered amongst the sequences of the GenBank database.

The question is ... what to do with them? Sandra, over at DigitalBio has written several blog entries to this effect lately. As someone who is critical of GenBank (who have a non-redundant ("nr") library which they admit is no longer non-redundant), I offered my own solution.

Appointed editors.

Here is what I wrote (slightly edited to make my point clearer).
Does GenBank need to be fixed? Yes, at least in the sense that sequences which are clearly proven to be erroneous (such as chimeric 16S sequences) need to be marked as such. I wouldn't necessarily remove them, because they can still serve a useful purpose (such as being test samples for chimeric sequence detectors). However, they need to be re-annotated.

My recommendation? Have GenBank editors, members of the scientific community, who field reports on sequences from the scientific community and make determinations as to the "final call" on a particular reported sequence. This process should be non-anonymous, and it should be documented in the gene record (to avoid constant reporting of the same sequences). Someone spots a problem, they file a report. The report is handled by the editor who then contacts the individual who submitted the sequence (i.e., the submitter). The submitter can then defend their submission, or admit the error. If they defend their submission, the editor gets to make the final call, weighing all the evidence. This way it doesn't fall on the GenBank staff to handle everything, and also gives people in the community a chance to put something else on their CV.
Sandra (who liked the idea), described it as follows:
Temporary editor positions could be analogous to being a program officer at the NSF, or serving on a study section or being on committee for reviewing grant proposals.
Which is exactly what I had in mind. A large portion of the scientific community already volunteers their time in such fashions, this would be nothing extraordinary.

No comments: