Tuesday, November 11, 2008

Bioinformatics

So, for the microbiologists/molecular biologists/geneticists who read this blog (I know there are a couple), what programs/software do you use for sequence analysis/protein analysis/molecular biological applications?

Here's my list:

IN HOUSE

Geneious ($249 subscription/year)
Used: Daily
Official description:
Geneious Pro is an integrated, cross-platform bioinformatics software suite for manipulating, finding, sharing, and exploring biological data such as DNA sequences or proteins, phylogenies, 3D structure information, publications, etc. It features sequence alignment and phylogenetic analysis, contig assembly, primer design and restriction analysis, access to NCBI and UniProt, BLAST, protein structure viewing, automated PubMed searching, and more. It even includes an API for creating your own plugins.

What I use it for:
Geneious is the workhorse application for DNA sequence analysis (chromatogram/sequence quality) and editing (vector and quality trimming) in my laboratory. Geneious is also used for contig assembly of genes/organisms and for alignment of 16S sequences for downstream phylogeny analysis (see programs MEGA, DnaSP, DAMBE). It can also be used to construct quick phylogenetic trees for routine examination. The subscription package allows me to receive regular updates. The only other comparable software application that I’ve found that works well on Windows XP is Sequencher (2007 quote for purchase was $2975. Major updates would require another purchase).

Artemis (freeware)
Used: Moderately (several times a month)
Official description:
Artemis is a free genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence, and its six-frame translation. Artemis is written in Java, and is available for UNIX, GNU/Linux, BSD, Macintosh and MS Windows systems. It can read complete EMBL and GENBANK database entries or sequence in FASTA or raw format. Extra sequence features can be in EMBL, GENBANK or GFF format.

What I use it for:
Artemis is a valuable tool for examining completed genomes. Search by gene/sequence/functional category for items of interest. GenBank houses over 630 completed microbial genomes (631 as of 02/08/08).

MEGA ver4.0 (freeware)
Used: Moderately
Official description:
MEGA is an integrated tool for conducting automatic and manual sequence alignment, inferring phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, and testing evolutionary hypotheses.

What I use it for:
MEGA is the primary phylogenetic tree building program. It constructs publication quality phylogenetic trees. It is used for molecular evolution and population genetic analysis. In terms of alignment data, MEGA is an established format and most programs export/import alignments in MEGA format. Geneious exports alignment data in MEGA format, allowing these two programs to be used in conjunction. MEGA also has a sequence editor for quick/minor alignment editing.

DnaSP (freeware)
Used: Infrequently/Rarely
Official description:
DnaSP, DNA Sequence Polymorphism, is a software package for the analysis of nucleotide polymorphism from aligned DNA sequence data. DnaSP can estimate several measures of DNA sequence variation within and between populations (in noncoding, synonymous or nonsynonymous sites, or in various sorts of codon positions), as well as linkage disequilibrium, recombination, gene flow and gene conversion parameters. DnaSP can also carry out several tests of neutrality: Hudson, Kreitman and Aguadé, Tajima, McDonald and Kreitman, Fu and Li, and Fu tests. Additionally, DnaSP can estimate the confidence intervals of some test-statistics by the coalescent. The results of the analyses are displayed on tabular and graphic form.

What I use it for:
DnaSP is primarily used to determine genotype numbers. Genotypes are based on SNP information derived from DNA sequencing (typically MLST – multi-locus sequence typing) closely related strains/isolates.

DAMBE (freeware)
Used: Infrequently/Rarely
Official description:
Data analysis in molecular biology and evolution. t is an integrated software package for retrieving, organizing, manipulating, aligning, and analyzing molecular sequence data. Allele frequency data can also be used by DAMBE for calculating genetic distances or phylogenetic reconstruction.

What I use it for:
DAMBE does not see frequent usage in the lab, but it is sometimes useful for determining genotype numbers (it ignores gapped sequences for example) from complex sequences.

TotalLab 120 DM ($6,000)
Used: Moderately
Official description:
The TL120 version in the TotalLab range is an advanced image analysis solution which offers an extensive range of features for the in-depth analysis of 1D electrophoresis gels and performing band pattern matching studies. TL120 DM is the TL120 analysis software complete with the DM database component so you can archive all your analysed results and perform cross experiment investigations.

What I use it for:
This program is integral for analysis of our ribosomal intergenic spacer analysis (RISA) data which is collected on a LiCor DNA sequencer. This allows us to look at archaea, eubacterial and fungal population patterns in a sample and then compare that gel image to other images/samples to construct a phylogenetic relationship between them. The DM option allows us to store this information in a database for comparison of data between experiments. This will enhance our ability to compare samples across time (date of analysis & time of collection) and space (place of collection).

ONLINE RESOURCES

NCBI (National Center for Biotechnology Information)
Site: http://www.ncbi.nlm.nih.gov/
Used: Daily
Official Description:
Established in 1988 as a national resource for molecular biology information, NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information - all for the better understanding of molecular processes affecting human health and disease.

What I use it for:
PubMed serves as a primary reference search tool which is linked to the U.S. National Library of Medicine. NCBI houses several major databases, ranging from nucleotide and protein, to taxonomic and structure/function. NCBI also has databases dedicated to SNP (Single Nucleotide Polymorphim), EST (Expressed Sequence Tag), and GEO (Gene Expression Omnibus) analysis. Databases cover all forms of life, from eukaryotic (animal and plant) to prokaryotic (archaea and eubacterial). NCBI also serves the BLAST (Basic Local Alignment Search Tool) which is used to examine sequence similarity to other previously identified sequences (nucleotide or protein).

Ribosomal Database Project
(Michigan State University, J.M. Tiedje)
Site: https://rdp.cme.msu.edu/
Used: Daily
Official Description:
The Ribosomal Database Project (RDP) provides ribosome related data and services to the scientific community, including online data analysis and aligned and annotated Bacterial small-subunit 16S rRNA sequences.

What I use it for:
Upon sequencing 16S clones, we use the RDP database to classify (Phylum/Class/Order/Family/Genus/Species) them for separation, for further phylogenetic analysis. The RDP also has sequence match functions which will identify closely related sequences which are useful when building phylogenetic trees (typically using MEGA).

Bellerophon

Site: http://foo.maths.uq.edu.au/~huber/bellerophon.pl
Used: Daily
Official Description:
Bellerophon is a program for detecting chimeric sequences in a multiple sequence dataset by comparative analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries but can be applied to other gene datasets. A chimeric sequence, or chimera for short, is a sequence comprised of two or more phylogenetically distinct parent sequences. Chimeras are usually PCR artifacts thought to occur when a prematurely terminated amplicon reanneals to a foreign DNA strand and is copied to completion in the following PCR cycles. The point at which the chimeric sequence changes from one parent to the next is called the breakpoint or conversion point.

What I use it for:
Chimera detection in 16S sequences.

1 comment:

Sandra Porter said...

Hi TomJoe,

These days, I'm spending more time in the classrooom, but when I am actually doing bioinformatics instead of teaching it, I am using;

1. Phred or KB for basecalling and quality trimming, and Cross_Match for vector masking - of course this is through our software system, so that I can sort high vs. low quality sequence.

2. FinchTV to view polymorphisms and BLAST high quality regions of sequence.

3. JalView with Clustal to make, view, and color, multiple alignments.

4. Cn3D to view 3D structures of 16S RNA.

5. Phrap to assemble sequences and identify SNPs and chimeras.

6. Phylip programs to make different kinds of trees.

7. Maq to align data from Next Gen sequencing experiments.

and other things, but that's enough for the moment.