Bioinformatic Resources
Here is an incomplete compilation of various bioinformatic resources
accessible on the web. The major categories include:
-
databases: genomes, sequences,
structures,
promoters
-
tools: sequence search, RNA
folding, gene finding, DNA
motif finding, multiple alignment, cellular
processes
-
miscellaneous links: on-line tutorials, conferences,
public
institutions, genomic companies, journals
Please e-mail Nick Buchler
if there are additional links you would like to see included.
Genomic Databases
-
E. Coli Genome Database:
most updated and consolidated information about E. coli genes and proteins
resulted from both the traditional experimental reseach and computational
analysis.
-
Saccharomyces
Genome Database: is a scientific database of the molecular biology
and genetics of the yeast S. cerevisiae.
-
C. Elegans Genome Database:
is a repository of mapping, sequencing, and phenotypic information about
the C. elegans nematode.
-
Drosophila Genome Database:
is a database of genetic and molecular data for Drosphila, which includes
data on all species from the family Drosophilidae.
-
Mouse Genome Informatics:
this database provides integrated access to data on the genetics, genomics,
and biology of the laboratory mouse.
-
Human-Mouse Homology
Map: constructed by integrating orthologs curated by the Mouse
Genome Database with putative orthologs identified by sequence homology.
-
Ensembl.Org: is a joint
project between EMBL-EBI and the Sanger Centre to develop a software system
which produces and maintains automatic annotation on eukaryotic genomes.
Sequence Databases
-
Genbank:
the NIH genetic sequence database, an annotated collection of all publicly
available DNA sequences.
-
SwissProt:
is a curated protein sequence database which strives to provide a high
level of annotations (such as the description of the function of a protein,
its domains structure, post-translational modifications, variants, etc.),
a minimal level of redundancy and high level of integration with other
databases.
-
ExPASy: is the Expert Protein
Analysis System proteomics server of the SwissProt. It is dedicated to
the analysis of protein sequences and structures, as well as 2-D PAGE.
-
Pfam: is a large collection
of multiple sequence alignments and hidden Markov models covering many
common protein domains. Version 5.5 of Pfam (Sept 2000) contains alignments
and models for 2478 protein families, based on the Swissprot 38 and SP-TrEMBL
11 protein sequence databases.
Structural Databases
-
Protein Data Bank (PDB):
international repository for the processing and distribution of 3-D macromolecular
structure data primarily determined experimentally by X-ray crystallography
and NMR.
-
SCOP: created by manual inspection
and abetted by a battery of automated methods, SCOP aims to provide a detailed
and comprehensive description of the structural and evolutionary relationships
between all proteins whose structure is known.
-
CATH:
is a database of structural domains and is a hierarchical classification
of protein domain structures based on class, architecture, topology, and
homologous superfamily.
-
Nucleic Acid Database:
assembles and distributes structural information about nucleic acids and
is integrated with a DNA-Binding Protein Database.
-
DPInteract:
is a database on DNA-protein interactions in E. Coli. with putative extensions
to other organisms.
Promoter/GeneRegulation Databases
-
TRANSFAC: compiles
data about gene regulatory DNA sequences and protein factors binding to
and acting through them. Programs are developed that help to identify putative
promoter or enchancer structures and to suggest their features.
-
RegulonDB
Database: is a DataBase on transcriptional regulation in E. Coli.
-
SCPD: the Saccharomyces
cerevisiae promoter database provides information on the promoter regions
of 6000 genes and ORFs (Open Reading Frames) along with regulatory elements
and transcription factors involved.
-
Eukaryotic Promoter Database:
is an annotated non-redundant collection of eukaryotic POL II promoters,
for which the transcription start site has been determined experimentally.
EPD is structured in a way that facilitates dynamic extraction of biologically
meaningful promoter subsets for comparative sequence analysis.
Sequence Alignment Tools
-
BLAST: a set of
similarity search programs designed to explore all of the available sequence
databases in Genbank, regardless of whether the query is protein or DNA.
-
FastA: compares a protein
sequence to another protein sequence or to a protein database, or a DNA
sequence to another DNA sequence or a DNA library.
-
HMMer: performs profile hidden
Markov models to do sensitive database searching using statistical descriptions
of a sequence family's consensus.
-
SAM:
is the Sequence Alignment and Modeling system, based on HMM and Dirichlet
mixtures.
RNA Folding and Stretching
-
IMB Jena RNA: a veritable
compendium of all RNA sites on the web is listed here
-
RNA/DNA Folding:
interactive server allowing one to use state-of-the-art RNA and DNA folding
algorithms of the Zuker group to fold query sequences.
-
Vienna RNA Package:
consists of a C code library and several stand-alone programs for the prediction
and comparision of RNA secondary structures, both at zero and finite temperatures.
-
RNA Puller:
This server performs quantitative predictions of force-extension curves
of RNA or ssDNA molecules. It takes the secondary structure of the molecule
fully into account with the exception of pseudoknots. The single-stranded
pieces of the molecule are modeled as an elastic freely jointed chain.
Tools for Gene Finding
-
GenScan: provides
access to the program GenScan for predicting the locations and exon-intron
structures of genes in genomic sequences from a variety of organisms.
-
GENEID: is a program
to predict genes (splice sites, start/stop codons, exon assembly) in anonymous
genomic sequences.
-
GeneParser:
is a program for the identification of protein coding regions in genomic
DNA sequence.
-
GeneSCAN: uses Fourier
transform of DNA to find coding regions
-
GeneMark:
uses a Hidden Markov Model (HMM) approach to find genes. Although originally
written for use on bacterial genomes, there now exists a version that works
with eukaryotes.
Tools for Motif Finding
-
MEME and MAST:
programs tailored to discover motifs (highly conserved regions) in groups
of related DNA or protein sequences via multiple alignment. Given such
a motif, MAST searches for it in the sequence database.
-
Meta-MEME: an extension
of MEME f or building and using motif-based hidden Markov models of DNA
and proteins.
-
Gibbs Motif
Sampler: allows one to identify motifs, conserved regions in both
DNA and protein sequences.
Tools for Multiple Alignment and Phylogenetics
-
CLUSTAL-W: is a general
purpose multiple alignment program for DNA or proteins.
-
PHYLIP:
is a package of programs for inferring phylogenies using parsimony, distance
matrix, and likelihood methods via bootstrapping and consensus trees.
Metabolism & Genomes
-
KEGG: the Kyoto Encyclopedia
of Genes and Genomes is an effort to computerize current knowledge of molecular
and cellular biology in terms of the information pathways that consist
of interacting molecules or genes and to provide links from the gene catalogs
produced by genome sequencing projects.
-
E-Cell Project: interested
in building models for simulating intracellcular molecular processes to
predict the dynamic behavior of living cells.
Online courses and tutorials:
Many of these links are to past bioinformatics courses. The website
contains both the lecture notes and, more relevantly, judiciously chosen
problem sets.
Annual Conferences/Symposia
-
PSB: in Hawaii on January
3rd-7th, 2001. Links to future conference.
-
Biophysical
Society: in Boston, Massachusetts on February 17th-21st, 2001.
-
RECOMB: in Montreal, Canada
on April 22nd-25th, 2001.
-
RECOMB satellite:
in Los Angeles on May 19th-20th, 2001.
-
RNA: in Banff,
Canada on May 29th-June 3rd, 2001.
-
ISMB: in Copenhagen, Denmark
on July 21st-25th, 2001.
-
CSHL: Computational
Biology Meeting on Sept 28th - 30th, 2001.
Public Institutions
-
NIGMS: is the National Institute
of General Medical Sciences. Research and links are geared towards basic
biomedical research that is not targeted to specific diseases, but that
increases understanding of life processes and lays the foundation for advances
in disease diagnosis, treatment, and prevention.
-
DOE-OBER:
is the DOE office of biological and environmental research.
-
NCBI: is the National
Center for Biotechnology Information.
-
Sanger Centre: is a genome
research centre founded by the Wellcome Trust and the Medical Research
Council. The purpose is to further the knowledge of genomes, particularly
through large scale sequencing and analysis.
-
EMBL: is the European
Molecular Biology Laboratory and has links to genomes and computational
resources
Genomics Companies
Journals