Peptide/Protein Identification

Presentation of Lennart Martens (University of Gent, Computational Omics and Systems Biology Group) illustrating three ways in which his group is reusing public proteomics data, including publicly available phosphorylation data in PRIDE. May 30, 2013.

Comprehesive annotation of gene sets and gene set enrichment analysis

The Molecular Signatures Database (MSigDB) is a collection of annotated gene sets in 7 category:

  1. positional gene sets for each human chromosome and cytogenetic band.
  2. curated gene sets from online pathway databases, publications in PubMed, and knowledge of domain experts.
  3. motif gene sets based on conserved cis-regulatory motifs from a comparative analysis of the human, mouse, rat, and dog genomes.
  4. computational gene sets defined by mining large collections of cancer-oriented microarray data.
  5. GO gene sets consist of genes annotated by the same GO terms.
  6. oncogenic signatures defined directly from microarray gene expression data from cancer gene perturbations.
  7. immunologic signatures defined directly from microarray gene expression data from immunologic studies.

This annotation can be used together with Gene Set Enrichment Analysis (GSEA) approach to determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).

Comprehensive encyclopedia of genomic functional elements in the model organisms

The modENCODE Project has goal to identify all of the sequence-based functional elements in the Caenorhabditis elegans and Drosophila melanogaster genomes.

LocTree3 prediction tool for protein subcellular localisation

LocTree3 tools from Rostlab can be used to predict subcellular localisation of proteins. Protein accession numbers can be provided in a text file or in an online Textbox individually. Forther details are available in Goldberg et al.. This tool can be used to identify Dataset E defined at Busan Workshop.