Chromosome 13
1.1. Group composition
Participating country(ies): South Korea
Head: Min-Sik Kim
Members:
- Keun Na (Functional study, C. elegans)
- Ju-Wan Kim (LC-MS analysis)
- Jin-Young Cho (Proteomics, LC-MS analysis)
- Chae-Yeon Kim (Bioinformatics)
1.2. Missing proteins
Protein evidence according to NeXtProt for Chromosome 13:
Protein evidence status in text format is available here.
1.3. Bioinformatics protocols
Reference Databases (version or release date)
- neXtProt (Newest Release): for protein information
- Ensembl (Newest Release): for gene information
- Guide to the Human Proteome (Newest Release) from the Global Proteome Machine database (GPMdb): for MS information.
- PeptideAtlas (Newest Release): for MS information.
- Human Protein Atlas (HPA; Newest Release): referenced for information regarding antibody availability and tissue expression.
- Online Mendelian Inheritance in Man (OMIM): for disease-related information
- the Cancer Gene Census (Newest Release): oncogene product information
Protein identification quantification
- The tandem mass spectrometry (MS/MS) spectra were extracted and searched using MASCOT software (version 2.6.0, http://www.matrixscience.com/) against human sequences from NextProt (Newest Release).
- The search parameters were:
- enzyme specificity: trypsin
- two maximum missed cleavages
- carbamidomethyl (C) as fixed modification
- acetyl (K), acetyl (protein N-term), and oxidation (M) as variable modification
- peptide mass tolerance of 10 ppm; (6) MS/MS mass tolerance of 1.2 Da
- PeptideProphet and ProteinProphet were used to estimate the false discovery rate (FDR).
- We identified proteins using two or more unique peptides with an FDR < 1% at the protein level
Protein quantification
- ProteomeDiscoverer software (version 1.3; Thermo Fisher) was used for protein identification and quantification.
- For TMT-labeled peptides, TMT6 modification was added at peptide N termini (+229 Da) and at lysines (+229 Da) for fixed modification.
- Quantification was performed by calculating the ratio between the peak areas of the TMT reporter groups.
- To eliminate masking of changes in expression due to peptides that are shared between proteins, we calculated the protein ratio using only ratios from the spectra that are distinct to each protein.
- All quantitative results were normalized using protein medians (minimum protein count: 20).
- If all the quant channels were not present, the quant values were rejected.
1.4. Analytical protocols
- For proteomic profiling, tryptic-digested placental tissue proteins were fractionated using three different methods, including hydrophilic interaction chromatography, strong cationic exchange chromatography, and OFFGEL electrophoresis according to the manufacturers’ protocols (Agilent Technology). Moreover, the membrane fraction method for identifying membrane proteins in placental tissue was conducted as previously described. LTQ-Orbitrap mass spectrometry (Thermo Fisher San Jose, CA) was used for acquiring mass spectra to protein identification..
1.5. Biological projects
- Identification of missing protein-driven bio-signature for cancer, metabolic disease and other biological regulations.
1.6. Biomaterials
- Placenta tissues samples from healthy persons as well from patient suffering from pre-eclampsia.
- paired cancer tissues and plasma samples.
1.7. Data sets
1.8. Special expertise
- Construction of database
- Bioinformatic tool development
- Cancer proteomics
- Developmental biology
- Functional study on the model organism (C. elegans, Cell lines)
1.9. Major achievements
- Set up home page for C-HPP (April 2012)
- Construct GenomewidePDB (December 2012)
- Update GenomewidePDB 2.0 (November 2015)
Publications
- Paik,Y. et al. (2012) The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome. Nat Biotechnol, 30, 221-3. (PMID 22398612)
- Paik, Y. K. and Hancock, W. S. (2012) Uniting ENCODE with genome-wide proteomics. Nat. Biotechnol. 30, 1065-1067. (PMID 23138303))
- Lee, H. J. et al. (2013) Comprehensive genome-wide proteomic analysis of human placental tissue for the Chromosome-Centric Human Proteome Project. J. Proteome Res., 12, 2458-2466. (PMID 23362793)
- Jeong, S. K. et al. (2013) GenomewidePDB, a proteomic database exploring the comprehensive protein parts list and transcriptome landscape in human chromosomes. J. Proteome Res., 12, 106-111. (PMID 23252913)
- Paik, Y. K. et al. (2014) Genome-wide proteomics, Chromosome-Centric Human Proteome Project (C-HPP), part II. J. Proteome Res., 13, 1-4. (PMID 24328071)
- Paik, Y. K. et al. (2015) Recent Advances in the Chromosome-Centric Human Proteome Project: Missing Proteins in the Spot Light. J. Proteome Res., 14, 3409-3414. (PMID 26337862)
- Jeong, S. K. et al. (2015) GenomewidePDB 2.0: A Newly Upgraded Versatile Proteogenomic Database for the Chromosome-Centric Human Proteome Project. J. Proteome Res., 14, 3710-3719. (PMID 26272709)
- Cho, J. Y. et al. (2015) Combination of Multiple Spectral Libraries Improves the Current Search Methods Used to Identify Missing Proteins in the Chromosome-Centric Human Proteome Project. J. Proteome Res., 14, 4959-4966. (PMID 26330117)
- Paik, Y. K. et al. (2016) Progress in the Chromosome-Centric Human Proteome Project as Highlighted in the Annual Special Issue IV. J. Proteome Res., 15, 3945-3950. (PMID 27809547)
- Kim, J. W. et al. (2016) gFinder: A Web-Based Bioinformatics Tool for the Analysis of N-Glycopeptides. J. Proteome Res., 15, 4116-4125. (PMID 27573070)
- Paik YK et al. (2018) Launching the C-HPP neXt-CP50 Pilot Project for Functional Characterization of Identified Proteins with No Known Function. J Proteome Res. 7;17(12):4042-4050. (PMID 30269496).
- Paik YK et al. (2018) Toward Completion of the Human Proteome Parts List: Progress Uncovering Proteins That Are Missing or Have Unknown Function and Developing Analytical Methods. J Proteome Res. 7;17(12):4023-4030. (PMID 30985145)
- Jeong SK et al. (2018) ASV-ID, a Proteogenomic Workflow to Predict the Candidate Protein Isoforms based on Transcript Evidence. J Proteome Res. 7;17(12):4235-4242. (PMID 30289715)
1.10. The neXt-MP50 challenge
Top-Down Approaches
- Steps
- 20~30 missing proteins from liver (normal-like, tumor, and/or cell lines) in all Chr. mapped
- 20~30 missing proteins from pancreas (normal-like, tumor, and/or cell lines ) in all Chr. mapped
- 10~20 missing proteins from placenta (normal, pre-eclampcia) in all Chr. mapped
- Total missing proteins: 50 in all Chr. mapped
- Milestone dates:
- Step 1. Jul. 2017
- Step 2. Feb. 2018
- Step 3. Aug. 2018
Bottom-Up Approaches
- Steps
- Select target missing proteins under criteria (available/target sample, transcript expression level, number of proteotypic peptides, etc..)
- makes synthetic peptides’ MS spectrum of target proteins.
- Discovering and validating missing proteins.
- Total missing proteins: 50 in all Chr. mapped
- Milestone dates:
- Step 1. Feb. 2017
- Step 2. Feb. 2018
- Step 3. Aug. 2018