Chromosome 18

1.1. Group composition

Participating country(ies): Russia
Principal Investigator: Alexander Archakov
co-PIs: Elena Ponomarenko (bioinformatics), Andrey Lisitsa (standardization)
Members:
Institute of Biomedical Chemistry (core facility), Moscow
Institute of Physical-Chemical Medicine of the FMBA, Moscow
Centre "Bioengineering", Moscow
Institute of Bioorganic Chemistry, Moscow
Institute of Molecular Biology, Moscow
Institute of Biomedical Problems, Moscow
International Tomography Center, Novosibirsk

Roadmap available at http://www.ibmc.msk.ru/content/intelligence/RHUPO_Roadmap.pdf
Key issues:

The size of the human proteome requires estimates of number of protein species (proteome width) and number of copies of the same protein molecules in a biosample (proteome depth).
Master proteome of a single chromosome is the results of the identification and measurement of all master proteins encoded by the selected chromosome and expressed in the selected type of biological material, there master protein is the primary translation of the coding sequence resembling at least one of the known protein forms, coded by the gene.
Complete human proteome in this context is the result of (1) constellation of master proteomes of different chromosomes and types of biological material and (2) data about modified (AS, SAP, PTM) protein species.
In case PTM and SAP can occur both in the proteins encoded by gene with canonical sequence and splice variants, the number of human protein species (proteome width) was estimated based on the NeXtProt (v2013) data as 1.8 million, which correspond to 2-DE based experimental estimations. Using the assumption that PTMs appear exclusively in sequences of master proteins, but not in splice variants, the width of the human proteome contains approximately 650 000 protein species. If all types of modifications (AS, SAP and PTM) are independent events, the potential proteome width is approximately 8.5 million.
One of the most sensitive SRM technology available detection with sensitivity limit of 10^-18 M for BSA standard, but additional sample treatment with irreversible binding onto BrCN-Sepharose beads allowed a sensitivity of 10^–18 M to be achieved (the proteins CYP102 and BSA was used for calibration), which corresponds to one protein copy/µl of blood plasma.
Operating within an ultra-low concentration range, it is convenient to refer to protein copies rather than concentrations because it allows comparison of results of transcriptomic and proteomic experiments.

1.2. Missing proteins

Protein evidence according to NeXtProt for Chromosome 18:
Protein evidence status in text format is available here.

Number of genes: 275
Number of proteins: 275
Number of validated proteins: 227
Number of proteins awaiting validation: 39
Number of uncertain proteins: 6

Source	Ensembl prot.-coding genes	neXtProt entries	neXtProt PE1	PeptideAtlas	GPMdb green	HPA evid. supportive	Our data
Baseline*	276	272	227	195	219	175	275
Missing	0	0	50	83	58	102	6

* from Lane et al., JPR, 2014 (cumulatively for different types of tissues and cell lines)

«In a broad sense, the scope of “missing” master proteins is restricted to unidentified proteins, which detection does not meet the gold threshold of neXtProt, the green threshold of GPMdb, the FDR of 1% at protein level in Peptide Atlas, or confident detection in Human ProteinAtlas».

In our understanding “missing” master proteins don't exist in separate tissue and cells, so these master proteins don't express in human body or cannot be detected due to the limitations of method sensitivity. Zgoda et al., JPR 2013

LIST OF ‘MISSING’ MASTER PROTEINS FOR CHR18 IN RESULTS OF OUR SRM PROTEOME PROFILING

GENE	AC	Protein name	Our results of transcriptome profiling
			LIVER	HepG2 cell line
C18ORF64	J3KSC0	Putative uncharacterized protein encoded by LINC01387	Not expression	Not expression
LDLRAD4	O15165	Low-density lipoprotein receptor class A domain-containing protein 4	Expression	Expression
PPP4R1	Q8TF05	Serine/threonine-protein phosphatase 4 regulatory subunit 1	Expression	Expression
ST8SIA5	O15466	Alpha-2,8-sialyltransferase 8E	Below thresholds	Below thresholds
SERPINB10	P48595	Serpin B10	Below thresholds	Not expression
TTC39C	Q8N584	Tetratricopeptide repeat protein 39C	Expression	Expression

1.3. Bioinformatics protocols

SRM data processing
SRM spectra are processed manually using MassHunter Data Analysis (Agilent) to annotate the peak groups and to assign them to peptides. Protocol of data processing is available on request. Raw data is also exported and analyzed using the modified version of mQuest\mProphet (to avoid usage of decoy transitions) and also by geometrical analyzer in proprietary SRM2Prot software. Annotated peak groups are uploaded to Panorama for browsing and to PASSEL for quality assignment and long-term storage.
We provide following quality ranking of SRM data:

«green» data are of highest quality; protein detected using 2 peptides with protein copies range variance ≤ 1 order
«yellow» for protein detected using 2 peptides with protein copies range variance > 1 order
«red» for protein detected using 1 peptide

Resources
Gene-Centric Knowledgebase (kb18.ru) – the information collected from C-HPP recommended resources is processed relatively to chromosome 18 genes and presented as color-coded Web-matrix.
SRM Registry (pikb18.ru) – raw data on SRM measurements is stored and displayed on Web as a temporary point before transmitting the spectra to PASSEL. Currently collates 1568 spectra of endogenous and synthetic peptides.

1.4. Analytical protocols

Instrument / parameter	Q-exactive	Agilent 6490 QQQ	TSQ-Vantage QQQ
LC flow rate	Nanoflow-Dionex 300 nL/min	Microflow Agilent 1200 300 μL/min	Nanoflow-Dionex 300 nL/min
Trap	15 cm х 0.75 mm RP C18 Thermo (2 μM, 100 Å)	Nanobore RP C18 5 cm x 2.1 mm (2 μM, 100 Å)	15 cm х 0.75 mm RP C18 Thermo (2 μM, 100 Å)
Sample injected Mass / Max volume	1 μg / 1 μL	10 μg / 20 μL	1 μg / 1 μL
Sample injected (trapping column)	1 µg (С18 0.05*20mm)	10 µg (none)	1 µg (С18 0.05*20mm)
Gradient length for data acquisition	120 min	45 min	120 min
Peak width at half height	20 sec	5 sec	20 sec
МАХ ejected volume	1 µL	20 µL	1 µL
Sensitivity (by BSA as a standard)	10^-11 M (10 amole/μL)	10^-14 M* (0.1 amole/μL)	10^-12 M (1 amole/μL)
Survey scan resolution	140 000	2000	2000
Protein identification	MASCOT (score>50+decoy)	Coincidence RT ± 10 s; ≥3 transitions with S/N > 7; identical transitional profiles	Coincidence RT ± 60 s; ≥3 transitions with S/N > 7; identical transitional profiles

* Sensitivity for the irreversible binding 10^-18М (0.1 ymole/μL) by BSA and CYP102 as the standard Kopylov et al., 2013

1.5. Biological projects

The ongoing Plasma master proteome project is aimed on establishment of the normal levels concentration for each protein using blood plasma of healthy volunteers (astronaut candidates). Creation of the multiplex SRM assay for 200 plasma protein associated with diseases is planned within the project. The clinical proof-of-principal study is based on meta-analysis and our SRM-analysis of the following collections of clinical samples:

glioblastoma
colorectal cancer
risk of stroke

1.6. Biomaterials

Blood Plasma Samples – obtained form 18-25 y.o. healthy men, undergone the through clinical examination in the aerospace commission. Each sample is annotated by 53 clinical and biochemical parameters.
Liver Tissues – obtained from 25-30 y.o. male and female died in car accidents and collected according to the ILS BioBanking http://www.ilsbio.com/ protocols.
HepG2 cells – provided Lab of proteomics IBMC where the cell culture of HepG2 strain cells was grown with Dulbecco's modified Eagle's medium supplemented with 10% fetal bovine serum and antibiotics gentamicin at 30 µg/ml (Invitrogen) and penicillin at 25 µg/ml (Invitrogen).

1.7. Data sets

1.8. Special expertise

Comparative ranking of chromosomes based on post-genomic data shows that all chromosomes are not much different from each other except ChrY and ChrMt due to their shortest length. Chr18 for the Russian portion of the HPP was selected based on the combination of proposed criteria. (Ponomarenko et al., 2012)
meta-analysis of proteomics data from publication and databases for Chr18 shows that only 37%, 43% and 7% of total number of master proteins coding on chromosome detected in blood plasma, liver tissue and HepG2 cells, respectively, and the very few of them measured.
detection of ultra-low abundant protein specified in concentrations as low as 10^-18 M (e.g. 1 molecula per 1 μL of plasma). These low-copied proteins are enriched from the excessive sample volume by the irreversible binding to the beads. See Kopylov et al., 2013 for details.
quantitative correlations of transcriptome-to-proteome data to evaluate the quality of measurements obtained by RNAseq, survey and targeted proteomics. See Ponomarenko et al., 2014.
development of technology for direct molecular fishing on paramagnetic particles for protein interactomics (Ivanov et al., 2014)
cataloguing the non-canonical protein species (proteoforms) by in-depth analysis of trasncriptome data (Shargunov et al., 2013) and prediction of post-translational modifications (Lisitsa et. al., 2014).

1.9. Major achievements

Transcriptome profiling for chr18 in liver tissue and HepG2 cell line using RNA-Seq (Illumina and SOLiD) and RT-PCR protocols.
Proteome profiling for 276 master proteins coding on Chr18 in blood plasma, liver tissue and HepG2 cell line using SRM.

Vienn diagram representing the number of chromosome 18 proteins detected by SRM in three types of biomaterial.

Construction Gene-Centric Knowledgebase(kb18) and SRM Registry (pikb18.ru) database for post-genomic data of Chr18.

Publications

Alexander I. Archakov, Elena A. Ponomarenko, Ekaterina V. Poverennaya, Andrey V. Lisitsa, Ekaterina V. Ilgisonis, Mikhail A. Pyatnitskiy, Arthur T. Kopylov, Victor G. Zgoda. The size of the human proteome: the width and depth. 2014 (submitted).
Lisitsa A, Moshkovskii S, Chernobrovkin A, Ponomarenko E, Archakov A. Profiling proteoforms: promising follow-up of proteomics for biomarker discovery. Expert Rev Proteomics. 2014 Feb;11(1):121-9. doi: 10.1586/14789450.2014.878652. PubMed PMID: 24437377.
Ponomarenko EA, Kopylov AT, Lisitsa AV, Radko SP, Kiseleva YY, Kurbatov LK,Ptitsyn KG, Tikhonova OV, Moisa AA, Novikova SE, Poverennaya EV, Ilgisonis EV,Filimonov AD, Bogolubova NA, Averchuk VV, Karalkin PA, Vakhrushev IV, Yarygin KN,Moshkovskii SA, Zgoda VG, Sokolov AS, Mazur AM, Prokhortchouck EB, Skryabin KG,Ilina EN, Kostrjukova ES, Alexeev DG, Tyakht AV, Gorbachev AY, Govorun VM,Archakov AI. Chromosome 18 transcriptoproteome of liver tissue and HepG2 cells and targeted proteome mapping in depleted plasma: update 2013. J Proteome Res. 2014 Jan 3;13(1):183-90. doi: 10.1021/pr400883x. Epub 2013 Dec 13. PubMed PMID: 24328317.
Shargunov AV, Krasnov GS, Ponomarenko EA, Lisitsa AV, Shurdov MA, Zverev VV, Archakov AI, Blinov VM. Tissue-specific alternative splicing analysis reveals the diversity of chromosome 18 transcriptome. J Proteome Res. 2014 Jan 3;13(1):173-82. doi: 10.1021/pr400808u. Epub 2013 Dec 9. PubMed PMID: 24320163.
Poverennaya EV, Bogolubova NA, Bylko NN, Ponomarenko EA, Lisitsa AV, Archakov AI. Gene-centric content management system. Biochim Biophys Acta. 2014 Jan;1844(1 Pt A):77-81. doi: 10.1016/j.bbapap.2013.08.006. Epub 2013 Aug 27. PubMed PMID: 23994227.
Ponomarenko E, Baranova A, Lisitsa A, Albar JP, Archakov A. The Chromosome-Centric Human Proteome Project at FEBS Congress. Proteomics. 2013 Nov 28. doi: 10.1002/pmic.201300373. Epub ahead of print PubMed PMID: 24285571.
Naryzhny SN, Lisitsa AV, Zgoda VG, Ponomarenko EA, Archakov AI. 2DE-based approach for estimation of number of protein species in cell. Electrophoresis.2013 Nov 20. doi: 10.1002/elps.201300525. Epub ahead of print PubMed PMID:24259369.
Kopylov AT, Zgoda VG, Lisitsa AV, Archakov AI. Combined use of irreversible binding and MRM technology for low- and ultralow copy-number protein detection and quantitation. Proteomics. 2013 Mar;13(5):727-42. doi: 10.1002/pmic.201100460. PubMed PMID: 23281252.
Zgoda VG, Kopylov AT, Tikhonova OV, Moisa AA, Pyndyk NV, Farafonova TE, Novikova SE, Lisitsa AV, Ponomarenko EA, Poverennaya EV, Radko SP, Khmeleva SA, Kurbatov LK, Filimonov AD, Bogolyubova NA, Ilgisonis EV, Chernobrovkin AL, Ivanov AS, Medvedev AE, Mezentsev YV, Moshkovskii SA, Naryzhny SN, Ilina EN, Kostrjukova ES, Alexeev DG, Tyakht AV, Govorun VM, Archakov AI. Chromosome 18 transcriptome profiling and targeted proteome mapping in depleted plasma, liver tissue and HepG2 cells. J Proteome Res. 2013 Jan 4;12(1):123-34. doi: 10.1021/pr300821n. Epub 2012 Dec 20. PubMed PMID: 23256950.
Archakov A, Zgoda V, Kopylov A, Naryzhny S, Chernobrovkin A, Ponomarenko E, Lisitsa A. Chromosome-centric approach to overcoming bottlenecks in the Human Proteome Project. Expert Rev Proteomics. 2012 Dec;9(6):667-76. doi: 10.1586/epr.12.54. Review. PubMed PMID: 23256676.
Ponomarenko E, Poverennaya E, Pyatnitskiy M, Lisitsa A, Moshkovskii S, Ilgisonis E, Chernobrovkin A, Archakov A. Comparative ranking of human chromosomes based on post-genomic data. OMICS. 2012 Nov;16(11):604-11. doi: 10.1089/omi.2012.0034. Epub 2012 Sep 11. PubMed PMID: 22966780.
Ershov P, Mezentsev Y, Gnedenko O, Mukha D, Yantsevich A, Britikov V, Kaluzhskiy L, Yablokov E, Molnar A, Ivanov A, Lisitsa A, Gilep A, Usanov S, Archakov A. Protein interactomics based on direct molecular fishing on paramagnetic particles: experimental simulation and SPR validation. Proteomics. 2012 Nov;12(22):3295-8. doi: 10.1002/pmic.201200135. Epub 2012 Nov 2. PubMed PMID: 23001861.
Medvedev A, Kopylov A, Buneeva O, Zgoda V, Archakov A. Affinity-based proteomic profiling: problems and achievements. Proteomics. 2012 Feb;12(4-5):621-37. doi: 10.1002/pmic.201100373. Epub 2012 Jan 19. Review. PubMed PMID: 22246677.
Archakov A, Aseev A, Bykov V, Grigoriev A, Govorun V, Ivanov V, Khlunov A, Lisitsa A, Mazurenko S, Makarov AA, Ponomarenko E, Sagdeev R, Skryabin K. Gene-centric view on the human proteome project: the example of the Russian roadmap for chromosome 18. Proteomics. 2011 May;11(10):1853-6. doi: 10.1002/pmic.201000540. PubMed PMID: 21563312.
Archakov A, Ivanov Y, Lisitsa A, Zgoda V. Biospecific irreversible fishing coupled with atomic force microscopy for detection of extremely low-abundant proteins. Proteomics. 2009 Mar;9(5):1326-43. doi: 10.1002/pmic.200800598. PubMed PMID: 19253286.
Archakov AI, Ivanov YD, Lisitsa AV, Zgoda VG. AFM fishing nanotechnology is the way to reverse the Avogadro number in proteomics. Proteomics. 2007 Jan;7(1):4-9. PubMed PMID: 17154275.