EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction

dc.contributor.authorStahl, Kolja
dc.contributor.authorSchneider, Michael
dc.contributor.authorBrock, Oliver
dc.date.accessioned2018-12-03T14:11:28Z
dc.date.available2018-12-03T14:11:28Z
dc.date.issued2017-06-17
dc.description.abstractBackground Accurately predicted contacts allow to compute the 3D structure of a protein. Since the solution space of native residue-residue contact pairs is very large, it is necessary to leverage information to identify relevant regions of the solution space, i.e. correct contacts. Every additional source of information can contribute to narrowing down candidate regions. Therefore, recent methods combined evolutionary and sequence-based information as well as evolutionary and physicochemical information. We develop a new contact predictor (EPSILON-CP) that goes beyond current methods by combining evolutionary, physicochemical, and sequence-based information. The problems resulting from the increased dimensionality and complexity of the learning problem are combated with a careful feature analysis, which results in a drastically reduced feature set. The different information sources are combined using deep neural networks. Results On 21 hard CASP11 FM targets, EPSILON-CP achieves a mean precision of 35.7% for top- L/10 predicted long-range contacts, which is 11% better than the CASP11 winning version of MetaPSICOV. The improvement on 1.5L is 17%. Furthermore, in this study we find that the amino acid composition, a commonly used feature, is rendered ineffective in the context of meta approaches. The size of the refined feature set decreased by 75%, enabling a significant increase in training data for machine learning, contributing significantly to the observed improvements. Conclusions Exploiting as much and diverse information as possible is key to accurate contact prediction. Simply merging the information introduces new challenges. Our study suggests that critical feature analysis can improve the performance of contact prediction methods that combine multiple information sources. EPSILON-CP is available as a webservice: http://compbio.robotics.tu-berlin.de/epsilon/en
dc.description.sponsorshipTU Berlin, Open-Access-Mittel - 2017de
dc.identifier.issn1471-2105
dc.identifier.urihttps://depositonce.tu-berlin.de/handle/11303/8645
dc.identifier.urihttp://dx.doi.org/10.14279/depositonce-7779
dc.language.isoenen
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en
dc.subject.ddc570 Biowissenschaften; Biologiede
dc.subject.ddc004 Datenverarbeitung; Informatikde
dc.subject.othercontact predictionen
dc.subject.othermeta algorithmsen
dc.subject.otherdeep learningen
dc.titleEPSILON-CP: using deep learning to combine information from multiple sources for protein contact predictionen
dc.typeArticleen
dc.type.versionpublishedVersionen
dcterms.bibliographicCitation.articlenumber303en
dcterms.bibliographicCitation.doi10.1186/s12859-017-1713-xen
dcterms.bibliographicCitation.journaltitleBMC Bioinformaticsen
dcterms.bibliographicCitation.originalpublishernameBioMed Centralen
dcterms.bibliographicCitation.originalpublisherplaceLondonen
dcterms.bibliographicCitation.volume18en
tub.accessrights.dnbfreeen
tub.affiliationFak. 4 Elektrotechnik und Informatik::Inst. Technische Informatik und Mikroelektronik::FG Roboticsde
tub.affiliation.facultyFak. 4 Elektrotechnik und Informatikde
tub.affiliation.groupFG Roboticsde
tub.affiliation.instituteInst. Technische Informatik und Mikroelektronikde
tub.publisher.universityorinstitutionTechnische Universität Berlinen

Files

Original bundle
Now showing 1 - 1 of 1
Loading…
Thumbnail Image
Name:
s12859-017-1713-x.pdf
Size:
697.49 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
4.9 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections