Glycosylphosphatidylinositol (GPI) anchoring is a common post-translational modification, mainly from extracellular eukaryotic proteins. GPI is partially attached to the carboxyl terminus of peptides after proteolytic cleavage of C-terminal propeptides and is involved in recognition and signaling processes. At CD BioGlyco, we are fully equipped with glycoinformatics technology to provide Structure or Site Analysis/Prediction Services to various types of compounds or glycosylation processes.
Here, we utilize informatics methods to identify GPI-anchored proteins and provide GPI-anchored site prediction services to our clients to help them distinguish GPI-anchored proteins and determine GPI-anchored site locations.
Taking proteins with GPI-anchored site annotations as a prerequisite, proteins containing known GPI-anchored sites are collected to establish a GPI-anchored protein database, which is examined using the literature, and the presence of GPI anchors and the location of anchor sites have been confirmed.
The proteins in the database are grouped, and the division of the groups is based on the sequence identity situation. Each group consists of one or two proteins without any detectable similarity of sequences in different groups. In distinguishing GPI-anchored proteins, a computer program is used to evaluate the prediction performance.
To characterize where the GPI-anchored sites are located, we use the hidden Markov model (HMM). The HMM is a graphical model consisting of states, where each state represents the position of the sequence.
Using the HMM model, we identify and predict the site where the C-terminal sequence of a protein begins. At the same time, we use the sequences of known GPI-anchored sites in the database for cross-validation and employ algorithms to decode the data.
Currently, we successfully predict the set of experimentally known GPI-anchored proteins, and most of the predicted GPI-anchored sites are formed by five typical residues: cysteine, aspartic acid, glycine, asparagine, and serine.
In addition, when making predictions for each protein, we use a computerized technique to assess the threshold specificity. Depending on the specificity value, the GPI anchoring potential of the protein can be assessed. Clients predict the location of the GPI-anchored site based on the specificity value as well as the specifics of the experiment.
Technology: HMM and support vector machine (SVM)
Journal: BMC Bioinformatics
IF: 2.9
Published: 2008
Results: In this article, the authors present PredGPI, a prediction method that can effectively predict the presence of GPI anchors and the location of ω sites by coupling HMM and SVM. During the experiments, the authors combined the probabilistic output of the HMM with information from the entire sequence, the carboxy-terminal region and the amino-terminal region, and utilized the SVM to fully characterize the overall composition of the protein sequence, such as the characterization of the N-terminal region of the signal peptide, as well as the characterization of the C-terminal region, which contains the cleaved GPI-anchor signal. Based on experimental results, PredGPI was found to correctly replicate the results of previously published high-throughput experiments with a much lower false-positive prediction rate, making it a cost-effective, fast, and accurate method for screening the entire proteome.
Fig.1 HMM at the protein GPI-anchored ω site. (Pierleoni, et al., 2008)
CD BioGlyco has mastered the world's leading glycoinformatics technology and formed a multidisciplinary team of professionals to provide Structure and Function Prediction Services of glycan biomolecules to our clients. We are constantly innovating to improve our data integration and analysis capabilities to ensure high efficiency and accuracy in the prediction of GPI-anchored sites. Please feel free to contact us if you have any questions, our staff will reply to you promptly.
Reference
We envision a future where the intricate world of carbohydrate is no longer shrouded in mystery, but rather illuminated by the power of cutting-edge computational tools.