Home Research COVID-19 Services Publications People Teaching Job Opening News Forum Lab Only
Online Services

I-TASSER QUARK LOMETS COACH COFACTOR MetaGO MUSTER CEthreader SEGMER FG-MD ModRefiner REMO DEMO SPRING COTH BSpred ANGLOR EDock BSP-SLIM SAXSTER FUpred ThreaDom ThreaDomEx EvoDesign GPCR-I-TASSER MAGELLAN BindProf BindProfX SSIPe ResQ IonCom STRUM DAMpred

TM-score TM-align MM-align RNA-align NW-align LS-align EDTSurf MVP MVP-Fit SPICKER HAAD PSSpred 3DRobot MR-REX I-TASSER-MR SVMSEQ NeBcon ResPRE WDL-RF ATPbind DockRMSD DeepMSA FASPR EM-Refiner

BioLiP E. coli GLASS GPCR-HGmod GPCR-RD GPCR-EXP Tara-3D TM-fold DECOYS POTENTIAL RW/RWplus EvoEF HPSF THE-DB ADDRESS Alpaca-Antibody CASP7 CASP8 CASP9 CASP10 CASP11 CASP12 CASP13 CASP14

[Back to I-TASSER-MD homepage]


About I-TASSER-MD Pipeline



What is I-TASSER-MD?

    I-TASSER-MD is a pipeline specially designed to automatically generate high-quality structures and biological functions for proteins containing multiple domains from amino acid sequence alone. It is a extended protocol of I-TASSER, which integrates state-of-the-art algorithms for protein domain splitting, domain modeling, domain assembly, and domain/full-chain function prediction in a fully automated pipeline. I-TASSER-MD includes all new features of the current version of I-TASSER and several new deep learning-based components.

How does I-TASSER-MD generate multi-domain protein structure predictions?

    When user submits an amino acid sequence, the server first predicts the domain boundaries by FUpred and ThreaDom, the locally installed protein domain boundaries prediction methods. Meanwhile, the inter-domain distance and contact are predicted with a deep convolutional neural-network predictor DeepPotential extended from TripletRes, and the threading templates are identified by LOMETS.

    In the second step, if the query sequence is predicted as a multi-domain protein and none of the top 10 threading templates can cover all domains (i.e., >=1 domains with the alignment coverage smaller than the cutoff C=0.95), each domain model will be independently generated by D-I-TASSER, an new version of I-TASSER improved by integrating deep learning predicted spatial restraints. Otherwise, if the query is predicted as a single-domain protein or >=1 top templates can cover all domains, the structure is directly modelled by D-I-TASSER.

    In the third step, the domain models are assembled into the full-length model by DEMO guided by the structurally analogous templates, the inter-domain distances and domain-domain intefaces predicted by DeepPotential, and the knowledge-based inter-domain potentials.

    In the last step, the protein function, including the Enzyme Commission (EC) numbers, Gene Ontology (GO) terms, and ligand-binding sites are predicted by COFACTOR for the individual domain and full-chain protein based on the modeled structures, sequences, and protein-protein interactions (PPIs).


    Figure 1. Pipeline of I-TASSER-MD for automated multi-domain protein structruce and function prediction.

What are the performances of I-TASSER-MD server compared with other methods?

    CASP (or Critical Assessment of Techniques for Protein Structure Prediction) is a community-wide experiment for testing the state-of-the-art of protein structure predictions which takes place every two years since 1994. The experiment (often referred as a competition) is strictly blind because the structures of testing proteins are unknown to the predictors.

    We have used the I-TASSER-MD protocol (as ‘Zhang-Server’) to predict structures of all targets in the latest CASP13 and CASP14, and the protocol was ranked as the number one server. Figures 2a and 2b show the comparisons between the I-TASSER-MD protocol and other top 4 servers for protein structure prediction in CASP13 and CASP 14, respectively, where we sorted them according to the GDT-score of full-length models of multi-domain proteins. As shown in the figure, the performance of I-TASSER-MD on the full-length structure prediction of multi-domain proteins is better than other servers for both CASP13 and CASP14 targets. The accuracy of individual domain models of multi-domain proteins is also better than other severs as each individual domain is independently modelled. In particular, I-TASSER-MD achieves an average GDT-score of 65.5 for all individual domain models of multi-domain proteins in CASP14, which is 8.3% higher than that of the second best server, Yang-Server (60.5). In addition to multi-domain protein structure prediction, the protocol also achieves a superior performance on single-domain protein structure modeling. Especially in CASP14, the performance of Zhang-Server is clearly better than other third-party servers. Furthermore, I-TASSER-MD can accurately distinguish multi-domain proteins from single-domain and predict the domain boundary with high accuracy.

    The performance of I-TASSER-MD on the domain boundary prediction is also compared with two state-of-the-art methods ConDO and DoBo over the CASP13 and CASP14 targets. As shown in Figs. 2c and 2d, the performance of I-TASSER-MD for domain boundary prediction is significantly better than these two methods in terms of normalized domain overlap (NDO) score for the protein domain boundary prediction, and accuracy (ACC) and Matthew’s correlation coefficient (MCC) for the protein classification. The performance of I-TASSER-MD is obviously improved in CASP14 due to the use of FUpred, which is guided by the deep learning predicted contact map.


    Figure 2. Performance of the I-TASSER-MD protocol on the CASP13 and CASP14 targets. a and b, Comparison between I-TASSER-MD (Zhang-Server) with other top 4 servers in CASP13 (a) and CASP14 (b) on the performance of full-length multi-domain proteins, individual domains of multi-domain proteins, and single-domain proteins in terms of the global distance test (GDT) score, where the server are sorted according to the GDT score of full-length models of multi-domain proteins. c and d, Comparison of I-TASSER-MD with ConDo and DoBo for the protein domain boundary prediction over the CASP13 (c) and CASP14 (d) targets

What are the output of the I-TASSER-MD server if you submit a seqeunce?

    The output of the I-TASSER-MD server include:
    • Up to five full-length atomic models (ranked based on the energy)
    • Estimated accuracy of the predicted models (including a confidence score of all models, and predicted TM-score and RMSD for the first model)
    • Predicted domain bounary
    • Individual domain models by D-I-TASSER
    • Predicted secondary structures
    • Predicted solvent accessibility
    • Top 10 full-length templates for domain assembly
    • Predicted distance and inferface maps for domain assembly
    • Top 10 proteins in PDB which are structurally closest to the full-length models and that to inividual domain models
    • Predicted GO terms of the full-chain protein and inividual domains
    • Predicted enzyme classification of the full-chain protein and individual domains
    • Predicted ligand-binding sites of the full-chain protein and individual domains
    An illustrative example of the I-TASSER-MD output can be seen from here.

How to interpret the output data generated by the I-TASSER-MD server?

    The outputs of the I-TASSER-MD modeling results are generally summarized in a webpage, the link of which is sent to the users by email after the modeling is completed (see an example of I-TASSER-MD output). In the following, we present answers to several most frequently asked questions in interpreting the I-TASSER-MD results.

    • What is the 'top 5 models constructed by I-TASSER-MD'?

      For each target, I-TASSER-MD reports up to five full-length models ranked by the total energy. It is possible that the lower-rank models have a higher C-score. Although the first model has a higher C-score and a better quality in most cases, it is not unusual that the lower-rank models have a better quality than the higher-rank models. If only one full-length model is reported, it indicates that the model is directly generated by D-I-TASSER as the threading templates can cover all domains, and the top templates identified by LOMETS have consistent topologies. In these cases, the final model usually has a relative high C-score, indicating a high-quality final model.

    • What are the "top 10 full-length templates for domain assembly"?

      I-TASSER-MD identifies the analogous full-length templates from a non-redundant multi-domain protein library using TM-align structural alignments. All domain models are aligned to each template of the library by TM-align, and the TM-score of all domains is defined as the score of a template. The top 10 templates with the highest score are selected to generate the initial full-length model and deduce the inter-domain distance restraints to guide the domain assembly.

    • What is C-score?

      C-score is a confidence score for estimating the quality of predicted models by I-TASSER-MD. It is calculated based on the convergence parameters of the domain assembly simulations, the quality of the full-length templates for domain assembly, the satisfaction degree of the inter-domain distances, and the estimated accuracy of the individual domain model. C-score is typically in the range of [-5,2], where a C-score of higher value signifies a model with a high confidence and vice-versa.

    • What is TM-score?

      TM-score is a metric for measuring the structural similarity between two structures (see Zhang and Skolnick, Scoring function for automated assessment of protein structure template quality, Proteins, 2004 57: 702-710). The purpose of proposing TM-score is to solve the problem of RMSD which is sensitive to the local error. Because RMSD is an average distance of all residue pairs in two structures, a local error (e.g. a misorientation of the tail) will arise a big RMSD value although the global topology is correct. In TM-score, however, the small distance is weighted stronger than the big distance which makes the score insensitive to the local modeling error. A TM-score >0.5 indicates a model of correct topology and a TM-score<0.17 means a random similarity. These cutoff does not depends on the protein length.

      Here the 'Estimated TM-score' is an estimated value of TM-score over the correlation between TM-score and C-score which is observed by a nonredundant training set.

    • What are 'Proteins structurally close to the query protein'?

      After the full-length model generation, I-TASSER-MD uses TM-align program to match the first model to all structures in the PDB library. This section reports the top 10 proteins from the PDB which have the closest structural similarity (i.e. the highest TM-score) to the predicted I-TASSER-MD model. Due to the structural similarity, these proteins often have similar function to the target. However, users are encouraged to use the predicted function by COFACTOR to infer the biological function of the target protein, since COFACTOR has been extensively trained to derive function from multi-source of sequence and structure features which has on average a much higher accuracy than the function annotations derived only from the global structure comparison.

How to use known information (e.g. full-length templates, experimental data) to improve I-TASSER-MD modeling?

    If users have some information or experimental data about the structure of the query protein, the information can be conveniently uploaded to the I-TASSER-MD server. The information can significantly improve the quality of structural and function predictions.

    The I-TASSER-MD server currently accepts the following information:

    • Domain definition
    • Up to 20 full-length templates
    • Experimental cross-linking data
    • Inter-domain contact/inteface restraints
    • Cryo-EM density map
How long does it take for I-TASSER-MD to generate the predictions for your protein?

    It usually takes several hours to 1~4 days from submitting a sequence to receiving the prediction results. But if too many sequences are accumulated in the queue, the procedure may take a longer time. The time also depends on the protein size and a smaller protein takes shorter time than a larger protein.

    However, it will cost less time if you provide the domain definition since the program does not need to predict the domain bounaries.

How to cite I-TASSER-MD

    You are requested to cite following article when you use the I-TASSER-MD server:

    • Xiaogen Zhou, Wei Zheng, Yang Li, Robin Pearce, Chengxin Zhang, Eric W. Bell, Guijun Zhang, and Yang Zhang. I-TASSER-MD: A deep-learning based platform for multi-domain protein structure and function prediction, submitted, 2021.

Funding support

    The development of I-TASSER-MD server is supported by the National Institute of General Medical Sciences (GM136422 and S10OD026825), the National Institute of Allergy and Infectious Diseases (AI134678), the National Science Foundation (IIS1901191 and DBI2030790). This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by the National Science Foundation (ACI1548562).

Contact information

    The I-TASSER-MD server is in active development with the goal to provide the most accurate multi-domain protein and large size protein structure modeling. Please help us achieve the goal by sending your questions, feedback, and comments to yangzhanglab@umich.edu.

yangzhanglabumich.edu | (734) 647-1549 | 100 Washtenaw Avenue, Ann Arbor, MI 48109-2218