tmtools
- Structural superposition¶
The tmtools
module provides access to the structural
superposition programs TMscore and Tmalign developed by Y. Zhang
and J. Skolnick. These programs superpose a model onto a reference structure,
using the positions of the Calpha atoms only. While at their core, these
programs essentially use the same algorithm, they differ on how the Calphas are
paired. TMscore pairs the Calpha atom based on the residue number, TMalign
calculates an optimal pairing of Calpha atom based on heuristics.
Citation:
Yang Zhang and Jeffrey Skolnick, Proteins 2004 57: 702-710 Y. Zhang and J. Skolnick, Nucl. Acids Res. 2005 33, 2302-9
Besides using the standalone TM-align program, ost also provides wrappers around USalign as published in:
Chengxin Zhang, Morgan Shine, Anna Marie Pyle, Yang Zhang (2022) Nat Methods
USalign can be used as external standalone tool. Alternatively, ost allows to directly inject structural data in the USAlign c++ layer. The advantage of that is that no intermediate files must be generated.
Distance measures used by TMscore¶
There are many different ways to describe the structural similarity of two protein structures at the Calpha level. TMscore calculate several of these measures. The most common is to describe the difference in terms of the root mean square deviation of the Calpha positions, the RMSD. Despite its common use, RMSD has several drawbacks when working with incomplete models. Since the RMSD highly depends on the set of included atoms, it is relatively easy to obtain a smaller RMSD by omitting flexible parts of a protein structure. This has lead to the introduction of the global distance test (GDT). A model is compared to a reference by calculating the fraction of Calpha atoms that can be superposed below a certain cutoff, e.g. 1Å. The fractions of several such cutoffs are combined into the GDT_TS (1, 2, 4 and 8Å) and GDT_HA (0.5, 1, 2, 4Å) and divided by four to obtain the final measure. In contrast to RSMD, GDT is an agreement measure. The higher the value, the more similar the two structures are. TM-score (not to be confused by TMscore, the program), additionally adds a size dependences to the GDT measure by taking the protein length into account. As with GDT, the bigger the value, the more similar the two structures are.
Common Usage¶
The following example shows how to use TMscore to superpose two protein structures and print the RMSD as well as the GDT_TS and GDT_HA similarity measures.
from ost.bindings import tmtools
pdb1=io.LoadPDB('1ake.pdb', restrict_chains='A')
pdb2=io.LoadPDB('4ake.pdb', restrict_chains='A')
result=tmtools.TMScore(pdb1, pdb2)
print(result.rmsd_below_five) # 1.9
print(result.gdt_ha) # 0.41
print(result.gdt_ts) # 0.56
Usage of TMalign¶
- TMAlign(model1, model2, tmalign=None)¶
Performs a sequence independent superposition of model1 onto model2, the reference.
- Parameters:
model1 (
EntityView
orEntityHandle
) – The model structure. If the superposition is successful, will be superposed onto the reference structuremodel2 (
EntityView
orEntityHandle
) – The reference structuretmalign – If not None, the path to the tmalign executable.
- Returns:
The result of the tmscore superposition
- Return type:
- Raises:
FileNotFound
if tmalign could not be located.- Raises:
RuntimeError
if the superposition failed
Usage of TMscore¶
- TMScore(model1, model2, tmscore=None)¶
Performs a sequence dependent superposition of model1 onto model2, the reference.
- Parameters:
model1 (
EntityView
orEntityHandle
) – The model structure. If the superposition is successful, will be superposed onto the reference structuremodel2 (
EntityView
orEntityHandle
) – The reference structuretmscore – If not None, the path to the tmscore executable.
- Returns:
The result of the tmscore superposition
- Return type:
- Raises:
FileNotFound
if tmalign could not be located.- Raises:
RuntimeError
if the superposition failed
- class TMScoreResult(rmsd_common, tm_score, max_sub, gdt_ts, gdt_ha, rmsd_below_five, transform)¶
Holds the result of running TMscore
- rmsd_common¶
The RMSD of the common Calpha atoms of both structures
- rmsd_below_five¶
The RMSD of all Calpha atoms that can be superposed below five Angstroem
- tm_score¶
The TM-score of the structural superposition
- gdt_ha¶
The GDT_HA of the model to the reference structure.
- gdt_ts¶
The GDT_TS of the model to the reference structure.
Usage of USalign¶
For higher order complexes, ost provides access to USalign. This corresponds to calling USalign with the preferred way of comparing full biounits:
USalign mdl.pdb ref.pdb -mm 1 -ter 0
- USAlign(model1, model2, usalign=None, custom_chain_mapping=None)¶
Performs a sequence independent superposition of model1 onto model2, the reference. Can deal with multimeric complexes and RNA.
Creates temporary model files on disk and runs USalign with:
USalign model1.pdb model2.pdb -mm 1 -ter 0 -m rotmat.txt
- Parameters:
model1 (
EntityView
orEntityHandle
) – The model structure. If the superposition is successful, will be superposed onto the reference structuremodel2 (
EntityView
orEntityHandle
) – The reference structureusalign – If not None, the path to the USalign executable. Searches for executable with name
USalign
in PATH if not given.custom_chain_mapping (
dict
) – Custom chain mapping that is passed as -chainmap to USalign executable. Raises an error is this is not supported by the USalign executable you’re using (introduced in July 2023). It’s a dict with reference chain names as key (model2) and model chain names as values (model1).
- Returns:
The result of the superposition
- Return type:
- Raises:
FileNotFound
if executable could not be located.- Raises:
RuntimeError
if the superposition failed
C++ wrappers¶
Instead of calling the external executables, ost also provides a wrapper around the USalign c++ implementation which is shipped with the ost source code. The advantage is that no intermediate files need to be generated.
from ost import bindings
pdb1=io.LoadPDB('1ake.pdb').Select("peptide=true")
pdb2=io.LoadPDB('4ake.pdb').Select("peptide=true")
result = bindings.WrappedTMAlign(pdb1.chains[0], pdb2.chains[0],
fast=True)
print(result.tm_score)
print(result.alignment.ToString(80))
- class TMAlignResult(rmsd, tm_score, aligned_length, transform, alignment)¶
All parameters of the constructor are available as attributes of the class
- Parameters:
rmsd (
float
) – RMSD of the superposed residuestm_score (
float
) – TMScore of the superposed residuestm_score_swapped – TMScore when reference is swapped
aligned_length (
int
) – Number of superposed residuestransform (
geom.Mat4
) – Transformation matrix to superpose first chain onto referencealignment (
ost.seq.AlignmentHandle
) – The sequence alignment given the structural superposition
- WrappedTMAlign(chain1, chain2[, fast=False])¶
Takes two chain views and runs TMalign from USAlign with chain2 as reference. The positions and sequences are directly extracted from the chain residues for every residue that fulfills:
peptide linking and valid CA atom OR nucleotide linking and valid C3’ atom
valid one letter code(no ‘?’)
The function automatically identifies whether the chains consist of peptide or RNA residues. An error is raised if the two types are mixed.
- Parameters:
chain1 (
ost.mol.ChainView
) – Chain from which position and sequence are extracted to run TMalign.chain2 (
ost.mol.ChainView
) – Chain from which position and sequence are extracted to run TMalign, this is the reference.fast (
bool
) – Whether to apply the fast flag to TMAlign
- Return type:
- WrappedTMAlign(pos1, pos2, seq1, seq2 [fast=False, rna=False])
Similar as described above, but directly feeding in raw data.
- Parameters:
pos1 (
ost.geom.Vec3List
) – CA/C3’ positions of the first chainpos2 (
ost.geom.Vec3List
) – CA/C3’ positions of the second chain, this is the reference.seq1 (
ost.seq.SequenceHandle
) – Sequence of first chainseq2 (
ost.seq.SequenceHandle
) – Sequence of second chainfast (
bool
) – Whether to apply the fast flag to TMAlignrna (
bool
) – Whether to treat as RNA
- Return type:
- Raises:
ost.Error
if pos1 and seq1, pos2 and seq2 respectively are not consistent in size.
For higher order complexes, ost provides access to the MMalign functionality from USalign.
- class MMAlignResult(rmsd, tm_score, transform, aligned_length, alignments, ent1_mapped_chains, ent2_mapped_chains)¶
All parameters of the constructor are available as attributes of the class
- Parameters:
rmsd (
float
) – RMSD of the superposed residuestm_score (
float
) – TMScore of the superposed residuestm_score_swapped – TMScore when reference is swapped
aligned_length (
int
) – Number of superposed residuestransform (
geom.Mat4
) – Transformation matrix to superpose mdl onto referencealignments (
ost.seq.AlignmentList
) – Alignments of all mapped chains, with first sequence being from ent1 and second sequence from ent2ent1_mapped_chains (
ost.StringList
) – All mapped chains from ent1ent2_mapped_chains (
ost.StringList
) – The respective mapped chains from ent2
- WrappedMMAlign(ent1, ent2[, fast=False])¶
Takes two entity views and runs MMalign with ent2 as reference. The positions and sequences are directly extracted from the chain residues for every residue that fulfills:
peptide linking and valid CA atom OR nucleotide linking and valid C3’ atom
valid one letter code(no ‘?’)
The function automatically identifies whether the chains consist of peptide or RNA residues. An error is raised if the two types are mixed in the same chain.
- Parameters:
ent1 (
ost.mol.EntityView
) – Entity from which position and sequence are extracted to run MMalign.ent2 (
ost.mol.EntityView
) – Entity from which position and sequence are extracted to run MMalign, this is the reference.fast (
bool
) – Whether to apply the fast flag to MMAlign
- Return type: