Quaternary Structure (QS) scores (deprecated)¶
qsscoring – Quaternary Structure (QS) scores¶
Scoring of quaternary structures (QS). The QS scoring is according to the paper by Bertoni et al..
Note
Requirements for use:
A default
compound librarymust be defined and accessible viaGetDefaultLib(). This is set by default when executing scripts withost. Otherwise, you must set this withSetDefaultLib().ClustalW must be installed (unless you provide chain mappings)
Python modules numpy and scipy must be installed and available (e.g. use
pip install scipy numpy)
- FilterContacts(contacts, chain_names)¶
Filter contacts to contain only contacts for chains in chain_names.
- Parameters:
contacts (
dict) – Connectivity dictionary as produced byGetContacts().chain_names (
listor (better)set) – Chain names to keep.
- Returns:
New connectivity dictionary (format as in
GetContacts())- Return type:
dict
- GetContacts(entity, calpha_only, dist_thr=12.0)¶
Get inter-chain contacts of a macromolecular entity.
Contacts are pairs of residues within a given distance belonging to different chains. They are stored once per pair and include the CA/CB-CA/CB distance.
- Parameters:
entity (
EntityHandleorEntityView) – An entity to check connectivity for.calpha_only (
bool) – If True, we only consider CA-CA distances. Else, we use CB unless the residue is a GLY.dist_thr (
float) – Maximal CA/CB-CA/CB distance to be considered in contact.
- Returns:
A connectivity dictionary. A pair of residues with chain names ch_name1 & ch_name2 (ch_name1 < ch_name2), residue numbers res_num1 & res_num2 and distance dist (<= dist_thr) are stored as result[ch_name1][ch_name2][res_num1][res_num2] = dist.
- Return type:
dict
- class MappedLDDTScorer(alignment, calpha_only, settings)¶
A simple class to calculate a single-chain lDDT score on a given chain to chain mapping as extracted from
OligoLDDTScorer.- Parameters:
alignment – Sets
alignmentcalpha_only – Sets
calpha_onlysettings – Sets
settings
- alignment¶
Alignment with two sequences named according to the mapped chains and with views attached to both sequences (e.g. one of the items of
QSscorer.alignments).The first sequence is assumed to be the reference and the second one the model. Since the lDDT score is not symmetric (extra residues in model are ignored), the order is important.
- Type:
- calpha_only¶
If True, restricts lDDT score to CA only.
- Type:
bool
- settings¶
Settings to use for lDDT scoring.
- Type:
- lddt_scorer¶
lDDT Scorer object for the given chains.
- Type:
lDDTScorer
- reference_chain_name¶
Chain name of the reference.
- Type:
str
- model_chain_name¶
Chain name of the model.
- Type:
str
- GetPerResidueScores()¶
- Returns:
Scores for each residue
- Return type:
listofdictwith one item for each residue existing in model and reference:”residue_number”: Residue number in reference chain
”residue_name”: Residue name in reference chain
”lddt”: local lDDT
”conserved_contacts”: number of conserved contacts
”total_contacts”: total number of contacts
- class OligoLDDTScorer(ref, mdl, alignments, calpha_only, settings, penalize_extra_chains=False, chem_mapping=None)¶
Helper class to calculate oligomeric lDDT scores.
This class can be used independently, but commonly it will be created by calling
QSscorer.GetOligoLDDTScorer().Note
By construction, lDDT scores are not symmetric and hence it matters which structure is the reference (
ref) and which one is the model (mdl). Extra residues in the model are generally not considered. Extra chains in both model and reference can be considered by setting thepenalize_extra_chainsflag to True.- Parameters:
ref – Sets
refmdl – Sets
mdlalignments – Sets
alignmentscalpha_only – Sets
calpha_onlysettings – Sets
settingspenalize_extra_chains – Sets
penalize_extra_chainschem_mapping – Sets
chem_mapping. Must be given if penalize_extra_chains is True.
- ref¶
- mdl¶
Full reference/model entity to be scored. The entity must contain all chains mapped in
alignmentsand may also contain additional ones which are considered ifpenalize_extra_chainsis True.- Type:
- alignments¶
One alignment for each mapped chain of
ref/mdlas defined inQSscorer.alignments. The first sequence of each alignment belongs torefand the second one tomdl. Sequences must have sequence naming and attached views as defined inQSscorer.alignments.- Type:
listofAlignmentHandle
- calpha_only¶
If True, restricts lDDT score to CA only.
- Type:
bool
- settings¶
Settings to use for lDDT scoring.
- Type:
- penalize_extra_chains¶
If True, extra chains in both
refandmdlwill penalize the lDDT scores.- Type:
bool
- chem_mapping¶
Inter-complex mapping of chemical groups as defined in
QSscorer.chem_mapping. Used to find “chem-mapped” chains inreffor unmapped chains inmdlwhen penalizing scores. Each unmapped model chain can add extra reference-contacts according to the average total contacts of each single “chem-mapped” reference chain. If there is no “chem-mapped” reference chain, a warning is shown and the model chain is ignored.Only relevant if
penalize_extra_chainsis True.
- property lddt_mdl¶
The model entity used for oligomeric lDDT scoring (
oligo_lddt/oligo_lddt_scorer).Like
lddt_ref, this is a single chain X containing all chains ofmdl. The residue numbers match the ones inlddt_refwhere aligned and have unique numbers for additional residues.- Getter:
Computed on first use (cached)
- Type:
- property lddt_ref¶
The reference entity used for oligomeric lDDT scoring (
oligo_lddt/oligo_lddt_scorer).Since the lDDT computation requires a single chain with mapped residue numbering, all chains of
refare appended into a single chain X with unique residue numbers according to the column-index in the alignment. The alignments are in the same order as they appear inalignments. Additional residues are appended at the end of the chain with unique residue numbers. Unmapped chains are only added ifpenalize_extra_chainsis True. Only CA atoms are considered ifcalpha_onlyis True.- Getter:
Computed on first use (cached)
- Type:
- property mapped_lddt_scorers¶
List of scorer objects for each chain mapped in
alignments.- Getter:
Computed on first use (cached)
- Type:
listofMappedLDDTScorer
- property oligo_lddt¶
Oligomeric lDDT score.
The score is computed as conserved contacts divided by the total contacts in the reference using the
oligo_lddt_scorer, which uses the full complex as reference/model structure. Ifpenalize_extra_chainsis True, the reference/model complexes contain all chains (otherwise only the mapped ones) and additional contacts are added to the reference’s total contacts for unmapped model chains according to thechem_mapping.The main difference with
weighted_lddtis that the lDDT scorer “sees” the full complex here (incl. inter-chain contacts), while the weighted single chain score looks at each chain separately.- Getter:
Computed on first use (cached)
- Type:
float
- property oligo_lddt_scorer¶
lDDT Scorer object for
lddt_refandlddt_mdl.- Getter:
Computed on first use (cached)
- Type:
lDDTScorer
- property sc_lddt¶
List of global scores extracted from
sc_lddt_scorers.If scoring for a mapped chain fails, an error is displayed and a score of 0 is assigned.
- Getter:
Computed on first use (cached)
- Type:
listoffloat
- property sc_lddt_scorers¶
List of lDDT scorer objects extracted from
mapped_lddt_scorers.- Type:
listoflDDTScorer
- property weighted_lddt¶
Weighted average of single chain lDDT scores.
The score is computed as a weighted average of single chain lDDT scores (see
sc_lddt_scorers) using the total contacts of each single reference chain as weights. Ifpenalize_extra_chainsis True, unmapped chains are added with a 0 score and total contacts taken from the actual reference chains or (for unmapped model chains) using thechem_mapping.See
oligo_lddtfor a comparison of the two scores.- Getter:
Computed on first use (cached)
- Type:
float
- class QSscoreEntity(ent)¶
Entity with cached entries for QS scoring.
Any known / precomputed information can be filled into the appropriate attribute here as long as they are labelled as read/write. Otherwise the quantities are computed on first access and cached (lazy evaluation). The heaviest load is expected when computing
contactsandcontacts_ca.- Parameters:
ent (
EntityHandleorEntityView) – Entity to be used for QS scoring. A copy of it will be processed.
- is_valid¶
True, if successfully initialized. False, if input structure has no protein chains with >= 20 residues.
- Type:
bool
- original_name¶
Name set for ent when object was created.
- Type:
str
- ent¶
Cleaned version of ent passed at construction. Hydrogens are removed, the entity is processed with a
RuleBasedProcessorand chains listed inremoved_chainshave been removed. The name of this entity might change during scoring (seeGetName()). Otherwise, this will be fixed.- Type:
- removed_chains¶
Chains removed from ent passed at construction. These are ligand and water chains as well as small (< 20 res.) peptides or chains with no amino acids (determined by chem. type, which is set by rule based processor).
- Type:
listofstr
- calpha_only¶
Whether entity is CA-only (i.e. it has 0 CB atoms)
- Type:
bool
- GetAlignment(c1, c2)¶
Get sequence alignment of chain c1 with chain c2. Computed on first use based on
ca_chains(cached).- Parameters:
c1 (
str) – Chain name for first chain to align.c2 (
str) – Chain name for second chain to align.
- Return type:
AlignmentHandleor None if it failed.
- GetAngles(c1, c2)¶
Get Euler angles from superposition of chain c1 with chain c2. Computed on first use based on
ca_chains(cached).- Parameters:
c1 (
str) – Chain name for first chain to superpose.c2 (
str) – Chain name for second chain to superpose.
- Returns:
3 Euler angles (may contain nan if something fails).
- Return type:
numpy.array
- GetAxis(c1, c2)¶
Get axis of symmetry from superposition of chain c1 with chain c2. Computed on first use based on
ca_chains(cached).- Parameters:
c1 (
str) – Chain name for first chain to superpose.c2 (
str) – Chain name for second chain to superpose.
- Returns:
Rotational axis (may contain nan if something fails).
- Return type:
numpy.array
- GetName()¶
Wrapper to
GetName()ofent. This is used to uniquely identify the entity while scoring. The name may therefore change whileoriginal_nameremains fixed.
- SetName(new_name)¶
Wrapper to
SetName()ofent. Use this to change unique identifier while scoring (seeGetName()).
- property ca_chains¶
Map of chain names in
entto sequences with attached view to CA-only chains (intoca_entity). Useful for alignments and superpositions.- Getter:
Computed on first use (cached)
- Type:
dict(key =str, value =SequenceHandle)
- property ca_entity¶
Reduced representation of
entwith only CA atoms. This guarantees that each included residue has exactly one atom.- Getter:
Computed on first use (cached)
- Type:
- property chem_groups¶
Intra-complex group of chemically identical (seq. id. > 95%) polypeptide chains as extracted from
ca_chains. First chain in group is the one with the longest sequence.- Getter:
Computed on first use (cached)
- Type:
listoflistofstr(chain names)
- property contacts¶
Connectivity dictionary (read/write). As given by
GetContacts()with calpha_only = False onent.- Getter:
Computed on first use (cached)
- Setter:
Uses
FilterContacts()to ensure that we only keep contacts for chains in the cleaned entity.- Type:
See return type of
GetContacts()
- property contacts_ca¶
CA-only connectivity dictionary (read/write). Like
contactsbut with calpha_only = True inGetContacts().
- exception QSscoreError¶
Exception to be raised for “acceptable” exceptions in QS scoring.
Those are cases we might want to capture for default behavior.
- class QSscorer(ent_1, ent_2, res_num_alignment=False)¶
Object to compute QS scores.
Simple usage without any precomputed contacts, symmetries and mappings:
import ost from ost.mol.alg import qsscoring # load two biounits to compare ent_full = ost.io.LoadPDB('3ia3', remote=True) ent_1 = ent_full.Select('cname=A,D') ent_2 = ent_full.Select('cname=B,C') # get score ost.PushVerbosityLevel(3) try: qs_scorer = qsscoring.QSscorer(ent_1, ent_2) ost.LogScript('QSscore:', str(qs_scorer.global_score)) ost.LogScript('Chain mapping used:', str(qs_scorer.chain_mapping)) # commonly you want the QS global score as output qs_score = qs_scorer.global_score except qsscoring.QSscoreError as ex: # default handling: report failure and set score to 0 ost.LogError('QSscore failed:', str(ex)) qs_score = 0
For maximal performance when computing QS scores of the same entity with many others, it is advisable to construct and reuse
QSscoreEntityobjects.Any known / precomputed information can be filled into the appropriate attribute here (no checks done!). Otherwise most quantities are computed on first access and cached (lazy evaluation). Setters are provided to set values with extra checks (e.g.
SetSymmetries()).All necessary seq. alignments are done by global BLOSUM62-based alignment. A multiple sequence alignment is performed with ClustalW unless
chain_mappingis provided manually. You will need to have an executableclustalworclustalw2in yourPATHor you must setclustalw_binaccordingly. Otherwise an exception (ost.settings.FileNotFound) is thrown.Formulas for QS scores:
- QS_best = weighted_scores / (weight_sum + weight_extra_mapped) - QS_global = weighted_scores / (weight_sum + weight_extra_all) -> weighted_scores = sum(w(min(d1,d2)) * (1 - abs(d1-d2)/12)) for shared -> weight_sum = sum(w(min(d1,d2))) for shared -> weight_extra_mapped = sum(w(d)) for all mapped but non-shared -> weight_extra_all = sum(w(d)) for all non-shared -> w(d) = 1 if d <= 5, exp(-2 * ((d-5.0)/4.28)^2) else
In the formulas above:
“d”: CA/CB-CA/CB distance of an “inter-chain contact” (“d1”, “d2” for “shared” contacts).
“mapped”: we could map chains of two structures and align residues in
alignments.“shared”: pairs of residues which are “mapped” and have “inter-chain contact” in both structures.
“inter-chain contact”: CB-CB pairs (CA for GLY) with distance <= 12 A (fallback to CA-CA if
calpha_onlyis True).“w(d)”: weighting function (prob. of 2 res. to interact given CB distance) from Xu et al. 2009.
- Parameters:
ent_1 (
QSscoreEntity,EntityHandleorEntityView) – First structure to be scored.ent_2 (
QSscoreEntity,EntityHandleorEntityView) – Second structure to be scored.res_num_alignment – Sets
res_num_alignment
- Raises:
QSscoreErrorif input structures are invalid or are monomers or have issues that make it impossible for a QS score to be computed.
- qs_ent_1¶
QSscoreEntityobject for ent_1 given at construction. If entity names (original_name) are not unique, we set it to ‘pdb_1’ usingSetName().
- qs_ent_2¶
QSscoreEntityobject for ent_2 given at construction. If entity names (original_name) are not unique, we set it to ‘pdb_2’ usingSetName().
- calpha_only¶
True if any of the two structures is CA-only (after cleanup).
- Type:
bool
- max_ca_per_chain_for_cm¶
Maximal number of CA atoms to use in each chain to determine chain mappings. Setting this to -1 disables the limit. Limiting it speeds up determination of symmetries and chain mappings. By default it is set to 100.
- Type:
int
- max_mappings_extensive¶
Maximal number of chain mappings to test for ‘extensive’
chain_mapping_scheme. The extensive chain mapping search must in the worst case check O(N^2) * O(N!) possible mappings for complexes with N chains. Two octamers without symmetry would require 322560 mappings to be checked. To limit computations, aQSscoreErroris thrown if we try more than the maximal number of chain mappings. The value must be set before the first use ofchain_mapping. By default it is set to 100000.- Type:
int
- res_num_alignment¶
Forces each alignment in
alignmentsto be based on residue numbers instead of using a global BLOSUM62-based alignment.- Type:
bool
- GetOligoLDDTScorer(settings, penalize_extra_chains=True)¶
- Returns:
OligoLDDTScorerobject, setup for this QS scoring problem. The scorer is set up withqs_ent_1as the reference andqs_ent_2as the model.- Parameters:
settings – Passed to
OligoLDDTScorerconstructor.penalize_extra_chains – Passed to
OligoLDDTScorerconstructor.
- SetSymmetries(symm_1, symm_2)¶
Set user-provided symmetry groups.
These groups are restricted to chain names appearing in
ent_to_cm_1andent_to_cm_2respectively. They are only valid if they cover all chains and both symm_1 and symm_2 have same lengths of symmetry group tuples. Otherwise trivial symmetry group used (seesymm_1).
- property alignments¶
List of successful sequence alignments using
chain_mapping.There will be one alignment for each mapped chain and they are ordered by their chain names in
qs_ent_1.The first sequence of each alignment belongs to
qs_ent_1and the second one toqs_ent_2. The sequences are named according to the mapped chain names and have views attached intoQSscoreEntity.entofqs_ent_1andqs_ent_2.If
res_num_alignmentis False, each alignment is performed using a global BLOSUM62-based alignment. Otherwise, the positions in the alignment sequences are simply given by the residue number so that residues with matching numbers are aligned.- Getter:
Computed on first use (cached)
- Type:
listofAlignmentHandle
- property best_score¶
QS-score without penalties.
Like
global_score, but neglecting additional residues or chains in one of the biounits (i.e. the score is calculated considering only mapped chains and residues).- Getter:
Computed on first use (cached)
- Type:
float- Raises:
QSscoreErrorif only one chain is mapped
- property chain_mapping¶
Mapping from
ent_to_cm_1toent_to_cm_2.Properties:
Mapping is between chains of same chem. group (see
chem_mapping)Each chain can appear only once in mapping
All chains of complex with less chains are mapped
Details on algorithms used to find mapping:
We try all pairs of chem. mapped chains within symmetry group and get superpose-transformation for them
First option: check for “sufficient overlap” of other chain-pairs
For each chain-pair defined above: apply superposition to full oligomer and map chains based on structural overlap
Structural overlap = X% of residues in second oligomer covered within Y Angstrom of a (chem. mapped) chain in first oligomer. We successively try (X,Y) = (80,4), (40,6) and (20,8) to be less and less strict in mapping (warning shown for most permissive one).
If multiple possible mappings are found, we choose the one which leads to the lowest multi-chain-RMSD given the superposition
Fallback option: try all mappings to find minimal multi-chain-RMSD (warning shown)
For each chain-pair defined above: apply superposition, try all (!) possible chain mappings (within symmetry group) and keep mapping with lowest multi-chain-RMSD
Repeat procedure above to resolve symmetry. Within the symmetry group we can use the chain mapping computed before and we just need to find which symmetry group in first oligomer maps to which in the second one. We again try all possible combinations…
Limitations:
Trying all possible mappings is a combinatorial nightmare (factorial). We throw an exception if too many combinations (e.g. octomer vs octomer with no usable symmetry)
The mapping is forced: the “best” mapping will be chosen independently of how badly they fit in terms of multi-chain-RMSD
As a result, such a forced mapping can lead to a large range of resulting QS scores. An extreme example was observed between 1on3.1 and 3u9r.1, where
global_scorecan range from 0.12 to 0.43 for mappings with very similar multi-chain-RMSD.
- Getter:
Computed on first use (cached)
- Type:
dictwith key / value =str(chain names, key forent_to_cm_1, value forent_to_cm_2)- Raises:
QSscoreErrorif there are too many combinations to check to find a chain mapping (seemax_mappings_extensive).
- property chain_mapping_scheme¶
Mapping scheme used to get
chain_mapping.Possible values:
‘strict’: 80% overlap needed within 4 Angstrom (overlap based mapping).
‘tolerant’: 40% overlap needed within 6 Angstrom (overlap based mapping).
‘permissive’: 20% overlap needed within 8 Angstrom (overlap based mapping). It’s best if you check mapping manually!
‘extensive’: Extensive search used for mapping detection (fallback). This approach has known limitations and may be removed in future versions. Mapping should be checked manually!
‘user’:
chain_mappingwas set by user before first use of this attribute.
- Getter:
Computed with
chain_mappingon first use (cached)- Type:
str- Raises:
QSscoreErroras inchain_mapping.
- property chem_mapping¶
Inter-complex mapping of chemical groups.
Each group (see
QSscoreEntity.chem_groups) is mapped according to highest sequence identity. Alignment is between longest sequences in groups.Limitations:
If different numbers of groups, we map only the groups for the complex with less groups (rest considered unmapped and shown as warning)
The mapping is forced: the “best” mapping will be chosen independently of how low the seq. identity may be
- Getter:
Computed on first use (cached)
- Type:
dictwith key =tupleof chain names inqs_ent_1and value =tupleof chain names inqs_ent_2.- Raises:
QSscoreErrorif we end up having no chains for either entity in the mapping (can happen if chains do not have CA atoms).
- property clustalw_bin¶
Full path to
clustalworclustalw2executable to use for multiple sequence alignments (unlesschain_mappingis provided manually).- Getter:
Located in path on first use (cached)
- Type:
str
- property ent_to_cm_1¶
Subset of
qs_ent_1used to compute chain mapping and symmetries.Properties:
Includes only residues aligned according to
chem_mappingIncludes only 1 CA atom per residue
Has at least 5 and at most
max_ca_per_chain_for_cmatoms per chainAll chains of the same chemical group have the same number of atoms (also in
ent_to_cm_2according tochem_mapping)All chains appearing in
chem_mappingappear in this entity (so the two can be safely used together)
This entity might be transformed (i.e. all positions rotated/translated by same transformation matrix) if this can speed up computations. So do not assume fixed global positions (but relative distances will remain fixed).
- Getter:
Computed on first use (cached)
- Type:
- Raises:
QSscoreErrorif any chain ends up having less than 5 res.
- property ent_to_cm_2¶
Subset of
qs_ent_1used to compute chain mapping and symmetries (seeent_to_cm_1for details).
- property global_score¶
QS-score with penalties.
The range of the score is between 0 (i.e. no interface residues are shared between biounits) and 1 (i.e. the interfaces are identical).
The global QS-score is computed applying penalties when interface residues or entire chains are missing (i.e. anything that is not mapped in
mapped_residues/chain_mapping) in one of the biounits.- Getter:
Computed on first use (cached)
- Type:
float- Raises:
QSscoreErrorif only one chain is mapped
- property mapped_residues¶
Mapping of shared residues in
alignments.- Getter:
Computed on first use (cached)
- Type:
dictmapped_residues[c1][r1] = r2 with: c1 = Chain name in first entity (= first sequence in aln), r1 = Residue number in first entity, r2 = Residue number in second entity
- property superposition¶
Superposition result based on shared CA atoms in
alignments.The superposition can be used to map
QSscoreEntity.entofqs_ent_1onto the one ofqs_ent_2. Useost.geom.Invert()if you need the opposite transformation.- Getter:
Computed on first use (cached)
- Type:
- property symm_1¶
Symmetry groups for
qs_ent_1used to speed up chain mapping.This is a list of chain-lists where each chain-list can be used reconstruct the others via cyclic C or dihedral D symmetry. The first chain-list is used as a representative symmetry group. For heteromers, the group-members must contain all different seqres in oligomer.
Example: symm. groups [(A,B,C), (D,E,F), (G,H,I)] means that there are symmetry transformations to get (D,E,F) and (G,H,I) from (A,B,C).
Properties:
All symmetry group tuples have the same length (num. of chains)
All chains in
ent_to_cm_1appear (w/o duplicates)For heteros: symmetry group tuples have all different chem. groups
Trivial symmetry group = one tuple with all chains (used if inconsistent data provided or if no symmetry is found)
Either compatible to
symm_2or trivial symmetry groups used. Compatibility requires same lengths of symmetry group tuples and it must be possible to get an overlap (80% of residues covered within 6 A of a (chem. mapped) chain) of all chains in representative symmetry groups by superposing one pair of chains.
- Getter:
Computed on first use (cached)
- Type:
listoftupleofstr(chain names)