Quaternary Structure (QS) scores (deprecated)¶
qsscoring
– Quaternary Structure (QS) scores¶
Scoring of quaternary structures (QS). The QS scoring is according to the paper by Bertoni et al..
Note
Requirements for use:
A default
compound library
must be defined and accessible viaGetDefaultLib()
. This is set by default when executing scripts withost
. Otherwise, you must set this withSetDefaultLib()
.ClustalW must be installed (unless you provide chain mappings)
Python modules numpy and scipy must be installed and available (e.g. use
pip install scipy numpy
)
- FilterContacts(contacts, chain_names)¶
Filter contacts to contain only contacts for chains in chain_names.
- Parameters:
contacts (
dict
) – Connectivity dictionary as produced byGetContacts()
.chain_names (
list
or (better)set
) – Chain names to keep.
- Returns:
New connectivity dictionary (format as in
GetContacts()
)- Return type:
dict
- GetContacts(entity, calpha_only, dist_thr=12.0)¶
Get inter-chain contacts of a macromolecular entity.
Contacts are pairs of residues within a given distance belonging to different chains. They are stored once per pair and include the CA/CB-CA/CB distance.
- Parameters:
entity (
EntityHandle
orEntityView
) – An entity to check connectivity for.calpha_only (
bool
) – If True, we only consider CA-CA distances. Else, we use CB unless the residue is a GLY.dist_thr (
float
) – Maximal CA/CB-CA/CB distance to be considered in contact.
- Returns:
A connectivity dictionary. A pair of residues with chain names ch_name1 & ch_name2 (ch_name1 < ch_name2), residue numbers res_num1 & res_num2 and distance dist (<= dist_thr) are stored as result[ch_name1][ch_name2][res_num1][res_num2] = dist.
- Return type:
dict
- class MappedLDDTScorer(alignment, calpha_only, settings)¶
A simple class to calculate a single-chain lDDT score on a given chain to chain mapping as extracted from
OligoLDDTScorer
.- Parameters:
alignment – Sets
alignment
calpha_only – Sets
calpha_only
settings – Sets
settings
- alignment¶
Alignment with two sequences named according to the mapped chains and with views attached to both sequences (e.g. one of the items of
QSscorer.alignments
).The first sequence is assumed to be the reference and the second one the model. Since the lDDT score is not symmetric (extra residues in model are ignored), the order is important.
- Type:
- calpha_only¶
If True, restricts lDDT score to CA only.
- Type:
bool
- settings¶
Settings to use for lDDT scoring.
- Type:
- lddt_scorer¶
lDDT Scorer object for the given chains.
- Type:
lDDTScorer
- reference_chain_name¶
Chain name of the reference.
- Type:
str
- model_chain_name¶
Chain name of the model.
- Type:
str
- GetPerResidueScores()¶
- Returns:
Scores for each residue
- Return type:
list
ofdict
with one item for each residue existing in model and reference:”residue_number”: Residue number in reference chain
”residue_name”: Residue name in reference chain
”lddt”: local lDDT
”conserved_contacts”: number of conserved contacts
”total_contacts”: total number of contacts
- class OligoLDDTScorer(ref, mdl, alignments, calpha_only, settings, penalize_extra_chains=False, chem_mapping=None)¶
Helper class to calculate oligomeric lDDT scores.
This class can be used independently, but commonly it will be created by calling
QSscorer.GetOligoLDDTScorer()
.Note
By construction, lDDT scores are not symmetric and hence it matters which structure is the reference (
ref
) and which one is the model (mdl
). Extra residues in the model are generally not considered. Extra chains in both model and reference can be considered by setting thepenalize_extra_chains
flag to True.- Parameters:
ref – Sets
ref
mdl – Sets
mdl
alignments – Sets
alignments
calpha_only – Sets
calpha_only
settings – Sets
settings
penalize_extra_chains – Sets
penalize_extra_chains
chem_mapping – Sets
chem_mapping
. Must be given if penalize_extra_chains is True.
- ref¶
- mdl¶
Full reference/model entity to be scored. The entity must contain all chains mapped in
alignments
and may also contain additional ones which are considered ifpenalize_extra_chains
is True.- Type:
- alignments¶
One alignment for each mapped chain of
ref
/mdl
as defined inQSscorer.alignments
. The first sequence of each alignment belongs toref
and the second one tomdl
. Sequences must have sequence naming and attached views as defined inQSscorer.alignments
.- Type:
list
ofAlignmentHandle
- calpha_only¶
If True, restricts lDDT score to CA only.
- Type:
bool
- settings¶
Settings to use for lDDT scoring.
- Type:
- penalize_extra_chains¶
If True, extra chains in both
ref
andmdl
will penalize the lDDT scores.- Type:
bool
- chem_mapping¶
Inter-complex mapping of chemical groups as defined in
QSscorer.chem_mapping
. Used to find “chem-mapped” chains inref
for unmapped chains inmdl
when penalizing scores. Each unmapped model chain can add extra reference-contacts according to the average total contacts of each single “chem-mapped” reference chain. If there is no “chem-mapped” reference chain, a warning is shown and the model chain is ignored.Only relevant if
penalize_extra_chains
is True.
- property lddt_mdl¶
The model entity used for oligomeric lDDT scoring (
oligo_lddt
/oligo_lddt_scorer
).Like
lddt_ref
, this is a single chain X containing all chains ofmdl
. The residue numbers match the ones inlddt_ref
where aligned and have unique numbers for additional residues.- Getter:
Computed on first use (cached)
- Type:
- property lddt_ref¶
The reference entity used for oligomeric lDDT scoring (
oligo_lddt
/oligo_lddt_scorer
).Since the lDDT computation requires a single chain with mapped residue numbering, all chains of
ref
are appended into a single chain X with unique residue numbers according to the column-index in the alignment. The alignments are in the same order as they appear inalignments
. Additional residues are appended at the end of the chain with unique residue numbers. Unmapped chains are only added ifpenalize_extra_chains
is True. Only CA atoms are considered ifcalpha_only
is True.- Getter:
Computed on first use (cached)
- Type:
- property mapped_lddt_scorers¶
List of scorer objects for each chain mapped in
alignments
.- Getter:
Computed on first use (cached)
- Type:
list
ofMappedLDDTScorer
- property oligo_lddt¶
Oligomeric lDDT score.
The score is computed as conserved contacts divided by the total contacts in the reference using the
oligo_lddt_scorer
, which uses the full complex as reference/model structure. Ifpenalize_extra_chains
is True, the reference/model complexes contain all chains (otherwise only the mapped ones) and additional contacts are added to the reference’s total contacts for unmapped model chains according to thechem_mapping
.The main difference with
weighted_lddt
is that the lDDT scorer “sees” the full complex here (incl. inter-chain contacts), while the weighted single chain score looks at each chain separately.- Getter:
Computed on first use (cached)
- Type:
float
- property oligo_lddt_scorer¶
lDDT Scorer object for
lddt_ref
andlddt_mdl
.- Getter:
Computed on first use (cached)
- Type:
lDDTScorer
- property sc_lddt¶
List of global scores extracted from
sc_lddt_scorers
.If scoring for a mapped chain fails, an error is displayed and a score of 0 is assigned.
- Getter:
Computed on first use (cached)
- Type:
list
offloat
- property sc_lddt_scorers¶
List of lDDT scorer objects extracted from
mapped_lddt_scorers
.- Type:
list
oflDDTScorer
- property weighted_lddt¶
Weighted average of single chain lDDT scores.
The score is computed as a weighted average of single chain lDDT scores (see
sc_lddt_scorers
) using the total contacts of each single reference chain as weights. Ifpenalize_extra_chains
is True, unmapped chains are added with a 0 score and total contacts taken from the actual reference chains or (for unmapped model chains) using thechem_mapping
.See
oligo_lddt
for a comparison of the two scores.- Getter:
Computed on first use (cached)
- Type:
float
- class QSscoreEntity(ent)¶
Entity with cached entries for QS scoring.
Any known / precomputed information can be filled into the appropriate attribute here as long as they are labelled as read/write. Otherwise the quantities are computed on first access and cached (lazy evaluation). The heaviest load is expected when computing
contacts
andcontacts_ca
.- Parameters:
ent (
EntityHandle
orEntityView
) – Entity to be used for QS scoring. A copy of it will be processed.
- is_valid¶
True, if successfully initialized. False, if input structure has no protein chains with >= 20 residues.
- Type:
bool
- original_name¶
Name set for ent when object was created.
- Type:
str
- ent¶
Cleaned version of ent passed at construction. Hydrogens are removed, the entity is processed with a
RuleBasedProcessor
and chains listed inremoved_chains
have been removed. The name of this entity might change during scoring (seeGetName()
). Otherwise, this will be fixed.- Type:
- removed_chains¶
Chains removed from ent passed at construction. These are ligand and water chains as well as small (< 20 res.) peptides or chains with no amino acids (determined by chem. type, which is set by rule based processor).
- Type:
list
ofstr
- calpha_only¶
Whether entity is CA-only (i.e. it has 0 CB atoms)
- Type:
bool
- GetAlignment(c1, c2)¶
Get sequence alignment of chain c1 with chain c2. Computed on first use based on
ca_chains
(cached).- Parameters:
c1 (
str
) – Chain name for first chain to align.c2 (
str
) – Chain name for second chain to align.
- Return type:
AlignmentHandle
or None if it failed.
- GetAngles(c1, c2)¶
Get Euler angles from superposition of chain c1 with chain c2. Computed on first use based on
ca_chains
(cached).- Parameters:
c1 (
str
) – Chain name for first chain to superpose.c2 (
str
) – Chain name for second chain to superpose.
- Returns:
3 Euler angles (may contain nan if something fails).
- Return type:
numpy.array
- GetAxis(c1, c2)¶
Get axis of symmetry from superposition of chain c1 with chain c2. Computed on first use based on
ca_chains
(cached).- Parameters:
c1 (
str
) – Chain name for first chain to superpose.c2 (
str
) – Chain name for second chain to superpose.
- Returns:
Rotational axis (may contain nan if something fails).
- Return type:
numpy.array
- GetName()¶
Wrapper to
GetName()
ofent
. This is used to uniquely identify the entity while scoring. The name may therefore change whileoriginal_name
remains fixed.
- SetName(new_name)¶
Wrapper to
SetName()
ofent
. Use this to change unique identifier while scoring (seeGetName()
).
- property ca_chains¶
Map of chain names in
ent
to sequences with attached view to CA-only chains (intoca_entity
). Useful for alignments and superpositions.- Getter:
Computed on first use (cached)
- Type:
dict
(key =str
, value =SequenceHandle
)
- property ca_entity¶
Reduced representation of
ent
with only CA atoms. This guarantees that each included residue has exactly one atom.- Getter:
Computed on first use (cached)
- Type:
- property chem_groups¶
Intra-complex group of chemically identical (seq. id. > 95%) polypeptide chains as extracted from
ca_chains
. First chain in group is the one with the longest sequence.- Getter:
Computed on first use (cached)
- Type:
list
oflist
ofstr
(chain names)
- property contacts¶
Connectivity dictionary (read/write). As given by
GetContacts()
with calpha_only = False onent
.- Getter:
Computed on first use (cached)
- Setter:
Uses
FilterContacts()
to ensure that we only keep contacts for chains in the cleaned entity.- Type:
See return type of
GetContacts()
- property contacts_ca¶
CA-only connectivity dictionary (read/write). Like
contacts
but with calpha_only = True inGetContacts()
.
- exception QSscoreError¶
Exception to be raised for “acceptable” exceptions in QS scoring.
Those are cases we might want to capture for default behavior.
- class QSscorer(ent_1, ent_2, res_num_alignment=False)¶
Object to compute QS scores.
Simple usage without any precomputed contacts, symmetries and mappings:
import ost from ost.mol.alg import qsscoring # load two biounits to compare ent_full = ost.io.LoadPDB('3ia3', remote=True) ent_1 = ent_full.Select('cname=A,D') ent_2 = ent_full.Select('cname=B,C') # get score ost.PushVerbosityLevel(3) try: qs_scorer = qsscoring.QSscorer(ent_1, ent_2) ost.LogScript('QSscore:', str(qs_scorer.global_score)) ost.LogScript('Chain mapping used:', str(qs_scorer.chain_mapping)) # commonly you want the QS global score as output qs_score = qs_scorer.global_score except qsscoring.QSscoreError as ex: # default handling: report failure and set score to 0 ost.LogError('QSscore failed:', str(ex)) qs_score = 0
For maximal performance when computing QS scores of the same entity with many others, it is advisable to construct and reuse
QSscoreEntity
objects.Any known / precomputed information can be filled into the appropriate attribute here (no checks done!). Otherwise most quantities are computed on first access and cached (lazy evaluation). Setters are provided to set values with extra checks (e.g.
SetSymmetries()
).All necessary seq. alignments are done by global BLOSUM62-based alignment. A multiple sequence alignment is performed with ClustalW unless
chain_mapping
is provided manually. You will need to have an executableclustalw
orclustalw2
in yourPATH
or you must setclustalw_bin
accordingly. Otherwise an exception (ost.settings.FileNotFound
) is thrown.Formulas for QS scores:
- QS_best = weighted_scores / (weight_sum + weight_extra_mapped) - QS_global = weighted_scores / (weight_sum + weight_extra_all) -> weighted_scores = sum(w(min(d1,d2)) * (1 - abs(d1-d2)/12)) for shared -> weight_sum = sum(w(min(d1,d2))) for shared -> weight_extra_mapped = sum(w(d)) for all mapped but non-shared -> weight_extra_all = sum(w(d)) for all non-shared -> w(d) = 1 if d <= 5, exp(-2 * ((d-5.0)/4.28)^2) else
In the formulas above:
“d”: CA/CB-CA/CB distance of an “inter-chain contact” (“d1”, “d2” for “shared” contacts).
“mapped”: we could map chains of two structures and align residues in
alignments
.“shared”: pairs of residues which are “mapped” and have “inter-chain contact” in both structures.
“inter-chain contact”: CB-CB pairs (CA for GLY) with distance <= 12 A (fallback to CA-CA if
calpha_only
is True).“w(d)”: weighting function (prob. of 2 res. to interact given CB distance) from Xu et al. 2009.
- Parameters:
ent_1 (
QSscoreEntity
,EntityHandle
orEntityView
) – First structure to be scored.ent_2 (
QSscoreEntity
,EntityHandle
orEntityView
) – Second structure to be scored.res_num_alignment – Sets
res_num_alignment
- Raises:
QSscoreError
if input structures are invalid or are monomers or have issues that make it impossible for a QS score to be computed.
- qs_ent_1¶
QSscoreEntity
object for ent_1 given at construction. If entity names (original_name
) are not unique, we set it to ‘pdb_1’ usingSetName()
.
- qs_ent_2¶
QSscoreEntity
object for ent_2 given at construction. If entity names (original_name
) are not unique, we set it to ‘pdb_2’ usingSetName()
.
- calpha_only¶
True if any of the two structures is CA-only (after cleanup).
- Type:
bool
- max_ca_per_chain_for_cm¶
Maximal number of CA atoms to use in each chain to determine chain mappings. Setting this to -1 disables the limit. Limiting it speeds up determination of symmetries and chain mappings. By default it is set to 100.
- Type:
int
- max_mappings_extensive¶
Maximal number of chain mappings to test for ‘extensive’
chain_mapping_scheme
. The extensive chain mapping search must in the worst case check O(N^2) * O(N!) possible mappings for complexes with N chains. Two octamers without symmetry would require 322560 mappings to be checked. To limit computations, aQSscoreError
is thrown if we try more than the maximal number of chain mappings. The value must be set before the first use ofchain_mapping
. By default it is set to 100000.- Type:
int
- res_num_alignment¶
Forces each alignment in
alignments
to be based on residue numbers instead of using a global BLOSUM62-based alignment.- Type:
bool
- GetOligoLDDTScorer(settings, penalize_extra_chains=True)¶
- Returns:
OligoLDDTScorer
object, setup for this QS scoring problem. The scorer is set up withqs_ent_1
as the reference andqs_ent_2
as the model.- Parameters:
settings – Passed to
OligoLDDTScorer
constructor.penalize_extra_chains – Passed to
OligoLDDTScorer
constructor.
- SetSymmetries(symm_1, symm_2)¶
Set user-provided symmetry groups.
These groups are restricted to chain names appearing in
ent_to_cm_1
andent_to_cm_2
respectively. They are only valid if they cover all chains and both symm_1 and symm_2 have same lengths of symmetry group tuples. Otherwise trivial symmetry group used (seesymm_1
).
- property alignments¶
List of successful sequence alignments using
chain_mapping
.There will be one alignment for each mapped chain and they are ordered by their chain names in
qs_ent_1
.The first sequence of each alignment belongs to
qs_ent_1
and the second one toqs_ent_2
. The sequences are named according to the mapped chain names and have views attached intoQSscoreEntity.ent
ofqs_ent_1
andqs_ent_2
.If
res_num_alignment
is False, each alignment is performed using a global BLOSUM62-based alignment. Otherwise, the positions in the alignment sequences are simply given by the residue number so that residues with matching numbers are aligned.- Getter:
Computed on first use (cached)
- Type:
list
ofAlignmentHandle
- property best_score¶
QS-score without penalties.
Like
global_score
, but neglecting additional residues or chains in one of the biounits (i.e. the score is calculated considering only mapped chains and residues).- Getter:
Computed on first use (cached)
- Type:
float
- Raises:
QSscoreError
if only one chain is mapped
- property chain_mapping¶
Mapping from
ent_to_cm_1
toent_to_cm_2
.Properties:
Mapping is between chains of same chem. group (see
chem_mapping
)Each chain can appear only once in mapping
All chains of complex with less chains are mapped
Details on algorithms used to find mapping:
We try all pairs of chem. mapped chains within symmetry group and get superpose-transformation for them
First option: check for “sufficient overlap” of other chain-pairs
For each chain-pair defined above: apply superposition to full oligomer and map chains based on structural overlap
Structural overlap = X% of residues in second oligomer covered within Y Angstrom of a (chem. mapped) chain in first oligomer. We successively try (X,Y) = (80,4), (40,6) and (20,8) to be less and less strict in mapping (warning shown for most permissive one).
If multiple possible mappings are found, we choose the one which leads to the lowest multi-chain-RMSD given the superposition
Fallback option: try all mappings to find minimal multi-chain-RMSD (warning shown)
For each chain-pair defined above: apply superposition, try all (!) possible chain mappings (within symmetry group) and keep mapping with lowest multi-chain-RMSD
Repeat procedure above to resolve symmetry. Within the symmetry group we can use the chain mapping computed before and we just need to find which symmetry group in first oligomer maps to which in the second one. We again try all possible combinations…
Limitations:
Trying all possible mappings is a combinatorial nightmare (factorial). We throw an exception if too many combinations (e.g. octomer vs octomer with no usable symmetry)
The mapping is forced: the “best” mapping will be chosen independently of how badly they fit in terms of multi-chain-RMSD
As a result, such a forced mapping can lead to a large range of resulting QS scores. An extreme example was observed between 1on3.1 and 3u9r.1, where
global_score
can range from 0.12 to 0.43 for mappings with very similar multi-chain-RMSD.
- Getter:
Computed on first use (cached)
- Type:
dict
with key / value =str
(chain names, key forent_to_cm_1
, value forent_to_cm_2
)- Raises:
QSscoreError
if there are too many combinations to check to find a chain mapping (seemax_mappings_extensive
).
- property chain_mapping_scheme¶
Mapping scheme used to get
chain_mapping
.Possible values:
‘strict’: 80% overlap needed within 4 Angstrom (overlap based mapping).
‘tolerant’: 40% overlap needed within 6 Angstrom (overlap based mapping).
‘permissive’: 20% overlap needed within 8 Angstrom (overlap based mapping). It’s best if you check mapping manually!
‘extensive’: Extensive search used for mapping detection (fallback). This approach has known limitations and may be removed in future versions. Mapping should be checked manually!
‘user’:
chain_mapping
was set by user before first use of this attribute.
- Getter:
Computed with
chain_mapping
on first use (cached)- Type:
str
- Raises:
QSscoreError
as inchain_mapping
.
- property chem_mapping¶
Inter-complex mapping of chemical groups.
Each group (see
QSscoreEntity.chem_groups
) is mapped according to highest sequence identity. Alignment is between longest sequences in groups.Limitations:
If different numbers of groups, we map only the groups for the complex with less groups (rest considered unmapped and shown as warning)
The mapping is forced: the “best” mapping will be chosen independently of how low the seq. identity may be
- Getter:
Computed on first use (cached)
- Type:
dict
with key =tuple
of chain names inqs_ent_1
and value =tuple
of chain names inqs_ent_2
.- Raises:
QSscoreError
if we end up having no chains for either entity in the mapping (can happen if chains do not have CA atoms).
- property clustalw_bin¶
Full path to
clustalw
orclustalw2
executable to use for multiple sequence alignments (unlesschain_mapping
is provided manually).- Getter:
Located in path on first use (cached)
- Type:
str
- property ent_to_cm_1¶
Subset of
qs_ent_1
used to compute chain mapping and symmetries.Properties:
Includes only residues aligned according to
chem_mapping
Includes only 1 CA atom per residue
Has at least 5 and at most
max_ca_per_chain_for_cm
atoms per chainAll chains of the same chemical group have the same number of atoms (also in
ent_to_cm_2
according tochem_mapping
)All chains appearing in
chem_mapping
appear in this entity (so the two can be safely used together)
This entity might be transformed (i.e. all positions rotated/translated by same transformation matrix) if this can speed up computations. So do not assume fixed global positions (but relative distances will remain fixed).
- Getter:
Computed on first use (cached)
- Type:
- Raises:
QSscoreError
if any chain ends up having less than 5 res.
- property ent_to_cm_2¶
Subset of
qs_ent_1
used to compute chain mapping and symmetries (seeent_to_cm_1
for details).
- property global_score¶
QS-score with penalties.
The range of the score is between 0 (i.e. no interface residues are shared between biounits) and 1 (i.e. the interfaces are identical).
The global QS-score is computed applying penalties when interface residues or entire chains are missing (i.e. anything that is not mapped in
mapped_residues
/chain_mapping
) in one of the biounits.- Getter:
Computed on first use (cached)
- Type:
float
- Raises:
QSscoreError
if only one chain is mapped
- property mapped_residues¶
Mapping of shared residues in
alignments
.- Getter:
Computed on first use (cached)
- Type:
dict
mapped_residues[c1][r1] = r2 with: c1 = Chain name in first entity (= first sequence in aln), r1 = Residue number in first entity, r2 = Residue number in second entity
- property superposition¶
Superposition result based on shared CA atoms in
alignments
.The superposition can be used to map
QSscoreEntity.ent
ofqs_ent_1
onto the one ofqs_ent_2
. Useost.geom.Invert()
if you need the opposite transformation.- Getter:
Computed on first use (cached)
- Type:
- property symm_1¶
Symmetry groups for
qs_ent_1
used to speed up chain mapping.This is a list of chain-lists where each chain-list can be used reconstruct the others via cyclic C or dihedral D symmetry. The first chain-list is used as a representative symmetry group. For heteromers, the group-members must contain all different seqres in oligomer.
Example: symm. groups [(A,B,C), (D,E,F), (G,H,I)] means that there are symmetry transformations to get (D,E,F) and (G,H,I) from (A,B,C).
Properties:
All symmetry group tuples have the same length (num. of chains)
All chains in
ent_to_cm_1
appear (w/o duplicates)For heteros: symmetry group tuples have all different chem. groups
Trivial symmetry group = one tuple with all chains (used if inconsistent data provided or if no symmetry is found)
Either compatible to
symm_2
or trivial symmetry groups used. Compatibility requires same lengths of symmetry group tuples and it must be possible to get an overlap (80% of residues covered within 6 A of a (chem. mapped) chain) of all chains in representative symmetry groups by superposing one pair of chains.
- Getter:
Computed on first use (cached)
- Type:
list
oftuple
ofstr
(chain names)