You are reading the documentation for version 2.4 of OpenStructure. You may also want to read the documentation for:
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.7.1
1.8
1.9
1.10
1.11
2.0
2.1
2.2
2.3
2.3.1
devel
|
Parameters: |
|
---|---|
Raises: |
|
GetNChainContacts
(target_chain, no_interchain=False)¶Returns number of contacts expected for a certain chain in target
Parameters: |
|
---|---|
Raises: |
|
lDDT
(model, thresholds=[0.5, 1.0, 2.0, 4.0], local_lddt_prop=None, local_contact_prop=None, chain_mapping=None, no_interchain=False, no_intrachain=False, penalize_extra_chains=False, residue_mapping=None, return_dist_test=False, check_resnames=True)¶Computes lDDT of model - globally and per-residue
Parameters: |
|
---|---|
Returns: | global and per-residue lDDT scores as a tuple - first element is global lDDT score and second element a list of per-residue scores with length len(model.residues) None is assigned to residues that are not covered by target |
SymmetrySettings
¶Container for symmetric compounds
lDDT considers symmetries and selects the one resulting in the highest possible score.
A symmetry is defined as a renaming operation on one or more atoms that leads to a chemically equivalent residue. Example would be OD1 and OD2 in ASP => renaming OD1 to OD2 and vice versa gives a chemically equivalent residue.
Use AddSymmetricCompound()
to define a symmetry which can then
directly be accessed through the symmetric_compounds member.
AddSymmetricCompound
(name, symmetric_atoms)¶Adds symmetry for compound with name
Parameters: |
|
---|
GetDefaultSymmetrySettings
()¶Constructs and returns SymmetrySettings
object for natural amino
acids
CustomCompound
(atom_names)¶Defines atoms for custom compounds
lDDT requires the reference atoms of a compound which are typically
extracted from a ost.conop.CompoundLib
. This lightweight
container allows to handle arbitrary compounds which are not
necessarily in the compound library.
Parameters: | atom_names (list of str ) – Names of atoms of custom compound |
---|
FromResidue
(res)¶Construct custom compound from residue
Parameters: | res (ost.mol.ResidueView /ost.mol.ResidueHandle ) – Residue from which reference atom names are extracted,
hydrogen/deuterium atoms are filtered out |
---|---|
Returns: | CustomCompound |
lDDTSettings
(radius=15, sequence_separation=0, cutoffs=(0.5, 1.0, 2.0, 4.0), label="locallddt")¶Object containing the settings used for lDDT calculations.
Parameters: |
|
---|
radius
¶Distance inclusion radius.
Type: | float |
---|
sequence_separation
¶Sequence separation.
Type: | int |
---|
cutoffs
¶List of thresholds used to determine distance conservation.
Type: | list of float |
---|
label
¶The base name for the ResidueHandle properties that store the local scores.
Type: | str |
---|
PrintParameters
()¶Print settings.
ToString
()¶Returns: | String representation of the lDDTSettings object. |
---|---|
Return type: | str |
stereochemistry
– Stereochemistry Checks¶Warning
Stereochemistry checks described in Mariani et al. are considered deprecated. They have been re-implemented and now support nucleotides. The old code is still available and documented here.
AngleViolationInfo
(a1, a2, a3, angle, exp_angle, std)¶Object to hold info on angle violation
Constructor arguments are available as attributes:
ost.mol.AtomHandle
)ost.mol.AtomHandle
)ost.mol.AtomHandle
)float
)float
)float
)ToJSON
()¶Return JSON serializable dict
BondViolationInfo
(a1, a2, length, exp_length, std)¶Object to hold info on bond violation
Constructor arguments are available as attributes:
ost.mol.AtomHandle
)ost.mol.AtomHandle
)float
)float
)float
)ToJSON
()¶Return JSON serializable dict
ClashInfo
(a1, a2, dist, tolerated_dist)¶Object to hold info on clash
Constructor arguments are available as attributes:
ost.mol.AtomHandle
)ost.mol.AtomHandle
)float
)float
)ToJSON
()¶Return JSON serializable dict
GetAngleParam
(a1, a2, a3, stereo_data=None, stereo_link_data=None)¶Returns mean and standard deviation for angle
Parameters: |
|
---|---|
Returns: |
|
GetBadAngles
(ent, stereo_data=None, stereo_link_data=None, tolerance=12)¶Identify unrealistic angles
Parameters: |
|
---|---|
Returns: |
|
GetBadBonds
(ent, stereo_data=None, stereo_link_data=None, tolerance=12)¶Identify unrealistic bonds
Parameters: |
|
---|---|
Returns: |
|
GetBondParam
(a1, a2, stereo_data=None, stereo_link_data=None)¶Returns mean and standard deviation for bond
Parameters: |
|
---|---|
Returns: |
|
GetClashes
(ent, vdw_radii=None, tolerance=1.5, disulfid_dist=2.03, disulfid_tolerance=1.0)¶Identifies clashing atoms
A clash between two non-bonded atoms is defined as their distance d being below the sum of their vdw radii with some subtracted tolerance value.
The default values are not very sensitive.
Parameters: |
|
---|---|
Returns: | A |
GetDefaultStereoData
()¶Get default stereo data derived from CCP4 MON_LIB
Used as default if not provided in GetBadBonds()
, GetBadAngles()
and StereoCheck()
.
MON_LIB is licensed under GNU LESSER GENERAL PUBLIC LICENSE Version 3. Consult the latest CCP4 for the full license text.
GetDefaultStereoLinkData
()¶Get default stereo data for links between compounds
Hardcoded from arbitrary sources, see comments in the code.
Returns: | Data for peptide bonds, nucleotide links and disulfid bonds that
are used as default if not provided in GetBadBonds() ,
GetBadAngles() and StereoCheck() . |
---|
StereoCheck
(ent, stereo_data=None, stereo_link_data=None)¶Remove atoms with stereochemical problems
Selects for peptide/nucleotides and calls GetClashes()
,
GetBadBonds()
and GetBadAngles()
with default
parameters.
Parameters: |
|
---|---|
Returns: | Tuple with four elements: 1) |
StereoDataFromMON_LIB
(mon_lib_path, compounds=None)¶Parses stereochemistry parameters from CCP4 MON_LIB
CCP4 MON_LIB contains data on ideal bond lengths/angles for compounds.
Original data (several updates in the meantime) come from:
This function adds a dependency to the gemmi library to read cif files.
Parameters: |
|
---|---|
Returns: |
|
scoring
– Specialized scoring functions¶lDDTBSScorer
(reference, model, residue_number_alignment=False)¶Scorer specific for a reference/model pair
Finds best possible binding site representation of reference in model given
lDDT score. Uses ost.mol.alg.chain_mapping.ChainMapper
to deal with
chain mapping.
Parameters: |
|
---|
ScoreBS
(ligand, radius=4.0, lddt_radius=10.0)¶Computes binding site lDDT score given ligand. Best possible binding site representation is selected by lDDT but other scores such as CA based RMSD and GDT are computed too and returned.
Parameters: |
|
---|---|
Returns: | Object of type |
Scorer
(model, target, resnum_alignments=False, molck_settings=None, naive_chain_mapping_thresh=12, cad_score_exec=None, custom_mapping=None)¶Helper class to access the various scores available from ost.mol.alg
Deals with structure cleanup, chain mapping, interface identification etc. Intermediate results are available as attributes.
Parameters: |
|
---|
aln
¶Alignments of model
/target
chains
Alignments for each pair of chains mapped in mapping
.
First sequence is target sequence, second sequence the model sequence.
Type: | list of ost.seq.AlignmentHandle |
---|
cad_score
¶The global CAD atom-atom (AA) score
Computed based on model
. In case of oligomers, mapping
is used.
Type: | float |
---|
chain_mapper
¶Chain mapper object for given target
Type: | ost.mol.alg.chain_mapping.ChainMapper |
---|
dockq_ave
¶Average of DockQ scores in dockq_scores
In its original implementation, DockQ only operates on single interfaces. Thus the requirement to combine scores for higher order oligomers.
Type: | float |
---|
dockq_ave_full
¶Same as dockq_ave
but penalizing for missing interfaces
Interfaces in nonmapped_interfaces
are added as 0.0
in average computation.
Type: | float |
---|
dockq_scores
¶DockQ scores for interfaces in interfaces
list
of float
dockq_wave
¶Same as dockq_ave
, weighted by native_contacts
Type: | float |
---|
dockq_wave_full
¶Same as dockq_ave_full
, but weighted
Interfaces in nonmapped_interfaces
are added as 0.0 in
average computations and the respective weights are derived from
nonmapped_interfaces_contacts
fnat
¶fnat scores for interfaces in interfaces
fnat: Fraction of native contacts that are also present in model
list
of float
fnonnat
¶fnonnat scores for interfaces in interfaces
fnat: Fraction of model contacts that are not present in target
list
of float
gdtha
¶GDT with thresholds: 4.0A, 2.0A, 1.0A and 0.5A
Computed on transformed_mapped_model_pos
and
mapped_target_pos
Type: | float |
---|
gdtts
¶GDT with thresholds: 8.0A, 4.0A, 2.0A and 1.0A
Computed on transformed_mapped_model_pos
and
mapped_target_pos
Type: | float |
---|
interface_qs_best
¶QS-score for each interface in interfaces
Only computed on aligned residues
Type: | list of float |
---|
interface_qs_global
¶QS-score for each interface in interfaces
Type: | list of float |
---|
interfaces
¶Interfaces with nonzero native_contacts
Type: | list of tuple with 4 elements each:
(trg_ch1, trg_ch2, mdl_ch1, mdl_ch2) |
---|
irmsd
¶irmsd scores for interfaces in interfaces
irmsd: RMSD of interface (RMSD computed on N, CA, C, O atoms) which consists of each residue that has at least one heavy atom within 10A of other chain.
list
of float
lddt
¶Global lDDT score in range [0.0, 1.0]
Computed based on stereochecked_model
. In case of oligomers,
mapping
is used.
Type: | float |
---|
lddt_scorer
¶lDDT scorer for stereochecked_target
(default parameters)
Type: | ost.mol.alg.lddt.lDDTScorer |
---|
local_cad_score
¶The per-residue CAD atom-atom (AA) scores
Computed based on model
. In case of oligomers, mapping
is used.
Type: | dict |
---|
local_lddt
¶Per residue lDDT scores in range [0.0, 1.0]
Computed based on stereochecked_model
but scores for all
residues in model
are reported. If a residue has been removed
by stereochemistry checks, the respective score is set to 0.0. If a
residue is not covered by the target or is in a chain skipped by the
chain mapping procedure (happens for super short chains), the respective
score is set to None. In case of oligomers, mapping
is used.
Type: | dict |
---|
lrmsd
¶lrmsd scores for interfaces in interfaces
lrmsd: The interfaces are superposed based on the receptor (rigid min RMSD superposition) and RMSD for the ligand is reported. Superposition and RMSD are based on N, CA, C and O positions, receptor is the chain contributing to the interface with more residues in total.
list
of float
mapped_model_pos
¶Mapped representative positions in model
Thats CA positions for peptide residues and C3’ positions for
nucleotides. Has same length as mapped_target_pos
and mapping
is based on mapping
.
Type: | ost.geom.Vec3List |
---|
mapped_target_pos
¶Mapped representative positions in target
Thats CA positions for peptide residues and C3’ positions for
nucleotides. Has same length as mapped_model_pos
and mapping
is based on mapping
.
Type: | ost.geom.Vec3List |
---|
mapping
¶Full chain mapping result for target
/model
Type: | ost.mol.alg.chain_mapping.MappingResult |
---|
model
¶Model with Molck cleanup
Type: | ost.mol.EntityHandle |
---|
model_bad_angles
¶Model angles with unexpected stereochemistry
Type: | list of
ost.mol.alg.stereochemistry.AngleViolationInfo |
---|
model_bad_bonds
¶Model bonds with unexpected stereochemistry
Type: | list of
ost.mol.alg.stereochemistry.BondViolationInfo |
---|
model_clashes
¶Clashing model atoms
Type: | list of ost.mol.alg.stereochemistry.ClashInfo |
---|
model_contacts
¶N model contacts for interfaces in interfaces
A contact is a pair or residues from distinct chains that have a minimal heavy atom distance < 5A
Type: | list of int |
---|
model_interface_residues
¶Interface residues in model
Thats all residues having a contact with at least one residue from another chain (CB-CB distance <= 8A, CA in case of Glycine)
Type: | dict with chain names as key and and list
with residue numbers of the respective interface residues. |
---|
n_target_not_mapped
¶Number of target residues which have no mapping to model
Type: | int |
---|
native_contacts
¶N native contacts for interfaces in interfaces
A contact is a pair or residues from distinct chains that have a minimal heavy atom distance < 5A
Type: | list of int |
---|
nonmapped_interfaces
¶Interfaces present in target that are not mapped
At least one of the chains is not present in target
Type: | list of tuple with two elements each:
(trg_ch1, trg_ch2) |
---|
nonmapped_interfaces_contacts
¶Number of native contacts in nonmapped_interfaces
Type: | list of int |
---|
patch_dockq
¶Same as patch_qs
but for DockQ scores
patch_qs
¶Patch QS-scores for each residue in model_interface_residues
Representative patches for each residue r in chain c are computed as follows:
Results are stored in the same manner as
model_interface_residues
, with corresponding scores instead of
residue numbers. Scores for residues which are not
mol.ChemType.AMINOACIDS
are set to None. Additionally,
interface patches are derived from model
. If they contain
residues which are not covered by target
, the score is set to
None too.
Type: | dict with chain names as key and and list
with scores of the respective interface residues. |
---|
qs_best
¶Global QS-score - only computed on aligned residues
Computed based on model
using mapping
. The QS-score
computation only considers contacts between residues with a mapping
between target and model. As a result, the score won’t be lowered in
case of additional chains/residues in any of the structures.
Type: | float |
---|
qs_global
¶Global QS-score
Computed based on model
using mapping
Type: | float |
---|
qs_scorer
¶QS scorer constructed from mapping
The scorer object is constructed with default parameters and relates to
model
and target
(no stereochecks).
Type: | ost.mol.alg.qsscore.QSScorer |
---|
rmsd
¶RMSD
Computed on transformed_mapped_model_pos
and
mapped_target_pos
Type: | float |
---|
stereochecked_aln
¶Stereochecked equivalent of aln
The alignments may differ, as stereochecks potentially remove residues
Type: | :class:`` |
---|
stereochecked_model
¶View of model
that has stereochemistry checks applied
First, a selection for peptide/nucleotide residues is performed, secondly peptide sidechains with stereochemical irregularities are removed (full residue if backbone atoms are involved). Irregularities are clashes or bond lengths/angles more than 12 standard deviations from expected values.
Type: | ost.mol.EntityView |
---|
stereochecked_target
¶Same as stereochecked_model
for target
Type: | ost.mol.EntityView |
---|
target
¶Target with Molck cleanup
Type: | ost.mol.EntityHandle |
---|
target_bad_angles
¶Target angles with unexpected stereochemistry
Type: | list of
ost.mol.alg.stereochemistry.AngleViolationInfo |
---|
target_bad_bonds
¶Target bonds with unexpected stereochemistry
Type: | list of
ost.mol.alg.stereochemistry.BondViolationInfo |
---|
target_clashes
¶Clashing target atoms
Type: | list of ost.mol.alg.stereochemistry.ClashInfo |
---|
target_interface_residues
¶Same as model_interface_residues
for target
Type: | dict with chain names as key and and list
with residue numbers of the respective interface residues. |
---|
transform
¶Transform: mapped_model_pos
onto mapped_target_pos
Computed using Kabsch minimal rmsd algorithm
Type: | ost.geom.Mat4 |
---|
transformed_mapped_model_pos
¶mapped_model_pos
with transform
applied
Type: | ost.geom.Vec3List |
---|
ligand_scoring
– Ligand scoring functions¶LigandScorer
(model, target, model_ligands=None, target_ligands=None, resnum_alignments=False, check_resnames=True, rename_ligand_chain=False, chain_mapper=None, substructure_match=False, radius=4.0, lddt_pli_radius=6.0, lddt_bs_radius=10.0, binding_sites_topn=100000)¶Scorer class to compute various small molecule ligand (non polymer) scores.
Note
Extra requirements:
pip install numpy networkx
)At the moment, two scores are available:
lDDT
.Both scores involve local chain mapping of the reference binding site onto the model, symmetry-correction, and finally assignment (mapping) of model and target ligands, as described in (Manuscript in preparation).
Results are available as matrices ((lddt_pli|rmsd)_matrix), where every target-model score is reported in a matrix; as (lddt_pli|rmsd) where a model-target assignment has been determined, starting from the “best” possible mapping and using each target and model ligand in a single assignment, and the results are reported in a dictionary; and as ((lddt_pli|rmsd)_details) methods, which report additional details about different aspects of the scoring such as chain mapping.
The class generally assumes that the
is_ligand
property is properly set on all
the ligand atoms, and only ligand atoms. This is typically the case for
entities loaded from mmCIF (tested with mmCIF files from the PDB and
SWISS-MODEL), but it will most likely not work for most entities loaded
from PDB files.
The class doesn’t perform any cleanup of the provided structures. It is up to the caller to ensure that the data is clean and suitable for scoring. Molck should be used with extra care, as many of the options (such as rm_non_std or map_nonstd_res) can cause ligands to be removed from the structure. If cleanup with Molck is needed, ligands should be kept and passed separately. Non-ligand residues should be valid compounds with atom names following the naming conventions of the component dictionary. Non-standard residues are acceptable, and if the model contains a standard residue at that position, only atoms with matching names will be considered.
Unlike most of OpenStructure, this class does not assume that the ligands (either in the model or the target) are part of the PDB component dictionary. They may have arbitrary residue names. Residue names do not have to match between the model and the target. Matching is based on the calculation of isomorphisms which depend on the atom element name and atom connectivity (bond order is ignored). It is up to the caller to ensure that the connectivity of atoms is properly set before passing any ligands to this class. Ligands with improper connectivity will lead to bogus results.
Note, however, that atom names should be unique within a residue (ie two distinct atoms cannot have the same atom name).
This only applies to the ligand. The rest of the model and target structures (protein, nucleic acids) must still follow the usual rules and contain only residues from the compound library.
Although it isn’t a requirement, hydrogen atoms should be removed from the structures. Here is an example code snippet that will perform a reasonable cleanup. Keep in mind that this is most likely not going to work as expected with entities loaded from PDB files, as the is_ligand flag is probably not set properly.
Here is a snippet example of how to use this code:
from ost.mol.alg.ligand_scoring import LigandScorer
from ost.mol.alg import Molck, MolckSettings
# Load data
# Structure model in PDB format, containing the receptor only
model = io.LoadPDB("path_to_model.pdb")
# Ligand model as SDF file
model_ligand = io.LoadEntity("path_to_ligand.sdf", format="sdf")
# Target loaded from mmCIF, containing the ligand
target, _ = io.LoadMMCIF("path_to_target.cif", seqres=True)
# Cleanup a copy of the structures
cleaned_model = model.Copy()
cleaned_target = target.Copy()
molck_settings = MolckSettings(rm_unk_atoms=True,
rm_non_std=False,
rm_hyd_atoms=True,
rm_oxt_atoms=False,
rm_zero_occ_atoms=False,
colored=False,
map_nonstd_res=False,
assign_elem=True)
Molck(cleaned_model, conop.GetDefaultLib(), molck_settings)
Molck(cleaned_target, conop.GetDefaultLib(), molck_settings)
# Setup scorer object and compute lDDT-PLI
model_ligands = [model_ligand.Select("ele != H")]
ls = LigandScorer(model=cleaned_model, target=cleaned_target, model_ligands=model_ligands)
print("lDDT-PLI:", ls.lddt_pli)
print("RMSD:", ls.rmsd)
Parameters: |
|
---|
chain_mapper
¶Chain mapper object for the given target
.
Type: | ost.mol.alg.chain_mapping.ChainMapper |
---|
rmsd_matrix
¶Get the matrix of RMSD values.
Target ligands are in rows, model ligands in columns.
NaN values indicate that no RMSD could be computed (i.e. different ligands).
Return type: | ndarray |
---|
lddt_pli_matrix
¶Get the matrix of lDDT-PLI values.
Target ligands are in rows, model ligands in columns.
NaN values indicate that no lDDT-PLI could be computed (i.e. different ligands).
Return type: | ndarray |
---|
rmsd
¶Get a dictionary of RMSD score values, keyed by model ligand
(chain name, ResNum
).
Return type: | dict |
---|
rmsd_details
¶Get a dictionary of RMSD score details (dictionaries), keyed by
model ligand (chain name, ResNum
).
Each sub-dictionary contains the following information:
ResidueHandle
)
that define the binding site in the reference.ResidueHandle
) in the reference binding site
that could be mapped to the model.ResidueHandle
) in the model that were mapped to
the reference binding site. The residues are in the same order as
bs_ref_res_mapped.ResidueView
) with residue names that differ
between the reference and the model, respectively.
The list is empty if all residue names match, which is guaranteed
if check_resnames=True.
Note: more binding site mappings may be explored during scoring,
but only inconsistencies in the selected mapping are reported.Return type: | dict |
---|
lddt_pli
¶Get a dictionary of lDDT-PLI score values, keyed by model ligand
(chain name, ResNum
).
Return type: | dict |
---|
lddt_pli_details
¶Get a dictionary of lDDT-PLI score details (dictionaries), keyed by
model ligand (chain name, ResNum
).
Each sub-dictionary contains the following information:
ResidueHandle
)
that define the binding site in the reference.ResidueHandle
) in the reference binding site
that could be mapped to the model.ResidueHandle
) in the model that were mapped to
the reference binding site. The residues are in the same order as
bs_ref_res_mapped.ResidueView
) with residue names that differ
between the reference and the model, respectively.
The list is empty if all residue names match, which is guaranteed
if check_resnames=True.
Note: more binding site mappings may be explored during scoring,
but only inconsistencies in the selected mapping are reported.Return type: | dict |
---|
SCRMSD
(model_ligand, target_ligand, transformation=geom.Mat4(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1), substructure_match=False)¶Calculate symmetry-corrected RMSD.
Binding site superposition must be computed separately and passed as transformation.
Parameters: |
|
---|---|
Return type: |
|
Raises: |
|
NoSymmetryError
¶Exception to be raised when no symmetry can be found.
chain_mapping
– Chain Mapping¶Chain mapping aims to identify a one-to-one relationship between chains in a reference structure and a model.
ChainMapper
(target, resnum_alignments=False, pep_seqid_thr=95.0, pep_gap_thr=1.0, nuc_seqid_thr=95.0, nuc_gap_thr=1.0, pep_subst_mat=<ost.seq.alg._ost_seq_alg.SubstWeightMatrix object>, pep_gap_open=-11, pep_gap_ext=-1, nuc_subst_mat=<ost.seq.alg._ost_seq_alg.SubstWeightMatrix object>, nuc_gap_open=-4, nuc_gap_ext=-4, min_pep_length=10, min_nuc_length=4, n_max_naive=100000000.0)¶Class to compute chain mappings
All algorithms are performed on processed structures which fulfill criteria as given in constructor arguments (min_pep_length, “min_nuc_length”) and only contain residues which have all required backbone atoms. for peptide residues thats N, CA, C and CB (no CB for GLY), for nucleotide residues thats O5’, C5’, C4’, C3’ and O3’.
Chain mapping is a three step process:
GetChemMapping()
or directly call
one of the full fletched one-to-one chain mapping functions which
execute that step internally.8! = 40320
possible chain mappings.Parameters: |
|
---|
Align
(s1, s2, stype)¶Access to internal sequence alignment functionality
Alignment parameterization is setup at ChainMapper construction
Parameters: |
|
---|---|
Returns: | Pairwise alignment of s1 and s2 |
GetChemMapping
(model)¶Maps sequences in model to chem_groups of target
Parameters: | model (ost.mol.EntityView /ost.mol.EntityHandle ) – Model from which to extract sequences, a
selection that only includes peptides and nucleotides
is performed and returned along other results. |
---|---|
Returns: | Tuple with two lists of length len(self.chem_groups) and
an ost.mol.EntityView representing model:
1) Each element is a list with mdl chain names that
map to the chem group at that position.
2) Each element is a ost.seq.AlignmentList aligning
these mdl chain sequences to the chem group ref sequences.
3) A selection of model that only contains polypeptides and
polynucleotides whose ATOMSEQ exactly matches the sequence
info in the returned alignments. |
GetNMappings
(model)¶Returns number of possible mappings
Parameters: | model (ost.mol.EntityView /ost.mol.EntityHandle ) – Model with chains that are mapped onto
chem_groups |
---|
GetQSScoreMapping
(model, contact_d=12.0, strategy='naive', full_n_mdl_chains=None, block_seed_size=5, block_blocks_per_chem_group=5, steep_opt_rate=None, chem_mapping_result=None)¶Identify chain mapping based on QSScore
Scoring is based on CA/C3’ positions which are present in all chains of
a chem_groups
as well as the model chains which are mapped to
that respective chem group. QS score is not defined for single chains.
The greedy strategies that require to identify starting seeds thus
often rely on single chain lDDTs.
The following strategies are available:
Sets MappingResult.opt_score
in case of no trivial one-to-one
mapping.
Parameters: |
|
---|---|
Returns: |
GetRepr
(substructure, model, topn=1, inclusion_radius=15.0, thresholds=[0.5, 1.0, 2.0, 4.0], bb_only=False, only_interchain=False, chem_mapping_result=None)¶Identify topn representations of substructure in model
substructure defines a subset of target
for which one
wants the topn representations in model. Representations are scored
and sorted by lDDT.
Parameters: |
|
---|---|
Returns: |
|
GetRigidMapping
(model, strategy='greedy_single_gdtts', single_chain_gdtts_thresh=0.4, subsampling=None, first_complete=False, iterative_superposition=False, chem_mapping_result=None)¶Identify chain mapping based on rigid superposition
Superposition and scoring is based on CA/C3’ positions which are present
in all chains of a chem_groups
as well as the model
chains which are mapped to that respective chem group.
Transformations to superpose model onto ChainMapper.target
are estimated using all possible combinations of target and model chains
within the same chem groups and build the basis for further extension.
There are four extension strategies:
Parameters: |
|
---|---|
Returns: |
GetlDDTMapping
(model, inclusion_radius=15.0, thresholds=[0.5, 1.0, 2.0, 4.0], strategy='naive', steep_opt_rate=None, full_n_mdl_chains=None, block_seed_size=5, block_blocks_per_chem_group=5, chem_mapping_result=None)¶Identify chain mapping by optimizing lDDT score
Maps model chain sequences to chem_groups
and find mapping
based on backbone only lDDT score (CA for amino acids C3’ for
Nucleotides).
Either performs a naive search, i.e. enumerate all possible mappings or executes a greedy strategy that tries to identify a (close to) optimal mapping in an iterative way by starting from a start mapping (seed). In each iteration, the one-to-one mapping that leads to highest increase in number of conserved contacts is added with the additional requirement that this added mapping must have non-zero interface counts towards the already mapped chains. So basically we’re “growing” the mapped structure by only adding connected stuff.
The available strategies:
Sets MappingResult.opt_score
in case of no trivial one-to-one
mapping.
Parameters: |
|
---|---|
Returns: |
ProcessStructure
(ent)¶Entity processing for chain mapping
ost.mol.EntityView
to each sequenceParameters: | ent (ost.mol.EntityView /ost.mol.EntityHandle ) – Entity to process |
---|---|
Returns: | Tuple with 3 elements: 1) ost.mol.EntityView
containing peptide and nucleotide residues 2)
ost.seq.SequenceList containing ATOMSEQ sequences
for each polypeptide chain in returned view, sequences have
ost.mol.EntityView of according chains attached
3) same for polynucleotide chains |
chem_group_alignments
¶MSA for each group in chem_groups
Sequences in MSAs exhibit same order as in chem_groups
and
have the respective ost.mol.EntityView
from target attached.
Getter: | Computed on first use (cached) |
---|---|
Type: | ost.seq.AlignmentList |
chem_group_ref_seqs
¶Reference (longest) sequence for each group in chem_groups
Respective EntityView
from target for each sequence s are
available as s.GetAttachedView()
Getter: | Computed on first use (cached) |
---|---|
Type: | ost.seq.SequenceList |
chem_group_types
¶ChemType of each group in chem_groups
Specifying if groups are poly-peptides/nucleotides, i.e.
ost.mol.ChemType.AMINOACIDS
or
ost.mol.ChemType.NUCLEOTIDES
Getter: | Computed on first use (cached) |
---|---|
Type: | list of ost.mol.ChemType |
chem_groups
¶Groups of chemically equivalent chains in target
First chain in group is the one with longest sequence.
Getter: | Computed on first use (cached) |
---|---|
Type: | list of list of str (chain names) |
polynuc_seqs
¶Sequences of nucleotide chains in target
Respective EntityView
from target for each sequence s are
available as s.GetAttachedView()
Type: | ost.seq.SequenceList |
---|
polypep_seqs
¶Sequences of peptide chains in target
Respective EntityView
from target for each sequence s are
available as s.GetAttachedView()
Type: | ost.seq.SequenceList |
---|
target
¶Target structure that only contains peptides/nucleotides
Contains only residues that have the backbone representatives (CA for peptide and C3’ for nucleotides) to avoid ATOMSEQ alignment inconsistencies when switching between all atom and backbone only representations.
Type: | ost.mol.EntityView |
---|
ReprResult
(lDDT, substructure, ref_view, mdl_view)¶Result object for ChainMapper.GetRepr()
Constructor is directly called within the function, no need to construct such objects yourself.
Parameters: |
|
---|
GetFlatChainMapping
(mdl_as_key=False)¶Returns flat mapping of all chains in the representation
Parameters: | mdl_as_key – Default is target chain name as key and model chain name as value. This can be reversed with this flag. |
---|---|
Returns: | dict with str as key/value that describe
one-to-one mapping |
JSONSummary
()¶Returns JSON serializable summary of results
bb_rmsd
¶RMSD between ref_bb_pos
and superposed_mdl_bb_pos
Type: | float |
---|
gdt_1
¶GDT with one single threshold: 1.0
Type: | float |
---|
gdt_2
¶GDT with one single threshold: 2.0
Type: | float |
---|
gdt_4
¶GDT with one single threshold: 4.0
Type: | float |
---|
gdt_8
¶GDT with one single threshold: 8.0
Type: | float |
---|
inconsistent_residues
¶A list of mapped residue whose names do not match (eg. ALA in the reference and LEU in the model).
The mismatches are reported as a tuple of ResidueView
(reference, model), or as an empty list if all the residue names match.
Type: | list |
---|
lDDT
¶lDDT of representation result
Depends on how you call ChainMapper.GetRepr()
whether this is
backbone only or full atom lDDT.
Type: | float |
---|
mdl_bb_pos
¶Representative backbone positions for model residues.
Thats CA positions for peptides and C3’ positions for Nucleotides.
Type: | geom.Vec3List |
---|
mdl_full_bb_pos
¶Representative backbone positions for reference residues.
Thats N, CA and C positions for peptides and O5’, C5’, C4’, C3’, O3’ positions for Nucleotides.
Type: | geom.Vec3List |
---|
mdl_residues
¶The model residues
Type: | mol.ResidueViewList |
---|
mdl_view
¶The ref_view
representation in the model
Type: | ost.mol.EntityView |
---|
ost_query
¶query for mdl residues in OpenStructure query language
Repr can be selected as full_mdl.Select(ost_query)
Returns invalid query if residue numbers have insertion codes.
Type: | str |
---|
ref_bb_pos
¶Representative backbone positions for reference residues.
Thats CA positions for peptides and C3’ positions for Nucleotides.
Type: | geom.Vec3List |
---|
ref_full_bb_pos
¶Representative backbone positions for reference residues.
Thats N, CA and C positions for peptides and O5’, C5’, C4’, C3’, O3’ positions for Nucleotides.
Type: | geom.Vec3List |
---|
ref_residues
¶The reference residues
Type: | class:mol.ResidueViewList |
---|
ref_view
¶View which contains the mapped subset of substructure
Type: | ost.mol.EntityView |
---|
substructure
¶The full substructure for which we searched for a representation
Type: | ost.mol.EntityView |
---|
superposed_mdl_bb_pos
¶mdl_bb_pos
with transform applied
Type: | geom.Vec3List |
---|
transform
¶Transformation to superpose mdl residues onto ref residues
Superposition computed as minimal RMSD superposition on
ref_bb_pos
and mdl_bb_pos
. If number of positions is
smaller 3, the full_bb_pos equivalents are used instead.
Type: | ost.geom.Mat4 |
---|
MappingResult
(target, model, chem_groups, chem_mapping, mapping, alns, opt_score=None)¶Result object for the chain mapping functions in ChainMapper
Constructor is directly called within the functions, no need to construct such objects yourself.
GetFlatMapping
(mdl_as_key=False)¶Returns flat mapping as dict
for all mapable chains
Parameters: | mdl_as_key – Default is target chain name as key and model chain name as value. This can be reversed with this flag. |
---|---|
Returns: | dict with str as key/value that describe
one-to-one mapping |
JSONSummary
()¶Returns JSON serializable summary of results
alns
¶Alignments of mapped chains in target
and model
Each alignment is accessible with alns[(t_chain,m_chain)]
. First
sequence is the sequence of target
chain, second sequence the
one from model
. The respective ost.mol.EntityView
are
attached with ost.seq.ConstSequenceHandle.AttachView()
.
Type: | dict with key: tuple of str , value:
ost.seq.AlignmentHandle |
---|
chem_groups
¶Groups of chemically equivalent chains in target
Same as ChainMapper.chem_group
list
of list
of str
(chain names)
chem_mapping
¶Assigns chains in model
to chem_groups
.
list
of list
of str
(chain names)
mapping
¶Mapping of model
chains onto target
Exact same shape as chem_groups
but containing the names of the
mapped chains in model
. May contain None for target
chains that are not covered. No guarantee that all chains in
model
are mapped.
list
of list
of str
(chain names)
model
¶Model structure that gets mapped onto target
Underwent same processing as ChainMapper.target
, i.e.
only contains peptide/nucleotide chains of sufficient size.
Type: | ost.mol.EntityView |
---|
opt_score
¶Placeholder property without any guarantee of being set
Different scores get optimized in the various chain mapping algorithms.
Some of them may set their final optimal score in that property.
Consult the documentation of the respective chain mapping algorithm
for more information. Won’t be in the return dict of
JSONSummary()
.
target
¶Target/reference structure, i.e. ChainMapper.target
Type: | ost.mol.EntityView |
---|
qsscoring
– QS score implementation¶Warning
The code that comes with Bertoni et al. is considered deprecated. QS score has been re-implemented to be tightly integrated with the chain mapping algorithms. The old code is still available and documented here.
QSEntity
(ent, contact_d=12.0)¶Helper object for QS-score computation
Holds structural information and getters for interacting chains, i.e. interfaces. Peptide residues are represented by their CB position (CA for GLY) and nucleotides by C3’.
Parameters: |
|
---|
GetPos
(chain_name)¶Get representative positions of chain
That’s CB positions for peptide residues (CA for GLY) and C3’ for nucleotides. Returns positions as a numpy array of shape (n_residues, 3).
Parameters: | chain_name (str ) – Chain in view |
---|
GetSequence
(chain_name)¶Get sequence of chain
Returns sequence of specified chain as raw str
Parameters: | chain_name (str ) – Chain in view |
---|
PairDist
(chain_name_one, chain_name_two)¶Get pairwise distances between two chains
Returns distances as numpy array of shape (a, b).
Where a is the number of residues of the chain that comes BEFORE the
other in chain_names
contact_d
¶Pairwise distance of residues to be considered as contacts
Given at QSEntity
construction
Type: | float |
---|
view
¶Processed structure
View that only contains representative atoms. That’s CB for peptide residues (CA for GLY) and C3’ for nucleotides.
Type: | ost.mol.EntityView |
---|
QSScorer
(target, chem_groups, model, alns, contact_d=12.0)¶Helper object to compute QS-score
Tightly integrated into the mechanisms from the chain_mapping module.
The prefered way to derive an object of type QSScorer
is through
the static constructor: FromMappingResult()
. Example score
computation including mapping:
from ost.mol.alg.qsscore import QSScorer
from ost.mol.alg.chain_mapping import ChainMapper
ent_1 = io.LoadPDB("path_to_assembly_1.pdb")
ent_2 = io.LoadPDB("path_to_assembly_2.pdb")
chain_mapper = ChainMapper(ent_1)
mapping_result = chain_mapper.GetlDDTMapping(ent_2)
qs_scorer = QSScorer.FromMappingResult(mapping_result)
score_result = qs_scorer.Score(mapping_result.mapping)
print("score:", score_result.QS_global)
QS-score computation in QSScorer.Score()
implements caching.
Repeated computations with alternative chain mappings thus become faster.
Parameters: |
|
---|
FromFlatMapping
(flat_mapping)¶Same as Score()
but with flat mapping
Parameters: | flat_mapping (dict with str as key and value) – Dictionary with target chain names as keys and
the mapped model chain names as value |
---|---|
Returns: | Result object of type QSScorerResult |
FromMappingResult
(mapping_result)¶The preferred way to get a QSScorer
Static constructor that derives an object of type QSScorer
using a ost.mol.alg.chain_mapping.MappingResult
Parameters: | mapping_result (ost.mol.alg.chain_mapping.MappingResult ) – Data source |
---|
Score
(mapping, check=True)¶Computes QS-score given chain mapping
Again, the preferred way is to get mapping is from an object
of type ost.mol.alg.chain_mapping.MappingResult
.
Parameters: |
|
---|---|
Returns: | Result object of type |
ScoreInterface
(trg_ch1, trg_ch2, mdl_ch1, mdl_ch2)¶Computes QS-score only considering one interface
This only works for interfaces that are computed in Score()
, i.e.
interfaces for which the alignments are set up correctly.
Parameters: |
|
---|---|
Returns: | Result object of type |
Raises: |
|
alns
¶Alignments between chains in qsent1
and qsent2
Provided at object construction. Each alignment is accessible with
alns[(t_chain,m_chain)]
. First sequence is the sequence of the
respective chain in qsent1
, second sequence the one from
qsent2
.
Type: | dict with key: tuple of str , value:
ost.seq.AlignmentHandle |
---|
chem_groups
¶Groups of chemically equivalent chains in target
Provided at object construction
Type: | list of list of str |
---|
DockQ
– DockQ implementation¶DockQ
(mdl, ref, mdl_ch1, mdl_ch2, ref_ch1, ref_ch2, ch1_aln=None, ch2_aln=None)¶Computes DockQ for specified interface
DockQ is described in: Sankar Basu and Bjoern Wallner (2016), “DockQ: A Quality Measure for Protein-Protein Docking Models”, PLOS one
Residues are mapped based on residue numbers by default. If you provide ch1_aln and ch2_aln you can enforce an arbitrary mapping.
Parameters: |
|
---|---|
Returns: |
|
The following function detects steric clashes in atomic structures. Two atoms are clashing if their euclidian distance is smaller than a threshold value (minus a tolerance offset).
FilterClashes
(entity, clashing_distances, always_remove_bb=False)¶This function filters out residues with non-bonded clashing atoms. If the clashing atom is a backbone atom, the complete residue is removed from the structure, if the atom is part of the sidechain, only the sidechain atoms are removed. This behavior is changed by the always_remove_bb flag: when the flag is set to True the whole residue is removed even if a clash is just detected in the side-chain.
The function returns a view containing all elements (residues, atoms) that
have not been removed from the input structure, plus a
ClashingInfo
object containing information about the
detected clashes.
Two atoms are defined as clashing if their distance is shorter than the reference distance minus a tolerance threshold. The information about the clashing distances and the tolerance thresholds for all possible pairs of atoms is passed to the function as a parameter.
Hydrogen and deuterium atoms are ignored by this function.
Parameters: |
|
---|---|
Returns: | A tuple of two elements: The filtered |
CheckStereoChemistry
(entity, bond_stats, angle_stats, bond_tolerance, angle_tolerance, always_remove_bb=False)¶This function filters out residues with severe stereo-chemical violations. If the violation involves a backbone atom, the complete residue is removed from the structure, if it involves an atom that is part of the sidechain, only the sidechain is removed. This behavior is changed by the always_remove_bb flag: when the flag is set to True the whole residue is removed even if a violation is just detected in the side-chain.
The function returns a view containing all elements (residues, atoms) that
have not been removed from the input structure, plus a
StereoChemistryInfo
object containing information about
the detected stereo-chemical violations.
A violation is defined as a bond length that lies outside of the range: [mean_length-std_dev*bond_tolerance, mean_length+std_dev*bond_tolerance] or an angle width outside of the range [mean_width-std_dev*angle_tolerance, mean_width+std_dev*angle_tolerance ]. The information about the mean lengths and widths and the corresponding standard deviations is passed to the function using two parameters.
Hydrogen and deuterium atoms are ignored by this function.
Parameters: |
|
---|---|
Returns: | A tuple of two elements: The filtered |
ClashingInfo
¶This object is returned by the FilterClashes()
function, and contains
information about the clashes detected by the function.
GetClashCount
()¶Returns: | number of clashes between non-bonded atoms detected in the input structure |
---|
GetAverageOffset
()¶Returns: | a value in Angstroms representing the average offset by which clashing atoms lie closer than the minimum acceptable distance (which of course differs for each possible pair of elements) |
---|
GetClashList
()¶Returns: | list of detected inter-atomic clashes |
---|---|
Return type: | list of ClashEvent |
ClashEvent
¶This object contains all the information relative to a single clash detected
by the FilterClashes()
function
GetModelDistance
()¶Returns: | distance (in Angstroms) between the two clashing atoms as observed in the model |
---|
GetAdjustedReferenceDistance
()¶Returns: | minimum acceptable distance (in Angstroms) between the two atoms
involved in the clash, as defined in ClashingDistances |
---|
StereoChemistryInfo
¶This object is returned by the CheckStereoChemistry()
function, and
contains information about bond lengths and planar angle widths in the
structure that diverge from the parameters tabulated by Engh and Huber in the
International Tables of Crystallography. Only elements that diverge from the
tabulated value by a minimumnumber of standard deviations (defined when the
CheckStereoChemistry function is called) are reported.
GetBadBondCount
()¶Returns: | number of bonds where a serious violation was detected |
---|
GetBondCount
()¶Returns: | total number of bonds in the structure checked by the CheckStereoChemistry function |
---|
GetAvgZscoreBonds
()¶Returns: | average z-score of all the bond lengths in the structure, computed using Engh and Huber’s mean and standard deviation values |
---|
GetBadAngleCount
()¶Returns: | number of planar angles where a serious violation was detected |
---|
GetAngleCount
()¶Returns: | total number of planar angles in the structure checked by the CheckStereoChemistry function |
---|
GetAvgZscoreAngles
()¶Returns: | average z-score of all the planar angle widths, computed using Engh and Huber’s mean and standard deviation values. |
---|
GetBondViolationList
()¶Returns: | list of bond length violations detected in the structure |
---|---|
Return type: | list of StereoChemicalBondViolation |
GetAngleViolationList
()¶Returns: | list of angle width violations detected in the structure |
---|---|
Return type: | list of StereoChemicalAngleViolation |
StereoChemicalBondViolation
¶This object contains all the information relative to a single detected violation of stereo-chemical parameters in a bond length
GetFirstAtom
()¶GetSecondAtom
()¶Returns: | first / second atom of the bond |
---|---|
Return type: | UniqueAtomIdentifier |
GetBondLength
()¶Returns: | length of the bond (in Angstroms) as observed in the model |
---|
GetAllowedRange
()¶Returns: | allowed range of bond lengths (in Angstroms), according to Engh and
Huber’s tabulated parameters and the tolerance threshold used when
the CheckStereoChemistry() function was called |
---|---|
Return type: | tuple (minimum and maximum) |
StereoChemicalAngleViolation
¶This object contains all the information relative to a single detected violation of stereo-chemical parameters in a planar angle width
GetFirstAtom
()¶GetSecondAtom
()¶GetThirdAtom
()¶Returns: | first / second (vertex) / third atom that defines the planar angle |
---|---|
Return type: | UniqueAtomIdentifier |
GetAngleWidth
()¶Returns: | width of the planar angle (in degrees) as observed in the model |
---|
GetAllowedRange
()¶Returns: | allowed range of angle widths (in degrees), according to Engh and
Huber’s tabulated parameters and the tolerance threshold used when
the CheckStereoChemistry() function was called |
---|---|
Return type: | tuple (minimum and maximum) |
ClashingDistances
¶Object containing information about clashing distances between non-bonded atoms
ClashingDistances
()¶Creates an empty distance list
SetClashingDistance
(ele1, ele2, clash_distance, tolerance)¶Adds or replaces an entry in the list
Parameters: |
|
---|
GetClashingDistance
(ele1, ele2)¶Returns: | reference distance and a tolerance threshold (both in Angstroms) for two elements |
---|---|
Return type: |
|
Parameters: |
|
GetAdjustedClashingDistance
(ele1, ele2)¶Returns: | reference distance (in Angstroms) for two elements, already adjusted by the tolerance threshold |
---|---|
Parameters: |
|
GetMaxAdjustedDistance
()¶Returns: | longest clashing distance (in Angstroms) in the list, after adjustment with tolerance threshold |
---|
IsEmpty
()¶Returns: | True if the list is empty (i.e. in an invalid, useless state) |
---|
PrintAllDistances
()¶Prints all distances in the list to standard output
StereoChemicalParams
¶Object containing stereo-chemical information about bonds and angles. For each item (bond or angle in a specific residue), stores the mean and standard deviation
StereoChemicalParams
()¶Creates an empty parameter list
SetParam
(item, residue, mean, standard_dev)¶Adds or replaces an entry in the list
Parameters: |
|
---|
IsEmpty
()¶Returns: | True if the list is empty (i.e. in an invalid, useless state) |
---|
PrintAllParameters
()¶Prints all entries in the list to standard output
FillClashingDistances
(file_content)¶FillBondStereoChemicalParams
(file_content)¶FillAngleStereoChemicalParams
(file_content)¶These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, starting from the content of a parameter file.
Parameters: | file_content (list of str ) – list of lines from the parameter file |
---|---|
Return type: | ClashingDistances or
StereoChemicalParams |
FillClashingDistancesFromFile
(filename)¶FillBondStereoChemicalParamsFromFile
(filename)¶FillAngleStereoChemicalParamsFromFile
(filename)¶These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, starting from a file path.
Parameters: | filename (str ) – path to parameter file |
---|---|
Return type: | ClashingDistances or
StereoChemicalParams |
DefaultClashingDistances
()¶DefaultBondStereoChemicalParams
()¶DefaultAngleStereoChemicalParams
()¶These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, using the default parameter files distributed with OpenStructure.
Return type: | ClashingDistances or
StereoChemicalParams |
---|
ResidueNamesMatch
(probe, reference)¶The function requires a reference structure and a probe structure. The function checks that all the residues in the reference structure that appear in the probe structure (i.e., that have the same ResNum) are of the same residue type. Chains are compared by order, not by chain name (i.e.: the first chain of the reference will be compared with the first chain of the probe structure, etc.)
Parameters: |
|
---|---|
Returns: | True if the residue names are the same, False otherwise |
Superpose
(ent_a, ent_b, match='number', atoms='all', iterative=False, max_iterations=5, distance_threshold=3.0)¶Superposes the model entity onto the reference. To do so, two views are
created, returned with the result. atoms describes what goes into these
views and match the selection method. For superposition,
SuperposeSVD()
or IterativeSuperposeSVD()
are called (depending on
iterative). For matching, the following methods are recognised:
number
- select residues by residue number, includes atoms, calls
MatchResidueByNum()
index
- select residues by index in chain, includes atoms, calls
MatchResidueByIdx()
local-aln
- select residues from a Smith/Waterman alignment, includes
atoms, calls MatchResidueByLocalAln()
global-aln
- select residues from a Needleman/Wunsch alignment, includes
atoms, calls MatchResidueByGlobalAln()
Parameters: |
|
---|---|
Returns: | An instance of |
ParseAtomNames
(atoms)¶Parses different representations of a list of atom names and returns a
set
, understandable by MatchResidueByNum()
. In
essence, this function translates
None
None
set(['N', 'CA', 'C', 'O'])
set(['aname1', 'aname2'])
['aname1', 'aname2']
to set(['aname1', 'aname2'])
Parameters: | atoms (str , list , set ) – Identifier or list of atoms |
---|---|
Returns: | A set of atoms. |
MatchResidueByNum
(ent_a, ent_b, atoms='all')¶Returns a tuple of views containing exactly the same number of atoms. Residues are matched by residue number. A subset of atoms to be included in the views can be specified in the atoms argument. Regardless of what the list of atoms says, only those present in two matched residues will be included in the views. Chains are processed in the order they occur in the entities. If ent_a and ent_b contain a different number of chains, processing stops with the lower count.
Parameters: |
|
---|---|
Returns: | Two |
MatchResidueByIdx
(ent_a, ent_b, atoms='all')¶Returns a tuple of views containing exactly the same number of atoms. Residues are matched by position in the chains of an entity. A subset of atoms to be included in the views can be specified in the atoms argument. Regardless of what the list of atoms says, only those present in two matched residues will be included in the views. Chains are processed in order of appearance. If ent_a and ent_b contain a different number of chains, processing stops with the lower count. The number of residues per chain is supposed to be the same.
Parameters: |
|
---|---|
Returns: | Two |
MatchResidueByLocalAln
(ent_a, ent_b, atoms='all')¶Match residues by local alignment. Takes ent_a and ent_b, extracts the
sequences chain-wise and aligns them in Smith/Waterman manner using the
BLOSUM62 matrix for scoring. Only residues which are marked as peptide
linking
are considered for alignment.
The residues of the entities are then matched based on this alignment. Only
atoms present in both residues are included in the views. Chains are processed
in order of appearance. If ent_a and ent_b contain a different number
of chains, processing stops with the lower count.
Parameters: |
|
---|---|
Returns: | Two |
MatchResidueByGlobalAln
(ent_a, ent_b, atoms='all')¶Match residues by global alignment.
Same as MatchResidueByLocalAln()
but performs a global Needleman/Wunsch
alignment of the sequences using the BLOSUM62 matrix for scoring.
Parameters: |
|
---|---|
Returns: | Two |
SuperpositionResult
¶rmsd
¶RMSD of the superposed entities.
view1
¶view2
¶Two EntityView
used in superposition (not set if methods
with Vec3List
used).
fraction_superposed
¶rmsd_superposed_atoms
¶ncycles
¶For iterative superposition (IterativeSuperposeSVD()
): fraction and
RMSD of atoms that were superposed with a distance below the given
threshold and the number of iteration cycles performed.
SuperposeSVD
(view1, view2, apply_transform=True)¶SuperposeSVD
(list1, list2)Superposition of two sets of atoms minimizing RMSD using a classic SVD based algorithm.
Note that the atom positions in the view are taken blindly in the order in which the atoms appear.
Parameters: |
|
---|---|
Returns: | An instance of |
IterativeSuperposeSVD
(view1, view2, max_iterations=5, distance_threshold=3.0, apply_transform=True)¶IterativeSuperposeSVD
(list1, list2, max_iterations=5, distance_threshold=3.0)Iterative superposition of two sets of atoms. In each iteration cycle, we keep a fraction of atoms with distances below distance_threshold and get the superposition considering only those atoms.
Note that the atom positions in the view are taken blindly in the order in which the atoms appear.
Parameters: |
|
---|---|
Returns: | An instance of |
Raises: | Exception if atom counts do not match or if less than 3 atoms. |
CalculateRMSD
(view1, view2, transformation=geom.Mat4())¶Returns: | RMSD of atom positions (taken blindly in the order in which the atoms appear) in the two given views. |
---|---|
Return type: |
|
Parameters: |
|
Accessibility
(ent, probe_radius=1.4, include_hydrogens=False, include_hetatm=False, include_water=False, oligo_mode=False, selection="", asa_abs="asaAbs", asa_rel="asaRel", asa_atom="asaAtom", algorithm = NACCESS)¶Calculates the accesssible surface area for ever atom in ent. The algorithm mimics the behaviour of the bindings available for the NACCESS and DSSP tools and has been tested to reproduce the numbers accordingly.
Parameters: |
|
---|---|
Returns: | The summed solvent accessibilty of each atom in ent. |
AccessibilityAlgorithm
¶The accessibility algorithm enum specifies the algorithm used by the respective tools. Available are:
NACCESS, DSSP
AssignSecStruct
(ent)¶Assigns secondary structures to all residues based on hydrogen bond patterns as described by DSSP.
Parameters: | ent (EntityView /
EntityHandle ) – Entity on which to assign secondary structures |
---|
FindMemParam
¶Result object for the membrane detection algorithm described below
axis
¶initial search axis from which optimal membrane slab could be found
tilt_axis
¶Axis around which we tilt the membrane starting from the initial axis
tilt
¶Angle to tilt around tilt axis
angle
¶After the tilt operation we perform a rotation around the initial axis with this angle to get the final membrane axis
membrane_axis
¶The result of applying the tilt and rotation procedure described above. The membrane_axis is orthogonal to the membrane plane and has unit length.
pos
¶Real number that describes the membrane center point. To get the actual position you can do: pos * membrane_axis
width
¶Total width of the membrane in A
energy
¶Pseudo energy of the implicit solvation model
membrane_asa
¶Membrane accessible surface area
membrane_representation
¶Dummy atoms that represent the membrane. This entity is only valid if the according flag has been set to True when calling FindMembrane.
FindMembrane
(ent, assign_membrane_representation=True, fast=False)¶Estimates the optimal membrane position of a protein by using an implicit solvation model. The original algorithm and the used energy function are described in: Lomize AL, Pogozheva ID, Lomize MA, Mosberg HI (2006) Positioning of proteins in membranes: A computational approach.
There are some modifications in this implementation and the procedure is as follows:
FindMemParam
). The top 20 parametrizations
(only top parametrization if fast is True) are stored for further
processing.Parameters: |
|
---|---|
Returns: | The results object |
Return type: |
This is a set of functions used for basic trajectory analysis such as extracting
positions, distances, angles and RMSDs. The organization is such that most
functions have their counterpart at the individual frame level
so that they can also be called on one frame instead of
the whole trajectory.
All these functions have a “stride” argument that defaults to stride=1, which is used to skip frames in the analysis.
SuperposeFrames
(frames, sel, from=0, to=-1, ref=-1)¶This function superposes the frames of the given coord group and returns them as a new coord group.
Parameters: |
|
---|---|
Returns: | A newly created coord group containing the superposed frames. |
SuperposeFrames
(frames, sel, ref_view, from=0, to=-1)Same as SuperposeFrames above, but the superposition is done on a reference view and not on another frame of the trajectory.
Parameters: |
|
---|---|
Returns: | A newly created coord group containing the superposed frames. |
AnalyzeAtomPos
(traj, atom1, stride=1)¶This function extracts the position of an atom from a trajectory. It returns a vector containing the position of the atom for each analyzed frame.
Parameters: |
|
---|
AnalyzeCenterOfMassPos
(traj, sele, stride=1)¶This function extracts the position of the center-of-mass of a selection
(EntityView
) from a trajectory and returns it as a vector.
Parameters: |
|
---|
AnalyzeDistanceBetwAtoms
(traj, atom1, atom2, stride=1)¶This function extracts the distance between two atoms from a trajectory and returns it as a vector.
Parameters: |
|
---|
AnalyzeAngle
(traj, atom1, atom2, atom3, stride=1)¶This function extracts the angle between three atoms from a trajectory and returns it as a vector. The second atom is taken as being the central atom, so that the angle is between the vectors (atom1.pos-atom2.pos) and (atom3.pos-atom2.pos).
Parameters: |
|
---|
AnalyzeDihedralAngle
(traj, atom1, atom2, atom3, atom4, stride=1)¶This function extracts the dihedral angle between four atoms from a trajectory and returns it as a vector. The angle is between the planes containing the first three and the last three atoms.
Parameters: |
|
---|
AnalyzeDistanceBetwCenterOfMass
(traj, sele1, sele2, stride=1)¶This function extracts the distance between the center-of-mass of two
selections (EntityView
) from a trajectory and returns it as
a vector.
Parameters: |
|
---|
AnalyzeRMSD
(traj, reference_view, sele_view, stride=1)¶This function extracts the rmsd between two EntityView
and
returns it as a vector. The views don’t have to be from the same entity. The
reference positions are taken directly from the reference_view, evaluated only
once. The positions from the sele_view are evaluated for each frame.
If you want to compare to frame i of the trajectory t, first use
t.CopyFrame(i) for example:
eh = io.LoadPDB(...)
t = io.LoadCHARMMTraj(eh, ...)
sele = eh.Select(...)
t.CopyFrame(0)
mol.alg.AnalyzeRMSD(t, sele, sele)
Parameters: |
|
---|
AnalyzeMinDistance
(traj, view1, view2, stride=1)¶This function extracts the minimal distance between two sets of atoms (view1 and view2) for each frame in a trajectory and returns it as a vector.
Parameters: |
|
---|
AnalyzeMinDistanceBetwCenterOfMassAndView
(traj, view_cm, view_atoms, stride=1)¶This function extracts the minimal distance between a set of atoms (view_atoms) and the center of mass of a second set of atoms (view_cm) for each frame in a trajectory and returns it as a vector.
Parameters: |
|
---|
AnalyzeAromaticRingInteraction
(traj, view_ring1, view_ring2, stride=1)¶This function is a crude analysis of aromatic ring interactions. For each frame in a trajectory, it calculates the minimal distance between the atoms in one view and the center of mass of the other and vice versa, and returns the minimum between these two minimal distances. Concretely, if the two views are the heavy atoms of two rings, then it returns the minimal center of mass - heavy atom distance betweent he two rings
Parameters: |
|
---|
helix_kinks
– Algorithms to calculate Helix Kinks¶Functions to calculate helix kinks: bend, face shift and wobble angles
Author: Niklaus Johner
AnalyzeHelixKink
(t, sele, proline=False)¶This function calculates the bend, wobble and face-shift angles in an alpha- helix over a trajectory. The determination is more stable if there are at least 4 residues on each side (8 is even better) of the proline around which the helix is kinked. The selection should contain all residues in the correct order and with no gaps and no missing C-alphas.
Parameters: |
|
---|---|
Returns: | A tuple (bend_angle, face_shift, wobble_angle). |
Return type: | (FloatList, FLoatList, FloatList) |
CalculateHelixKink
(sele, proline=False)¶This function calculates the bend, wobble and face-shift angles in an alpha- helix of an EntityView. The determination is more stable if there are at least 4 residues on each side (8 is even better) of the proline around which the helix is kinked. The selection should contain all residues in the correct order and with no gaps and no missing C-alphas.
Parameters: |
|
---|---|
Returns: | A tuple (bend_angle, face_shift, wobble_angle). |
Return type: | (float, float, float) |
trajectory_analysis
– DRMSD, pairwise distances and more¶This Module requires numpy
This module contains functions to analyze trajectories, mainly similiraty measures baed on RMSDS and pairwise distances.
Author: Niklaus Johner (niklaus.johner@unibas.ch)
AverageDistanceMatrixFromTraj
(t, sele, first=0, last=-1)¶This function calcultes the distance between each pair of atoms in sele, averaged over the trajectory t.
Parameters: |
|
---|---|
Returns: | a numpy NpairsxNpairs matrix, where Npairs is the number of atom pairs in sele. |
DistRMSDFromTraj
(t, sele, ref_sele, radius=7.0, average=False, seq_sep=4, first=0, last=-1)¶This function calculates the distance RMSD from a trajectory. The distances selected for the calculation are all the distances between pair of atoms from residues that are at least seq_sep apart in the sequence and that are smaller than radius in ref_sel. The number and order of atoms in ref_sele and sele should be the same.
Parameters: |
|
---|---|
Returns: | a numpy vecor dist_rmsd(Nframes). |
DistanceMatrixFromPairwiseDistances
(distances, p=2)¶This function calculates an distance matrix M(NframesxNframes) from the pairwise distances matrix D(NpairsxNframes), where Nframes is the number of frames in the trajectory and Npairs the number of atom pairs. M[i,j] is the distance between frame i and frame j calculated as a p-norm of the differences in distances from the two frames (distance-RMSD for p=2).
Parameters: |
|
---|---|
Returns: | a numpy NframesxNframes matrix, where Nframes is the number of frames. |
PairwiseDistancesFromTraj
(t, sele, first=0, last=-1, seq_sep=1)¶This function calculates the distances between any pair of atoms in sele with sequence separation larger than seq_sep from a trajectory t. It return a matrix containing one line for each atom pair and Nframes columns, where Nframes is the number of frames in the trajectory.
Parameters: |
|
---|---|
Returns: | a numpy NpairsxNframes matrix. |
RMSD_Matrix_From_Traj
(t, sele, first=0, last=-1, align=True, align_sele=None)¶This function calculates a matrix M such that M[i,j] is the RMSD (calculated on sele) between frames i and j of the trajectory t aligned on sele.
Parameters: |
|
---|---|
Returns: | Returns a numpy NframesxNframes matrix, where Nframes is the number of frames. |
structure_analysis
– Functions to analyze structures¶Some functions for analyzing structures
Author: Niklaus Johner (Niklaus.Johner@unibas.ch)
CalculateBestFitLine
(sele1)¶This function calculates the best fit line to the atoms in sele1.
Parameters: | sele1 (EntityView ) – |
---|---|
Returns: | Line3 |
CalculateBestFitPlane
(sele1)¶This function calculates the best fit plane to the atoms in sele1.
Parameters: | sele1 (EntityView ) – |
---|---|
Returns: | Plane |
CalculateDistanceDifferenceMatrix
(sele1, sele2)¶This function calculates the pairwise distance differences between two selections (EntityView
).
The two selections should have the same number of atoms
It returns an NxN DistanceDifferenceMatrix M (where N is the number of atoms in sele1)
where M[i,j]=||(sele2.atoms[i].pos-sele2.atoms[j].pos)||-||(sele1.atoms[i].pos-sele1.atoms[j].pos)||
Parameters: |
|
---|---|
Returns: | NxN numpy matrix |
CalculateHelixAxis
(sele1)¶This function calculates the best fit cylinder to the CA atoms in sele1, and returns its axis. Residues should be ordered correctly in sele1.
Parameters: | sele1 (EntityView ) – |
---|---|
Returns: | Line3 |
GetAlphaHelixContent
(sele1)¶This function calculates the content of alpha helix in a view. All residues in the view have to ordered and adjacent (no gaps allowed)
Parameters: | sele1 (EntityView ) – |
---|---|
Returns: | float |
GetDistanceBetwCenterOfMass
(sele1, sele2)¶This function calculates the distance between the centers of mass of sele1 and sele2, two selections from the same Entity.
Parameters: |
|
---|---|
Returns: |
|
GetFrameFromEntity
(eh)¶This function returns a CoordFrame from an EntityHandle
Parameters: | eh (EntityHandle ) – |
---|---|
Returns: | ost.mol.CoordFrame |
GetMinDistBetwCenterOfMassAndView
(sele1, sele2)¶This function calculates the minimal distance between sele2 and the center of mass of sele1, two selections from the same Entity.
Parameters: |
|
---|---|
Returns: | distance ( |
GetMinDistanceBetweenViews
(sele1, sele2)¶This function calculates the minimal distance between sele1 and sele2, two selections from the same Entity.
Parameters: |
|
---|---|
Returns: |
|
The following functions help to convert one residue into another by reusing as much as possible from the present atoms. They are mainly meant to map from standard amino acid to other standard amino acids or from modified amino acids to standard amino acids.
CopyResidue
(src_res, dst_res, editor)¶Copies the atoms of src_res
to dst_res
using the residue names
as guide to decide which of the atoms should be copied. If src_res
and
dst_res
have the same name, or src_res
is a modified version of
dst_res
(i.e. have the same single letter code), CopyConserved()
will be called, otherwise CopyNonConserved()
.
If a CBeta atom wasn’t already copied from src_res
, a new one at a
reconstructed position will be added to dst_res
if it is not GLY
and
all backbone positions are available to do it.
Parameters: |
|
---|---|
Returns: | True if the residue could be copied as a conserved residue,
False if it had to fallback to |
CopyConserved
(src_res, dst_res, editor)¶Copies the atoms of src_res
to dst_res
assuming that the parent
amino acid of src_res
(or src_res
itself) are identical to dst_res
.
If src_res
and dst_res
are identical, all heavy atoms are copied
to dst_res
. If src_res
is a modified version of dst_res
and the
modification is a pure addition (e.g. the phosphate group of phosphoserine),
the modification is stripped off and all other heavy atoms are copied to
dst_res
. If the modification is not a pure addition, it falls back to
CopyNonConserved()
.
Additionally, the selenium atom of MSE
is converted to sulphur to map
MSE
to MET
.
Parameters: |
|
---|---|
Returns: | A tuple of bools stating whether the residue could be copied without
falling back to |
CopyNonConserved
(src_res, dst_res, editor)¶Copies the heavy backbone atoms and CBeta (except for GLY
) of src_res
to dst_res
.
Parameters: |
|
---|---|
Returns: | A tuple of bools as in |
Molecular Checker (Molck) could be called directly from the code using Molck function:
#! /bin/env python
"""Run Molck with Python API.
This is an exemplary procedure on how to run Molck using Python API which is
equivalent to the command line:
molck <PDB PATH> --rm=hyd,oxt,nonstd,unk \
--fix-ele --out=<OUTPUT PATH> \
--complib=<PATH TO compounds.chemlib>
"""
from ost.io import LoadPDB, SavePDB
from ost.mol.alg import MolckSettings, Molck
from ost.conop import CompoundLib
pdbid = "<PDB PATH>"
lib = CompoundLib.Load("<PATH TO compounds.chemlib>")
# Using Molck function
ent = LoadPDB(pdbid)
ms = MolckSettings(rm_unk_atoms=True,
rm_non_std=True,
rm_hyd_atoms=True,
rm_oxt_atoms=True,
rm_zero_occ_atoms=False,
colored=False,
map_nonstd_res=False,
assign_elem=True)
Molck(ent, lib, ms)
SavePDB(ent, "<OUTPUT PATH>")
It can also be split into subsequent commands for greater controll:
#! /bin/env python
"""Run Molck with Python API.
This is an exemplary procedure on how to run Molck using Python API which is
equivalent to the command line:
molck <PDB PATH> --rm=hyd,oxt,nonstd,unk \
--fix-ele --out=<OUTPUT PATH> \
--complib=<PATH TO compounds.chemlib>
"""
from ost.io import LoadPDB, SavePDB
from ost.mol.alg import (RemoveAtoms, MapNonStandardResidues,
CleanUpElementColumn)
from ost.conop import CompoundLib
pdbid = "<PDB PATH>"
lib = CompoundLib.Load("<PATH TO compounds.chemlib>")
map_nonstd = False
# Using function chain
ent = LoadPDB(pdbid)
if map_nonstd:
MapNonStandardResidues(lib=lib, ent=ent)
RemoveAtoms(lib=lib,
ent=ent,
rm_unk_atoms=True,
rm_non_std=True,
rm_hyd_atoms=True,
rm_oxt_atoms=True,
rm_zero_occ_atoms=False,
colored=False)
CleanUpElementColumn(lib=lib, ent=ent)
SavePDB(ent, "<OUTPUT PATH>")
MolckSettings
(rm_unk_atoms=False, rm_non_std=False, rm_hyd_atoms=True, rm_oxt_atoms=False, rm_zero_occ_atoms=False, colored=False, map_nonstd_res=True, assign_elem=True)¶Stores settings used for Molecular Checker.
Parameters: |
|
---|
rm_unk_atoms
¶Remove unknown and atoms not following the nomenclature.
Type: | bool |
---|
rm_non_std
¶Remove all residues not one of the 20 standard amino acids
Type: | bool |
---|
rm_hyd_atoms
¶Remove hydrogen atoms
Type: | bool |
---|
rm_oxt_atoms
¶Remove terminal oxygens
Type: | bool |
---|
rm_zero_occ_atoms
¶Remove atoms with zero occupancy
Type: | bool |
---|
colored
¶Whether output should be colored
Type: | bool |
---|
map_nonstd_res
¶Maps modified residues back to the parent amino acid, for example MSE -> MET, SEP -> SER
Type: | bool |
---|
assign_elem
¶Clean up element column
Type: | bool |
---|
ToString
()¶Returns: | String representation of the MolckSettings. |
---|---|
Return type: | str |
Warning
The API here is set such that the functions modify the passed structure ent in-place. If this is not ok, please work on a copy of the structure.
Molck
(ent, lib, settings[, prune=True])¶Runs Molck on provided entity. Reprocesses ent with
ost.conop.HeuristicProcessor
and given lib once done.
Parameters: |
|
---|
MapNonStandardResidues
(ent, lib, reprocess=True)¶Maps modified residues back to the parent amino acid, for example MSE -> MET.
Parameters: |
|
---|
RemoveAtoms(ent, lib, rm_unk_atoms=False, rm_non_std=False, rm_hyd_atoms=True, rm_oxt_atoms=False, rm_zero_occ_atoms=False, colored=False,
reprocess=True)
Removes atoms and residues according to some criteria.
Parameters: |
|
---|
CleanUpElementColumn
(ent, lib)¶Clean up element column.
Parameters: |
|
---|
mol.alg
– Algorithms for Structuresstereochemistry
– Stereochemistry Checksscoring
– Specialized scoring functionsligand_scoring
– Ligand scoring functionschain_mapping
– Chain Mappingqsscoring
– QS score implementationDockQ
– DockQ implementationhelix_kinks
– Algorithms to calculate Helix Kinkstrajectory_analysis
– DRMSD, pairwise distances and morestructure_analysis
– Functions to analyze structuresEnter search terms or a module, class or function name.
mol.alg
– Algorithms for Structures