alg
– Algorithms for Structures¶
Submodules¶
chain_mapping
– Chain Mappingcontact_score
– Contact-Based Scoresdockq
– DockQ Implementationhelix_kinks
– Algorithms to Calculate Helix Kinksligand_scoring
– Ligand scoring functionsqsscore
– New QS Score Implementationscoring
– Specialized Scoring Functionsstereochemistry
– Stereochemistry Checksstructure_analysis
– Functions to Analyze Structurestrajectory_analysis
– DRMSD, Pairwise Distances and More
Local Distance Test scores (lDDT, DRMSD)¶
Note
This is a new implementation of lDDT, introduced in OpenStructure 2.4 with focus on supporting quaternary structure and compounds beyond the 20 standard proteinogenic amino acids. The previous lDDT code that comes with Mariani et al. is considered deprecated.
Note
lddt.lDDTScorer
provides the raw Python API to compute lDDT but
stereochemistry checks as described in
Mariani et al.
must be done seperately. You may want to check out the
compare-structures
action (Comparing two structures) to
compute lDDT with pre-processing and support for quaternary structures.
- class lDDTScorer(target, compound_lib=None, custom_compounds=None, inclusion_radius=15, sequence_separation=0, symmetry_settings=None, seqres_mapping={}, bb_only=False)¶
lDDT scorer object for a specific target
Sets up everything to score models of that target. lDDT (local distance difference test) is defined as fraction of pairwise distances which exhibit a difference < threshold when considering target and model. In case of multiple thresholds, the average is returned. See
V. Mariani, M. Biasini, A. Barbato, T. Schwede, lDDT : A local superposition-free score for comparing protein structures and models using distance difference tests, Bioinformatics, 2013
- Parameters:
target (
ost.mol.EntityHandle
/ost.mol.EntityView
) – The targetcompound_lib (
ost.conop.CompoundLib
) – Compound library from which a compound for each residue is extracted based on its name. Usesost.conop.GetDefaultLib()
if not given, raises if this returns no valid compound library. Atoms defined in the compound are searched in the residue and build the reference for scoring. If the residue has atoms with names [“A”, “B”, “C”] but the corresponding compound only has [“A”, “B”], “A” and “B” are considered for scoring. If the residue has atoms [“A”, “B”] but the compound has [“A”, “B”, “C”], “C” is considered missing and does not influence scoring, even if present in the model.custom_compounds (
dict
with residue names (str
) as key andCustomCompound
as value.) – Custom compounds defining reference atoms. If given, custom_compounds take precedent over compound_lib.inclusion_radius (
float
) – All pairwise distances < inclusion_radius are considered for scoringsequence_separation (
int
) – Only pairwise distances between atoms of residues which are further apart than this threshold are considered. Residue distance is based on resnum. The default (0) considers all pairwise distances except intra-residue distances.symmetry_settings (
SymmetrySettings
) – Define residues exhibiting internal symmetry, usesGetDefaultSymmetrySettings()
if not given.seqres_mapping (
dict
(key:str
, value:ost.seq.AlignmentHandle
)) – Mapping of model residues at the scoring stage happens with residue numbers defining their location in a reference sequence (SEQRES) using one based indexing. If the residue numbers in target don’t correspond to that SEQRES, you can specify the mapping manually. You can provide a dictionary to specify a reference sequence (SEQRES) for one or more chain(s). Key: chain name, value: alignment (seq1: SEQRES, seq2: sequence of residues in chain). Example: The residues in a chain with name “A” have sequence “YEAH” and residue numbers [42,43,44,45]. You can provide an alignment with seq1 “HELLYEAH
” and seq2 “----YEAH
”. “Y” gets assigned residue number 5, “E” gets assigned 6 and so on no matter what the original residue numbers were.bb_only (
bool
) – Only consider atoms with name “CA” in case of amino acids and “C3’” for Nucleotides. this invalidates compound_lib. Raises if any residue in target is not r.chem_class.IsPeptideLinking() or r.chem_class.IsNucleotideLinking()
- Raises:
RuntimeError
if target contains compound which is not in compound_lib,RuntimeError
if symmetry_settings specifies symmetric atoms that are not present in the according compound in compound_lib,RuntimeError
if seqres_mapping is not provided and target contains residue numbers with insertion codes or the residue numbers for each chain are not monotonically increasing,RuntimeError
if seqres_mapping is provided but an alignment is invalid (seq1 contains gaps, mismatch in seq1/seq2, seq2 does not match residues in corresponding chains).
- GetNChainContacts(target_chain, no_interchain=False)¶
Returns number of contacts expected for a certain chain in target
- Parameters:
target_chain (
str
) – Chain in target for which you want the number of expected contactsno_interchain (
bool
) – Whether to exclude interchain contacts
- Raises:
RuntimeError
if specified chain doesnt exist
- lDDT(model, thresholds=[0.5, 1.0, 2.0, 4.0], local_lddt_prop=None, local_contact_prop=None, chain_mapping=None, no_interchain=False, no_intrachain=False, penalize_extra_chains=False, residue_mapping=None, return_dist_test=False, check_resnames=True, add_mdl_contacts=False, interaction_data=None, set_atom_props=False)¶
Computes lDDT of model - globally and per-residue
- Parameters:
model (
ost.mol.EntityHandle
/ost.mol.EntityView
) – Model to be scored - models are preferably scored upon performing stereo-chemistry checks in order to punish for non-sensical irregularities. This must be done separately as a pre-processing step. Target contacts that are not covered by model are considered not conserved, thus decreasing lDDT score. This also includes missing model chains or model chains for which no mapping is provided in chain_mapping.thresholds (
list
offloats
) – Thresholds of distance differences to be considered as correct - see docs in constructor for more info. default: [0.5, 1.0, 2.0, 4.0]local_lddt_prop (
str
) – If set, per-residue scores will be assigned as generic float property of that namelocal_contact_prop (
str
) – If set, number of expected contacts as well as number of conserved contacts will be assigned as generic int property. Excected contacts will be set as <local_contact_prop>_exp, conserved contacts as <local_contact_prop>_cons. Values are summed over all thresholds.chain_mapping (
dict
withstr
as keys/values) – Mapping of model chains (key) onto target chains (value). This is required if target or model have more than one chain.no_interchain (
bool
) – Whether to exclude interchain contactsno_intrachain (
bool
) – Whether to exclude intrachain contacts (i.e. only consider interface related contacts)penalize_extra_chains (
bool
) – Whether to include a fixed penalty for additional chains in the model that are not mapped to the target. ONLY AFFECTS RETURNED GLOBAL SCORE. In detail: adds the number of intra-chain contacts of each extra chain to the expected contacts, thus adding a penalty.residue_mapping (
dict
with key:str
, value:ost.seq.AlignmentHandle
) – By default, residue mapping is based on residue numbers. That means, a model chain and the respective target chain map to the same underlying reference sequence (SEQRES). Alternatively, you can specify one or several alignment(s) between model and target chains by providing a dictionary. key: Name of chain in model (respective target chain is extracted from chain_mapping), value: Alignment with first sequence corresponding to target chain and second sequence to model chain. There is NO reference sequence involved, so the two sequences MUST exactly match the actual residues observed in the respective target/model chains (ATOMSEQ).return_dist_test – Whether to additionally return the underlying per-residue data for the distance difference test. Adds five objects to the return tuple. First: Number of total contacts summed over all thresholds Second: Number of conserved contacts summed over all thresholds Third: list with length of scored residues. Contains indices referring to model.residues. Fourth: numpy array of size len(scored_residues) containing the number of total contacts, Fifth: numpy matrix of shape (len(scored_residues), len(thresholds)) specifying how many for each threshold are conserved.
check_resnames (
bool
) – On by default. Enforces residue name matches between mapped model and target residues.add_mdl_contacts (
bool
) – Adds model contacts - Only using contacts that are within a certain distance threshold in the target does not penalize for added model contacts. If set to True, this flag will also consider target contacts that are within the specified distance threshold in the model but not necessarily in the target. No contact will be added if the respective atom pair is not resolved in the target.interaction_data (
tuple
) – Pro param - don’t useset_atom_props (
bool
) – If True, sets generic properties on a per atom level if local_lddt_prop/local_contact_prop are set as well. In other words: this is the only way you can get per-atom lDDT values.
- Returns:
global and per-residue lDDT scores as a tuple - first element is global lDDT score (None if target has no contacts) and second element a list of per-residue scores with length len(model.residues). None is assigned to residues that are not covered by target. If a residue is covered but has no contacts in target, 0.0 is assigned.
- class SymmetrySettings¶
Container for symmetric compounds
lDDT considers symmetries and selects the one resulting in the highest possible score.
A symmetry is defined as a renaming operation on one or more atoms that leads to a chemically equivalent residue. Example would be OD1 and OD2 in ASP => renaming OD1 to OD2 and vice versa gives a chemically equivalent residue.
Use
AddSymmetricCompound()
to define a symmetry which can then directly be accessed through the symmetric_compounds member.- AddSymmetricCompound(name, symmetric_atoms)¶
Adds symmetry for compound with name
- Parameters:
name (
str
) – Name of compound with symmetrysymmetric_atoms (
list
oftuple
) – Pairs of atom names that define renaming operation, i.e. after applying all switches defined in the tuples, the resulting residue should be chemically equivalent. Atom names must refer to the PDB component dictionary.
- GetDefaultSymmetrySettings()¶
Constructs and returns
SymmetrySettings
object for natural amino acids
- class CustomCompound(atom_names)¶
Defines atoms for custom compounds
lDDT requires the reference atoms of a compound which are typically extracted from a
ost.conop.CompoundLib
. This lightweight container allows to handle arbitrary compounds which are not necessarily in the compound library.- Parameters:
atom_names (
list
ofstr
) – Names of atoms of custom compound
- static FromResidue(res)¶
Construct custom compound from residue
- Parameters:
res (
ost.mol.ResidueView
/ost.mol.ResidueHandle
) – Residue from which reference atom names are extracted, hydrogen/deuterium atoms are filtered out- Returns:
- class lDDTSettings(radius=15, sequence_separation=0, cutoffs=(0.5, 1.0, 2.0, 4.0), label='locallddt')¶
Object containing the settings used for lDDT calculations.
- Parameters:
radius – Sets
radius
.sequence_separation – Sets
sequence_separation
.cutoffs – Sets
cutoffs
.label – Sets
label
.
- radius¶
Distance inclusion radius.
- Type:
float
- sequence_separation¶
Sequence separation.
- Type:
int
- cutoffs¶
List of thresholds used to determine distance conservation.
- Type:
list
offloat
- label¶
The base name for the ResidueHandle properties that store the local scores.
- Type:
str
- PrintParameters()¶
Print settings.
- ToString()¶
- Returns:
String representation of the lDDTSettings object.
- Return type:
str
GDT - Global Distance Test¶
Implements the GDT score, i.e. identifies the largest number of positions that can be superposed within a given distance threshold. The final GDT score is then the returned number divided by the total number of reference positioons. The algorithm is similar to what is described for the LGA tool but simpler. Therefore, the fractions reported by OpenStructure tend to be systematically lower. For benchmarking we computed the full GDT_TS, i.e. average GDT for distance thresholds [1, 2, 4, 8], on all CASP15 TS models. 96.5% of differences to the LGA results from the predictioncenter are within 2 GDT points and 99.2% are within 3 GDT points. The max difference is 7.39 GDT points.
The algorithm expects two position lists of same length and applies a sliding window with specified length to define a subset of position pairs as starting point for iterative superposition. Each iterative superposition applies the following steps:
Compute minimal RMSD superposition on subset of position pairs
Apply superposition on all model positions
Compute pairwise distances of all model positions and reference positions
Define new subset of position pairs: pairs within distance threshold
Stop if subset doesn’t change anymore
The subset in any of the iterations which is largest is stored.
This is done for each sliding window position and the largest subset ever observed is reported. To avoid long runtimes for large problem sizes, the sliding window is not applied on each possible position but is capped. If the number of positions is larger than this threshold, the sliding window is only applied on N equidistant locations.
- GDT(mdl_pos, ref_pos, window_size, max_windows, distance_thresh)¶
Returns number of positions that can be superposed within distance_thresh and the respective transformation matrix.
- Parameters:
mdl_pos (
ost.geom.Vec3List
) – Positions representing the model, typically alpha-carbon positionsref_pos (
ost.geom.Vec3List
) – Positions representing the reference, typically alpha-carbon positionswindow_size (
int
) – Size of the sliding window that is used to serve as starting point for iterative superposition. The described benchmark was done with a value of 7.max_windows (
int
) – Cap for number of starting points. The described benchmark was done with a value of 1000.distance_thresh (
float
) – Distance threshold for GDT algorithm- Returns:
tuple
with first element being the number of superposable positions (int
) and the second element the transformation matrix (ost.geom.Mat4
)
Steric Clashes¶
The following function detects steric clashes in atomic structures. Two atoms are clashing if their euclidian distance is smaller than a threshold value (minus a tolerance offset).
- FilterClashes(entity, clashing_distances, always_remove_bb=False)¶
This function filters out residues with non-bonded clashing atoms. If the clashing atom is a backbone atom, the complete residue is removed from the structure, if the atom is part of the sidechain, only the sidechain atoms are removed. This behavior is changed by the always_remove_bb flag: when the flag is set to True the whole residue is removed even if a clash is just detected in the side-chain.
The function returns a view containing all elements (residues, atoms) that have not been removed from the input structure, plus a
ClashingInfo
object containing information about the detected clashes.Two atoms are defined as clashing if their distance is shorter than the reference distance minus a tolerance threshold. The information about the clashing distances and the tolerance thresholds for all possible pairs of atoms is passed to the function as a parameter.
Hydrogen and deuterium atoms are ignored by this function.
- Parameters:
entity (
EntityView
orEntityHandle
) – The input entityclashing_distances (
ClashingDistances
) – information about the clashing distancesalways_remove_bb (
bool
) – if set to True, the whole residue is removed even if the clash happens in the side-chain
- Returns:
A tuple of two elements: The filtered
EntityView
, and aClashingInfo
object
- CheckStereoChemistry(entity, bond_stats, angle_stats, bond_tolerance, angle_tolerance, always_remove_bb=False)¶
This function filters out residues with severe stereo-chemical violations. If the violation involves a backbone atom, the complete residue is removed from the structure, if it involves an atom that is part of the sidechain, only the sidechain is removed. This behavior is changed by the always_remove_bb flag: when the flag is set to True the whole residue is removed even if a violation is just detected in the side-chain.
The function returns a view containing all elements (residues, atoms) that have not been removed from the input structure, plus a
StereoChemistryInfo
object containing information about the detected stereo-chemical violations.A violation is defined as a bond length that lies outside of the range: [mean_length-std_dev*bond_tolerance, mean_length+std_dev*bond_tolerance] or an angle width outside of the range [mean_width-std_dev*angle_tolerance, mean_width+std_dev*angle_tolerance ]. The information about the mean lengths and widths and the corresponding standard deviations is passed to the function using two parameters.
Hydrogen and deuterium atoms are ignored by this function.
- Parameters:
entity (
EntityView
orEntityHandle
) – The input entitybond_stats (
StereoChemicalParams
) – statistics about bond lengthsangle_stats (
StereoChemicalParams
) – statistics about angle widthsbond_tolerance (
float
) – tolerance for bond lengths (in standard deviations)angle_tolerance (
float
) – tolerance for angle widths (in standard deviations)always_remove_bb (
bool
) – if set to True, the whole residue is removed even if the clash happens in the side-chain
- Returns:
A tuple of two elements: The filtered
EntityView
, and aStereoChemistryInfo
object
- class ClashingInfo¶
This object is returned by the
FilterClashes()
function, and contains information about the clashes detected by the function.- GetClashCount()¶
- Returns:
number of clashes between non-bonded atoms detected in the input structure
- GetAverageOffset()¶
- Returns:
a value in Angstroms representing the average offset by which clashing atoms lie closer than the minimum acceptable distance (which of course differs for each possible pair of elements)
- GetClashList()¶
- Returns:
list of detected inter-atomic clashes
- Return type:
list
ofClashEvent
- class ClashEvent¶
This object contains all the information relative to a single clash detected by the
FilterClashes()
function- GetModelDistance()¶
- Returns:
distance (in Angstroms) between the two clashing atoms as observed in the model
- GetAdjustedReferenceDistance()¶
- Returns:
minimum acceptable distance (in Angstroms) between the two atoms involved in the clash, as defined in
ClashingDistances
- class StereoChemistryInfo¶
This object is returned by the
CheckStereoChemistry()
function, and contains information about bond lengths and planar angle widths in the structure that diverge from the parameters tabulated by Engh and Huber in the International Tables of Crystallography. Only elements that diverge from the tabulated value by a minimumnumber of standard deviations (defined when the CheckStereoChemistry function is called) are reported.- GetBadBondCount()¶
- Returns:
number of bonds where a serious violation was detected
- GetBondCount()¶
- Returns:
total number of bonds in the structure checked by the CheckStereoChemistry function
- GetAvgZscoreBonds()¶
- Returns:
average z-score of all the bond lengths in the structure, computed using Engh and Huber’s mean and standard deviation values
- GetBadAngleCount()¶
- Returns:
number of planar angles where a serious violation was detected
- GetAngleCount()¶
- Returns:
total number of planar angles in the structure checked by the CheckStereoChemistry function
- GetAvgZscoreAngles()¶
- Returns:
average z-score of all the planar angle widths, computed using Engh and Huber’s mean and standard deviation values.
- GetBondViolationList()¶
- Returns:
list of bond length violations detected in the structure
- Return type:
list
ofStereoChemicalBondViolation
- GetAngleViolationList()¶
- Returns:
list of angle width violations detected in the structure
- Return type:
list
ofStereoChemicalAngleViolation
- class StereoChemicalBondViolation¶
This object contains all the information relative to a single detected violation of stereo-chemical parameters in a bond length
- GetFirstAtom()¶
- GetSecondAtom()¶
- Returns:
first / second atom of the bond
- Return type:
UniqueAtomIdentifier
- GetBondLength()¶
- Returns:
length of the bond (in Angstroms) as observed in the model
- GetAllowedRange()¶
- Returns:
allowed range of bond lengths (in Angstroms), according to Engh and Huber’s tabulated parameters and the tolerance threshold used when the
CheckStereoChemistry()
function was called- Return type:
tuple
(minimum and maximum)
- class StereoChemicalAngleViolation¶
This object contains all the information relative to a single detected violation of stereo-chemical parameters in a planar angle width
- GetFirstAtom()¶
- GetSecondAtom()¶
- GetThirdAtom()¶
- Returns:
first / second (vertex) / third atom that defines the planar angle
- Return type:
- GetAngleWidth()¶
- Returns:
width of the planar angle (in degrees) as observed in the model
- GetAllowedRange()¶
- Returns:
allowed range of angle widths (in degrees), according to Engh and Huber’s tabulated parameters and the tolerance threshold used when the
CheckStereoChemistry()
function was called- Return type:
tuple
(minimum and maximum)
- class ClashingDistances¶
Object containing information about clashing distances between non-bonded atoms
- ClashingDistances()¶
Creates an empty distance list
- SetClashingDistance(ele1, ele2, clash_distance, tolerance)¶
Adds or replaces an entry in the list
- Parameters:
ele1 – string containing the first element’s name
ele2 – string containing the second element’s name
clash_distance – minimum clashing distance (in Angstroms)
tolerance – tolerance threshold (in Angstroms)
- GetClashingDistance(ele1, ele2)¶
- Returns:
reference distance and a tolerance threshold (both in Angstroms) for two elements
- Return type:
tuple
(minimum clashing distance, tolerance threshold)- Parameters:
ele1 – string containing the first element’s name
ele2 – string containing the second element’s name
- GetAdjustedClashingDistance(ele1, ele2)¶
- Returns:
reference distance (in Angstroms) for two elements, already adjusted by the tolerance threshold
- Parameters:
ele1 – string containing the first element’s name
ele2 – string containing the second element’s name
- GetMaxAdjustedDistance()¶
- Returns:
longest clashing distance (in Angstroms) in the list, after adjustment with tolerance threshold
- IsEmpty()¶
- Returns:
True if the list is empty (i.e. in an invalid, useless state)
- PrintAllDistances()¶
Prints all distances in the list to standard output
- class StereoChemicalParams¶
Object containing stereo-chemical information about bonds and angles. For each item (bond or angle in a specific residue), stores the mean and standard deviation
- StereoChemicalParams()¶
Creates an empty parameter list
- SetParam(item, residue, mean, standard_dev)¶
Adds or replaces an entry in the list
- Parameters:
item – string defining a bond (format: X-Y) or an angle (format: X-Y-Z), where X,Y an Z are atom names
residue – string containing the residue type for this entry
mean – mean bond length (in Angstroms) or angle width (in degrees)
standard_dev – standard deviation of the bond length (in Angstroms) or of the angle width (in degrees)
- IsEmpty()¶
- Returns:
True if the list is empty (i.e. in an invalid, useless state)
- PrintAllParameters()¶
Prints all entries in the list to standard output
- FillClashingDistances(file_content)¶
- FillBondStereoChemicalParams(file_content)¶
- FillAngleStereoChemicalParams(file_content)¶
These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, starting from the content of a parameter file.
- Parameters:
file_content (
list
ofstr
) – list of lines from the parameter file- Return type:
- FillClashingDistancesFromFile(filename)¶
- FillBondStereoChemicalParamsFromFile(filename)¶
- FillAngleStereoChemicalParamsFromFile(filename)¶
These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, starting from a file path.
- Parameters:
filename (
str
) – path to parameter file- Return type:
- DefaultClashingDistances()¶
- DefaultBondStereoChemicalParams()¶
- DefaultAngleStereoChemicalParams()¶
These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, using the default parameter files distributed with OpenStructure.
- Return type:
- ResidueNamesMatch(probe, reference)¶
The function requires a reference structure and a probe structure. The function checks that all the residues in the reference structure that appear in the probe structure (i.e., that have the same ResNum) are of the same residue type. Chains are compared by order, not by chain name (i.e.: the first chain of the reference will be compared with the first chain of the probe structure, etc.)
- Parameters:
probe (
EntityView
) – the structure to testreference (
EntityView
) – the reference structure
- Returns:
True if the residue names are the same, False otherwise
Superposing structures¶
- Superpose(ent_a, ent_b, match='number', atoms='all', iterative=False, max_iterations=5, distance_threshold=3.0)¶
Superposes the model entity onto the reference. To do so, two views are created, returned with the result. atoms describes what goes into these views and match the selection method. For superposition,
SuperposeSVD()
orIterativeSuperposeSVD()
are called (depending on iterative). For matching, the following methods are recognised:number
- select residues by residue number, includes atoms, callsMatchResidueByNum()
index
- select residues by index in chain, includes atoms, callsMatchResidueByIdx()
local-aln
- select residues from a Smith/Waterman alignment, includes atoms, callsMatchResidueByLocalAln()
global-aln
- select residues from a Needleman/Wunsch alignment, includes atoms, callsMatchResidueByGlobalAln()
- Parameters:
ent_a (
EntityView
orEntityHandle
) – The model entity (superposition transform is applied on full entity handle here)ent_b (
EntityView
orEntityHandle
) – The reference entitymatch (
str
) – Method to gather residues/ atomsatoms (
str
,list
,set
) – The subset of atoms to be used in the superpositioniterative (
bool
) – Whether or not to use iterative superpositon.max_iterations (
int
) – Max. number of iterations forIterativeSuperposeSVD()
(only if iterative = True)distance_threshold (
float
) – Distance threshold forIterativeSuperposeSVD()
(only if iterative = True)
- Returns:
An instance of
SuperpositionResult
.
- ParseAtomNames(atoms)¶
Parses different representations of a list of atom names and returns a
set
, understandable byMatchResidueByNum()
. In essence, this function translatesNone to
None
‘all’ to
None
‘backbone’ to
set(['N', 'CA', 'C', 'O'])
‘aname1, aname2’ to
set(['aname1', 'aname2'])
['aname1', 'aname2']
toset(['aname1', 'aname2'])
- Parameters:
atoms (
str
,list
,set
) – Identifier or list of atoms- Returns:
A
set
of atoms.
- MatchResidueByNum(ent_a, ent_b, atoms='all')¶
Returns a tuple of views containing exactly the same number of atoms. Residues are matched by residue number. A subset of atoms to be included in the views can be specified in the atoms argument. Regardless of what the list of atoms says, only those present in two matched residues will be included in the views. Chains are processed in the order they occur in the entities. If ent_a and ent_b contain a different number of chains, processing stops with the lower count.
- Parameters:
ent_a (
EntityView
orEntityHandle
) – The first entityent_b (
EntityView
orEntityHandle
) – The second entityatoms (
str
,list
,set
) – The subset of atoms to be included in the two views.
- Returns:
Two
EntityView
instances with the same amount of residues matched by number. Each residue will have the same number & type of atoms.
- MatchResidueByIdx(ent_a, ent_b, atoms='all')¶
Returns a tuple of views containing exactly the same number of atoms. Residues are matched by position in the chains of an entity. A subset of atoms to be included in the views can be specified in the atoms argument. Regardless of what the list of atoms says, only those present in two matched residues will be included in the views. Chains are processed in order of appearance. If ent_a and ent_b contain a different number of chains, processing stops with the lower count. The number of residues per chain is supposed to be the same.
- Parameters:
ent_a (
EntityView
orEntityHandle
) – The first entityent_b (
EntityView
orEntityHandle
) – The second entityatoms (
str
,list
,set
) – The subset of atoms to be included in the two views.
- Returns:
Two
EntityView
instances with the same amount of residues matched by position. Each residue will have the same number & type of atoms.
- MatchResidueByLocalAln(ent_a, ent_b, atoms='all')¶
Match residues by local alignment. Takes ent_a and ent_b, extracts the sequences chain-wise and aligns them in Smith/Waterman manner using the BLOSUM62 matrix for scoring. Only residues which are marked as
peptide linking
are considered for alignment. The residues of the entities are then matched based on this alignment. Only atoms present in both residues are included in the views. Chains are processed in order of appearance. If ent_a and ent_b contain a different number of chains, processing stops with the lower count.- Parameters:
ent_a (
EntityView
orEntityHandle
) – The first entityent_b (
EntityView
orEntityHandle
) – The second entityatoms (
str
,list
,set
) – The subset of atoms to be included in the two views.
- Returns:
Two
EntityView
instances with the same number of residues. Each residue will have the same number & type of atoms.
- MatchResidueByGlobalAln(ent_a, ent_b, atoms='all')¶
Match residues by global alignment. Same as
MatchResidueByLocalAln()
but performs a global Needleman/Wunsch alignment of the sequences using the BLOSUM62 matrix for scoring.- Parameters:
ent_a (
EntityView
orEntityHandle
) – The first entityent_b (
EntityView
orEntityHandle
) – The second entityatoms (
str
,list
,set
) – The subset of atoms to be included in the two views.
- Returns:
Two
EntityView
instances with the same number of residues. Each residue will have the same number & type of atoms.
- class SuperpositionResult¶
- rmsd¶
RMSD of the superposed entities.
- view1¶
- view2¶
Two
EntityView
used in superposition (not set if methods withVec3List
used).
- fraction_superposed¶
- rmsd_superposed_atoms¶
- ncycles¶
For iterative superposition (
IterativeSuperposeSVD()
): fraction and RMSD of atoms that were superposed with a distance below the given threshold and the number of iteration cycles performed.
- SuperposeSVD(view1, view2, apply_transform=True)¶
- SuperposeSVD(list1, list2)
Superposition of two sets of atoms minimizing RMSD using a classic SVD based algorithm.
Note that the atom positions in the view are taken blindly in the order in which the atoms appear.
- Parameters:
view1 (
EntityView
) – View on the model entityview2 (
EntityView
) – View on the reference entitylist1 (
Vec3List
) – List of atom positions for model entitylist2 (
Vec3List
) – List of atom positions for reference entityapply_transform (
bool
) – If True, the superposition transform is applied to the (full!) entity handle linked to view1.
- Returns:
An instance of
SuperpositionResult
.
- IterativeSuperposeSVD(view1, view2, max_iterations=5, distance_threshold=3.0, apply_transform=True)¶
- IterativeSuperposeSVD(list1, list2, max_iterations=5, distance_threshold=3.0)
Iterative superposition of two sets of atoms. In each iteration cycle, we keep a fraction of atoms with distances below distance_threshold and get the superposition considering only those atoms.
Note that the atom positions in the view are taken blindly in the order in which the atoms appear.
- Parameters:
view1 (
EntityView
) – View on the model entityview2 (
EntityView
) – View on the reference entitylist1 (
Vec3List
) – List of atom positions for model entitylist2 (
Vec3List
) – List of atom positions for reference entitymax_iterations (
int
) – Max. number of iterations to be performeddistance_threshold (
float
) – Distance threshold defining superposed atomsapply_transform (
bool
) – If True, the superposition transform is applied to the (full!) entity handle linked to view1.
- Returns:
An instance of
SuperpositionResult
.- Raises:
Exception if atom counts do not match or if less than 3 atoms.
- CalculateRMSD(view1, view2, transformation=geom.Mat4())¶
- Returns:
RMSD of atom positions (taken blindly in the order in which the atoms appear) in the two given views.
- Return type:
float
- Parameters:
view1 (
EntityView
) – View on the model entityview2 (
EntityView
) – View on the reference entitytransformation (
Mat4
) – Optional transformation to apply on each atom position of view1.
Algorithms on Structures¶
- Accessibility(ent, probe_radius=1.4, include_hydrogens=False, include_hetatm=False, include_water=False, oligo_mode=False, selection='', asa_abs='asaAbs', asa_rel='asaRel', asa_atom='asaAtom', algorithm=NACCESS)¶
Calculates the accesssible surface area for ever atom in ent. The algorithm mimics the behaviour of the bindings available for the NACCESS and DSSP tools and has been tested to reproduce the numbers accordingly.
- Parameters:
ent (
EntityView
/EntityHandle
) – Entity on which to calculate the surfaceprobe_radius (
float
) – Radius of probe to determine accessible surface areainclude_hydrogens (
bool
) – Whether to include hydrogens in the solvent accessibility calculations. By default, every atom with ele=H,D is simply neglected.include_hetatms (
bool
) – Whether to include atoms flagged as hetatms , i.e. ligands, in the solvent accessibility calculations. They are neglected by default.include_water (
bool
) – Whether to include water in the solvent accessibility calculations. By default, every residue with name “HOH” is neglected.oligo_mode (
bool
) – A typical used case of accessibility calculations is to determine the solvent accessibility of a full complex and then the accessibility of each chain individually. Lots of calculations can be cached because only the values of the atoms close to an interface change. This is exactly what happens when you activate the oligo mode. It returns exactly the same value but adds, additionally to the values estimated in full complex, the values from each individual chain as float properties on every residue and atom. Example for atom accessible surface if the according property name is set to “asaAtom”: Accessibility in the full complex is stored as “asaAtom”, the accessibility when only considering that particular chain is stored as “asaAtom_single_chain”. The other properties follow the same naming scheme.selection (
str
) – Selection statement, that gets applied on ent before doing anything. Everything that is not selected is neglected. The default value of “” results in no selection at all.asa_abs (
str
) – Float property name to assign the summed solvent accessible surface from each atom to a residue.asa_rel (
str
) –Float property name to assign the relative solvent accessibility to a residue. This is the absolute accessibility divided by the maximum solvent accessibility of that particular residue. This maximum solvent accessibility is dependent on the chosen
AccessibilityAlgorithm
. Only residues of the 20 standarad amino acids can be handled.In case of the NACCESS algorithm you can expect a value in range [0.0, 100.0] and a value of -99.9 for non standard residues.
In case of the DSSP algorithm you can expect a value in range [0.0, 1.0], no float property is assigned in case of a non standard residue.
asa_atom (
str
) – Float property name to assign the solvent accessible area to each atom.algorithm (
AccessibilityAlgorithm
) – Specifies the used algorithm for solvent accessibility calculations
- Returns:
The summed solvent accessibilty of each atom in ent.
- class AccessibilityAlgorithm¶
The accessibility algorithm enum specifies the algorithm used by the respective tools. Available are:
NACCESS, DSSP
- AssignSecStruct(ent)¶
Assigns secondary structures to all residues based on hydrogen bond patterns as described by DSSP.
- Parameters:
ent (
EntityView
/EntityHandle
) – Entity on which to assign secondary structures
- class FindMemParam¶
Result object for the membrane detection algorithm described below
- axis¶
initial search axis from which optimal membrane slab could be found
- tilt_axis¶
Axis around which we tilt the membrane starting from the initial axis
- tilt¶
Angle to tilt around tilt axis
- angle¶
After the tilt operation we perform a rotation around the initial axis with this angle to get the final membrane axis
- membrane_axis¶
The result of applying the tilt and rotation procedure described above. The membrane_axis is orthogonal to the membrane plane and has unit length.
- pos¶
Real number that describes the membrane center point. To get the actual position you can do: pos * membrane_axis
- width¶
Total width of the membrane in A
- energy¶
Pseudo energy of the implicit solvation model
- membrane_asa¶
Membrane accessible surface area
- membrane_representation¶
Dummy atoms that represent the membrane. This entity is only valid if the according flag has been set to True when calling FindMembrane.
- FindMembrane(ent, assign_membrane_representation=True, fast=False)¶
Estimates the optimal membrane position of a protein by using an implicit solvation model. The original algorithm and the used energy function are described in: Lomize AL, Pogozheva ID, Lomize MA, Mosberg HI (2006) Positioning of proteins in membranes: A computational approach.
There are some modifications in this implementation and the procedure is as follows:
Initial axis are constructed that build the starting point for initial parameter grid searches.
For every axis, the protein is rotated so that the axis builds the z-axis
In order to exclude internal hydrophilic pores, only the outermost atoms with respect the the z-axis enter an initial grid search
The width and position of the membrane is optimized for different combinations of tilt and rotation angles (further described in
FindMemParam
). The top 20 parametrizations (only top parametrization if fast is True) are stored for further processing.
The 20 best membrane parametrizations from the initial grid search (only the best if fast is set to True) enter a final minimization step using a Levenberg-Marquardt minimizer.
- Parameters:
ent (
ost.mol.EntityHandle
/ost.mol.EntityView
) –Entity of a transmembrane protein, you’ll get weird results if this is not the case. The energy term of the result is typically a good indicator whether ent is an actual transmembrane protein. The following float properties will be set on the atoms:
’asaAtom’ on all atoms that are selected with ent.Select(‘peptide=true and ele!=H’) as a result of envoking
Accessibility()
.’membrane_e’ the contribution of the potentially membrane facing atoms to the energy function.
assign_membrane_representation (
bool
) – Whether to construct a membrane representation using dummy atomsfast – If set to false, the 20 best results of the initial grid search undergo a Levenberg-Marquardt minimization and the parametrization with optimal minimized energy is returned. If set to yes, only the best result of the initial grid search is selected and returned after Levenberg-Marquardt minimization.
- Returns:
The results object
- Return type:
Trajectory Analysis¶
This is a set of functions used for basic trajectory analysis such as extracting
positions, distances, angles and RMSDs. The organization is such that most
functions have their counterpart at the individual frame level
so that they can also be called on one frame instead of
the whole trajectory.
All these functions have a “stride” argument that defaults to stride=1, which is used to skip frames in the analysis.
- SuperposeFrames(frames, sel, from=0, to=-1, ref=-1)¶
This function superposes the frames of the given coord group and returns them as a new coord group.
- Parameters:
frames (
CoordGroupHandle
) – The source coord group.sel (
ost.mol.EntityView
) – An entity view containing the selection of atoms to be used for superposition. If set to an invalid view, all atoms in the coord group are used.from – index of the first frame
to – index of the last frame plus one. If set to -1, the value is set to the number of frames in the coord group
ref – The index of the reference frame to use for superposition. If set to -1, the each frame is superposed to the previous frame.
- Returns:
A newly created coord group containing the superposed frames.
- SuperposeFrames(frames, sel, ref_view, from=0, to=-1)
Same as SuperposeFrames above, but the superposition is done on a reference view and not on another frame of the trajectory.
- Parameters:
frames (
CoordGroupHandle
) – The source coord group.sel (
ost.mol.EntityView
) – An entity view containing the selection of atoms of the frames to be used for superposition.ref_view (
ost.mol.EntityView
) – The reference view on which the frames will be superposed. The number of atoms in this reference view should be equal to the number of atoms in sel.from – index of the first frame
to – index of the last frame plus one. If set to -1, the value is set to the number of frames in the coord group
- Returns:
A newly created coord group containing the superposed frames.
- AnalyzeAtomPos(traj, atom1, stride=1)¶
This function extracts the position of an atom from a trajectory. It returns a vector containing the position of the atom for each analyzed frame.
- Parameters:
traj (
CoordGroupHandle
) – The trajectory to be analyzed.atom1 – The
AtomHandle
.stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeCenterOfMassPos(traj, sele, stride=1)¶
This function extracts the position of the center-of-mass of a selection (
EntityView
) from a trajectory and returns it as a vector.- Parameters:
traj (
CoordGroupHandle
) – The trajectory to be analyzed.sele (
EntityView
.) – The selection from which the center of mass is computedstride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeDistanceBetwAtoms(traj, atom1, atom2, stride=1)¶
This function extracts the distance between two atoms from a trajectory and returns it as a vector.
- Parameters:
traj (
CoordGroupHandle
) – The trajectory to be analyzed.atom1 – The first
AtomHandle
.atom2 – The second
AtomHandle
.stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeAngle(traj, atom1, atom2, atom3, stride=1)¶
This function extracts the angle between three atoms from a trajectory and returns it as a vector. The second atom is taken as being the central atom, so that the angle is between the vectors (atom1.pos-atom2.pos) and (atom3.pos-atom2.pos).
- Parameters:
traj (
CoordGroupHandle
) – The trajectory to be analyzed.atom1 – The first
AtomHandle
.atom2 – The second
AtomHandle
.atom3 – The third
AtomHandle
.stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeDihedralAngle(traj, atom1, atom2, atom3, atom4, stride=1)¶
This function extracts the dihedral angle between four atoms from a trajectory and returns it as a vector. The angle is between the planes containing the first three and the last three atoms.
- Parameters:
traj (
CoordGroupHandle
) – The trajectory to be analyzed.atom1 – The first
AtomHandle
.atom2 – The second
AtomHandle
.atom3 – The third
AtomHandle
.atom4 – The fourth
AtomHandle
.stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeDistanceBetwCenterOfMass(traj, sele1, sele2, stride=1)¶
This function extracts the distance between the center-of-mass of two selections (
EntityView
) from a trajectory and returns it as a vector.- Parameters:
traj (
CoordGroupHandle
) – The trajectory to be analyzed.sele1 (
EntityView
.) – The selection from which the first center of mass is computedsele2 (
EntityView
.) – The selection from which the second center of mass is computedstride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeRMSD(traj, reference_view, sele_view, stride=1)¶
This function extracts the rmsd between two
EntityView
and returns it as a vector. The views don’t have to be from the same entity. The reference positions are taken directly from the reference_view, evaluated only once. The positions from the sele_view are evaluated for each frame. If you want to compare to frame i of the trajectory t, first use t.CopyFrame(i) for example:eh = io.LoadPDB(...) t = io.LoadCHARMMTraj(eh, ...) sele = eh.Select(...) t.CopyFrame(0) mol.alg.AnalyzeRMSD(t, sele, sele)
- Parameters:
traj (
CoordGroupHandle
) – The trajectory to be analyzed.reference_view (
EntityView
.) – The selection used as reference structuresele_view (
EntityView
.) – The selection compared to the reference_viewstride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeMinDistance(traj, view1, view2, stride=1)¶
This function extracts the minimal distance between two sets of atoms (view1 and view2) for each frame in a trajectory and returns it as a vector.
- Parameters:
traj (
CoordGroupHandle
) – The trajectory to be analyzed.view1 (
EntityView
.) – The first group of atomsview2 (
EntityView
.) – The second group of atomsstride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeMinDistanceBetwCenterOfMassAndView(traj, view_cm, view_atoms, stride=1)¶
This function extracts the minimal distance between a set of atoms (view_atoms) and the center of mass of a second set of atoms (view_cm) for each frame in a trajectory and returns it as a vector.
- Parameters:
traj (
CoordGroupHandle
) – The trajectory to be analyzed.view_cm (
EntityView
.) – The group of atoms from which the center of mass is takenview_atoms (
EntityView
.) – The second group of atomsstride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeAromaticRingInteraction(traj, view_ring1, view_ring2, stride=1)¶
This function is a crude analysis of aromatic ring interactions. For each frame in a trajectory, it calculates the minimal distance between the atoms in one view and the center of mass of the other and vice versa, and returns the minimum between these two minimal distances. Concretely, if the two views are the heavy atoms of two rings, then it returns the minimal center of mass - heavy atom distance betweent he two rings
- Parameters:
traj (
CoordGroupHandle
) – The trajectory to be analyzed.view_ring1 (
EntityView
.) – First group of atomsview_ring2 (
EntityView
.) – Second group of atomsstride – Size of the increment of the frame’s index between two consecutive frames analyzed.
Mapping functions¶
The following functions help to convert one residue into another by reusing as much as possible from the present atoms. They are mainly meant to map from standard amino acid to other standard amino acids or from modified amino acids to standard amino acids.
- CopyResidue(src_res, dst_res, editor)¶
Copies the atoms of
src_res
todst_res
using the residue names as guide to decide which of the atoms should be copied. Ifsrc_res
anddst_res
have the same name, orsrc_res
is a modified version ofdst_res
(i.e. have the same single letter code),CopyConserved()
will be called, otherwiseCopyNonConserved()
.If a CBeta atom wasn’t already copied from
src_res
, a new one at a reconstructed position will be added todst_res
if it is notGLY
and all backbone positions are available to do it.- Parameters:
src_res (
ResidueHandle
) – The source residuedst_res (
ResidueHandle
) – The destination residue (expected to be a standard amino acid)editor (
XCSEditor
) – Editor used to modify dst_res.
- Returns:
True if the residue could be copied as a conserved residue, False if it had to fallback to
CopyNonConserved()
.
- CopyConserved(src_res, dst_res, editor)¶
Copies the atoms of
src_res
todst_res
assuming that the parent amino acid ofsrc_res
(orsrc_res
itself) are identical todst_res
.If
src_res
anddst_res
are identical, all heavy atoms are copied todst_res
. Ifsrc_res
is a modified version ofdst_res
and the modification is a pure addition (e.g. the phosphate group of phosphoserine), the modification is stripped off and all other heavy atoms are copied todst_res
. If the modification is not a pure addition, it falls back toCopyNonConserved()
.Additionally, the selenium atom of
MSE
is converted to sulphur to mapMSE
toMET
.- Parameters:
src_res (
ResidueHandle
) – The source residuedst_res (
ResidueHandle
) – The destination residue (expected to be a standard amino acid)editor (
XCSEditor
) – Editor used to modify dst_res.
- Returns:
A tuple of bools stating whether the residue could be copied without falling back to
CopyNonConserved()
and whether the CBeta atom was copied fromsrc_res
todst_res
.
- CopyNonConserved(src_res, dst_res, editor)¶
Copies the heavy backbone atoms and CBeta (except for
GLY
) ofsrc_res
todst_res
.- Parameters:
src_res (
ResidueHandle
) – The source residuedst_res (
ResidueHandle
) – The destination residue (expected to be a standard amino acid)editor (
XCSEditor
) – Editor used to modify dst_res.
- Returns:
A tuple of bools as in
CopyConserved()
with the first bool always being False.
Molecular Checker (Molck)¶
Programmatic usage¶
Molecular Checker (Molck) could be called directly from the code using Molck function:
#! /bin/env python
"""Run Molck with Python API.
This is an exemplary procedure on how to run Molck using Python API which is
equivalent to the command line:
molck <PDB PATH> --rm=hyd,oxt,nonstd,unk \
--fix-ele --out=<OUTPUT PATH> \
--complib=<PATH TO compounds.chemlib>
"""
from ost.io import LoadPDB, SavePDB
from ost.mol.alg import MolckSettings, Molck
from ost.conop import CompoundLib
pdbid = "<PDB PATH>"
lib = CompoundLib.Load("<PATH TO compounds.chemlib>")
# Using Molck function
ent = LoadPDB(pdbid)
ms = MolckSettings(rm_unk_atoms=True,
rm_non_std=True,
rm_hyd_atoms=True,
rm_oxt_atoms=True,
rm_zero_occ_atoms=False,
colored=False,
map_nonstd_res=False,
assign_elem=True)
Molck(ent, lib, ms)
SavePDB(ent, "<OUTPUT PATH>")
It can also be split into subsequent commands for greater controll:
#! /bin/env python
"""Run Molck with Python API.
This is an exemplary procedure on how to run Molck using Python API which is
equivalent to the command line:
molck <PDB PATH> --rm=hyd,oxt,nonstd,unk \
--fix-ele --out=<OUTPUT PATH> \
--complib=<PATH TO compounds.chemlib>
"""
from ost.io import LoadPDB, SavePDB
from ost.mol.alg import (RemoveAtoms, MapNonStandardResidues,
CleanUpElementColumn)
from ost.conop import CompoundLib
pdbid = "<PDB PATH>"
lib = CompoundLib.Load("<PATH TO compounds.chemlib>")
map_nonstd = False
# Using function chain
ent = LoadPDB(pdbid)
if map_nonstd:
MapNonStandardResidues(lib=lib, ent=ent)
RemoveAtoms(lib=lib,
ent=ent,
rm_unk_atoms=True,
rm_non_std=True,
rm_hyd_atoms=True,
rm_oxt_atoms=True,
rm_zero_occ_atoms=False,
colored=False)
CleanUpElementColumn(lib=lib, ent=ent)
SavePDB(ent, "<OUTPUT PATH>")
API¶
- class MolckSettings(rm_unk_atoms=False, rm_non_std=False, rm_hyd_atoms=True, rm_oxt_atoms=False, rm_zero_occ_atoms=False, colored=False, map_nonstd_res=True, assign_elem=True)¶
Stores settings used for Molecular Checker.
- Parameters:
rm_unk_atoms – Sets
rm_unk_atoms
.rm_non_std – Sets
rm_non_std
.rm_hyd_atoms – Sets
rm_hyd_atoms
.rm_oxt_atoms – Sets
rm_oxt_atoms
.rm_zero_occ_atoms – Sets
rm_zero_occ_atoms
.colored – Sets
colored
.map_nonstd_res – Sets
map_nonstd_res
.assign_elem – Sets
assign_elem
.
- rm_unk_atoms¶
Remove unknown and atoms not following the nomenclature.
- Type:
bool
- rm_non_std¶
Remove all residues not one of the 20 standard amino acids
- Type:
bool
- rm_hyd_atoms¶
Remove hydrogen atoms
- Type:
bool
- rm_oxt_atoms¶
Remove terminal oxygens
- Type:
bool
- rm_zero_occ_atoms¶
Remove atoms with zero occupancy
- Type:
bool
- colored¶
Whether output should be colored
- Type:
bool
- map_nonstd_res¶
Maps modified residues back to the parent amino acid, for example MSE -> MET, SEP -> SER
- Type:
bool
- assign_elem¶
Clean up element column
- Type:
bool
- ToString()¶
- Returns:
String representation of the MolckSettings.
- Return type:
str
Warning
The API here is set such that the functions modify the passed structure ent in-place. If this is not ok, please work on a copy of the structure.
- Molck(ent, lib, settings[, prune=True])¶
Runs Molck on provided entity. Reprocesses ent with
ost.conop.HeuristicProcessor
and given lib once done.- Parameters:
ent (
EntityHandle
) – Structure to checklib (
CompoundLib
) – Compound librarysettings (
MolckSettings
) – Molck settingsprune (
bool
) – Whether to remove residues/chains that don’t contain atoms anymore after Molck cleanup
- MapNonStandardResidues(ent, lib, reprocess=True)¶
Maps modified residues back to the parent amino acid, for example MSE -> MET.
- Parameters:
ent (
EntityHandle
) – Structure to checklib (
CompoundLib
) – Compound libraryreprocess – The function generates a deep copy of ent. Highly recommended to enable reprocess that runs
ost.conop.HeuristicProcessor
with given lib. If set to False, you’ll have no connectivity etc. after calling this function.
- RemoveAtoms(ent, lib, rm_unk_atoms=False, rm_non_std=False, rm_hyd_atoms=True, rm_oxt_atoms=False, rm_zero_occ_atoms=False, colored=False,
- reprocess=True)
Removes atoms and residues according to some criteria.
- Parameters:
ent (
EntityHandle
) – Structure to checklib (
CompoundLib
) – Compound libraryrm_unk_atoms – See
MolckSettings.rm_unk_atoms
rm_non_std – See
MolckSettings.rm_non_std
rm_hyd_atoms – See
MolckSettings.rm_hyd_atoms
rm_oxt_atoms – See
MolckSettings.rm_oxt_atoms
rm_zero_occ_atoms – See
MolckSettings.rm_zero_occ_atoms
colored – See
MolckSettings.colored
reprocess – Removing atoms may impact certain annotations on the structure (chem class etc.) which are set by
ost.conop.Processor
. If set to True, aost.conop.HeuristicProcessor
with given lib reprocesses ent.
- CleanUpElementColumn(ent, lib)¶
Clean up element column.
- Parameters:
ent (
EntityHandle
) – Structure to checklib (
CompoundLib
) – Compound library
Biounits¶
Biological assemblies, i.e. biounits, are an integral part of mmCIF files and
their construction is fully defined in ost.io.MMCifInfoBioUnit
.
ost.io.MMCifInfoBioUnit.PDBize()
provides one possibility to construct
such biounits with compatibility with the PDB format in mind. That is single
character chain names, dumping all ligands in one chain etc. Here we provide a
more mmCIF-style way of constructing biounits. This can either be done starting
from a ost.io.MMCifInfoBioUnit
or the derived
ost.mol.alg.BUInfo
. The latter is a minimalistic representation of
ost.io.MMCifInfoBioUnit
and can be serialized to a byte string.
- BUInfo(mmcif_buinfo):
Preprocesses data from
ost.io.MMCifInfoBioUnit
that are required to construct a biounit from an assymetric unit. Can be serialized.- Parameters:
mmcif_buinfo (
ost.io.MMCifInfoBioUnit
) – Biounit definition
- ToBytes()¶
- Returns:
A byte string from which the object can be reconstructed.
- CreateBU(asu, bu_info)¶
Constructs a biounit given an assymetric unit and transformation information. The following properties are copied from the assymetric unit and are expected to be set (this is the case if you use
ost.io.LoadMMCIF()
for the assymetric unit):Chain level:
Chain type (see
ost.mol.ChainHandle.type
)
Residue level:
Chem type (see
ost.mol.ResidueHandle.chem_type
)Chem class (
ost.mol.ResidueHandle.chem_class
)One letter code (see
ost.mol.ResidueHandle.one_letter_code
)Secondary structure (see
ost.mol.ResidueHandle.sec_structure
)IsProtein and IsLigand properties (see
ost.mol.ResidueHandle.is_protein
/ost.mol.ResidueHandle.is_ligand
)
Each chain in the returned biounit can be referenced back to the assymetric unit as they follow a standardised naming scheme: <idx>.<asu_cname>, where asu_cname is the chain name in the assymetric unit and idx is the nth occurence of that chain in the biounit with one based indexing. There is a quirk though. An index of 1, for example 1.A, is reserved for the original AU chain with identity transform (read: no transform) applied. If a certain AU chain only occurs with an actual transform applied, numbering starts at 2.
Warning
There is the (rare) possibility that a AU chain that has only identity transform applied is not named 1.<au_cname>. As of january 2024, there are 3 pdb entries (8qn6, 8x1h, 2c0x) where the same AU chain with identity transform occurs several times in the same biounit. This is likely an error in the respective mmCIF files as the resulting chains sit on top of each other. OST just names the FIRST occurence as 1.<au_cname>.
- Parameters:
asu (
ost.mol.EntityHandle
) – The assymetric unitbu_info (
MMCifInfoBioUnit
/BUInfo
) – Info object
- Returns:
A
ost.mol.EntityHandle
of the requested biounit