Structural Data

The StructureDB serves as a container for structural backbone and sequence data. Custom accessor objects can be implemented that relate arbitrary features to structural data. Examples provided by ProMod3 include accession using matching stem geometry (see: FragDB) or sequence features (see: Fragger). Besides backbone and sequence data, derived features can optionally be stored. E.g. sequence profiles or secondary structure information. Optional data includes:

  • The phi/psi dihedral angles

  • The secondary structure state as defined by dssp

  • The solvent accessibility in square Angstrom

  • The amino acid frequencies as given by an input sequence profile

  • The residue depth - The residue depth is defined as the minimum distance of a residue towards any of the exposed residues. Distances are calculated using CB positions (artificially constructed in case of glycine) and exposed is defined as: relative solvent accessibility > 25% and at least one atom being exposed to the OUTER surface. To determine whether an atom is part of that outer surface, the full structure is placed into a 3D grid and a flood fill algorithm is used to determine the atoms of interest. Internal cavities are excluded by using this approach. This is a simplified version of the residue depth as discussed in [chakravarty1999] and gets directly calculated when structural information is added to the StructureDB.

  • The amino acid frequency derived from structural alignments as described in [zhou2005] - Since the calculation of such a profile already requires a StructureDB, we end up in a hen and egg problem here… When adding structural information to the StructureDB, the according memory gets just allocated and set to zero. The usage of this information is therefore only meaningful if you calculate these profiles and manually set them (or load the provided default database).

Defining Chains and Fragments

class promod3.loop.CoordInfo

The CoordInfo gets automatically generated when new chains are added to a StructureDB. It contains internal information of how a connected stretch of residues is stored in the database.

id

An id string specified when adding the corresponding stretch to the structure db

chain_name

A chain name string specified when adding the corresponding stretch to the structure db

offset

All residues of the added stretch are stored in a linear memory layout. The offset parameter tells us where it exactly starts in the global data structure. (int)

size

The number of residues in that stretch (int)

start_resnum

Residue number of first residue in the added stretch. The residue number is relative to the SEQRES provided in the input profile when adding the stuff to the structure db. (int)

shift

Translation from original coordinates that has been applied before storing structural information in db. (ost.geom.Vec3)

class promod3.loop.FragmentInfo(chain_index, offset, length)

The FragmentInfo defines any fragment in the StructureDB. If you implement your own accessor object, thats the information you want to store.

Parameters:
chain_index

The index of the chain (defined by CoordInfo) in the StructureDB this particle belongs to. (int)

offset

Index of residue in chain the fragment starts. (0-based, int)

length

Length of the fragment (int)

The Structure Database

The following code example demonstrates how to create a structural database and fill it with content.

from promod3 import loop
from ost import io, seq
import os

# StructureDB where all data get extracted
structure_db_one = loop.StructureDB(loop.StructureDBDataType.All)

# StructureDB where we only have the default data 
# (positions and sequence) plus residue depths and dihedrals.
# In order to pass the required flags, we use a bitwise or.
structure_db_two = loop.StructureDB(
                   loop.StructureDBDataType.ResidueDepths |
                   loop.StructureDBDataType.Dihedrals)

# Lets fill in some structures. It gets assumed, that all required
# data lies in the following directories.
structure_dir = "data"
prof_dir = "data"

# The naming of the files in the directories is e.g. 1CRN.pdb for 
# the structure and 1CRNA.hhm for the profile. 
# The structure possibly contain several chains, whereas the hhm 
# file is only for that specific chain.
structure_ids = ["1CRN", "1AKI"]
chain_names = ["A", "A"]

for s_id, ch_name in zip(structure_ids, chain_names):

    # Join together the data paths.
    structure_path = os.path.join(structure_dir, s_id + ".pdb")
    prof_path = os.path.join(prof_dir, s_id + ch_name + ".hhm")

    # Let's load the structure.
    structure = io.LoadPDB(structure_path).Select("peptide=True")
    
    # And the according profile in hhm format.
    prof = io.LoadSequenceProfile(prof_path)

    # For simplicity we use as SEQRES the sequence from the profile.
    # In this case the numbering of the structures already matches.
    seqres = seq.CreateSequence(ch_name, prof.sequence)

    # Add the stuff to the first StructureDB
    structure_db_one.AddCoordinates(s_id, ch_name, structure, 
                                    seqres, prof)

    # Add the stuff to the second StructureDB, 
    # No profile required here...
    structure_db_two.AddCoordinates(s_id, ch_name, structure, 
                                    seqres)

                                
# We now have two structures in both databases...
# Lets get a summary of whats actually in there
structure_db_one.PrintStatistics()
structure_db_two.PrintStatistics()

# There is no profile derived from structures assigned to 
# structure_db_one yet, the memory is only allocated and set to 
# zero. In structure_db_two, there'll never be stored a structure 
# profile as we did not initialize it accordingly. 
# However, we can still use its coordinates and residue depths to
# generate profiles!  
# To demonstrate, we use our structure_db_two to derive profiles 
# and set them in structure_db_one.

for i in range(structure_db_one.GetNumCoords()):

    # extract all required information
    bb_list = structure_db_one.GetBackboneList(i)
    res_depths = structure_db_one.GetResidueDepths(i)

    # generate structure profiles based on structure_db_two
    prof = structure_db_two.GenerateStructureProfile(bb_list, 
                                                     res_depths)
    # and add it to the previously created structure_db
    structure_db_one.SetStructureProfile(i, prof)

# That's it! Let's save both databases down.
# structure_db_two will use much less memory, as it contains less data. 
# Sidenote: We're saving the portable version. If you intent to use 
# your database only on one system, the Save / Load functions should
# be preferred, as the loading will be much faster.
structure_db_one.SavePortable("my_db_one.dat")
structure_db_two.SavePortable("my_db_two.dat")

Calculating the structural profiles is expensive and heavily depends on the size of the database used as source. If you want to do this for a larger database, you might want to consider two things:

  1. Use a database of limited size to generate the actual profiles (something in between 5000 and 10000 nonredundant chains is enough)

  2. Use the ost.seq.ProfileDB to gather profiles produced from jobs running in parallel

class promod3.loop.StructureDBDataType

The StructureDBDataType enum has to be passed at initialization of a StructureDB in order to define what data you want to store additionally to backbone coordinates and sequence. For the bare minimum (only backbone coordinates and sequence), use Minimal. If you want to store all data possible, use All. If you only want a subset, you can combine some of the datatypes with a bitwise or operation (see example script for StructureDB). One important note: If you enable AAFrequenciesStruct, the actual information is not automatically assigned. Only the according memory is allocated and set to zero, the actual information must be assigned manually (see example script again…).

Minimal, All, Dihedrals, SolventAccessibilities, ResidueDepths, DSSP, AAFrequencies, AAFrequenciesStruct

class promod3.loop.StructureDB(data_to_store)

Generates an empty StructureDB that can be filled with content through AddCoordinates(). The information extracted there is defined by data_to_store. Have a look at the StructureDBDataType documentation and at the example script…

Parameters:

data_to_store (StructureDBDataType) – Specifies what data to store in the database, several flags can be combined with a bitwise or operator.

static Load(filename)
static LoadPortable(filename)

Loads raw binary file generated with Save() (optimized for fast reading) / portable file generated with SavePortable() (slower but less machine-dependent).

Parameters:

filename (str) – Path to the file from which to load the database.

Returns:

The loaded data base

Return type:

StructureDB

Raises:

RuntimeError if file cannot be opened or if file cannot be parsed (see here for details).

Save(filename)
SavePortable(filename)

Saves a raw / portable binary representation. Use portable files for distribution and convert locally to raw files. See here for details.

Parameters:

filename (str) – Path to the file where the database will be saved

Raises:

RuntimeError if file cannot be opened

HasData(data_type)

Checks, whether requested data type is available in the current database.

Parameters:

data_type (StructureDBDataType) – Data type to check

Returns:

Whether the requested datatype is available

Return type:

bool

AddCoordinates(id, chain_name, ent, seqres, prof=None, only_longest_stretch=True)

This method takes an entity and adds coordinates and the sequence of one of its chains to the structural database. Additionally, all data as specified at the initialization of the database is extracted fully automatically by considering the full ent (e.g. when calculating solvent accessibilities etc.). The only exception is AAFrequencies, where a valid sequence profile is expected in prof that has matching sequence with seqres All residues in chain with name chain_name must have residue numbers that directly relate them to the seqres with an indexing scheme starting from one. If this is not the case, an error gets thrown. You might want to consider to use ost.seq.Renumber() for proper numbering. Based on consecutive numbering and additionally checking for valid peptide bonds, connected stretches are identified and every added connected stretch gets its own entry with CoordInfo as a descriptor. To avoid cluttering the database with many small fragments, the flag: only_longest_stretch can be used. Set it to False if all connected stretches of chain with name chain_name should be added. There is one final catch you have to consider: Due to the internal lossy data compression for the positions, the extent in x, y and z - direction for every connected stretch is limited to 655A. This should be sufficient for most structures, but stretches exceeding this maximum are discarded. For storing the structural data given these restraints, a translation is applied that gets stored as the shift attribute in the according CoordInfo object.

Parameters:
  • id (str) – identifier of the added structure (e.g. pdb id)

  • chain_name (str) – Name of the chain in ent you want to add

  • ent (ost.mol.EntityHandle / ost.mol.EntityView) – The full entity that must contain a chain named as specified by chain_name.

  • seqres (ost.seq.SequenceHandle) – The reference sequence of chain with name chain_name

  • prof (ost.seq.ProfileHandle) – Profile information for the chain with name chain_name. The profile sequence must match seqres.

  • only_longest_stretch – Flag whether you want to add only the longest connected stretch of residues are all connected stretches of residues

Returns:

indices of added stretches in db

Return type:

list of int

Raises:

RuntimeError if the residues in chain with name chain_name do not match seqres given the residue numbers, when AAFrequencies have to to be extracted and the sequence in prof does not match the seqres or prof is invalid.

RemoveCoordinates(coord_idx)

Removes coordinates at specified location and all its associated data. This has an impact on the offset values of all CoordInfo objects that are internally stored afterwards and on the actual coord indices (all shifted by one). So make sure that you adapt your data access accordingly!

Parameters:

coord_idx (int) – Specifies coordinates to be removed

Raises:

RuntimeError if coord_idx is invalid

GetCoordIdx(id, chain_name)
Returns:

The StructureDB indices (in [0, GetNumCoords()-1]) of all coords (connected stretches) with matching id / chain_name.

Return type:

list of int

Parameters:
GetCoordInfo(idx)
Returns:

Object describing the stretch of connected residues with index idx.

Return type:

CoordInfo

Parameters:

idx (int) – The StructureDB index (in [0, GetNumCoords()-1])

GetNumCoords()
Returns:

Number of connected stretches of residues that have been added to the database.

Return type:

int

PrintStatistics()

Prints out some information about the db.

GetBackboneList(fragment, sequence)
GetBackboneList(n_stem, c_stem, fragment, sequence='')
GetBackboneList(coord_idx, sequence='')
GetBackboneList(n_stem, c_stem, coord_idx, sequence='')
Returns:

Backbone list with positions extracted from fragment or full entry at coord_idx

Return type:

BackboneList

Parameters:
  • fragment (FragmentInfo) – Fragment definition from which to extract positions.

  • coord_idx (int) – Idx of entry from which to extract positions.

  • sequence (str) – Sequence of the returned backbone list. If not set, the original sequence at specified location in the database is used.

  • n_stem (ost.mol.ResidueHandle) – Positions on which the backbone list’s N-terminus should be superposed onto.

  • c_stem (ost.mol.ResidueHandle) – Positions on which the backbone list’s C-terminus should be superposed onto.

Raises:

RuntimeError if the length of sequence does not match with the desired backbone list, if sequence contains a character which does not belong to the 20 proteinogenic amino acids or if fragment or coord_idx is invalid. Fragment can be invalid when it does not fully fit into one of the connected stretches of residues in the database.

GetSequence(fragment)
GetSequence(coord_idx)
Returns:

The sequence of fragment or full entry at coord_idx

Return type:

str

Parameters:
  • fragment (FragmentInfo) – Fragment definition from which to extract the sequence.

  • coord_idx (int) – Idx of entry from which to extract the sequence

Raises:

RuntimeError if fragment or coord_idx is invalid. Fragment can be invalid when it does not fully fit into one of the connected stretches of residues in the database.

GetDSSPStates(fragment)
GetDSSPStates(coord_idx)
Returns:

The dssp states of fragment or full entry at coord_idx

Return type:

str

Parameters:
  • fragment (FragmentInfo) – Fragment definition from which to extract the states.

  • coord_idx (int) – Idx of entry from which to extract the dssp states

Raises:

RuntimeError if database does not contain dssp data or if fragment/ coord_idx is invalid. Fragment can be invalid when it does not fully fit into one of the connected stretches of residues in the database.

GetDihedralAngles(fragment)
GetDihedralAngles(coord_idx)
Returns:

The phi and psi dihedral angles of every residue of fragment or full entry at coord_idx

Return type:

list of pairs (tuple)

Parameters:
  • fragment (FragmentInfo) – Fragment definition from which to extract the dihedrals.

  • coord_idx (int) – Idx of entry from which to extract the dihedral angles

Raises:

RuntimeError if database does not contain dihedral angle data or if fragment/ coord_idx is invalid. Fragment can be invalid when it does not fully fit into one of the connected stretches of residues in the database.

GetResidueDepths(fragment)
GetResidueDepths(coord_idx)
Returns:

Residue depth for each residue of fragment or full entry at coord_idx

Return type:

list of float

Parameters:
  • fragment (FragmentInfo) – Fragment definition from which to extract the residue depths

  • coord_idx (int) – Idx of entry from which to extract the residue depths

Raises:

RuntimeError if database does not contain residue depth data or if fragment/ coord_idx is invalid. Fragment can be invalid when it does not fully fit into one of the connected stretches of residues in the database.

GetSolventAccessibilitites(fragment)
GetSolventAccessibilitites(coord_idx)
Returns:

Solvent accessibility for each residue of fragment or full entry at coord_idx in square A as calculated by Accessibility() when adding the structure to the database.

Return type:

list of float

Parameters:
  • fragment (FragmentInfo) – Fragment definition from which to extract the solvent accessibilities

  • coord_idx (int) – Idx of entry from which to extract the solvent accessibilities

Raises:

RuntimeError if database does not contain solvent accessibility data or if fragment/ coord_idx is invalid. Fragment can be invalid when it does not fully fit into one of the connected stretches of residues in the database.

GetSequenceProfile(fragment)
GetSequenceProfile(coord_idx)
Returns:

The sequence profile for the residues defined by fragment or full entry at coord_idx with the BLOSUM62 probabilities as NULL model.

Return type:

ost.seq.ProfileHandle

Parameters:
  • fragment (FragmentInfo) – Fragment definition from which to extract the sequence profile

  • coord_idx (int) – Idx of entry from which to extract the sequence profile

Raises:

RuntimeError if database does not contain sequence profile data or if fragment/ coord_idx is invalid. Fragment can be invalid when it does not fully fit into one of the connected stretches of residues in the database.

GetStructureProfile(fragment)
GetStructureProfile(coord_idx)
Returns:

The structure profile for the residues defined by fragment or full entry at coord_idx with the BLOSUM62 probabilities as NULL model.

Return type:

ost.seq.ProfileHandle

Parameters:
  • fragment (FragmentInfo) – Fragment definition from which to extract the structure profile

  • coord_idx (int) – Idx of entry from which to extract the structure profile

Raises:

RuntimeError if database does not contain structure profile data or if fragment/ coord_idx is invalid. Fragment can be invalid when it does not fully fit into one of the connected stretches of residues in the database.

GenerateStructureProfile(bb_list, residue_depths)

Calculates a structure profile for bb_list with given residue_depths using the full internal data of this StructureDB.

Parameters:
  • bb_list (BackboneList) – Positions for which to calculate the structural profile

  • residue_depths (list of float) – The residue depth for each residue in bb_list as you would extract it from any StructureDB containing that data.

Returns:

The structure profile for the input with the BLOSUM62 probabilities as NULL model.

Return type:

ost.seq.ProfileHandle

Raises:

RuntimeError if bb_list and residue_depths differ in size, when their size is 0 or when database does not contain residue depth data.

SetStructureProfile(coord_idx, prof)

Takes the prof and sets the corresponding StructureProfile frequencies in entry with coord_idx

Parameters:
Raises:

RuntimeError if coord_idx does not match any entry in the db, when the size of the prof does not exactly match the size of entry at coord_idx or when database does not contain aa frequency struct data.

GetSubDB(indices)
Returns:

A new database containing only the structural infos specified by your input. This might be useful if you’re testing stuff and want to make sure that you have no close homologue in the database.

Return type:

StructureDB

Parameters:

indices (list) – Indices of chains to be added to the sub database (in [0, GetNumCoords()-1])

Raises:

RuntimeError if you provide an invalid index

Finding Fragments based on Geometric Features

The fragment database allows to organize, search and access the information stored in a structural database (StructureDB). In its current form it groups fragments in bins according to their length (incl. stems) and the geometry of their N-stem and C-stem (described by 4 angles and the distance between the N-stem C atom and the C-stem N atom). It can therefore be searched for fragments matching a certain geometry of N and C stems. The bins are accessed through a hash table, making searching the database ultra fast.

This example illustrates how to create a custom FragDB based on a StructureDB:

from ost import io
from promod3 import loop

# let's load the default structure_db 
structure_db = loop.StructureDB.LoadPortable("data/port_str_db.dat")
# in practice you might want to use the default StructureDB instead:
# structure_db = loop.LoadStructureDB()

# and our beloved crambin...
structure = io.LoadPDB('data/1CRN.pdb')

# we now want to connect the residue with index 17 and 21
sub_res_list = structure.residues[17:22]
n_stem = sub_res_list[0]
c_stem = sub_res_list[-1]
frag_seq = ''.join([r.one_letter_code for r in sub_res_list])

# a custom FragDB can be built to identify fragments
# fulfilling these particular geometric constraints
frag_db = loop.FragDB(1.0, 20)

# at this point we add all possible fragments of length 5 
# with an RMSD threshold of 0.5
frag_db.AddFragments(5, 0.5, structure_db)

# the FragDB can now be used to extract FragmentInfo objects to
# finally query the StructureDB
fragment_infos = frag_db.SearchDB(n_stem, c_stem, 5)

# get the fragments in form of BackboneList objects and store them
for i,f_i in enumerate(fragment_infos):
    bb_list = structure_db.GetBackboneList(n_stem, c_stem,
                                           f_i, frag_seq)
    io.SavePDB(bb_list.ToEntity(), str(i) + ".pdb")
class promod3.loop.FragDB(dist_bin_size, angle_bin_size)
Parameters:
  • dist_bin_size (float) – Size of the distance parameter binning in A

  • angle_bin_size (int) – Size of the angle parameter binning in degree

static Load(filename)
static LoadPortable(filename)

Loads raw binary file generated with Save() (optimized for fast reading) / portable file generated with SavePortable() (slower but less machine-dependent).

Parameters:

filename (str) – Path to the file from which to load the database.

Returns:

The loaded database

Return type:

FragDB

Raises:

RuntimeError if file cannot be opened or if file cannot be parsed (see here for details).

Save(filename)
SavePortable(filename)

Saves a raw / portable binary representation. Use portable files for distribution and convert locally to raw files. See here for details.

Parameters:

filename (str) – path to the file where the database will be saved

Raises:

RuntimeError if file cannot be opened.

GetAngularBinSize()

The size of the bins for the 4 angles describing the stem geometry and used to organize the fragments in the database.

Returns:

The bin size in degrees

Return type:

int

GetDistBinSize()

The size of the bins for the distance describing the stem geometry and used to organize the fragments in the database.

Returns:

The bin size

Return type:

float

AddFragments(fragment_length, rmsd_cutoff, structure_db)

Iterates over all fragments of length fragment_length in structure_db and adds them to the fragment database. Fragments will be skipped if there is already a fragment in the database that has an RMSD smaller than rmsd_cutoff, where RMSD is calculated upon superposing the stem residues. As the fragments are added they are organized in bins described by their length and the geometry of their N and C stem.

Parameters:
  • fragment_length (int) – The length of the fragments that should be added to the databse

  • rmsd_cutoff (float) – The minimal RMSD between two fragments in the fragment database

  • structure_db (StructureDB) – Database delivering the structural info

PrintStatistics()

Prints statistics about the fragment database, notably:

  1. the number of different stem groups (number of bins used to group the fragments according to the geometry of their stem residues)

  2. The total number of fragments in the database

  3. The minimal and maximal number of fragments found in a stem group.

GetNumStemPairs()
GetNumStemPairs(loop_length)

Returns the number of stem groups (number of bins used to group the fragments according to the geometry of their stem residues) for the whole db or for fragments of a given length.

Parameters:

loop_length (int) – The length of the fragments

Returns:

The number of groups

Return type:

int

GetNumFragments()
GetNumFragments(loop_length)

Returns the number of fragments in the database in total or for fragments of a given length.

Parameters:

loop_length (int) – The length of the fragments

Returns:

Number of fragments

Return type:

int

HasFragLength(loop_length)
Parameters:

loop_length (int) – The length of the fragments

Returns:

True if fragments of given length exist.

Return type:

bool

MaxFragLength()
Returns:

Maximal fragment length contained in db.

Return type:

int

SearchDB(n_stem, c_stem, frag_size, extra_bins=0)

Search the database for fragments matching the geometry of the n_stem and c_stem and of the same length as the frag_size.

Parameters:
  • n_stem (ost.mol.ResidueHandle) – The N-stem

  • c_stem (ost.mol.ResidueHandle) – The C-stem

  • frag_size (int) – Number of residues of the fragment

  • extra_bins (int) – Whether to extend the search to include fragments from extra_bins additional bins surrounding the bin given by the n_stem and c_stem geometry. If odd, we extend to the closer bin, otherwise symmetrically.

Returns:

A list of FragmentInfo objects. These objects are related to the structural database with which you called the AddFragments function.

Finding Fragments based on Sequence Features

In some cases you might want to use the StructureDB to search for fragments that possibly represent the structural conformation of interest. The Fragger searches a StructureDB for n fragments, that maximize a certain score and gathers a set of fragments with a guaranteed structural diversity based on an rmsd threshold. You can use the Fragger wrapped in a full fletched pipeline implemented in FraggerHandle or search for fragments from scratch using an arbitrary linear combination of scores:

  • SeqID: Calculates the fraction of amino acids being identical when comparing a potential fragment from the StructureDB and the target sequence

  • SeqSim: Calculates the avg. substitution matrix based sequence similarity of amino acids when comparing a potential fragment from the StructureDB and the target sequence

  • SSAgree: Calculates the avg. agreement of the predicted secondary structure by PSIPRED [Jones1999] and the dssp [kabsch1983] assignment stored in the StructureDB. The Agreement term is based on a probabilistic approach also used in HHSearch [soding2005].

  • TorsionProbability: Calculates the avg. probability of observing the phi/psi dihedral angles of a potential fragment from the StructureDB given the target sequence. The probabilities are extracted from the TorsionSampler class.

  • SequenceProfile: Calculates the avg. profile score between the amino acid frequencies of a potential fragment from the StructureDB and a target profile assuming a gapfree alignment in between them. The scores are calculated as L1 distances between the profile columns.

  • StructureProfile: Calculates the avg. profile score between the amino acid frequencies of a potential fragment from the StructureDB and a target profile assuming a gapfree alignment in between them. The scores are calculated as L1 distances between the profile columns. In this case, the amino acid frequencies extracted from structural alignments are used.

from ost import io, seq
from promod3 import loop

# load an example structure
prot = io.LoadPDB('data/1CRN.pdb')

# extract some additional information
seqres = ''.join([r.one_letter_code for r in prot.residues])
frag_pos = 35
frag_length = 9
frag_seq = seqres[frag_pos:frag_pos+frag_length]
frag_residues = prot.residues[frag_pos:frag_pos+frag_length]
ref_backbone = loop.BackboneList(frag_seq, frag_residues)

# let's load the StructureDB and a substitution matrix
db = loop.LoadStructureDB()
subst_matrix = seq.alg.BLOSUM62

# the fragger object needs to be initialized with its target sequence
fragger = loop.Fragger(frag_seq)

# we could now add an arbitrary number of parameters from different
# features for now we only go for the sequence similarity score
fragger.AddSeqSimParameters(1.0, subst_matrix)

# the Fragger can finally be filled by providing a StructureDB
fragger.Fill(db, 1.0, 100)

# let's see how good the fragments are...
below_three = 0
for i in range(len(fragger)):
    ca_rmsd = fragger[i].CARMSD(ref_backbone,True)
    print("Fragment %d has CA RMSD of %.3f" % (i, ca_rmsd))
    if ca_rmsd < 3.0:
        below_three += 1

fraction = float(below_three)/len(fragger)
print("Fraction of fragments below 3A: %.2f" % fraction)

# add into a cached map with ID based on frag_pos
fragger_map = loop.FraggerMap()
if not fragger_map.Contains(frag_pos):
	fragger_map[frag_pos] = fragger
# store it for future use
fragger_map.SaveBB("frag_map.dat")
class promod3.loop.Fragger(seq)

A Fragger object to search a StructureDB for fragments with seq as target sequence. You need to add some score components before you can finally call the Fill function.

Parameters:

seq (str) – Sequence of fragments to be searched

Fill(db, rmsd_thresh, num_fragments)

Searches db for num_fragments fragments with maximal score based on the defined score components. There will be no pair of fragments with RMSD below rmsd_thresh.

Parameters:
  • db (StructureDB) – Source of structural data

  • rmsd_thresh (float) – To guarantee structural diversity

  • num_fragments (int) – Number of fragments to be extracted

AddSeqIDParameters(w)

Add SeqID score component with linear weight w

Parameters:

w (float) – linear weight

AddSeqSimParameters(w, subst)

Add SeqSim score component with linear weight w

Parameters:
  • w (float) – linear weight

  • subst (ost.seq.SubstWeightMatrix) – Substitution matrix to calculate sequence similarity

AddSSAgreeParameters(w, psipred_prediction)

Add SSAgree score component with linear weight w

Parameters:
  • w (str) – linear weight

  • psipred_prediction (PsipredPrediction) – Psipred prediction for fraggers target_sequence

AddTorsionProbabilityParameters(w, torsion_sampler, aa_before, aa_after)
AddTorsionProbabilityParameters(w, torsion_sampler_list, aa_before, aa_after)

Add TorsionProbability component of linear weight w

Parameters:
  • w (float) – linear weight

  • torsion_sampler (TorsionSampler) – Torsion sampler to be used for all residues.

  • torsion_sampler_list (list of TorsionSampler) – One torsion sampler for each residue.

  • aa_before (str) – Name (3 letter code) of the residue before the sequence linked to this object.

  • aa_after (str) – Name (3 letter code) of the residue after the sequence linked to this object.

AddSequenceProfileParameters(w, prof)

Add SequenceProfile component of linear weight w

Parameters:
AddStructureProfileParameters(w, prof)

Add StructureProfile component of linear weight w

Parameters:
__len__()
Returns:

Number of fragments stored in fragger.

__getitem__(index)
Parameters:

index – Item to extract

Returns:

Fragment at given position

Return type:

BackboneList

GetFragmentInfo(index)
Parameters:

index (int) – Index of fragment

Returns:

FragmentInfo of fragment at position index

GetScore(index)

Returns the final score of fragment defined by index

Parameters:

index (int) – Index of fragment

Returns:

Score of fragment at position index

GetScore(parameter_index, index)

Returns the single feature score defined by parameter_index of fragment defined by index with parameter indexing based on the order you added them to the Fragger

Parameters:
  • parameter_index (int) – Index of score (0-indexed in order of score components that were added)

  • index (int) – Index of fragment

class promod3.loop.FraggerMap

A simple storable map of Fragger objects. The idea is that one can use the map to cache fragger lists that have already been generated.

You can use Contains() to check if an item with a given key (int) already exists and access items with the [] operator (see __getitem__() and __setitem__()).

Serialization is meant to be temporary and is not guaranteed to be portable.

Load(filename, db)

Loads raw binary file generated with Save().

Parameters:
  • filename (str) – Path to the file.

  • db (StructureDB) – Source of structural data used when filling the fragments.

Returns:

The loaded map.

Return type:

FraggerMap

Raises:

RuntimeError if file cannot be opened.

Save(filename)

Saves raw binary representation of this map. Only fragment infos and scores are stored and not the parameters for scoring. The coordinates are to be reread from a structure db.

Parameters:

filename (str) – Path to the file.

Raises:

RuntimeError if file cannot be opened.

LoadBB(filename)

Loads raw binary file generated with SaveBB().

Parameters:

filename (str) – Path to the file.

Returns:

The loaded map.

Return type:

FraggerMap

Raises:

RuntimeError if file cannot be opened.

SaveBB(filename)

Saves raw binary representation of this map. Only fragments and scores are stored and not the parameters for scoring. Here, we also store the coordinates. This file will hence be much larger than the one saved with Save().

Parameters:

filename (str) – Path to the file.

Raises:

RuntimeError if file cannot be opened.

Contains(id)
Returns:

True, iff a fragger object for this id is already in the map.

Return type:

bool

__getitem__(id)
__setitem__(id)

Allow read/write access (with [id]) to fragger object with given ID.

The PsipredPrediction class

class promod3.loop.PsipredPrediction

A container for the secondary structure prediction by PSIPRED [Jones1999].

PsipredPrediction()

Constructs empty container

PsipredPrediction(prediction, confidence)

Constructs container with given content

Parameters:
  • prediction (list) – Secondary structure prediction as element in [‘H’,’E’,’C’]

  • confidence (list) – Confidence of prediction as element in [0,9]

Raises:

RuntimeError if size of prediction and confidence are inconsistent or if they contain an invalid element

FromHHM(filename)

Static function to Load a PsipredPrediction object from hhm file, as they are provided by the hhsearch suite

Parameters:

filename (str) – Name of file

FromHoriz(filename)

Static function to Load a PsipredPrediction object from horiz file, as they are produced by the psipred executable

Parameters:

filename (str) – Name of file

Add(prediction, confidence)

Adds and appends a single residue psipred prediction at the end

Parameters:
  • prediction (str) – Prediction, must be one in [‘H’,’E’,’C’]

  • confidence (int) – Confidence of prediction, must be in [0,9]

Raises:

RuntimeError if input contains invalid elements

Extract(from, to)

Extracts content and returns a sub-PsipredPrediction with range from to to, not including to itself

Parameters:
  • from (int) – Idx to start

  • to (int) – Idx to end

Returns:

PsipredPrediction with the specified range

Raises:

RuntimeError if from or to are invalid

GetPrediction(idx)
Parameters:

idx (int) – Index to get prediction from

Returns:

Psipred prediction at pos idx

Raises:

RuntimeError if idx is invalid

GetConfidence(idx)
Parameters:

idx (int) – Index to get confidence from

Returns:

Psipred confidence at pos idx

Raises:

RuntimeError if idx is invalid

GetPredictions()

Get all the predictions in the container

Returns:

list containing all the predictions in the container

GetConfidences()

Get all the confidences in the container

Returns:

list containing all the confidences in the container

__len__()
Returns:

Number of elements in container

Search

Enter search terms or a module, class or function name.

Contents