You are reading the documentation for version 3.3 of ProMod3.
You may also want to read the documentation for:
1.3
2.0
2.1
3.0
3.1
3.2
Modelling Algorithms
A collection of algorithms that can be useful in modelling
Rigid Blocks
RMSD is a typical measure for similarity of two structures. Given an atom atom
mapping between two structures, the minimum RMSD and the according superposition
can efficiently be calculated using an approach based on singular value
decomposition. This approach is problematic if there are very dissimilar regions
or when domain movement events occur. We can therefore implement an iterative
superposition. The two structures undergo an initial superposition. For every
iteration we then select a subset of atoms that are within a certain distance
threshold that serve as input for the next superposition. This iterative
superpostion typically converges to the largest common subpart but is
non-deterministic since it depends on the initial superposition.
The RigidBlocks algorithm is based on only the CA positions and performs this
iterative superposition multiple times by using a sliding window to select the
initial subset and gathers all unique results. These results can be very
similar and only differ by single positions. The algorithm therefore reduces
the amount of solutions by merging them based on a threshold of similarity.
The similarity is defined by the fraction of positions in solution A that are
also present in solution B. As a final result, the algorithm therefore detects
common rigid subsets of positions.
-
promod3.modelling. RigidBlocks (bb_list_one, bb_list_two[, window_length = 12, max_iterations=20, distance_thresh=3.0, cluster_thresh=0.9])
Performs the RigidBlock algorithm on given input
Parameters: |
- bb_list_one (
promod3.loop.BackboneList ) – First piece structural information from which CA
positions will be extracted
- bb_list_two (
promod3.loop.BackboneList ) – Second piece of structural information from which CA
positions will be extracted
- window_length (
int ) – Length of sliding window to generate initial subsets
- max_iterations (
int ) – Maximal numbers of iterations for every single
iterative superposition
- distance_thresh (
float ) – Maximal distance two CA positions can have to be
considered in the same rigid block and to select
the common subset for the next iteration of the
iterative superposition
- cluster_thresh (
float ) – Threshold of similarity to perform the final merging
of the solutions
|
Returns: | tuple with the first element being a
list of list defining the
indices of the common subsets (rigid blocks) relative
to the input promod3.loop.BackboneList objects
and the second element being a list of
ost.geom.Mat4 defining the transformations to
superpose the according positions in bb_list_one
onto bb_list_two
|
-
promod3.modelling. RigidBlocks (aln[, seq_one_idx=0, seq_two_idx=1, window_length = 12, max_iterations=20, distance_thresh=3.0, cluster_thresh=0.9])
Performs the RigidBlock algorithm on given input
Parameters: |
- aln (
ost.seq.AlignmentHandle ) – An alignment with attached ost.mol.EntityView
objects from which the positions are extracted
- seq_idx_one (
int ) – The idx of the first sequence from which the CA
positions will be extracted
- seq_idx_two (
int ) – The idx of the second sequence from which the CA
positions will be extracted
- window_length (
int ) – Length of sliding window to generate initial subsets
- max_iterations (
int ) – Maximal numbers of iterations for every single
iterative superposition
- distance_thresh (
float ) – Maximal distance two CA positions can have to be
considered in the same rigid block and to select
the common subset for the next iteration of the
iterative superposition
- cluster_thresh (
float ) – Threshold of similarity to perform the final merging
of the solutions
|
Returns: | tuple with the first element being a
list of list defining the
column indices of the common subsets (rigid blocks)
relative to the input ost.seq.AlignmentHandle
and the second element being a list of
ost.geom.Mat4 defining the transformations to
superpose the according positions from the first
sequence onto the second sequence.
|
-
promod3.modelling. RigidBlocks (pos_one, pos_two[, window_length = 12, max_iterations=20, distance_thresh=3.0, cluster_thresh=0.9])
Performs the RigidBlock algorithm on given input
Parameters: |
- pos_one (
ost.geom.Vec3List ) – First piece position information
- pos_two (
ost.geom.Vec3List ) – Second piece of position information
- window_length (
int ) – Length of sliding window to generate initial subsets
- max_iterations (
int ) – Maximal numbers of iterations for every single
iterative superposition
- distance_thresh (
float ) – Maximal distance two CA positions can have to be
considered in the same rigid block and to select
the common subset for the next iteration of the
iterative superposition
- cluster_thresh (
float ) – Threshold of similarity to perform the final merging
of the solutions
|
Returns: | tuple with the first element being a
list of list defining the
indices of the common subsets (rigid blocks) relative
to the input ost.geom.Vec3List objects
and the second element being a list of
ost.geom.Mat4 defining the transformations to
superpose the according positions in pos_one
onto pos_two
|
De Novo Modelling
ProMod3 provides algorithms for sampling and fragment detection.
Here we provide an object, that facilitates fragment detection and caching,
as well as a convenient function to combine the functionalities into an
example pipeline.
-
class
promod3.modelling. FraggerHandle (sequence, profile=None, psipred_pred=None, fragment_length=9, fragments_per_position=100, rmsd_thresh=0.0, structure_db=None, torsion_sampler_coil=None, torsion_sampler_helix=None, torsion_sampler_extended=None)
Handler for Fragger objects linked to a
specific chain.
Tries to get the most accurate fragments given your input.
You can only provide a SEQRES, the returned fragments are
then searched by using sequence similarity as the only target value.
You can massively increase the accuracy of the found fragments by
providing a secondary structure prediction and / or sequence profile.
Following features influence the fragment search given your input:
- sequence:
- Sequence Similarity with BLOSUM62
- sequence, psipred_pred:
- Sequence Similarity with BLOSUM62
- Secondary Structure Agreement
- Secondary Structure Dependent Torsion Probabilities
- sequence, profile:
- Sequence Profile Score
- Structure Profile Score
- sequence, psipred_pred, profile:
- Secondary Structure Agreement
- Secondary Structure Dependent Torsion Probabilities
- Sequence Profile Score
- Structure Profile Score
The FraggerHandle internally uses the promod3.loop.FraggerMap for caching. You
can therefore request fragments for a certain position several times and the
search is performed only once. This also allows to save the FraggerHandle to
disk. When loading the FraggerHandle again, you need to provide all parameters
again. These parameters must be exactly the same than the ones you used when
initially constructing the FraggerHandle, especially the structure database.
Weird things are happening otherwise.
-
Get (frag_pos)
Get fragger for sequence at index frag_pos..frag_pos+frag_length-1.
Parameters: | frag_pos (int ) – Start-index (note that sequence-indexing starts at 0) |
Returns: | A Fragger object. |
Raises: | ValueError if index out-of-bounds. |
-
GetList (pos_start=0, pos_end=-1)
Get List of fraggers covering sequence indices pos_start..pos_end.
This will return an empty list if range is smaller than fragment_length.
Parameters: |
- pos_start (
int ) – Start-index (note that sequence-indexing starts at 0)
- pos_end (
int ) – End-index or -1 if it should go to the sequence-end.
|
Returns: | A list of Fragger objects.
|
Raises: | ValueError if indices out-of-bounds.
|
-
LoadCached (filename)
Load fragger objects stored with SaveCached() .
Note that here we require that the same structure db is set as was
used when filename was saved.
-
SaveCached (filename)
Save cached fraggers.
-
promod3.modelling. GenerateDeNovoTrajectories (sequence, num_trajectories=200, avg_sampling_per_position=600, profile=None, psipred_prediction=None, fragment_handler=None, scorer=None, scorer_env=None, scoring_weights=None)
Example de novo modelling pipeline based on Fragment sampling and
backbone scoring. Take this as a starting point for more advanced
de novo procedures.
Parameters: |
- sequence (
str ) – The sequence you want to sample
- num_trajectories (
int ) – The number of sampling trajectories you
want to generate
- avg_sampling_per_position – Number of Monte Carlo sampling steps
the total number is:
len(sequence) * avg_sampling_per_position
- profile (
ost.seq.ProfileHandle ) – The sequence profile for sequence. This increases the
fragment search performance.
- psipred_prediction (
promod3.loop.PsipredPrediction ) – The psipred prediction for sequence. This
increases the fragment search performance
- fragment_handler (
promod3.modelling.FraggerHandle ) – You can provide already initialized fragments.
If you pass this parameter, profile and
psipred_prediction get neglected and do
not influence the fragment search, the
ones you initialized fragment_handler with
get used instead.
- scorer (
promod3.scoring.BackboneOverallScorer ) – Scorer doing the backbone scoring. If not provided, a
default one gets loaded with default objects with
following keys: clash, reduced, cb_packing, hbond, cbeta,
torsion and pairwise
- scorer_env (
promod3.scoring.BackboneScoreEnv ) – The scoring env that relates to scorer
This environment will be changed!
- scoring_weights (
dict ) – Linear weights for different scores. If not provided,
the output of ScoringWeights.GetWeights() is used.
Please note, that the weights must be consistent
with the keys of the scores in scorer
|
Returns: | A promod3.loop.LoopCandidates object containing
num_trajectories elements for further processing
|
Motif Finder
Distinct spatial arrangements of atoms or functional groups are key for protein
function. For their detection, ProMod3 implements the MotifFinder algorithm
which is based on geometric hashing as described by Nussinov and Wolfson
[nussinov1991]. The algorithm consists of a learning stage, a detection stage
and a refinement stage.
Learning Stage: A motif (query) is represented by a set of coordinates. Triplets
(p1, p2, p3) of coordinates are selected that define triangles. For each
triangle one can define an orthogonal vector basis
(in our case v1 = norm(p2-p1), v3 = norm(cross(v1,p3-p1),
v2 = norm(cross(v1,v3)))). For each coordinate not in [p1,p2,p3], we add the
identity of the query/triangle as value to a hash map.
The corresponding key consists of discretized values describing the edge lengths
of the triangle, as well as the coordinate transformed into the triangle
specific orthogonal vector basis. That’s 6 numbers in total.
Detection Stage: The goal is to identify one or several subsets of target
coordinates that resemble an input query.
We first setup an accumulator containing a counter for each triangle observed
in the input query. We then iterate over each possible triangle with vertices
p1, p2 and p3 in the target coordinates. At the beginning of each iteration,
all counters in the accumulator are set to zero. Again, we build a vector basis
given that triangle and transform all coordinates not in [p1,p2,p3] into that
vector space. For each transformed coordinate we obtain a key for the query hash
map. If there is one or several values at that location in the hash map,
we increment the corresponding locations in the accumulator.
Once all coordinates are processed, we search for high counts in the
accumulator. Given N query coordinates, we keep a solution for further
refinement if count/(N-3) >= hash_tresh. This is repeated until all
triangles in the target are processed. One key problem with this approach is
the discretization of floating point numbers that give raise to the hash map
keys. Two extremely close values might end up in different bins just because
they are close to the bin boundaries. For each of the 6 relevant numbers
we estimate the actual bin as well as the closest neighbouring bin. Processing
all possible combinations results in 64 hash map lookups instead of only one.
Refinement Stage: Every potential solution identified in the detection stage is
further refined based on the distance_thresh and refine_thresh parameters.
A potential solution found in the detection stage is a pair of triangles, one
in the query and one in the target, for which we find many matching coordinates
in their respective vector space. We start with a coordinate mapping based on
the triangle vertices from the query and the target (3 pairs).
This coordinate mapping is iteratively updated by estimating the minimum RMSD
superposition of the mapped query coordinates onto the target, apply that
superposition on the query, find the closest target coordinate for each
coordinate in the query and redo the mapping by including all pairs with
minimum distance < distance_thresh. Iteration stops if nothing changes
anymore. The solution is returned to the user if the final fraction of mapped
query coordinates is larger or equal refine_thresh.
The larger the mapping, the more accurate the superposition. As we start with
only the three triangle vertices, distance_thresh is doubled for the initial
iteration.
# Example script that loads protein structures that contain ATP and
# generates motif queries describing their binding pockets.
# In a second step, those pockets are matched against a protein
# structure that only contains an ATP analog. The ATP from every
# match is transformed and stored to disk for further processing.
from ost import io, geom, mol
from promod3 import modelling
files = ['data/1E2Q.pdb', 'data/1KO5.pdb', 'data/2IYW.pdb']
atp_list = list()
query_list = list()
for f in files:
prot = io.LoadPDB(f)
peptide_sel = prot.Select("peptide=true")
atp_sel = prot.Select("rname=ATP")
# generate a single query for each ATP pocket
for atp_idx, atp_r in enumerate(atp_sel.residues):
pocket_view = peptide_sel.CreateEmptyView()
for atp_at in atp_r.atoms:
close_at = peptide_sel.FindWithin(atp_at.GetPos(), 4.5)
for at in close_at:
r = at.handle.GetResidue()
add_flag = mol.INCLUDE_ATOMS | mol.CHECK_DUPLICATES
pocket_view.AddResidue(r, add_flag)
ca_positions = geom.Vec3List()
for res in pocket_view.residues:
ca_positions.append(res.FindAtom("CA").GetPos())
i = "%s_%i"%(f, atp_idx)
query = modelling.MotifQuery(ca_positions, i, 4.0, 9.0, 1.0)
query_list.append(query)
# create an entity from atp for later use
atp_view = prot.CreateEmptyView()
atp_view.AddResidue(atp_r, mol.INCLUDE_ATOMS)
atp_list.append(mol.CreateEntityFromView(atp_view, True))
# That's it, let's combine the single queries
full_query = modelling.MotifQuery(query_list)
prot = io.LoadPDB("data/1AKE.pdb")
peptide_sel = prot.Select("peptide=true")
ca_positions = geom.Vec3List()
for r in peptide_sel.residues:
ca_positions.append(r.FindAtom("CA").GetPos())
# search all matches, fetch the corresponding atps,
# transform them onto the 1ake binding site and dump them to disk
matches = modelling.FindMotifs(full_query, ca_positions)
for m_idx, m in enumerate(matches):
atp = atp_list[m.query_idx].Copy()
atp.EditXCS().ApplyTransform(m.mat)
io.SavePDB(atp, "m_%i.pdb"%(m_idx))
-
class
promod3.modelling. MotifQuery (positions, identifier, min_triangle_edge_length, max_triangle_edge_length, bin_size)
-
class
promod3.modelling. MotifQuery (positions, identifier, min_triangle_edge_length, max_triangle_edge_length, bin_size, flags)
-
class
promod3.modelling. MotifQuery (query_list)
A single query or a container of queries.
The constructor performs the learning stage of a single query or combines
several queries, so they can be searched at once.
Parameters: |
- positions (
ost.geom.Vec3List ) – Coordinates of the query
- identifier (
str ) – Descriptor of the query
- min_triangle_edge_length (
float ) – To avoid the full O(n^3) hell, triangles
with any edge length below min_triangle_edge_length
are skipped
- max_triangle_edge_length (
float ) – Same as min_triangle_edge_length but
upper bound
- bin_size (
float ) – Bin size in A, relevant to generate hash map keys
- flags (
list of int ) – Flag in range [0,63] for every coordinate in positions.
They’re also added to the hash map keys (default: 0).
This means that additionally to having a matching
relative position, the coordinates must also have a
matching flag in the detection/refinement stage.
If not provided (in the query and in the search),
only coordinates matter.
- query_list (
list of MotifQuery ) – E pluribus unum
|
-
Save (filename)
Saves the query down to disk
Parameters: | filename (str ) – filename |
-
static
Load (filename)
Load query from disk
Parameters: | filename (str ) – filename |
-
GetPositions (query_idx)
Returns coordinates of specified query
Parameters: | query_idx (int ) – Query from which you want the positions |
-
GetIdentifiers ()
Returns a list of all query identifiers.
-
GetN ()
Returns the number of queries
-
class
promod3.modelling. MotifMatch
Object that holds information about a match found in FindMotifs()
-
query_idx
Index of matching query
-
mat
Transformation matrix to superpose matching query onto target
-
alignment
List of tuples which define matching pairs of query/target coordinates
-
promod3.modelling. FindMotifs (query, target_positions, hash_tresh=0.4, distance_thresh=1.0, refine_thresh=0.7, flags=list(), swap_thresh=False)
Performs the detection and refinement stages of the geometric hashing
algorithm.
Parameters: |
- query – Query to be searched
- target_positions – Coordinates of the target
- hash_thresh – Parameter relevant for detection stage
- distance_thresh – Parameter relevant for refinement stage
- refine_thresh – Parameter relevant for refinement stage
- flags – Equivalent to flags in
MotifQuery
constructor. If you didn’t provide anything there,
this can be ignored. Only the actual coordinates
matter in this case.
- swap_thresh – hash_thresh and refine_thresh refer to fraction of
covered positions in query. When setting this to
True, they refer to the fraction of covered positions
in target_positions.
|
Returns: | All found matches
|
Return type: | list of MotifMatch
|
AFDB Modelling
Template based modelling using AFDB as template library comes with two
challenges for which ProMod3 provides solutions:
The creation of these two object requires extensive preprocessing. The required
scripts and documentation are available in
<GIT_ROOT>/extras/data_generation/afdb_modelling.
Basic modelling functionality is available in the following two functions:
-
promod3.modelling. AFDBTPLSearch (fs_server, pentamatch, trg_seq, pentamatch_n=100, seqid_thresh=70, tpl_n=5)
Searches tpl_n templates in fs_server/pentamatch
Step 1: Identifies pentamatch_n sequences in pentamatch with largest
number of matching pentamers with respect to trg_seq.
Step 2: Generate pairwise alignments with ost.seq.alg.LocalAlign()
and only retain the ones with seqid >= seqid_thresh.
Step 3: Extract respective templates from fs_server and score them by
the sum of plDDT of aligned residues divided by trg_seq length.
Step 4: Return top tpl_n (or less)
Parameters: |
- fs_server (
FSStructureServer ) – Structure database - The AFDB
- pentamatch (
PentaMatch ) – Pentamatch object specific for fs_server
- trg_seq (
ost.seq.SequenceHandle /str ) – Target sequence
- seqid_thresh (
int ) – Sequence Identity threshold [0-100] that alignment is
considered further
- tpl_n (
int ) – Number of templates that are finally returned based on
described scoring
|
Pentamatch_n: | Number of sequences that are initially searched in
pentamatch
|
Returns: | list of pairs with first element being the tpl score,
the second element being a ost.seq.AlignmentHandle with
first sequence being trg_seq and second sequence the hit found
in fs_server with structure attached. If fs_server has been
generated with the default procedure described in the docs,
additional info is available in the name of the attached
structure. That’s accessible with
aln.GetSequence(1).GetAttachedView().GetName(). That is
structured as “<UniprotAC> <Fragment> <AFDB version> <Idx>” where
idx refers to the raw idx of the template in fs_server.
|
-
promod3.modelling. AFDBModel (fs_server, pentamatch, trg_seq, transfer_bfactors=False)
Build model with AFDB as template library
Parameters: |
- fs_server (
FSStructureServer ) – Structure database - The AFDB
- pentamatch (
PentaMatch ) – Pentamatch object specific for fs_server
- trg_seq (
ost.seq.SequenceHandle /str ) – Target sequence
- transfer_bfactors – Simple heuristic to transfer bfactors (plDDT) to
model. In case of an aligned residue, bfactor
(i.e. plDDT) gets simply transferred.
Gaps are treated with a heuristic. This operates
on the full stretch of remodelled amino acids and
not solely on the gap indicated by the alignment.
We first derive a minimum plDDT which is
0.5*min(n_stem_plddt, c_stem_plddt). The plDDT
values of the processed amino acids then linearly
decreases from the stem towards that minimum with
a slope of 0.25 (i.e. reach the minimum value when
they’re 4 residues away).
|
Returns: | tuple with 4 elements. 1: The model 2: The model score
based on plDDT 3: Pairwise alignment (first seq: trg_seq,
second seq: tpl seq) 4: Template name (formatted as
“<uniprot AC> <AFDB_fragment> <AFDB_version> <chain name>”).
If no appropriate template can be found, all 4 elements are None.
|
-
class
promod3.modelling. FSStructureServer (db_dir)
FSStructureServer - A filesystem based AFDB structure server
Stores OMF data entries in huge binary files and extracts the respective
binary data blobs with an indexing mechanism. Efficient reading of these
huge files is delegated to the OS using the Python mmap module.
Data creation happens with static Create function: FromDataChunks() .
The required preprocessing must be done with external scripts.
Parameters: | db_dir (str ) – Directory containing required files. 1) pos.dat (see
pos ) 2) length.dat (see length )
3) chunk.dat (see chunk ) 4) indexer.dat (see
indexer ) 4) one or several files with pattern
fs_data_[x].dat (see data ) |
-
static
FromDataChunks (chunk_dir, db_dir, chunk_bytes=None)
Static method to create new database from preprocessed data
Data preprocessing consists of creating several data chunks that are
pickled to disk.
In detail: each chunk is a pickled file containing a list, where each
entry is a tuple with 4 elements: 1) uniprot_ac (str )
2) fragment (str ) 3) version (str ) 4) structure data
(ost.io.OMF object which has been written to a bytes object and
compressed with gzip)
The data itself is again stored in chunks of binary data which are
indexed.
Parameters: |
- chunk_dir (
str ) – Path to directory containing described data chunks
- db_dir (
str ) – Output directory - Creates all files that are needed for
FSStructureServer
- chunk_bytes (
int ) – Size in bytes of binary data chunks in final
database - default None: Everything ends up in one
single chunk
|
Returns: | FSStructureServer with all data from chunk_dir
|
-
GetIdx (uniprot_ac, fragment='F1', version='v4')
Get internal idx of stored data
Can be used for data retrieval with GetOMFByIdx()
Parameters: |
- uniprot_ac (
str ) – Uniprot AC
- fragment (class:str) – AFDB entries are potentially split to fragments
99.999% of all entries only have one fragment: F1
- version (
str ) – Version of entry
|
Returns: | Internal idx of that entry
|
Raises: | RuntimeError if no such entry exists
|
-
GetOMF (uniprot_ac, fragment='F1', version='v4')
Get stored OMF data structure
Parameters: |
- uniprot_ac (
str ) – Uniprot AC
- fragment (class:str) – AFDB entries are potentially split to fragments
99.999% of all entries only have one fragment: F1
- version (
str ) – Version of entry
|
Returns: | OMF data structure of type ost.io.OMF
|
Raises: | RuntimeError if no such entry exists
|
-
GetOMFByIdx (idx)
Get stored OMF data structure
Parameters: | idx (int ) – Internal index which can be derived from GetIdx() |
Returns: | OMF data structure of type ost.io.OMF |
Raises: | RuntimeError if idx is out of range |
-
chunk
Chunk, in which entry is stored
Type: | np.ndarray with dtype np.int16 |
-
data
Internal binary data in memory mapped files
-
indexer
Internal data structure - Relates entries with data indices
Type: | np.ndarray with dtype np.int32 |
-
length
Lengths of data in internal linear memory layout
Type: | np.ndarray with dtype np.int32 |
-
n_entries
Number of entries
-
pos
Start positions of data in internal linear memory layout
Type: | np.ndarray with dtype np.int64 |
-
search_keys
Internal data structure - Relates entries with data indices
-
class
promod3.modelling. PentaMatch (db_dir)
Pentamer matching for fast sequence searches
PentaMatch has fast sequence searches with low sensitivity as
use case. Specifically searching the full AFDB. Stores all unique pentamers
for each search sequence. Given a query sequence, it computes the number of
matching pentamers with respect to each search sequence and returns the top
hits.
Parameters: | db_dir (str ) – Directory containing all required files (indexer.dat,
pos.dat, length.dat, meta.dat). New PentaMatch
objects can be derived using the static FromSeqList()
creator. |
Raises: | RuntimeError if any required file is missing in db_dir |
-
static
FromSeqList (fasta_file, db_dir, entries_from_seqnames=False)
Creates PentaMatch object from Fasta file
Parameters: |
- fasta_file (
str ) – Path to Fasta file with sequences
- db_dir (
str ) – Files required for PentaMatch will be dumped
here, will be created if non-existent.
- entries_from_seqnames (
bool ) – If set to False, indices returned by
TopN() refer to position in
fasta_file. If set to True, integer
indices are parsed from sequence name.
|
Returns: | class:PentaMatch
|
Raises: | ost.Error if entries_from_seqnames is True but
sequence name cannot be casted to int.
|
-
N
Number of entries in underlying FSStructureServer
-
TopN (N, sequence, return_counts=False, unique_pentamers=True)
Find top-N matches given sequence
Identifies unique pentamers in sequence and counts number of
occurences in each entry. Returns the top-N entries with respect
to counts.
Parameters: |
- N (
int ) – Number of results to return
- sequence (
str ) – Sequence to search
- return_counts (
bool ) – Additionally return underlying pentamer counts
- unique_pentamers (
bool ) – Set to True if indexer contains only
unique pentamers for each entry. This way we
can use faster methods to accumulate counts.
In detail: accumulator[indices] += 1 is much
faster than np.add.at(accumulator, indices, 1).
But the latter is required if there are
duplicates.
|
Returns: | list of int with length N specifying
entry indices. If return_counts is true, the
list contains tuple with two elements:
1) count 2) index.
|
Raises: | RuntimeError if N is invalid or sequence is shorter
than 5 characters
|
-
indexer
Entry indices for pentamers
Entry data for one pentamer can be extracted with the respective values in
pos and length
Type: | memory mapped np.ndarray of dtype np.int32 |
-
length
Length for each pentamer in indexer
Type: | np.ndarray of dtype np.int32 with 3.2E6 entries |
-
pos
Start position for each pentamer in indexer
Type: | np.ndarray of dtype np.int64 with 3.2E6 entries |
|
Contents
Search
Enter search terms or a module, class or function name.
Previous topic
Sidechain Reconstruction
Next topic
sidechain - Sidechain Modelling
You are here
|