Linear Database¶
Many applications require to load lots of structures. Expecially on distributed file systems, io becomes a problem. OST provides a linear database to dump position data, e.g. CA positions, or character data, e.g. sequences, to allow fast retrieval. The actual data container behave like linear data arrays and the idea is to use an indexer to keep track of where to find data for a certain entry.
- class LinearIndexer¶
The idea of the
LinearIndexer
is to keep track of locations of data assuming a linear memory layout. The level of entries in the indexer are assemblies that can contain an arbitrary number of chains with varying length. Whenever a new assembly is added, a range enclosing all residues of that assembly is defined that is subsequent to the range of the previously added assembly. It is then not only possible to access the range of the full assembly, but also the range of single chains. Whenever an assembly with n residues is deleted, the ranges of all assemblies that have been added later on are reduced by n.- Load(filename)¶
Loads indexer from file
- Parameters:
filename (
str
) – Path to file to be loaded- Returns:
The loaded indexer
- Return type:
- Raises:
ost.Error
if filename cannot be opened
- Save(filename)¶
Saves indexer to file
- Parameters:
filename (
str
) – Path to file where the indexer is stored- Raises:
ost.Error
if filename cannot be created
- AddAssembly(name, chain_names, chain_lenths)¶
Adds a new assembly to the indexer. The range assigned to that assembly is subsequent to the previously added assembly.
- Parameters:
name (
str
) – Name of the added assemblychain_names (
list
ofstr
) – Names of all chains of the added assemblychain_lengths (
list
ofint
) – The according lengths of the chains
- Raises:
ost.Error
if lengths of chain_names and chain_lengths is inconsistent
- RemoveAssembly(name)¶
Removes an assembly from the indexer. Assuming that assembly contains a total of n residues, all ranges of the subsequent assemblies are reduced by n.
- Parameters:
name – Name of the assembly to be removed
- Raises:
ost.Error
if name is not present
- GetAssemblies()¶
- Returns:
The names of all added assemblies
- Return type:
list
ofstr
- Raises:
ost.Error
if name is not present
- GetChainNames(name)¶
- Parameters:
name (
str
) – Name of assembly from which you want the chain names- Returns:
The chain names of the specified assembly
- Return type:
list
ofstr
- Raises:
ost.Error
if name is not present
- GetChainLengths(name)¶
- Parameters:
name (
str
) – Name of assembly from which you want the chain lengths- Returns:
The chain lengths of the specified assembly
- Return type:
list
ofint
- Raises:
ost.Error
if name is not present
- GetDataRange(name)¶
Get the range for a full assembly
- Parameters:
name (
str
) – Name of the assembly from which you want the range- Returns:
Two values defining the range as [from, to[
- Return type:
tuple
ofint
- Raises:
ost.Error
if name is not present
- GetDataRange(name, chain_name)¶
Get the range for a chain of an assembly
- Parameters:
name (
str
) – Name of the assembly from which you want the rangechain_name (
str
) – Name of the chain from which you want the range
- Returns:
Two values defining the range as [from, to[
- Return type:
tuple
ofint
- Raises:
ost.Error
if name is not present or the according assembly has no chain with specified chain name
- GetNumResidues()¶
- Returns:
The total number of residues in all added assemblies
- Return type:
int
- class LinearCharacterContainer¶
The
LinearCharacterContainer
stores characters in a linear memory layout that can represent sequences such as SEQRES or ATOMSEQ. It can be accessed using range parameters and the idea is to keep it in sync with aLinearIndexer
.- Load(filename)¶
Loads container from file
- Parameters:
filename (
str
) – Path to file to be loaded- Returns:
The loaded container
- Return type:
- Raises:
ost.Error
if filename cannot be opened
- Save(filename)¶
Saves container to file
- Parameters:
filename (
str
) – Path to file where the container is stored- Raises:
ost.Error
if filename cannot be created
- AddCharacters(characters)¶
Adds characters at the end of the internal data. Call this function with appropriate data whenever you add an assembly to the associated
LinearIndexer
- Parameters:
characters (
str
) – Characters to be added
- ClearRange(range)¶
Removes all characters specified by range in form [from, to [ from the internal data. The internal data layout is linear, all characters starting from to are shifted to the location defined by from. Call this function with appropriate range whenever you remove an assembly from the associated
LinearIndexer
- Parameters:
range (
tuple
ofint
) – Range to be deleted in form [from, to[- Raises:
ost.Error
if range does not specify a valid range
- GetCharacter(idx)¶
- Returns:
The character at the specified location
- Return type:
str
- Raises:
ost.Error
if idx does not specify a valid position
- GetCharacters(range)¶
- Returns:
The characters from the specified range
- Return type:
str
- Raises:
ost.Error
if range does not specify a valid range
- GetNumElements()¶
- Returns:
The number of stored characters
- Rypte:
int
- class LinearPositionContainer¶
The
LinearPositionContainer
stores positions in a linear memory layout. It can be accessed using range parameters and the idea is to keep it in sync with aLinearIndexer
. In order to save some memory, a lossy compression is applied that results in a limited accuracy of two digits. if the absolute value of your added position is very large (> ~10000), the accuracy is further lowered to one digit. This is all handled internally.- Load(filename)¶
Loads container from file
- Parameters:
filename (
str
) – Path to file to be loaded- Returns:
The loaded container
- Return type:
- Raises:
ost.Error
if filename cannot be opened
- Save(filename)¶
Saves container to file
- Parameters:
filename (
str
) – Path to file where the container is stored- Raises:
ost.Error
if filename cannot be created
- AddPositions(positions)¶
Adds positions at the end of the internal data. Call this function with appropriate data whenever you add an assembly to the associated
LinearIndexer
- Parameters:
positions (
ost.geom.Vec3List
) – Positions to be added
- ClearRange(range)¶
Removes all positions specified by range in form [from, to [ from the internal data. The internal data layout is linear, all positions starting from to are shifted to the location defined by from. Call this function with appropriate range whenever you remove an assembly from the associated
LinearIndexer
- Parameters:
range (
tuple
ofint
) – Range to be deleted in form [from, to[- Raises:
ost.Error
if range does not specify a valid range
- GetPosition(idx, pos)¶
Extracts a position at specified location. For efficiency reasons, the function requires the position to be passed as reference.
- Parameters:
idx (
int
) – Specifies locationpos (
ost.geom.Vec3
) – Will be altered to the desired position
- Raises:
ost.Error
if idx does not specify a valid position
- GetPositions(range, positions)¶
Extracts positions at specified range. For efficiency reasons, the function requires the positions to be passed as reference.
- Parameters:
range (
tuple
ofint
) – Range in form [from,to[ that defines positions to be extractedpositions (
ost.geom.Vec3List
) – Will be altered to the desired positions
- Raises:
ost.Error
if range does not specify a valid range
- GetNumElements()¶
- Returns:
The number of stored positions
- Rypte:
int
Data Extraction¶
Openstructure provides data extraction functionality for the following scenario:
There are three binary container. A position container to hold CA-positions
(LinearPositionContainer
), a SEQRES container and
an ATOMSEQ container (both: LinearCharacterContainer
).
They contain entries from the protein structure database
and sequence/position data is relative to the SEQRES of those entries.
This means, if the SEQRES has more characters as there are resolved residues
in the structure, the entry in the position container still contains the exact
number of SEQRES characters but some position remain invalid. Thats where the
ATOMSEQ container comes in. It only contains matching residues to the SEQRES but
marks non-resolved residues with ‘-‘.
- ExtractValidPositions(entry_name, chain_name, indexer, atomseq_container, position_container, seq, positions)¶
Iterates over all data for a chain specified by entry_name and chain_name. For every data point marked as valid in the atomseq_container (character at that position is not ‘-‘), the character and the corresponding position are added to seq and positions
- Parameters:
entry_name (
str
) – Name of assembly you want the data fromchain_name (
str
) – Name of chain you want the data fromindexer (
LinearIndexer
) – Used to access atomseq_container and position_containeratomseq_container (
LinearCharacterContainer
) – Container that marks locations with invalid position data with ‘-’position_container (
LinearPositionContainer
) – Container containing position dataseq (
ost.seq.SequenceHandle
) – Sequence with extracted valid positions gets stored in here.positions (
ost.geom.Vec3List
) – The extracted valid positions get stored in here
- Raises:
ost.Error
if requested data is not present
- ExtractTemplateData(entry_name, chain_name, aln, indexer, seqres_container, atomseq_container, position_container)¶
Let’s say we have a target-template alignment in aln (first seq: target, second seq: template). This function extracts all valid template positions given the entry specified by entry_name and chain_name. The template sequence in aln must match the sequence in seqres_container. Again, the atomseq_container is used to identify valid positions. The according residue numbers relative to the target sequence in aln are also returned.
- Parameters:
entry_name (
str
) – Name of assembly you want the data fromchain_name (
str
) – Name of chain you want the data fromaln – Target-template sequence alignment
indexer (
LinearIndexer
) – Used to access atomseq_container, seqres_container and position_containerseqres_container (
LinearCharacterContainer
) – Container containing the full sequence dataatomseq_container (
LinearCharacterContainer
) – Container that marks locations with invalid position data with ‘-’position_container (
LinearPositionContainer
) – Container containing position data
- Returns:
First element:
list
of residue numbers that relate each entry in the second element to the target sequence specified in aln. The numbering scheme starts from one. Second Element:geom.Vec3List
with the according positions.- Return type:
tuple
- Raises:
ost.Error
if requested data is not present in the container or if the template sequence in aln doesn’t match with the sequence in seqres_container