You are reading the documentation for version 1.10 of OpenStructure. You may also want to read the documentation for: 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.7.1 1.8 1.9 devel

Linear Database

Many applications require to load lots of structures. Expecially on distributed file systems, io becomes a problem. OST provides a linear database to dump position data, e.g. CA positions, or character data, e.g. sequences, to allow fast retrieval. The actual data container behave like linear data arrays and the idea is to use an indexer to keep track of where to find data for a certain entry.

class LinearIndexer

The idea of the LinearIndexer is to keep track of locations of data assuming a linear memory layout. The level of entries in the indexer are assemblies that can contain an arbitrary number of chains with varying length. Whenever a new assembly is added, a range enclosing all residues of that assembly is defined that is subsequent to the range of the previously added assembly. It is then not only possible to access the range of the full assembly, but also the range of single chains. Whenever an assembly with n residues is deleted, the ranges of all assemblies that have been added later on are reduced by n.

Load(filename)

Loads indexer from file

Parameters:filename (str) – Path to file to be loaded
Returns:The loaded indexer
Return type:LinearIndexer
Raises:ost.Error if filename cannot be opened
Save(filename)

Saves indexer to file

Parameters:filename (str) – Path to file where the indexer is stored
Raises:ost.Error if filename cannot be created
AddAssembly(name, chain_names, chain_lenths)

Adds a new assembly to the indexer. The range assigned to that assembly is subsequent to the previously added assembly.

Parameters:
  • name (str) – Name of the added assembly
  • chain_names (list of str) – Names of all chains of the added assembly
  • chain_lengths (list of int) – The according lengths of the chains
Raises:

ost.Error if lengths of chain_names and chain_lengths is inconsistent

RemoveAssembly(name)

Removes an assembly from the indexer. Assuming that assembly contains a total of n residues, all ranges of the subsequent assemblies are reduced by n.

Parameters:name – Name of the assembly to be removed
Raises:ost.Error if name is not present
GetAssemblies()
Returns:The names of all added assemblies
Return type:list of str
Raises:ost.Error if name is not present
GetChainNames(name)
Parameters:name (str) – Name of assembly from which you want the chain names
Returns:The chain names of the specified assembly
Return type:list of str
Raises:ost.Error if name is not present
GetChainLengths(name)
Parameters:name (str) – Name of assembly from which you want the chain lengths
Returns:The chain lengths of the specified assembly
Return type:list of int
Raises:ost.Error if name is not present
GetDataRange(name)

Get the range for a full assembly

Parameters:name (str) – Name of the assembly from which you want the range
Returns:Two values defining the range as [from, to[
Return type:tuple of int
Raises:ost.Error if name is not present
GetDataRange(name, chain_name)

Get the range for a chain of an assembly

Parameters:
  • name (str) – Name of the assembly from which you want the range
  • chain_name (str) – Name of the chain from which you want the range
Returns:

Two values defining the range as [from, to[

Return type:

tuple of int

Raises:

ost.Error if name is not present or the according assembly has no chain with specified chain name

GetNumResidues()
Returns:The total number of residues in all added assemblies
Return type:int
class LinearCharacterContainer

The LinearCharacterContainer stores characters in a linear memory layout that can represent sequences such as SEQRES or ATOMSEQ. It can be accessed using range parameters and the idea is to keep it in sync with a LinearIndexer.

Load(filename)

Loads container from file

Parameters:filename (str) – Path to file to be loaded
Returns:The loaded container
Return type:LinearCharacterContainer
Raises:ost.Error if filename cannot be opened
Save(filename)

Saves container to file

Parameters:filename (str) – Path to file where the container is stored
Raises:ost.Error if filename cannot be created
AddCharacters(characters)

Adds characters at the end of the internal data. Call this function with appropriate data whenever you add an assembly to the associated LinearIndexer

Parameters:characters (str) – Characters to be added
ClearRange(range)

Removes all characters specified by range in form [from, to [ from the internal data. The internal data layout is linear, all characters starting from to are shifted to the location defined by from. Call this function with appropriate range whenever you remove an assembly from the associated LinearIndexer

Parameters:range (tuple of int) – Range to be deleted in form [from, to[
Raises:ost.Error if range does not specify a valid range
GetCharacter(idx)
Returns:The character at the specified location
Return type:str
Raises:ost.Error if idx does not specify a valid position
GetCharacters(range)
Returns:The characters from the specified range
Return type:str
Raises:ost.Error if range does not specify a valid range
GetNumElements()
Returns:The number of stored characters
Rypte:int
class LinearPositionContainer

The LinearPositionContainer stores positions in a linear memory layout. It can be accessed using range parameters and the idea is to keep it in sync with a LinearIndexer. In order to save some memory, a lossy compression is applied that results in a limited accuracy of two digits. if the absolute value of your added position is very large (> ~10000), the accuracy is further lowered to one digit. This is all handled internally.

Load(filename)

Loads container from file

Parameters:filename (str) – Path to file to be loaded
Returns:The loaded container
Return type:LinearPositionContainer
Raises:ost.Error if filename cannot be opened
Save(filename)

Saves container to file

Parameters:filename (str) – Path to file where the container is stored
Raises:ost.Error if filename cannot be created
AddPositions(positions)

Adds positions at the end of the internal data. Call this function with appropriate data whenever you add an assembly to the associated LinearIndexer

Parameters:positions (ost.geom.Vec3List) – Positions to be added
ClearRange(range)

Removes all positions specified by range in form [from, to [ from the internal data. The internal data layout is linear, all positions starting from to are shifted to the location defined by from. Call this function with appropriate range whenever you remove an assembly from the associated LinearIndexer

Parameters:range (tuple of int) – Range to be deleted in form [from, to[
Raises:ost.Error if range does not specify a valid range
GetPosition(idx, pos)

Extracts a position at specified location. For efficiency reasons, the function requires the position to be passed as reference.

Parameters:
  • idx (int) – Specifies location
  • pos (ost.geom.Vec3) – Will be altered to the desired position
Raises:

ost.Error if idx does not specify a valid position

GetPositions(range, positions)

Extracts positions at specified range. For efficiency reasons, the function requires the positions to be passed as reference.

Parameters:
  • range (tuple of int) – Range in form [from,to[ that defines positions to be extracted
  • positions (ost.geom.Vec3List) – Will be altered to the desired positions
Raises:

ost.Error if range does not specify a valid range

GetNumElements()
Returns:The number of stored positions
Rypte:int

Data Extraction

Openstructure provides data extraction functionality for the following scenario: There are three binary container. A position container to hold CA-positions (LinearPositionContainer), a SEQRES container and an ATOMSEQ container (both: LinearCharacterContainer). They contain entries from the protein structure database and sequence/position data is relative to the SEQRES of those entries. This means, if the SEQRES has more characters as there are resolved residues in the structure, the entry in the position container still contains the exact number of SEQRES characters but some position remain invalid. Thats where the ATOMSEQ container comes in. It only contains matching residues to the SEQRES but marks non-resolved residues with ‘-‘.

ExtractValidPositions(entry_name, chain_name, indexer, atomseq_container, position_container, seq, positions)

Iterates over all data for a chain specified by entry_name and chain_name. For every data point marked as valid in the atomseq_container (character at that position is not ‘-‘), the character and the corresponding position are added to seq and positions

Parameters:
  • entry_name (str) – Name of assembly you want the data from
  • chain_name (str) – Name of chain you want the data from
  • indexer (LinearIndexer) – Used to access atomseq_container and position_container
  • atomseq_container (LinearCharacterContainer) – Container that marks locations with invalid position data with ‘-‘
  • position_container (LinearPositionContainer) – Container containing position data
  • seq (ost.seq.SequenceHandle) – Sequence with extracted valid positions gets stored in here.
  • positions (ost.geom.Vec3List) – The extracted valid positions get stored in here
Raises:

ost.Error if requested data is not present

ExtractTemplateData(entry_name, chain_name, aln, indexer, seqres_container, atomseq_container, position_container)

Let’s say we have a target-template alignment in aln (first seq: target, second seq: template). This function extracts all valid template positions given the entry specified by entry_name and chain_name. The template sequence in aln must match the sequence in seqres_container. Again, the atomseq_container is used to identify valid positions. The according residue numbers relative to the target sequence in aln are also returned.

Parameters:
  • entry_name (str) – Name of assembly you want the data from
  • chain_name (str) – Name of chain you want the data from
  • aln – Target-template sequence alignment
  • indexer (LinearIndexer) – Used to access atomseq_container, seqres_container and position_container
  • seqres_container (LinearCharacterContainer) – Container containing the full sequence data
  • atomseq_container (LinearCharacterContainer) – Container that marks locations with invalid position data with ‘-‘
  • position_container (LinearPositionContainer) – Container containing position data
Returns:

First element: list of residue numbers that relate each entry in the second element to the target sequence specified in aln. The numbering scheme starts from one. Second Element: geom.Vec3List with the according positions.

Return type:

tuple

Raises:

ost.Error if requested data is not present in the container or if the template sequence in aln doesn’t match with the sequence in seqres_container

Contents

Search

Enter search terms or a module, class or function name.

Previous topic

seq.alg – Algorithms for Sequences

Next topic

bindings – Interfacing external programs

You are here