You are reading the documentation for version 1.11 of OpenStructure. You may also want to read the documentation for: 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.7.1 1.8 1.9 1.10 2.0 2.1 devel

# Linear Database¶

Many applications require to load lots of structures. Expecially on distributed file systems, io becomes a problem. OST provides a linear database to dump position data, e.g. CA positions, or character data, e.g. sequences, to allow fast retrieval. The actual data container behave like linear data arrays and the idea is to use an indexer to keep track of where to find data for a certain entry.

class LinearIndexer

The idea of the LinearIndexer is to keep track of locations of data assuming a linear memory layout. The level of entries in the indexer are assemblies that can contain an arbitrary number of chains with varying length. Whenever a new assembly is added, a range enclosing all residues of that assembly is defined that is subsequent to the range of the previously added assembly. It is then not only possible to access the range of the full assembly, but also the range of single chains. Whenever an assembly with n residues is deleted, the ranges of all assemblies that have been added later on are reduced by n.

Load(filename)

Parameters: filename (str) – Path to file to be loaded The loaded indexer LinearIndexer ost.Error if filename cannot be opened
Save(filename)

Saves indexer to file

Parameters: filename (str) – Path to file where the indexer is stored ost.Error if filename cannot be created
AddAssembly(name, chain_names, chain_lenths)

Adds a new assembly to the indexer. The range assigned to that assembly is subsequent to the previously added assembly.

Parameters: name (str) – Name of the added assembly chain_names (list of str) – Names of all chains of the added assembly chain_lengths (list of int) – The according lengths of the chains ost.Error if lengths of chain_names and chain_lengths is inconsistent
RemoveAssembly(name)

Removes an assembly from the indexer. Assuming that assembly contains a total of n residues, all ranges of the subsequent assemblies are reduced by n.

Parameters: name – Name of the assembly to be removed ost.Error if name is not present
GetAssemblies()
Returns: The names of all added assemblies list of str ost.Error if name is not present
GetChainNames(name)
Parameters: name (str) – Name of assembly from which you want the chain names The chain names of the specified assembly list of str ost.Error if name is not present
GetChainLengths(name)
Parameters: name (str) – Name of assembly from which you want the chain lengths The chain lengths of the specified assembly list of int ost.Error if name is not present
GetDataRange(name)

Get the range for a full assembly

Parameters: name (str) – Name of the assembly from which you want the range Two values defining the range as [from, to[ tuple of int ost.Error if name is not present
GetDataRange(name, chain_name)

Get the range for a chain of an assembly

Parameters: name (str) – Name of the assembly from which you want the range chain_name (str) – Name of the chain from which you want the range Two values defining the range as [from, to[ tuple of int ost.Error if name is not present or the according assembly has no chain with specified chain name
GetNumResidues()
Returns: The total number of residues in all added assemblies int
class LinearCharacterContainer

The LinearCharacterContainer stores characters in a linear memory layout that can represent sequences such as SEQRES or ATOMSEQ. It can be accessed using range parameters and the idea is to keep it in sync with a LinearIndexer.

Load(filename)

Parameters: filename (str) – Path to file to be loaded The loaded container LinearCharacterContainer ost.Error if filename cannot be opened
Save(filename)

Saves container to file

Parameters: filename (str) – Path to file where the container is stored ost.Error if filename cannot be created
AddCharacters(characters)

Adds characters at the end of the internal data. Call this function with appropriate data whenever you add an assembly to the associated LinearIndexer

Parameters: characters (str) – Characters to be added
ClearRange(range)

Removes all characters specified by range in form [from, to [ from the internal data. The internal data layout is linear, all characters starting from to are shifted to the location defined by from. Call this function with appropriate range whenever you remove an assembly from the associated LinearIndexer

Parameters: range (tuple of int) – Range to be deleted in form [from, to[ ost.Error if range does not specify a valid range
GetCharacter(idx)
Returns: The character at the specified location str ost.Error if idx does not specify a valid position
GetCharacters(range)
Returns: The characters from the specified range str ost.Error if range does not specify a valid range
GetNumElements()
Returns: The number of stored characters int
class LinearPositionContainer

The LinearPositionContainer stores positions in a linear memory layout. It can be accessed using range parameters and the idea is to keep it in sync with a LinearIndexer. In order to save some memory, a lossy compression is applied that results in a limited accuracy of two digits. if the absolute value of your added position is very large (> ~10000), the accuracy is further lowered to one digit. This is all handled internally.

Load(filename)

Parameters: filename (str) – Path to file to be loaded The loaded container LinearPositionContainer ost.Error if filename cannot be opened
Save(filename)

Saves container to file

Parameters: filename (str) – Path to file where the container is stored ost.Error if filename cannot be created
AddPositions(positions)

Adds positions at the end of the internal data. Call this function with appropriate data whenever you add an assembly to the associated LinearIndexer

Parameters: positions (ost.geom.Vec3List) – Positions to be added
ClearRange(range)

Removes all positions specified by range in form [from, to [ from the internal data. The internal data layout is linear, all positions starting from to are shifted to the location defined by from. Call this function with appropriate range whenever you remove an assembly from the associated LinearIndexer

Parameters: range (tuple of int) – Range to be deleted in form [from, to[ ost.Error if range does not specify a valid range
GetPosition(idx, pos)

Extracts a position at specified location. For efficiency reasons, the function requires the position to be passed as reference.

Parameters: idx (int) – Specifies location pos (ost.geom.Vec3) – Will be altered to the desired position ost.Error if idx does not specify a valid position
GetPositions(range, positions)

Extracts positions at specified range. For efficiency reasons, the function requires the positions to be passed as reference.

Parameters: range (tuple of int) – Range in form [from,to[ that defines positions to be extracted positions (ost.geom.Vec3List) – Will be altered to the desired positions ost.Error if range does not specify a valid range
GetNumElements()
Returns: The number of stored positions int

## Data Extraction¶

Openstructure provides data extraction functionality for the following scenario: There are three binary container. A position container to hold CA-positions (LinearPositionContainer), a SEQRES container and an ATOMSEQ container (both: LinearCharacterContainer). They contain entries from the protein structure database and sequence/position data is relative to the SEQRES of those entries. This means, if the SEQRES has more characters as there are resolved residues in the structure, the entry in the position container still contains the exact number of SEQRES characters but some position remain invalid. Thats where the ATOMSEQ container comes in. It only contains matching residues to the SEQRES but marks non-resolved residues with ‘-‘.

ExtractValidPositions(entry_name, chain_name, indexer, atomseq_container, position_container, seq, positions)

Iterates over all data for a chain specified by entry_name and chain_name. For every data point marked as valid in the atomseq_container (character at that position is not ‘-‘), the character and the corresponding position are added to seq and positions

Parameters: entry_name (str) – Name of assembly you want the data from chain_name (str) – Name of chain you want the data from indexer (LinearIndexer) – Used to access atomseq_container and position_container atomseq_container (LinearCharacterContainer) – Container that marks locations with invalid position data with ‘-‘ position_container (LinearPositionContainer) – Container containing position data seq (ost.seq.SequenceHandle) – Sequence with extracted valid positions gets stored in here. positions (ost.geom.Vec3List) – The extracted valid positions get stored in here ost.Error if requested data is not present
ExtractTemplateData(entry_name, chain_name, aln, indexer, seqres_container, atomseq_container, position_container)

Let’s say we have a target-template alignment in aln (first seq: target, second seq: template). This function extracts all valid template positions given the entry specified by entry_name and chain_name. The template sequence in aln must match the sequence in seqres_container. Again, the atomseq_container is used to identify valid positions. The according residue numbers relative to the target sequence in aln are also returned.

Parameters: entry_name (str) – Name of assembly you want the data from chain_name (str) – Name of chain you want the data from aln – Target-template sequence alignment indexer (LinearIndexer) – Used to access atomseq_container, seqres_container and position_container seqres_container (LinearCharacterContainer) – Container containing the full sequence data atomseq_container (LinearCharacterContainer) – Container that marks locations with invalid position data with ‘-‘ position_container (LinearPositionContainer) – Container containing position data First element: list of residue numbers that relate each entry in the second element to the target sequence specified in aln. The numbering scheme starts from one. Second Element: geom.Vec3List with the according positions. tuple ost.Error if requested data is not present in the container or if the template sequence in aln doesn’t match with the sequence in seqres_container

## Search

Enter search terms or a module, class or function name.

## Previous topic

seq.alg – Algorithms for Sequences

## Next topic

bindings – Interfacing external programs