OpenStructure
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Groups Pages
Public Member Functions | Static Public Member Functions | Data Fields | Static Public Attributes
Table Class Reference

Public Member Functions

def __init__
def __getattr__
def SetName
def GetName
def RenameCol
def GetColIndex
def GetColNames
def SearchColNames
def HasCol
def __getitem__
def __setitem__
def ToString
def __str__
def Stats
def PairedTTest
def AddRow
def RemoveCol
def AddCol
def Filter
def Select
def Sort
def GetUnique
def Zip
def Plot
def PlotHistogram
def PlotBar
def PlotHexbin
def MaxRow
def Max
def MaxIdx
def Min
def MinRow
def MinIdx
def Sum
def Mean
def RowMean
def Percentiles
def Median
def StdDev
def Count
def Correl
def SpearmanCorrel
def Save
def GetNumpyMatrix
def GaussianSmooth
def GetOptimalPrefactors
def PlotEnrichment
def ComputeEnrichment
def ComputeEnrichmentAUC
def ComputeROC
def ComputeROCAUC
def ComputeLogROCAUC
def PlotROC
def PlotLogROC
def ComputeMCC
def IsEmpty
def Extend

Static Public Member Functions

def Load

Data Fields

 col_names
 comment
 name
 col_types
 rows

Static Public Attributes

tuple SUPPORTED_TYPES = ('int', 'float', 'bool', 'string',)

Detailed Description

The table class provides convenient access to data in tabular form. An empty 
table can be easily constructed as follows

.. code-block:: python

  tab = Table()
  
If you want to add columns directly when creating the table, column names
and *column types* can be specified as follows

.. code-block:: python

  tab = Table(['nameX','nameY','nameZ'], 'sfb')
  
this will create three columns called nameX, nameY and nameZ of type string,
float and bool, respectively. There will be no data in the table and thus,
the table will not contain any rows.

The following *column types* are supported:

======= ========
name     abbrev
======= ========
string     s
float      f
int        i
bool       b
======= ========

If you want to add data to the table in addition, use the following:

.. code-block:: python

  tab=Table(['nameX','nameY','nameZ'],
            'sfb',
            nameX = ['a','b','c'],
            nameY = [0.1, 1.2, 3.414],
            nameZ = [True, False, False])
            
if values for one column is left out, they will be filled with NA, but if
values are specified, all values must be specified (i.e. same number of
values per column)

Definition at line 170 of file table.py.


Constructor & Destructor Documentation

def __init__ (   self,
  col_names = [],
  col_types = None,
  kwargs 
)

Definition at line 221 of file table.py.


Member Function Documentation

def __getattr__ (   self,
  col_name 
)

Definition at line 237 of file table.py.

def __getitem__ (   self,
  k 
)

Definition at line 407 of file table.py.

def __setitem__ (   self,
  k,
  value 
)

Definition at line 413 of file table.py.

def __str__ (   self)

Definition at line 479 of file table.py.

def AddCol (   self,
  col_name,
  col_type,
  data = None 
)
Add a column to the right of the table.

:param col_name: name of new column
:type col_name: :class:`str`

:param col_type: type of new column (long versions: *int*, *float*, *bool*,
             *string* or short versions: *i*, *f*, *b*, *s*)
:type col_type: :class:`str`

:param data: data to add to new column
:type data: scalar or iterable

**Example:**

.. code-block:: python

  tab = Table(['x'], 'f', x=range(5))
  tab.AddCol('even', 'bool', itertools.cycle([True, False]))
  print tab

  '''
  will produce the table

  ====  ====
  x     even
  ====  ====
0   True
1   False
2   True
3   False
4   True
  ====  ====
  '''

If data is a constant instead of an iterable object, it's value
will be written into each row:

.. code-block:: python

  tab = Table(['x'], 'f', x=range(5))
  tab.AddCol('num', 'i', 1)
  print tab

  '''
  will produce the table

  ====  ====
  x     num
  ====  ====
0   1
1   1
2   1
3   1
4   1
  ====  ====
  '''

As a special case, if there are no previous rows, and data is not 
None, rows are added for every item in data.

Definition at line 699 of file table.py.

def AddRow (   self,
  data,
  overwrite = None 
)
Add a row to the table.

*data* may either be a dictionary or a list-like object:

 - If *data* is a dictionary, the keys in the dictionary must match the
   column names. Columns not found in the dict will be initialized to None.
   If the dict contains list-like objects, multiple rows will be added, if
   the number of items in all list-like objects is the same, otherwise a
   :class:`ValueError` is raised.

 - If *data* is a list-like object, the row is initialized from the values
   in *data*. The number of items in *data* must match the number of
   columns in the table. A :class:`ValuerError` is raised otherwise. The
   values are added in the order specified in the list, thus, the order of
   the data must match the columns.
  
If *overwrite* is not None and set to an existing column name, the specified 
column in the table is searched for the first occurrence of a value matching
the value of the column with the same name in the dictionary. If a matching
value is found, the row is overwritten with the dictionary. If no matching
row is found, a new row is appended to the table.

:param data: data to add
:type data: :class:`dict` or *list-like* object

:param overwrite: column name to overwrite existing row if value in
              column *overwrite* matches
:type overwrite: :class:`str`

:raises: :class:`ValueError` if *list-like* object is used and number of
     items does *not* match number of columns in table.

:raises: :class:`ValueError` if *dict* is used and multiple rows are added
     but the number of data items is different for different columns.

**Example:** add multiple data rows to a subset of columns using a dictionary

.. code-block:: python

  # create table with three float columns
  tab = Table(['x','y','z'], 'fff')

  # add rows from dict
  data = {'x': [1.2, 1.6], 'z': [1.6, 5.3]}
  tab.AddRow(data)
  print tab

  '''
  will produce the table

  ====  ====  ====
  x     y     z
  ====  ====  ====
  1.20  NA    1.60
  1.60  NA    5.30
  ====  ====  ====
  '''

  # overwrite the row with x=1.2 and add row with x=1.9
  data = {'x': [1.2, 1.9], 'z': [7.9, 3.5]}
  tab.AddRow(data, overwrite='x')
  print tab

  '''
  will produce the table

  ====  ====  ====
  x     y     z
  ====  ====  ====
  1.20  NA    7.90
  1.60  NA    5.30
  1.90  NA    3.50
  ====  ====  ====
  '''

Definition at line 587 of file table.py.

def ComputeEnrichment (   self,
  score_col,
  class_col,
  score_dir = '-',
  class_dir = '-',
  class_cutoff = 2.0 
)
Computes the enrichment of column *score_col* classified according to
*class_col*.

For this it is necessary, that the datapoints are classified into positive
and negative points. This can be done in two ways:

 - by using one 'bool' type column (*class_col*) which contains *True* for
   positives and *False* for negatives
   
 - by specifying a classification column (*class_col*), a cutoff value
   (*class_cutoff*) and the classification columns direction (*class_dir*).
   This will generate the classification on the fly

   * if ``class_dir=='-'``: values in the classification column that are less than or equal to class_cutoff will be counted as positives
   * if ``class_dir=='+'``: values in the classification column that are larger than or equal to class_cutoff will be counted as positives

During the calculation, the table will be sorted according to *score_dir*,
where a '-' values means smallest values first and therefore, the smaller
the value, the better.

:warning: If either the value of *class_col* or *score_col* is *None*, the
      data in this row is ignored.

Definition at line 2513 of file table.py.

def ComputeEnrichmentAUC (   self,
  score_col,
  class_col,
  score_dir = '-',
  class_dir = '-',
  class_cutoff = 2.0 
)
Computes the area under the curve of the enrichment using the trapezoidal
rule.

For more information about parameters of the enrichment, see
:meth:`ComputeEnrichment`.

:warning: The function depends on *numpy*

Definition at line 2593 of file table.py.

def ComputeLogROCAUC (   self,
  score_col,
  class_col,
  score_dir = '-',
  class_dir = '-',
  class_cutoff = 2.0 
)
Computes the area under the curve of the log receiver operating 
characteristics (logROC) where the x-axis is semilogarithmic
using the trapezoidal rule.

The logROC is computed with a lambda of 0.001 according to 
Rapid Context-Dependent Ligand Desolvation in Molecular Docking
Mysinger M. and Shoichet B., Journal of Chemical Information and Modeling
2010 50 (9), 1561-1573

For more information about parameters of the ROC, see
:meth:`ComputeROC`.

:warning: The function depends on *numpy*

Definition at line 2728 of file table.py.

def ComputeMCC (   self,
  score_col,
  class_col,
  score_dir = '-',
  class_dir = '-',
  score_cutoff = 2.0,
  class_cutoff = 2.0 
)
Compute Matthews correlation coefficient (MCC) for one column (*score_col*)
with the points classified into true positives, false positives, true
negatives and false negatives according to a specified classification
column (*class_col*).

The datapoints in *score_col* and *class_col* are classified into
positive and negative points. This can be done in two ways:

 - by using 'bool' columns which contains True for positives and False
   for negatives
   
 - by using 'float' or 'int' columns and specifying a cutoff value and the
   columns direction. This will generate the classification on the fly
   
   * if ``class_dir``/``score_dir=='-'``: values in the classification column that are less than or equal to *class_cutoff*/*score_cutoff* will be counted as positives
   * if ``class_dir``/``score_dir=='+'``: values in the classification column that are larger than or equal to *class_cutoff*/*score_cutoff* will be counted as positives
                            
The two possibilities can be used together, i.e. 'bool' type for one column
and 'float'/'int' type and cutoff/direction for the other column.

Definition at line 2892 of file table.py.

def ComputeROC (   self,
  score_col,
  class_col,
  score_dir = '-',
  class_dir = '-',
  class_cutoff = 2.0 
)
Computes the receiver operating characteristics (ROC) of column *score_col*
classified according to *class_col*.

For this it is necessary, that the datapoints are classified into positive
and negative points. This can be done in two ways:

 - by using one 'bool' column (*class_col*) which contains True for positives
   and False for negatives
 - by using a non-bool column (*class_col*), a cutoff value (*class_cutoff*)
   and the classification columns direction (*class_dir*). This will generate
   the classification on the fly

   - if ``class_dir=='-'``: values in the classification column that are less than or equal to *class_cutoff* will be counted as positives
   - if ``class_dir=='+'``: values in the classification column that are larger than or equal to *class_cutoff* will be counted as positives

During the calculation, the table will be sorted according to *score_dir*,
where a '-' values means smallest values first and therefore, the smaller
the value, the better.

If *class_col* does not contain any positives (i.e. value is True (if column
is of type bool) or evaluated to True (if column is of type int or float
(depending on *class_dir* and *class_cutoff*))) the ROC is not defined and
the function will return *None*.

:warning: If either the value of *class_col* or *score_col* is *None*, the
      data in this row is ignored.

Definition at line 2617 of file table.py.

def ComputeROCAUC (   self,
  score_col,
  class_col,
  score_dir = '-',
  class_dir = '-',
  class_cutoff = 2.0 
)
Computes the area under the curve of the receiver operating characteristics
using the trapezoidal rule.

For more information about parameters of the ROC, see
:meth:`ComputeROC`.

:warning: The function depends on *numpy*

Definition at line 2704 of file table.py.

def Correl (   self,
  col1,
  col2 
)
Calculate the Pearson correlation coefficient between *col1* and *col2*, only
taking rows into account where both of the values are not equal to *None*.
If there are not enough data points to calculate a correlation coefficient,
*None* is returned.

:param col1: column name for first column
:type col1: :class:`str`

:param col2: column name for second column
:type col2: :class:`str`

Definition at line 2091 of file table.py.

def Count (   self,
  col,
  ignore_nan = True 
)
Count the number of cells in column that are not equal to ''None''.

:param col: column name
:type col: :class:`str`

:param ignore_nan: ignore all *None* values
:type ignore_nan: :class:`bool`

Definition at line 2071 of file table.py.

def Extend (   self,
  tab,
  overwrite = None 
)
Append each row of *tab* to the current table. The data is appended based
on the column names, thus the order of the table columns is *not* relevant,
only the header names.

If there is a column in *tab* that is not present in the current table,
it is added to the current table and filled with *None* for all the rows
present in the current table.

If the type of any column in *tab* is not the same as in the current table
a *TypeError* is raised.

If *overwrite* is not None and set to an existing column name, the specified 
column in the table is searched for the first occurrence of a value matching
the value of the column with the same name in the dictionary. If a matching
value is found, the row is overwritten with the dictionary. If no matching
row is found, a new row is appended to the table.

Definition at line 3004 of file table.py.

def Filter (   self,
  args,
  kwargs 
)
Returns a filtered table only containing rows matching all the predicates 
in kwargs and args For example,

.. code-block:: python

  tab.Filter(town='Basel')

will return all the rows where the value of the column "town" is equal to 
"Basel". Several predicates may be combined, i.e.

.. code-block:: python

  tab.Filter(town='Basel', male=True)
  
will return the rows with "town" equal to "Basel" and "male" equal to true.
args are unary callables returning true if the row should be included in the
result and false if not.

Definition at line 789 of file table.py.

def GaussianSmooth (   self,
  col,
  std = 1.0,
  na_value = 0.0,
  padding = 'reflect',
  c = 0.0 
)
In place Gaussian smooth of a column in the table with a given standard deviation.
All nan are set to nan_value before smoothing.

:param col: column name
:type col: :class:`str`

:param std: standard deviation for gaussian kernel
:type std: `scalar` 

:param na_value: all na (None) values of the speciefied column are set to na_value before smoothing
:type na_value: `scalar`

:param padding: allows to handle padding behaviour see scipy ndimage.gaussian_filter1d documentation for more information. standard is reflect
:type padding: :class:`str`

:param c: constant value used for padding if padding mode is constant
:type c: `scalar`



:warning: The function depends on *scipy*

Definition at line 2332 of file table.py.

def GetColIndex (   self,
  col 
)
Returns the column index for the column with the given name.

:raises: ValueError if no column with the name is found.

Definition at line 369 of file table.py.

def GetColNames (   self)
Returns a list containing all column names.

Definition at line 379 of file table.py.

def GetName (   self)
Get name of table

Definition at line 323 of file table.py.

def GetNumpyMatrix (   self,
  args 
)
Returns a numpy matrix containing the selected columns from the table as 
columns in the matrix.

Only columns of type *int* or *float* are supported. *NA* values in the
table will be converted to *None* values.

:param \*args: column names to include in numpy matrix

:warning: The function depends on *numpy*

Definition at line 2298 of file table.py.

def GetOptimalPrefactors (   self,
  ref_col,
  args,
  kwargs 
)
This returns the optimal prefactor values (i.e. a, b, c, ...) for the
following equation

.. math::
  :label: op1
  
  a*u + b*v + c*w + ... = z

where u, v, w and z are vectors. In matrix notation

.. math::
  :label: op2
  
  A*p = z

where A contains the data from the table (u,v,w,...), p are the prefactors 
to optimize (a,b,c,...) and z is the vector containing the result of
equation :eq:`op1`.

The parameter ref_col equals to z in both equations, and \*args are columns
u, v and w (or A in :eq:`op2`). All columns must be specified by their names.

**Example:**

.. code-block:: python

  tab.GetOptimalPrefactors('colC', 'colA', 'colB')

The function returns a list of containing the prefactors a, b, c, ... in 
the correct order (i.e. same as columns were specified in \*args).

Weighting:
If the kwarg weights="columX" is specified, the equations are weighted by
the values in that column. Each row is multiplied by the weight in that row,
which leads to :eq:`op3`:

.. math::
  :label: op3
  
  weight*a*u + weight*b*v + weight*c*w + ... = weight*z

Weights must be float or int and can have any value. A value of 0 ignores
this equation, a value of 1 means the same as no weight. If all weights are
the same for each row, the same result will be obtained as with no weights.

**Example:**

.. code-block:: python

  tab.GetOptimalPrefactors('colC', 'colA', 'colB', weights='colD')

Definition at line 2388 of file table.py.

def GetUnique (   self,
  col,
  ignore_nan = True 
)
Extract a list of all unique values from one column.

:param col: column name
:type col: :class:`str`

:param ignore_nan: ignore all *None* values
:type ignore_nan: :class:`bool`

Definition at line 1046 of file table.py.

def HasCol (   self,
  col 
)
Checks if the column with a given name is present in the table.

Definition at line 401 of file table.py.

def IsEmpty (   self,
  col_name = None,
  ignore_nan = True 
)
Checks if a table is empty.

If no column name is specified, the whole table is checked for being empty,
whereas if a column name is specified, only this column is checked.

By default, all NAN (or None) values are ignored, and thus, a table
containing only NAN values is considered as empty. By specifying the 
option ignore_nan=False, NAN values are counted as 'normal' values.

Definition at line 2967 of file table.py.

def Load (   stream_or_filename,
  format = 'auto',
  sep = ' 
)
static
Load table from stream or file with given name.

By default, the file format is set to *auto*, which tries to guess the file
format from the file extension. The following file extensions are
recognized:

============    ======================
extension       recognized format
============    ======================
.csv            comma separated values
.pickle         pickled byte stream
<all others>    ost-specific format
============    ======================

Thus, *format* must be specified for reading file with different filename
extensions.

The following file formats are understood:

- ost

  This is an ost-specific, but still human readable file format. The file
  (stream) must start with header line of the form

col_name1[type1] <col_name2[type2]>...

  The types given in brackets must be one of the data types the
  :class:`Table` class understands. Each following line in the file then must
  contains exactly the same number of data items as listed in the header. The
  data items are automatically converted to the column format. Lines starting
  with a '#' and empty lines are ignored.

- pickle

  Deserializes the table from a pickled byte stream.

- csv

  Reads the table from comma separated values stream. Since there is no
  explicit type information in the csv file, the column types are guessed,
  using the following simple rules:

  * if all values are either NA/NULL/NONE the type is set to string.
  * if all non-null values are convertible to float/int the type is set to
float/int.
  * if all non-null values are true/false/yes/no, the value is set to bool.
  * for all other cases, the column type is set to string.

:returns: A new :class:`Table` instance

Definition at line 964 of file table.py.

def Max (   self,
  col 
)
Returns the maximum value in col. If several rows have the highest value,
only the first one is returned. ''None'' values are ignored.

:param col: column name
:type col: :class:`str`

Definition at line 1780 of file table.py.

def MaxIdx (   self,
  col 
)
Returns the row index of the cell with the maximal value in col. If
several rows have the highest value, only the first one is returned.
''None'' values are ignored.

:param col: column name
:type col: :class:`str`

Definition at line 1791 of file table.py.

def MaxRow (   self,
  col 
)
Returns the row containing the cell with the maximal value in col. If 
several rows have the highest value, only the first one is returned.
''None'' values are ignored.

:param col: column name
:type col: :class:`str`

:returns: row with maximal col value or None if the table is empty

Definition at line 1765 of file table.py.

def Mean (   self,
  col 
)
Returns the mean of the given column. Cells with ''None'' are ignored. Returns 
None, if the column doesn't contain any elements. Col must be of numeric
('float', 'int') or boolean column type.

If column type is *bool*, the function returns the ratio of
number of 'Trues' by total number of elements.

:param col: column name
:type col: :class:`str`

:raises: :class:`TypeError` if column type is ``string``

Definition at line 1880 of file table.py.

def Median (   self,
  col 
)
Returns the median of the given column. Cells with ''None'' are ignored. Returns 
''None'', if the column doesn't contain any elements. Col must be of numeric
column type ('float', 'int') or boolean column type.

:param col: column name
:type col: :class:`str`

:raises: :class:`TypeError` if column type is ``string``

Definition at line 2020 of file table.py.

def Min (   self,
  col 
)
Returns the minimal value in col. If several rows have the lowest value,
only the first one is returned. ''None'' values are ignored.

:param col: column name
:type col: :class:`str`

Definition at line 1821 of file table.py.

def MinIdx (   self,
  col 
)
Returns the row index of the cell with the minimal value in col. If
several rows have the lowest value, only the first one is returned.
''None'' values are ignored.

:param col: column name
:type col: :class:`str`

Definition at line 1847 of file table.py.

def MinRow (   self,
  col 
)
Returns the row containing the cell with the minimal value in col. If 
several rows have the lowest value, only the first one is returned.
''None'' values are ignored.

:param col: column name
:type col: :class:`str`

:returns: row with minimal col value or None if the table is empty

Definition at line 1832 of file table.py.

def PairedTTest (   self,
  col_a,
  col_b 
)
Two-sided test for the null-hypothesis that two related samples 
have the same average (expected values).

:param col_a: First column
:param col_b: Second column

:returns: P-value between 0 and 1 that the two columns have the 
   same average. The smaller the value, the less related the two
   columns are.

Definition at line 565 of file table.py.

def Percentiles (   self,
  col,
  nths 
)
Returns the percentiles of column *col* given in *nths*.

The percentiles are calculated as 

.. code-block:: python

  values[min(len(values), int(round(len(values)*p/100+0.5)-1))]

where values are the sorted values of *col* not equal to ''None''
:param: nths: list of percentiles to be calculated. Each percentile is a number
between 0 and 100.

:raises: :class:`TypeError` if column type is ``string``
:returns: List of percentiles in the same order as given in *nths*

Definition at line 1981 of file table.py.

def Plot (   self,
  x,
  y = None,
  z = None,
  style = '.',
  x_title = None,
  y_title = None,
  z_title = None,
  x_range = None,
  y_range = None,
  z_range = None,
  color = None,
  plot_if = None,
  legend = None,
  num_z_levels = 10,
  z_contour = True,
  z_interpol = 'nn',
  diag_line = False,
  labels = None,
  max_num_labels = None,
  title = None,
  clear = True,
  save = False,
  kwargs 
)
Function to plot values from your table in 1, 2 or 3 dimensions using
`Matplotlib <http://matplotlib.sourceforge.net>`__

:param x: column name for first dimension
:type x: :class:`str`

:param y: column name for second dimension
:type y: :class:`str`

:param z: column name for third dimension
:type z: :class:`str`

:param style: symbol style (e.g. *.*, *-*, *x*, *o*, *+*, *\**). For a
          complete list check (`matplotlib docu <http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot>`__).
:type style: :class:`str`

:param x_title: title for first dimension, if not specified it is
            automatically derived from column name
:type x_title: :class:`str`

:param y_title: title for second dimension, if not specified it is
            automatically derived from column name
:type y_title: :class:`str`

:param z_title: title for third dimension, if not specified it is
            automatically derived from column name
:type z_title: :class:`str`

:param x_range: start and end value for first dimension (e.g. [start_x, end_x])
:type x_range: :class:`list` of length two

:param y_range: start and end value for second dimension (e.g. [start_y, end_y])
:type y_range: :class:`list` of length two

:param z_range: start and end value for third dimension (e.g. [start_z, end_z])
:type z_range: :class:`list` of length two

:param color: color for data (e.g. *b*, *g*, *r*). For a complete list check
          (`matplotlib docu <http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot>`__).
:type color: :class:`str`

:param plot_if: callable which returnes *True* if row should be plotted. Is
            invoked like ``plot_if(self, row)``
:type plot_if: callable

:param legend: legend label for data series
:type legend: :class:`str`

:param num_z_levels: number of levels for third dimension
:type num_z_levels: :class:`int`

:param diag_line: draw diagonal line
:type diag_line: :class:`bool`

:param labels: column name containing labels to put on x-axis for one
           dimensional plot
:type labels: :class:`str`

:param max_num_labels: limit maximum number of labels
:type max_num_labels: :class:`int`

:param title: plot title, if not specified it is automatically derived from
          plotted column names
:type title: :class:`str`

:param clear: clear old data from plot
:type clear: :class:`bool`

:param save: filename for saving plot
:type save: :class:`str`

:param z_contour: draw contour lines
:type z_contour: :class:`bool`

:param z_interpol: interpolation method for 3-dimensional plot (one of 'nn',
               'linear')
:type z_interpol: :class:`str`

:param \*\*kwargs: additional arguments passed to matplotlib

:returns: the ``matplotlib.pyplot`` module 

**Examples:** simple plotting functions

.. code-block:: python

  tab = Table(['a','b','c','d'],'iffi', a=range(5,0,-1),
                                    b=[x/2.0 for x in range(1,6)],
                                    c=[math.cos(x) for x in range(0,5)],
                                    d=range(3,8))

  # one dimensional plot of column 'd' vs. index
  plt = tab.Plot('d')
  plt.show()

  # two dimensional plot of 'a' vs. 'c'
  plt = tab.Plot('a', y='c', style='o-')
  plt.show()

  # three dimensional plot of 'a' vs. 'c' with values 'b'
  plt = tab.Plot('a', y='c', z='b')
  # manually save plot to file
  plt.savefig("plot.png")

Definition at line 1092 of file table.py.

def PlotBar (   self,
  cols = None,
  rows = None,
  xlabels = None,
  set_xlabels = True,
  xlabels_rotation = 'horizontal',
  y_title = None,
  title = None,
  colors = None,
  width = 0.8,
  bottom = 0,
  legend = False,
  legend_names = None,
  show = False,
  save = False 
)
Create a barplot of the data in cols. Every column will be represented
at one position. If there are several rows, each column will be grouped 
together.

:param cols: List of column names. Every column will be represented as a 
         single bar. If cols is None, every column of the table gets 
         plotted.
:type cols: :class:`list`

:param rows: List of row indices. Values from given rows will be plotted 
         in parallel at one column position. If set to None, all rows 
         of the table will be plotted. Note, that the maximum number 
         of rows is 7.
:type rows: :class:`list`

:param xlabels: Label for every col on x-axis. If set to None, the column 
            names are used. The xlabel plotting can be supressed by 
            the parameter set_xlabel.
:type xlabels: :class:`list`

:param set_xlabels: Controls whether xlabels are plotted or not.
:type set_xlabels: :class:`bool`

:param x_labels_rotation: Can either be 'horizontal', 'vertical' or an 
                      integer, that describes the rotation in degrees.

:param y_title: Y-axis description
:type y_title: :class:`str`

:title: Title of the plot. No title appears if set to None
:type title: :class:`str`

:param colors: Colors of the different bars in each group. Must be a list 
           of valid colors in matplotlib. Length of color and rows must 
           be consistent.
:type colors: :class:`list`

:param width: The available space for the groups on the x-axis is divided 
          by the exact number of groups. The parameters width is the 
          fraction of what is actually used. If it would be 1.0 the 
          bars of the different groups would touch each other.
          Value must be between [0;1]
:type width: :class:`float`

:param bottom: Bottom
:type bottom: :class:`float`

:param legend: Legend for color explanation, the corresponding row 
           respectively. If set to True, legend_names must be provided.
:type legend: :class:`bool`

:param legend_names: List of names, that describe the differently colored 
                 bars. Length must be consistent with number of rows.

:param show: If set to True, the plot is directly displayed.

:param save: If set, a png image with name save in the current working 
         directory will be saved.
:type save: :class:`str`

Definition at line 1474 of file table.py.

def PlotEnrichment (   self,
  score_col,
  class_col,
  score_dir = '-',
  class_dir = '-',
  class_cutoff = 2.0,
  style = '-',
  title = None,
  x_title = None,
  y_title = None,
  clear = True,
  save = None 
)
Plot an enrichment curve using matplotlib of column *score_col* classified
according to *class_col*.

For more information about parameters of the enrichment, see
:meth:`ComputeEnrichment`, and for plotting see :meth:`Plot`.

:warning: The function depends on *matplotlib*

Definition at line 2470 of file table.py.

def PlotHexbin (   self,
  x,
  y,
  title = None,
  x_title = None,
  y_title = None,
  x_range = None,
  y_range = None,
  binning = 'log',
  colormap = 'jet',
  show_scalebar = False,
  scalebar_label = None,
  clear = True,
  save = False,
  show = False 
)
Create a heatplot of the data in col x vs the data in col y using matplotlib

:param x: column name with x data
:type x: :class:`str`

:param y: column name with y data
:type y: :class:`str`

:param title: title of the plot, will be generated automatically if set to None
:type title: :class:`str`

:param x_title: label of x-axis, will be generated automatically if set to None
:type title: :class:`str`

:param y_title: label of y-axis, will be generated automatically if set to None
:type title: :class:`str`

:param x_range: start and end value for first dimension (e.g. [start_x, end_x])
:type x_range: :class:`list` of length two

:param y_range: start and end value for second dimension (e.g. [start_y, end_y])
:type y_range: :class:`list` of length two

:param binning: type of binning. If set to None, the value of a hexbin will
            correspond to the number of datapoints falling into it. If
            set to 'log', the value will be the log with base 10 of the above
            value (log(i+1)). If an integer is provided, the number of a 
            hexbin is equal the number of datapoints falling into it divided 
            by the integer. If a list of values is provided, these values
            will be the lower bounds of the bins.

:param colormap: colormap, that will be used. Value can be every colormap defined
             in matplotlib or an own defined colormap. You can either pass a
             string with the name of the matplotlib colormap or a colormap
             object.

:param show_scalebar: If set to True, a scalebar according to the chosen colormap is shown
:type show_scalebar: :class:`bool`

:param scalebar_label: Label of the scalebar
:type scalebar_label: :class:`str`

:param clear: clear old data from plot
:type clear: :class:`bool`

:param save: filename for saving plot
:type save: :class:`str`

:param show: directly show plot
:type show: :class:`bool`

Definition at line 1636 of file table.py.

def PlotHistogram (   self,
  col,
  x_range = None,
  num_bins = 10,
  normed = False,
  histtype = 'stepfilled',
  align = 'mid',
  x_title = None,
  y_title = None,
  title = None,
  clear = True,
  save = False,
  color = None,
  y_range = None 
)
Create a histogram of the data in col for the range *x_range*, split into
*num_bins* bins and plot it using Matplotlib.

:param col: column name with data
:type col: :class:`str`

:param x_range: start and end value for first dimension (e.g. [start_x, end_x])
:type x_range: :class:`list` of length two

:param y_range: start and end value for second dimension (e.g. [start_y, end_y])
:type y_range: :class:`list` of length two

:param num_bins: number of bins in range
:type num_bins: :class:`int`

:param color: Color to be used for the histogram. If not set, color will be 
determined by matplotlib
:type color: :class:`str`

:param normed: normalize histogram
:type normed: :class:`bool`

:param histtype: type of histogram (i.e. *bar*, *barstacked*, *step*,
             *stepfilled*). See (`matplotlib docu <http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.hist>`__).
:type histtype: :class:`str`

:param align: style of histogram (*left*, *mid*, *right*). See
          (`matplotlib docu <http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.hist>`__).
:type align: :class:`str`

:param x_title: title for first dimension, if not specified it is
            automatically derived from column name
:type x_title: :class:`str`

:param y_title: title for second dimension, if not specified it is
            automatically derived from column name
:type y_title: :class:`str`

:param title: plot title, if not specified it is automatically derived from
          plotted column names
:type title: :class:`str`

:param clear: clear old data from plot
:type clear: :class:`bool`

:param save: filename for saving plot
:type save: :class:`str`

**Examples:** simple plotting functions

.. code-block:: python

  tab = Table(['a'],'f', a=[math.cos(x*0.01) for x in range(100)])

  # one dimensional plot of column 'd' vs. index
  plt = tab.PlotHistogram('a')
  plt.show()

Definition at line 1346 of file table.py.

def PlotLogROC (   self,
  score_col,
  class_col,
  score_dir = '-',
  class_dir = '-',
  class_cutoff = 2.0,
  style = '-',
  title = None,
  x_title = None,
  y_title = None,
  clear = True,
  save = None 
)
Plot an logROC curve where the x-axis is semilogarithmic using matplotlib 

For more information about parameters of the ROC, see
:meth:`ComputeROC`, and for plotting see :meth:`Plot`.

:warning: The function depends on *matplotlib*

Definition at line 2837 of file table.py.

def PlotROC (   self,
  score_col,
  class_col,
  score_dir = '-',
  class_dir = '-',
  class_cutoff = 2.0,
  style = '-',
  title = None,
  x_title = None,
  y_title = None,
  clear = True,
  save = None 
)
Plot an ROC curve using matplotlib.

For more information about parameters of the ROC, see
:meth:`ComputeROC`, and for plotting see :meth:`Plot`.

:warning: The function depends on *matplotlib*

Definition at line 2787 of file table.py.

def RemoveCol (   self,
  col 
)
Remove column with the given name from the table.

:param col: name of column to remove
:type col: :class:`str`

Definition at line 686 of file table.py.

def RenameCol (   self,
  old_name,
  new_name 
)
Rename column *old_name* to *new_name*.

:param old_name: Name of the old column
:param new_name: Name of the new column
:raises: :exc:`ValueError` when *old_name* is not a valid column

Definition at line 329 of file table.py.

def RowMean (   self,
  mean_col_name,
  cols 
)
Adds a new column of type 'float' with a specified name (*mean_col_name*),
containing the mean of all specified columns for each row.

Cols are specified by their names and must be of numeric column
type ('float', 'int') or boolean column type. Cells with None are ignored.
Adds ''None'' if the row doesn't contain any values.

:param mean_col_name: name of new column containing mean values
:type mean_col_name: :class:`str`

:param cols: name or list of names of columns to include in computation of
         mean
:type cols: :class:`str` or :class:`list` of strings

:raises: :class:`TypeError` if column type of columns in *col* is ``string``

== Example ==
   
Staring with the following table:

==== ==== ====
x     y    u           
==== ==== ====
 1    10  100 
 2    15  None 
 3    20  400 
==== ==== ====

the code here adds a column with the name 'mean' to yield the table below:

.. code-block::python

  tab.RowMean('mean', ['x', 'u'])


==== ==== ==== ===== 
x     y    u   mean           
==== ==== ==== =====
 1    10  100  50.5 
 2    15  None 2
 3    20  400  201.5 
==== ==== ==== =====

Definition at line 1908 of file table.py.

def Save (   self,
  stream_or_filename,
  format = 'ost',
  sep = ' 
)
Save the table to stream or filename. The following three file formats
are supported (for more information on file formats, see :meth:`Load`):

=============   =======================================
ost             ost-specific format (human readable)
csv             comma separated values (human readable)
pickle          pickled byte stream (binary)
html            HTML table
context         ConTeXt table
=============   =======================================

:param stream_or_filename: filename or stream for writing output
:type stream_or_filename: :class:`str` or :class:`file`

:param format: output format (i.e. *ost*, *csv*, *pickle*)
:type format: :class:`str`

:raises: :class:`ValueError` if format is unknown

Definition at line 2156 of file table.py.

def SearchColNames (   self,
  regex 
)
Returns a list of column names matching the regex.

:param regex: regex pattern
:type regex: :class:`str`

:returns: :class:`list` of column names (:class:`str`)

Definition at line 385 of file table.py.

def Select (   self,
  query 
)
Returns a new table object containing all rows matching a logical query expression.

*query* is a string containing the logical expression, that will be evaluated
for every row. 

Operands have to be the name of a column or an expression that can be parsed to 
float, int, bool or string.
Valid operators are: and, or, !=, !, <=, >=, ==, =, <, >, +, -, *, / 

.. code-block:: python

  subtab = tab.Select('col_a>0.5 and (col_b=5 or col_c=5)')

The selection query should be self explaining. Allowed parenthesis are: (), [], {}, 
whereas parenthesis mismatches get recognized. Expressions like '3<=col_a>=col_b'
throw an error, due to problems in figuring out the evaluation order.

There are two special expressions:

.. code-block:: python

  #selects rows, where 1.0<=col_a<=1.5
  subtab = tab.Select('col_a=1.0:1.5')

  #selects rows, where col_a=1 or col_a=2 or col_a=3
  subtab = tab.Select('col_a=1,2,3')

Only consistent types can be compared. If col_a is of type string and col_b is of type int, 
following expression would throw an error: 'col_a<col_b'

Definition at line 825 of file table.py.

def SetName (   self,
  name 
)
Set name of the table

:param name: name
:type name: :class:`str`

Definition at line 314 of file table.py.

def Sort (   self,
  by,
  order = '+' 
)
Performs an in-place sort of the table, based on column *by*.

:param by: column name by which to sort
:type by: :class:`str`

:param order: ascending (``-``) or descending (``+``) order
:type order: :class:`str` (i.e. *+*, *-*)

Definition at line 1028 of file table.py.

def SpearmanCorrel (   self,
  col1,
  col2 
)
Calculate the Spearman correlation coefficient between col1 and col2, only 
taking rows into account where both of the values are not equal to None. If 
there are not enough data points to calculate a correlation coefficient, 
None is returned.

:warning: The function depends on the following module: *scipy.stats.mstats*

:param col1: column name for first column
:type col1: :class:`str`

:param col2: column name for second column
:type col2: :class:`str`

Definition at line 2117 of file table.py.

def Stats (   self,
  col 
)

Definition at line 482 of file table.py.

def StdDev (   self,
  col 
)
Returns the standard deviation of the given column. Cells with ''None'' are
ignored. Returns ''None'', if the column doesn't contain any elements. Col must
be of numeric column type ('float', 'int') or boolean column type.

:param col: column name
:type col: :class:`str`

:raises: :class:`TypeError` if column type is ``string``

Definition at line 2046 of file table.py.

def Sum (   self,
  col 
)
Returns the sum of the given column. Cells with ''None'' are ignored. Returns 
0.0, if the column doesn't contain any elements. Col must be of numeric
column type ('float', 'int') or boolean column type.

:param col: column name
:type col: :class:`str`

:raises: :class:`TypeError` if column type is ``string``

Definition at line 1859 of file table.py.

def ToString (   self,
  float_format = '%.3f',
  int_format = '%d',
  rows = None 
)
Convert the table into a string representation.

The output format can be modified for int and float type columns by
specifying a formatting string for the parameters *float_format* and
*int_format*.

The option *rows* specify the range of rows to be printed. The parameter
must be a type that supports indexing (e.g. a :class:`list`) containing the 
start and end row *index*, e.g. [start_row_idx, end_row_idx].

:param float_format: formatting string for float columns
:type float_format: :class:`str`

:param int_format: formatting string for int columns
:type int_format: :class:`str`

:param rows: iterable containing start and end row *index*
:type rows: iterable containing :class:`ints <int>`

Definition at line 422 of file table.py.

def Zip (   self,
  args 
)
Allows to conveniently iterate over a selection of columns, e.g.

.. code-block:: python

  tab = Table.Load('...')
  for col1, col2 in tab.Zip('col1', 'col2'):
print col1, col2

is a shortcut for

.. code-block:: python

  tab = Table.Load('...')
  for col1, col2 in zip(tab['col1'], tab['col2']):
print col1, col2

Definition at line 1067 of file table.py.


Field Documentation

col_names

Definition at line 223 of file table.py.

col_types

Definition at line 227 of file table.py.

comment

Definition at line 224 of file table.py.

name

Definition at line 225 of file table.py.

rows

Definition at line 228 of file table.py.

tuple SUPPORTED_TYPES = ('int', 'float', 'bool', 'string',)
static

Definition at line 218 of file table.py.


The documentation for this class was generated from the following file: