OpenStructure
|
Public Member Functions | |
def | __init__ |
def | SetName |
def | GetName |
def | GetColIndex |
def | GetColNames |
def | HasCol |
def | __getitem__ |
def | __setitem__ |
def | ToString |
def | __str__ |
def | AddRow |
def | RemoveCol |
def | AddCol |
def | Filter |
def | Sort |
def | GetUnique |
def | Zip |
def | Plot |
def | PlotHistogram |
def | MaxRow |
def | Max |
def | MaxIdx |
def | Min |
def | MinRow |
def | MinIdx |
def | Sum |
def | Mean |
def | RowMean |
def | Median |
def | StdDev |
def | Count |
def | Correl |
def | SpearmanCorrel |
def | Save |
def | GetNumpyMatrix |
def | GetOptimalPrefactors |
def | PlotEnrichment |
def | ComputeEnrichment |
def | ComputeEnrichmentAUC |
def | ComputeROC |
def | ComputeROCAUC |
def | PlotROC |
def | ComputeMCC |
def | IsEmpty |
def | Extend |
Static Public Member Functions | |
def | Load |
Data Fields | |
col_names | |
comment | |
name | |
col_types | |
rows |
Static Public Attributes | |
tuple | SUPPORTED_TYPES = ('int', 'float', 'bool', 'string',) |
The table class provides convenient access to data in tabular form. An empty table can be easily constructed as follows .. code-block:: python tab=Table() If you want to add columns directly when creating the table, column names and *column types* can be specified as follows .. code-block:: python tab=Table(['nameX','nameY','nameZ'], 'sfb') this will create three columns called nameX, nameY and nameZ of type string, float and bool, respectively. There will be no data in the table and thus, the table will not contain any rows. The following *column types* are supported: ======= ======== name abbrev ======= ======== string s float f int i bool b ======= ======== If you want to add data to the table in addition, use the following: .. code-block:: python tab=Table(['nameX','nameY','nameZ'], 'sfb', nameX=['a','b','c'], nameY=[0.1, 1.2, 3.414], nameZ=[True, False, False]) if values for one column is left out, they will be filled with NA, but if values are specified, all values must be specified (i.e. same number of values per column)
def __init__ | ( | self, | |
col_names = None , |
|||
col_types = None , |
|||
kwargs | |||
) |
def AddCol | ( | self, | |
col_name, | |||
col_type, | |||
data = None |
|||
) |
Add a column to the right of the table. :param col_name: name of new column :type col_name: :class:`str` :param col_type: type of new column (long versions: *int*, *float*, *bool*, *string* or short versions: *i*, *f*, *b*, *s*) :type col_type: :class:`str` :param data: data to add to new column. :type data: scalar or iterable **Example:** .. code-block:: python tab=Table(['x'], 'f', x=range(5)) tab.AddCol('even', 'bool', itertools.cycle([True, False])) print tab ''' will produce the table ==== ==== x even ==== ==== 0 True 1 False 2 True 3 False 4 True ==== ==== ''' If data is a constant instead of an iterable object, it's value will be written into each row: .. code-block:: python tab=Table(['x'], 'f', x=range(5)) tab.AddCol('num', 'i', 1) print tab ''' will produce the table ==== ==== x num ==== ==== 0 1 1 1 2 1 3 1 4 1 ==== ==== ''' .. warning:: :meth:`AddCol` only adds data to existing rows and does *not* add new rows. Use :meth:`AddRow` to do this. Therefore, the following code snippet does not add any data items: .. code-block:: python tab=Table() tab.AddCol('even', 'int', [1,2,3,4,5]) print tab ''' will produce the empty table ==== even ==== '''
def AddRow | ( | self, | |
data, | |||
overwrite = None |
|||
) |
Add a row to the table. *data* may either be a dictionary or a list-like object: - If *data* is a dictionary the keys in the dictionary must match the column names. Columns not found in the dict will be initialized to None. If the dict contains list-like objects, multiple rows will be added, if the number of items in all list-like objects is the same, otherwise a :class:`ValueError` is raised. - If *data* is a list-like object, the row is initialized from the values in *data*. The number of items in *data* must match the number of columns in the table. A :class:`ValuerError` is raised otherwise. The values are added in the order specified in the list, thus, the order of the data must match the columns. If *overwrite* is not None and set to an existing column name, the specified column in the table is searched for the first occurrence of a value matching the value of the column with the same name in the dictionary. If a matching value is found, the row is overwritten with the dictionary. If no matching row is found, a new row is appended to the table. :param data: data to add :type data: :class:`dict` or *list-like* object :param overwrite: column name to overwrite existing row if value in column *overwrite* matches :type overwrite: :class:`str` :raises: :class:`ValueError` if *list-like* object is used and number of items does *not* match number of columns in table. :raises: :class:`ValueError` if *dict* is used and multiple rows are added but the number of data items is different for different columns. **Example:** add multiple data rows to a subset of columns using a dictionary .. code-block:: python # create table with three float columns tab = Table(['x','y','z'], 'fff') # add rows from dict data = {'x': [1.2, 1.6], 'z': [1.6, 5.3]} tab.AddRow(data) print tab ''' will produce the table ==== ==== ==== x y z ==== ==== ==== 1.20 NA 1.60 1.60 NA 5.30 ==== ==== ==== ''' # overwrite the row with x=1.2 and add row with x=1.9 data = {'x': [1.2, 1.9], 'z': [7.9, 3.5]} tab.AddRow(data, overwrite='x') print tab ''' will produce the table ==== ==== ==== x y z ==== ==== ==== 1.20 NA 7.90 1.60 NA 5.30 1.90 NA 3.50 ==== ==== ==== '''
def ComputeEnrichment | ( | self, | |
score_col, | |||
class_col, | |||
score_dir = '-' , |
|||
class_dir = '-' , |
|||
class_cutoff = 2.0 |
|||
) |
Computes the enrichment of column *score_col* classified according to *class_col*. For this it is necessary, that the datapoints are classified into positive and negative points. This can be done in two ways: - by using one 'bool' type column (*class_col*) which contains *True* for positives and *False* for negatives - by specifying a classification column (*class_col*), a cutoff value (*class_cutoff*) and the classification columns direction (*class_dir*). This will generate the classification on the fly * if ``class_dir=='-'``: values in the classification column that are less than or equal to class_cutoff will be counted as positives * if ``class_dir=='+'``: values in the classification column that are larger than or equal to class_cutoff will be counted as positives During the calculation, the table will be sorted according to *score_dir*, where a '-' values means smallest values first and therefore, the smaller the value, the better.
def ComputeEnrichmentAUC | ( | self, | |
score_col, | |||
class_col, | |||
score_dir = '-' , |
|||
class_dir = '-' , |
|||
class_cutoff = 2.0 |
|||
) |
def ComputeMCC | ( | self, | |
score_col, | |||
class_col, | |||
score_dir = '-' , |
|||
class_dir = '-' , |
|||
score_cutoff = 2.0 , |
|||
class_cutoff = 2.0 |
|||
) |
Compute Matthews correlation coefficient (MCC) for one column (*score_col*) with the points classified into true positives, false positives, true negatives and false negatives according to a specified classification column (*class_col*). The datapoints in *score_col* and *class_col* are classified into positive and negative points. This can be done in two ways: - by using 'bool' columns which contains True for positives and False for negatives - by using 'float' or 'int' columns and specifying a cutoff value and the columns direction. This will generate the classification on the fly * if ``class_dir``/``score_dir=='-'``: values in the classification column that are less than or equal to *class_cutoff*/*score_cutoff* will be counted as positives * if ``class_dir``/``score_dir=='+'``: values in the classification column that are larger than or equal to *class_cutoff*/*score_cutoff* will be counted as positives The two possibilities can be used together, i.e. 'bool' type for one column and 'float'/'int' type and cutoff/direction for the other column.
def ComputeROC | ( | self, | |
score_col, | |||
class_col, | |||
score_dir = '-' , |
|||
class_dir = '-' , |
|||
class_cutoff = 2.0 |
|||
) |
Computes the receiver operating characteristics (ROC) of column *score_col* classified according to *class_col*. For this it is necessary, that the datapoints are classified into positive and negative points. This can be done in two ways: - by using one 'bool' column (*class_col*) which contains True for positives and False for negatives - by using a non-bool column (*class_col*), a cutoff value (*class_cutoff*) and the classification columns direction (*class_dir*). This will generate the classification on the fly - if ``class_dir=='-'``: values in the classification column that are less than or equal to *class_cutoff* will be counted as positives - if ``class_dir=='+'``: values in the classification column that are larger than or equal to *class_cutoff* will be counted as positives During the calculation, the table will be sorted according to *score_dir*, where a '-' values means smallest values first and therefore, the smaller the value, the better. If *class_col* does not contain any positives (i.e. value is True (if column is of type bool) or evaluated to True (if column is of type int or float (depending on *class_dir* and *class_cutoff*))) the ROC is not defined and the function will return *None*.
def ComputeROCAUC | ( | self, | |
score_col, | |||
class_col, | |||
score_dir = '-' , |
|||
class_dir = '-' , |
|||
class_cutoff = 2.0 |
|||
) |
def Correl | ( | self, | |
col1, | |||
col2 | |||
) |
Calculate the Pearson correlation coefficient between *col1* and *col2*, only taking rows into account where both of the values are not equal to *None*. If there are not enough data points to calculate a correlation coefficient, *None* is returned. :param col1: column name for first column :type col1: :class:`str` :param col2: column name for second column :type col2: :class:`str`
def Count | ( | self, | |
col, | |||
ignore_nan = True |
|||
) |
def Extend | ( | self, | |
tab, | |||
overwrite = None |
|||
) |
Append each row of *tab* to the current table. The data is appended based on the column names, thus the order of the table columns is *not* relevant, only the header names. If there is a column in *tab* that is not present in the current table, it is added to the current table and filled with *None* for all the rows present in the current table. If the type of any column in *tab* is not the same as in the current table a *TypeError* is raised. If *overwrite* is not None and set to an existing column name, the specified column in the table is searched for the first occurrence of a value matching the value of the column with the same name in the dictionary. If a matching value is found, the row is overwritten with the dictionary. If no matching row is found, a new row is appended to the table.
def Filter | ( | self, | |
args, | |||
kwargs | |||
) |
Returns a filtered table only containing rows matching all the predicates in kwargs and args For example, .. code-block:: python tab.Filter(town='Basel') will return all the rows where the value of the column "town" is equal to "Basel". Several predicates may be combined, i.e. .. code-block:: python tab.Filter(town='Basel', male=True) will return the rows with "town" equal to "Basel" and "male" equal to true. args are unary callables returning true if the row should be included in the result and false if not.
def GetColIndex | ( | self, | |
col | |||
) |
def GetColNames | ( | self | ) |
def GetNumpyMatrix | ( | self, | |
args | |||
) |
Returns a numpy matrix containing the selected columns from the table as columns in the matrix. Only columns of type *int* or *float* are supported. *NA* values in the table will be converted to *None* values. :param \*args: column names to include in numpy matrix :warning: The function depends on *numpy*
def GetOptimalPrefactors | ( | self, | |
ref_col, | |||
args, | |||
kwargs | |||
) |
This returns the optimal prefactor values (i.e. a, b, c, ...) for the following equation .. math:: :label: op1 a*u + b*v + c*w + ... = z where u, v, w and z are vectors. In matrix notation .. math:: :label: op2 A*p = z where A contains the data from the table (u,v,w,...), p are the prefactors to optimize (a,b,c,...) and z is the vector containing the result of equation :eq:`op1`. The parameter ref_col equals to z in both equations, and \*args are columns u, v and w (or A in :eq:`op2`). All columns must be specified by their names. **Example:** .. code-block:: python tab.GetOptimalPrefactors('colC', 'colA', 'colB') The function returns a list of containing the prefactors a, b, c, ... in the correct order (i.e. same as columns were specified in \*args). Weighting: If the kwarg weights="columX" is specified, the equations are weighted by the values in that column. Each row is multiplied by the weight in that row, which leads to :eq:`op3`: .. math:: :label: op3 weight*a*u + weight*b*v + weight*c*w + ... = weight*z Weights must be float or int and can have any value. A value of 0 ignores this equation, a value of 1 means the same as no weight. If all weights are the same for each row, the same result will be obtained as with no weights. **Example:** .. code-block:: python tab.GetOptimalPrefactors('colC', 'colA', 'colB', weights='colD')
def GetUnique | ( | self, | |
col, | |||
ignore_nan = True |
|||
) |
def HasCol | ( | self, | |
col | |||
) |
def IsEmpty | ( | self, | |
col_name = None , |
|||
ignore_nan = True |
|||
) |
Checks if a table is empty. If no column name is specified, the whole table is checked for being empty, whereas if a column name is specified, only this column is checked. By default, all NAN (or None) values are ignored, and thus, a table containing only NAN values is considered as empty. By specifying the option ignore_nan=False, NAN values are counted as 'normal' values.
|
static |
Load table from stream or file with given name. By default, the file format is set to *auto*, which tries to guess the file format from the file extension. The following file extensions are recognized: ============ ====================== extension recognized format ============ ====================== .csv comma separated values .pickle pickled byte stream <all others> ost-specific format ============ ====================== Thus, *format* must be specified for reading file with different filename extensions. The following file formats are understood: - ost This is an ost-specific, but still human readable file format. The file (stream) must start with header line of the form col_name1[type1] <col_name2[type2]>... The types given in brackets must be one of the data types the :class:`Table` class understands. Each following line in the file then must contains exactly the same number of data items as listed in the header. The data items are automatically converted to the column format. Lines starting with a '#' and empty lines are ignored. - pickle Deserializes the table from a pickled byte stream - csv Reads the table from comma separated values stream. Since there is no explicit type information in the csv file, the column types are guessed, using the following simple rules: * if all values are either NA/NULL/NONE the type is set to string * if all non-null values are convertible to float/int the type is set to float/int * if all non-null values are true/false/yes/no, the value is set to bool * for all other cases, the column type is set to string :returns: A new :class:`Table` instance
def Max | ( | self, | |
col | |||
) |
def MaxIdx | ( | self, | |
col | |||
) |
def MaxRow | ( | self, | |
col | |||
) |
def Mean | ( | self, | |
col | |||
) |
Returns the mean of the given column. Cells with None are ignored. Returns None, if the column doesn't contain any elements. Col must be of numeric ('float', 'int') or boolean column type. If column type is *bool*, the function returns the ratio of number of 'Trues' by total number of elements. :param col: column name :type col: :class:`str` :raises: :class:`TypeError` if column type is ``string``
def Median | ( | self, | |
col | |||
) |
Returns the median of the given column. Cells with None are ignored. Returns None, if the column doesn't contain any elements. Col must be of numeric column type ('float', 'int') or boolean column type. :param col: column name :type col: :class:`str` :raises: :class:`TypeError` if column type is ``string``
def Min | ( | self, | |
col | |||
) |
def MinIdx | ( | self, | |
col | |||
) |
def MinRow | ( | self, | |
col | |||
) |
def Plot | ( | self, | |
x, | |||
y = None , |
|||
z = None , |
|||
style = '.' , |
|||
x_title = None , |
|||
y_title = None , |
|||
z_title = None , |
|||
x_range = None , |
|||
y_range = None , |
|||
z_range = None , |
|||
color = None , |
|||
plot_if = None , |
|||
legend = None , |
|||
num_z_levels = 10 , |
|||
diag_line = False , |
|||
labels = None , |
|||
max_num_labels = None , |
|||
title = None , |
|||
clear = True , |
|||
save = False , |
|||
kwargs | |||
) |
Function to plot values from your table in 1, 2 or 3 dimensions using `Matplotlib <http://matplotlib.sourceforge.net>`__ :param x: column name for first dimension :type x: :class:`str` :param y: column name for second dimension :type y: :class:`str` :param z: column name for third dimension :type z: :class:`str` :param style: symbol style (e.g. *.*, *-*, *x*, *o*, *+*, *\**). For a complete list check (`matplotlib docu <http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot>`__). :type style: :class:`str` :param x_title: title for first dimension, if not specified it is automatically derived from column name :type x_title: :class:`str` :param y_title: title for second dimension, if not specified it is automatically derived from column name :type y_title: :class:`str` :param z_title: title for third dimension, if not specified it is automatically derived from column name :type z_title: :class:`str` :param x_range: start and end value for first dimension (e.g. [start_x, end_x]) :type x_range: :class:`list` of length two :param y_range: start and end value for second dimension (e.g. [start_y, end_y]) :type y_range: :class:`list` of length two :param z_range: start and end value for third dimension (e.g. [start_z, end_z]) :type z_range: :class:`list` of length two :param color: color for data (e.g. *b*, *g*, *r*). For a complete list check (`matplotlib docu <http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot>`__). :type color: :class:`str` :param plot_if: callable which returnes *True* if row should be plotted. Is invoked like ``plot_if(self, row)`` :type plot_if: callable :param legend: legend label for data series :type legend: :class:`str` :param num_z_levels: number of levels for third dimension :type num_z_levels: :class:`int` :param diag_line: draw diagonal line :type diag_line: :class:`bool` :param labels: column name containing labels to put on x-axis for one dimensional plot :type labels: :class:`str` :param max_num_labels: limit maximum number of labels :type max_num_labels: :class:`int` :param title: plot title, if not specified it is automatically derived from plotted column names :type title: :class:`str` :param clear: clear old data from plot :type clear: :class:`bool` :param save: filename for saving plot :type save: :class:`str` :param \*\*kwargs: additional arguments passed to matplotlib :returns: the ``matplotlib.pyplot`` module **Examples:** simple plotting functions .. code-block:: python tab=Table(['a','b','c','d'],'iffi', a=range(5,0,-1), b=[x/2.0 for x in range(1,6)], c=[math.cos(x) for x in range(0,5)], d=range(3,8)) # one dimensional plot of column 'd' vs. index plt=tab.Plot('d') plt.show() # two dimensional plot of 'a' vs. 'c' plt=tab.Plot('a', y='c', style='o-') plt.show() # three dimensional plot of 'a' vs. 'c' with values 'b' plt=tab.Plot('a', y='c', z='b') # manually save plot to file plt.savefig("plot.png")
def PlotEnrichment | ( | self, | |
score_col, | |||
class_col, | |||
score_dir = '-' , |
|||
class_dir = '-' , |
|||
class_cutoff = 2.0 , |
|||
style = '-' , |
|||
title = None , |
|||
x_title = None , |
|||
y_title = None , |
|||
clear = True , |
|||
save = None |
|||
) |
Plot an enrichment curve using matplotlib of column *score_col* classified according to *class_col*. For more information about parameters of the enrichment, see :meth:`ComputeEnrichment`, and for plotting see :meth:`Plot`. :warning: The function depends on *matplotlib*
def PlotHistogram | ( | self, | |
col, | |||
x_range = None , |
|||
num_bins = 10 , |
|||
normed = False , |
|||
histtype = 'stepfilled' , |
|||
align = 'mid' , |
|||
x_title = None , |
|||
y_title = None , |
|||
title = None , |
|||
clear = True , |
|||
save = False |
|||
) |
Create a histogram of the data in col for the range *x_range*, split into *num_bins* bins and plot it using Matplotlib. :param col: column name with data :type col: :class:`str` :param x_range: start and end value for first dimension (e.g. [start_x, end_x]) :type x_range: :class:`list` of length two :param num_bins: number of bins in range :type num_bins: :class:`int` :param normed: normalize histogram :type normed: :class:`bool` :param histtype: type of histogram (i.e. *bar*, *barstacked*, *step*, *stepfilled*). See (`matplotlib docu <http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.hist>`__). :type histtype: :class:`str` :param align: style of histogram (*left*, *mid*, *right*). See (`matplotlib docu <http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.hist>`__). :type align: :class:`str` :param x_title: title for first dimension, if not specified it is automatically derived from column name :type x_title: :class:`str` :param y_title: title for second dimension, if not specified it is automatically derived from column name :type y_title: :class:`str` :param title: plot title, if not specified it is automatically derived from plotted column names :type title: :class:`str` :param clear: clear old data from plot :type clear: :class:`bool` :param save: filename for saving plot :type save: :class:`str` **Examples:** simple plotting functions .. code-block:: python tab=Table(['a'],'f', a=[math.cos(x*0.01) for x in range(100)]) # one dimensional plot of column 'd' vs. index plt=tab.PlotHistogram('a') plt.show()
def PlotROC | ( | self, | |
score_col, | |||
class_col, | |||
score_dir = '-' , |
|||
class_dir = '-' , |
|||
class_cutoff = 2.0 , |
|||
style = '-' , |
|||
title = None , |
|||
x_title = None , |
|||
y_title = None , |
|||
clear = True , |
|||
save = None |
|||
) |
def RemoveCol | ( | self, | |
col | |||
) |
def RowMean | ( | self, | |
mean_col_name, | |||
cols | |||
) |
Adds a new column of type 'float' with a specified name (*mean_col_name*), containing the mean of all specified columns for each row. Cols are specified by their names and must be of numeric column type ('float', 'int') or boolean column type. Cells with None are ignored. Adds None if the row doesn't contain any values. :param mean_col_name: name of new column containing mean values :type mean_col_name: :class:`str` :param cols: name or list of names of columns to include in computation of mean :type cols: :class:`str` or :class:`list` of strings :raises: :class:`TypeError` if column type of columns in *col* is ``string`` == Example == Staring with the following table: ==== ==== ==== x y u ==== ==== ==== 1 10 100 2 15 None 3 20 400 ==== ==== ==== the code here adds a column with the name 'mean' to yield the table below: .. code-block::python tab.RowMean('mean', ['x', 'u']) ==== ==== ==== ===== x y u mean ==== ==== ==== ===== 1 10 100 50.5 2 15 None 2 3 20 400 201.5 ==== ==== ==== =====
def Save | ( | self, | |
stream_or_filename, | |||
format = 'ost' , |
|||
sep = ' |
|||
) |
Save the table to stream or filename. The following three file formats are supported (for more information on file formats, see :meth:`Load`): ============= ======================================= ost ost-specific format (human readable) csv comma separated values (human readable) pickle pickled byte stream (binary) ============= ======================================= :param stream_or_filename: filename or stream for writing output :type stream_or_filename: :class:`str` or :class:`file` :param format: output format (i.e. *ost*, *csv*, *pickle*) :type format: :class:`str` :raises: :class:`ValueError` if format is unknown
def SetName | ( | self, | |
name | |||
) |
def Sort | ( | self, | |
by, | |||
order = '+' |
|||
) |
def SpearmanCorrel | ( | self, | |
col1, | |||
col2 | |||
) |
Calculate the Spearman correlation coefficient between col1 and col2, only taking rows into account where both of the values are not equal to None. If there are not enough data points to calculate a correlation coefficient, None is returned. :warning: The function depends on the following module: *scipy.stats.mstats* :param col1: column name for first column :type col1: :class:`str` :param col2: column name for second column :type col2: :class:`str`
def StdDev | ( | self, | |
col | |||
) |
Returns the standard deviation of the given column. Cells with None are ignored. Returns None, if the column doesn't contain any elements. Col must be of numeric column type ('float', 'int') or boolean column type. :param col: column name :type col: :class:`str` :raises: :class:`TypeError` if column type is ``string``
def Sum | ( | self, | |
col | |||
) |
Returns the sum of the given column. Cells with None are ignored. Returns 0.0, if the column doesn't contain any elements. Col must be of numeric column type ('float', 'int') or boolean column type. :param col: column name :type col: :class:`str` :raises: :class:`TypeError` if column type is ``string``
def ToString | ( | self, | |
float_format = '%.3f' , |
|||
int_format = '%d' , |
|||
rows = None |
|||
) |
Convert the table into a string representation. The output format can be modified for int and float type columns by specifying a formatting string for the parameters 'float_format' and 'int_format'. The option 'rows' specify the range of rows to be printed. The parameter must be a type that supports indexing (e.g. a :class:`list`) containing the start and end row *index*, e.g. [start_row_idx, end_row_idx]. :param float_format: formatting string for float columns :type float_format: :class:`str` :param int_format: formatting string for int columns :type int_format: :class:`str` :param rows: iterable containing start and end row *index* :type rows: iterable containing :class:`ints <int>`
def Zip | ( | self, | |
args | |||
) |
Allows to conveniently iterate over a selection of columns, e.g. .. code-block:: python tab=Table.Load('...') for col1, col2 in tab.Zip('col1', 'col2'): print col1, col2 is a shortcut for .. code-block:: python tab=Table.Load('...') for col1, col2 in zip(tab['col1'], tab['col2']): print col1, col2
|
static |