Contributing¶
Code contributions are handled through the git repository hosted at sciCORE, University of Basel: https://git.scicore.unibas.ch/schwede/ProMod3. Get in touch with the main developers if you have a fantastic new feature and need an account there. The following should explain, in a coarse grain manner, how to add new features to ProMod3. The most general advice would be to use existing bits and pieces as examples and to be consistent with what you already find here.
How To Start Your Own Module¶
This is just a walk-through how the topics from above work together when you start your own module. For the entry point, lets assume that you already cloned the repository into a directory and just changed into it.
All new features should take off from the develop
branch. That way, they
work fine with all the other new fellows waiting for release right from the
beginning. Therefore you need to switch branches as a first step. Git will
tell you for which branch you went, a story of failure otherwise.
$ git checkout develop
Switched to branch 'develop'
Sitting on top of the right code basis, you should just spawn your own branch from it. As an example, your feature will go by the name of ‘sidechain’.
$ git checkout -b sidechain
Switched to a new branch 'sidechain'
This time, Git should tell you about going for a new branch.
Before starting to create anything for real, now is the perfect moment to install our very own Git hook to check some coding rules on commit.
$ cp extras/pre_commit/pre-commit .git/hooks/
With that in place, changes which break our coding standards will abort any commit.
Now create the directory structure where your project will live. Here is the list of directories which are likely to be used in every project.
$ mkdir -p sidechain/doc
$ mkdir -p sidechain/pymod
$ mkdir -p sidechain/tests
If you run git status
at this point, you will see basically nothing. That
is, Git does not admire empty directories. Before you bring your module under
version control, create a couple of files which are always needed.
$ touch sidechain/pymod/__init__.py
$ echo ":mod:\`~promod3.sidechain\` - ProMod3 side chain optimiser" \
>> sidechain/doc/index.rst
$ echo "==========================================================" \
"======================" >> sidechain/doc/index.rst
Having an empty __init__.py
is perfectly fine for Python, it just
announces a directory as a module. But a blank index.rst
has the chance
to give Sphinx quite a headache so you already fill it with a headline for
your documentation.
For integration with make, the build system needs to now about the new module and its members. This goes for setting up new CMake files and extending some around the directory root.
$ touch sidechain/CMakeLists.txt
$ touch sidechain/pymod/CMakeLists.txt
$ touch sidechain/doc/CMakeLists.txt
Each of those files still needs a bit of content. The simplest one comes from
the module’s root, sidechain/CMakeLists.txt
:
1add_subdirectory(pymod)
2add_subdirectory(doc)
Those two directives just tell CMake to go and look in directories
pymod
and doc
below the current path for more CMake
configurations. The next level in CMakeLists.txt
magic comes for the
doc
directory:
1set(SIDECHAIN_RST
2index.rst
3)
4
5add_doc_source(NAME sidechain RST ${SIDECHAIN_RST})
add_doc_source
is our custom CMake macro to register
reST files for the documentation. On running
make, those files are placed in a doc/source
directory tree
within the build directory. Each new submodule in your project should be
covered by its own documentation entity, extending the list in RST
.
Maintaining readability, its good practice to store this list in a separate
variable, called SIDECHAIN_RST
here.
For the actual code, you should keep in mind that a Python module may be
rather complex. There is for sure Python code, there could be a bit of C++
and conditional compilation. In rare cases you also want to modify the
directory structure of the package. All this has to be declared in the
pymod
subtree. We cannot enumerate all specialities but there should be
a couple of examples around in this repository. Here is the most basic
CMakeLists.txt
:
1set(SIDECHAIN_PYMOD
2__init__.py
3)
4
5pymod(NAME sidechain PY ${SIDECHAIN_PYMOD})
Source files should be again listed in a dedicated variable. Later, you
probably add some C++ code and settings diverging from the defaults via the
pymod
macro. This is where things clutter up quite quickly. As set up here,
your project would be added as a module sidechain
in the ProMod3
Python package tree.
The final step towards CMake is to register your module’s directory in the
top level CMakeLists.txt
:
1## <lots of cmake commands...>
2
3## sub dirs to be recognised by CMake
4## e.g. add_subdirectory(src), subdirs have their own CMakeLists.txt
5add_subdirectory(config)
6add_subdirectory(core)
7add_subdirectory(modelling)
8add_subdirectory(sidechain)
9add_subdirectory(loop)
10add_subdirectory(scripts)
11add_subdirectory(actions)
12add_subdirectory(extras)
13add_subdirectory(cmake_support)
14
15## <lots of cmake commands...>
All that needs to be done for CMake to recognise your module is adding its directory as shown in line 8.
This was the final step to set up the build system. Running CMake at this
point would create the build environment in place. But building software in
your code repository has several drawbacks. First of all, it puts all kind of
new files in the directory tree and git status
would show them all. Then
its very likely, that manual intervention is needed after make clean
. Plus,
this would be very static. Imagine at one point you want to switch on all
debugging flags for your C++ code. So you either clean the whole repository
and rebuild or you go by two separated repositories copying code changes from A
to B. The solution to this is instead of ‘in place’ you go ‘out of source’. You
still can stay in your repository while being out of the source tree by using
sub-directories. ProMod3 comes with a dedicated prefix ‘build*’ in
.gitignore
. Have a directory build
and build-dbg
and it
will not show up in git status
.
$ mkdir build
$ cd build
To actually create all the makefiles and generated files, you may use one of
the configuration scripts from the conf-scripts
directory. Usually
those scripts only need to be pointed to an OST staging tree. Even if you
are on a system not covered by available scripts, their code may help you at
the CMake command. Once you managed to conquer a new system, feel free to add
a new configuration script. The following example assumes Fedora 19.
$ ../conf-scripts/fedora-19-conf ../../ost.git/stage
From this point, make should work and you could start adding your
files to the repository using git add
.
Up to now, we did not cover the tests
branch of a new module. But its
good practice to develop new functionality along tests and that right from the
beginning. At some point, new code needs testing anyway to see if it does what
it should, so just do this by writing unit tests. Test sources are stored in
files with a prefix test_
and usually come per submodule instead of
sporting a single monolithic test_sidechain_reconstruction.py
.
Python code is evaluated using its own unit testing framework with a little help from OST (C++ uses the
Boost Test Library). The
basic scheme is to import your module, subclass unittest.TestCase
and
make the whole file runnable as script using the most common __name__
attribute. As an example we test the
promod3.modelling.ReconstructSidechains()
function:
1import unittest
2from promod3 import modelling
3from ost import io,mol
4import os
5
6class ReconstructTests(unittest.TestCase):
7 def testReconstruct(self):
8 in_file = os.path.join('data', '1eye.pdb')
9 ref_file = os.path.join('data', '1eye_rec.pdb')
10 # get and reconstruct 1eye
11 prot = io.LoadPDB(in_file)
12 modelling.ReconstructSidechains(prot, keep_sidechains=False)
13 # compare with reference solution
14 prot_rec = io.LoadPDB(ref_file)
15 self.assertEqual(prot.GetAtomCount(), prot_rec.GetAtomCount())
16
17if __name__ == "__main__":
18 from ost import testutils
19 testutils.RunTests()
To hook up your tests with make codetest
(and to create a
test_reconstruct_sidechains.py_run
target), everything has to be introduced to CMake.
First, tell CMake to search tests
for a CMakeLists.txt
file
by extending the list of sub-directories in sidechain/CMakeLists.txt
:
1 add_subdirectory(pymod)
2 add_subdirectory(doc)
3 add_subdirectory(tests)
Then fill sidechain/tests/CMakeLists.txt
with your new test script and
make
will recognise the changes next time it is run and fix the rest for
you.
1set(SIDECHAIN_UNIT_TESTS
2 test_reconstruct_sidechains.py
3)
4
5set(SIDECHAIN_TEST_DATA
6 data/1eye.pdb
7 data/1eye_rec.pdb
8)
9
10promod3_unittest(MODULE sidechain
11 SOURCES "${SIDECHAIN_UNIT_TESTS}"
12 DATA "${SIDECHAIN_TEST_DATA}")
Note how we listed the test data that we require in the unit test by defining
SIDECHAIN_TEST_DATA
.
Now tests should be available by make check
, make codetest
and
make test_reconstruct_sidechains.py_run
.
How To Start Your Own Action¶
In ProMod3 we call scripts/ programs ‘actions’. They are started by a
launcher found in your staging directory at stage/bin/pm
. This little
guy helps keeping the shell environment in the right mood to carry out your
job. So usually you will start an action by
$ stage/bin/pm help
To start your own action, follow How To Start Your Own Module until creating a directory structure for a new module. Also do go for a dedicated branch for action-development. There you can produce intermediate commits while other branches stay clean in case you have to do some work there which needs to get public.
After preparing your repository its time to create a file for the action. That is a bit different than for modules. Assuming we are sitting in the repository’s root:
$ touch action/pm-awesome-action
$ chmod +x action/pm-awesome-action
Two things are important here: actions are prefixed with pm-
, so they
are recognised by the pm
launcher. Secondly, action files need to be
executable, which does not propagate if you do it after the first call to
make
.
To get the new action recognised by make
to be placed in
stage/libexec/promod3
, it has to be registered with CMake in
actions/CMakeLists.txt
:
1 add_custom_target(actions ALL)
2 add_subdirectory(tests)
3
4 pm_action_init()
5 pm_action(pm-build-rawmodel actions)
6 pm_action(pm-help actions)
7 pm_action(pm-awesome-action actions)
Just add your action with its full filename with a call to
pm_action()
at the end of the file.
Before coding your action, lets set up unit tests for it. Usually when adding
features, you will immediately try them, check that everything works as
intended, etc.. ProMod3 helps you automatising those tests so its rather easy
to check later, if code changes break anything. For actions, we are using
test_actions.ActionTestCase
instead of
unittest.TestCase
. Since testing has a lot in common for different
actions, we decided to put up a little wrapper around this subject. See the
documentation of ActionTestCase
for more information.
Now its time to fill your action with code. Instead of reading a lot more of
explanations, it should be easy to go by examples from the actions
directory. There are only two really important points:
No shebang line (
#! /usr/bin/python
) in your action! Also no#! /usr/bin/env python
or anything like this. This may lead to funny side effects, like calling a Python interpreter from outside a virtual environment or a different version OST. Basically it may mess up the environment your action is running in. Actions are called bypm
, that’s enough to get everything just right.The action of your action happens in the
__main__
branch of the script. Your action will have own function definitions, variables and all the bells and whistles. Hiding behind__main__
keeps everything separated and makes things easier when it gets to debugging. So just afterimport alot def functions_specific_to_your_action(...): if __name__ == "__main__": <put together what your action should do here>
start putting your action together.
How To Write Your Own Scorer¶
The scoring
module contains several classes to make it easy to
add new scorers. As usual, you can use existing bits and pieces as examples and
try to be consistent with it. Here, we quickly give an overview of the separation of concerns:
BackboneScorer
: Defines the scorer with all its parameters and energies and the functionality to compute scores. Scorers are setup by the user (or loaded from disk) if necessary.Scorers do not store any environment data. If needed they can be linked via pointers to env. data kept and updated by the score env.. Also, they may be linked to a score env. listener to handle specially organized data.
BackboneScoreEnv
: Handles all model-specific data used by the scorers. The user sets up the environment and updates it whenever something changes.Residue-specific data is kept in arrays of fixed size (see
IdxHandler
for how the indexing is done). An array of bool-like integers can be accessed with “GetEnvSetData()” and used to determine whether env. data is available for a certain residue. The list of sequences handled by the env. is fixed as otherwise pointers into the data-storage would be invalidated.BackboneScoreEnvListener
: Allows to have score-specific data to be extracted from the model-specific data available in the score env. class. It is commonly used to define spatially organized structures to quickly access other atoms within a given radius.All score env. listener are attached to a master score env. and they get updated when the score env. gets updated. Multiple scorers can use the same listener. Listeners are not accessible by anyone outside of the scorers and the score env. object responsible for it. Since the user doesn’t see them, there is no Python API for them.
IdxHandler
: This takes care of translating chain indices (range [0, GetNumChains()]) and residue numbers (range [1, GetChainSize(chain_idx)]) into the indexing used internally by the score env. (range [0, GetNumResidues()]). The score env. takes care of this object and makes it accessible for scorers.
As an example, let’s look at the CBPackingScorer
:
it contains score-specific parameters and energies which can be either set manually or loaded from disk
it is linked to a score env. listener of type
CBetaEnvListener
, which provides aFindWithin()
function to quickly access neighboring CB atoms (note that the same listener is also used by theCBetaScorer
)a pointer to the
IdxHandler
object of the score env. is extracted when the environment is attached and is used to get sequence-specific data when calculating the score
As a second example, look at the PairwiseScorer
:
it does not require any score-specific setup
it is linked to residue-specific CA/CB positions and the pairwise functions defined in the score env.
“GetEnvSetData()” of the score env. is used to determine if env. data is available for a given residue
Quick testing of ProMod3 features¶
High-level features of ProMod3, can be tested directly in an interactive
Python shell. First, you need to tell Python, where to find the modules by
defining the PYTHONPATH
env. variable in your shell to include the
lib64/python3.6/site-packages
folders of the stage
folders of
ProMod3 and OST. For convenience, you can place the export-command in
your .bashrc
(or so). Then, you can import modules from promod3
and ost
as in the example codes shown in this documentation.
To test low-level C++ features, you can copy the extras/test_code
folder and adapt it for your purposes. First, you will have to fix the paths
to ProMod3 and OST in the Makefile
by changing the following
lines:
# path to OST and ProMod3 stage
OST_ROOT = <DEFINEME>/ost/build/stage
PROMOD3_ROOT = <DEFINEME>/ProMod3/build/stage
Afterwards, you should be able to compile and run small sample codes that use
ProMod3 and OST as in the test.cc
example. You can compile your
code by executing make
and run it with make run
. Also, remember to set
the PROMOD3_SHARED_DATA_PATH
variable if you moved the stage folder.
Unit Tests¶
Of course your code should contain tests. But we cannot give an elaborate tutorial on unit testing here. Again, have a look at how other modules treat this topic and then there is quite a lot of educated material to be found on the Internet. Nevertheless, here is a short list of most important advices:
Tests go into dedicated scripts/ source files in the
tests
directoryNo external data dependencies, if tests need data, they find it in
tests/data
If ‘exotic’ Python modules are used, consider making the test aware of the possibility that the module is not available
Tests do not fail on purpose
No failing tests, that are considered ‘this does not affect anything’
To run the whole test suite, make check
is enough. This will also trigger
the doctest
and linkcheck
targets. Alternatively you can run:
make codetest
to run only unit tests from all modules in ProMod3. Note thatmake check
does nothing more but invokingdoctest
,linkcheck
andcodetest
as dependencies.make check_xml
to run tests without stopping after each failure. Failures are shortly reported to the command line and the result of each test is written in ‘PYTEST-<TestCaseName>.xml’ files in the ‘tests’ subfolders of your ‘build’ folder.Run single tests: assuming you have
your_module/tests/test_awesome_feature.py
, CMake will provide you with a targettest_awesome_feature.py_run
. If your module has C++ tests, those will be available bytest_suite_your_module_run
.
Writing Documentation¶
To create documentation, we use Sphinx to go from reStructuredText (reST) files and API documentation in source files to HTML or man pages.
For each module, at least one reST document exists, that
gives an idea of concepts and pulls in interfaces from source. Copying files to
the build directory, issuing the Sphinx call and everything else that is
needed to create the actual documentation is done by CMake and its makefiles.
Hence, the CMakeLists.txt
of the doc
directory of a module is
crucial. For documentation which does not relate to a particular module, the
repository comes with a top-level doc
directory.
If you write new functionality for ProMod3, or fix bugs, feel free to extend
the CHANGELOG
file. It will be automatically pulled into the
documentation.
It is highly recommended to add code examples with your documentation. For that
purpose, you should write a fully runnable script which is to be placed in the
doc/tests/scripts
directory. The script is to be runnable from within
the doc/tests
directory as pm SCRIPTPATH
and may use data stored in
the doc/tests/data
directory. The script and any data needed by it, must
then be referenced in the doc/tests/CMakeLists.txt
file. Afterwards,
they can be included in the documentation using the literalinclude
directive.
For instance, if you add a new example code loop_main.py
,
you would add it in your module documentation as follows:
.. literalinclude:: ../../../tests/doc/scripts/loop_main.py
If your example does not relate to a specific module and the documentation is
in the top-level doc
directory, you need to drop one of the ..
as
follows:
.. literalinclude:: ../../tests/doc/scripts/hello_world.py
To ensure that the code examples keep on working, a unit test has to be defined
in doc/tests/test_doctests.py
. Each example code is run by a dedicated
test function. Usually, the codes are run and the return code is checked.
Command-line output or resulting files can also be checked (see existing test
codes for examples). A few more guidelines for example codes:
If it generates files as output, please delete them after checking them.
If it requires external modules or binaries, check for their availablity. If the external dependencies are not available, output a warning and skip the test.
A copy of the generated html documentation is kept in doc/html
so that
there is no need to compile ProMod3 to read it. Our policy is to keep that
folder in-sync with the latest documentation at least on the master
branch
(i.e. for every release). You can use the following commands to do the update:
$ cd <PROMOD3_PATH>/build
$ make html
$ rsync -iv -az --exclude=".*" --delete \
"stage/share/promod3/html/" "../doc/html"
Third Party Contributions (License Issues)¶
For some tasks you may want to make use of third party contributions in your module, for example
calling/ using the output of third party binaries
external libraries
smallish bits of source code included into the ProMod3 directory tree
Python modules not distributed as part of the Python standard library
Modules from the Python standard library are covered by the Python license and licenses is what you have to watch out for with this subject. While the Python license is safe to be used, in the past several projects went restrictive because of exclusive terms of use. Those issues often came from ‘academic licenses’, allowing use if free of charge but for commercial entities. To prevent this is one reason for the existence of ProMod3. This means, before utilising external code, third party libraries, basically anything not created within this project (including pictures, test data, etc.), check licensing. What cannot be used at all are items without any license. Those things are not ‘free’ but more in a legally uncertain state. Also forbidden are licenses which exclude commercial entities.
There are a lot of rather permissive licenses out there, very often asking for acknowledgements. We definitively support this. Either go by example phrases suggested in the license itself or find some nice paragraph yourself and place it in the documentation. We should also not forget to promote those contributions to web pages using ProMod3.