Contributing

Development setup

git clone https://github.com/samuelmurail/pdb_cpp.git
cd pdb_cpp
pip install -r requirements.txt
pip install -e . --no-build-isolation

Project layout

src/pdb_cpp/
├── __init__.py          # Package entry point (Coor, Model)
├── _pyprops.py          # Python property patches on Coor/Model
├── alignment.py         # Sequence alignment wrappers
├── analysis/            # High-level analysis package
│   ├── __init__.py      # Flat compatibility exports + grouped submodules
│   ├── dockq.py         # RMSD, DockQ, interface metrics
│   ├── sasa.py          # SASA namespace bridge
│   └── hbonds.py        # H-bond namespace bridge
├── geom.py              # Distance matrix wrapper
├── select.py            # Backbone cleaning utility
├── sequence.py          # Sequence extraction wrappers
├── TMalign.py           # Secondary structure wrapper
├── _core/               # C++ extension source
│   ├── pybind.cpp       # pybind11 bindings (pdb_cpp.core)
│   ├── Coor.cpp/h       # Multi-model coordinate container
│   ├── Model.cpp/h      # Single-model atom storage
│   ├── align.cpp/h      # Structural alignment algorithms
│   ├── select.cpp/h     # Selection language parser
│   ├── seq_align.cpp/h  # Needleman-Wunsch alignment
│   ├── sequence.cpp/h   # Sequence extraction
│   ├── geom.h           # Geometry (Kabsch, distance matrix)
│   ├── TMalign_wrapper.cpp  # USalign integration
│   ├── format/          # PDB/mmCIF parsers and writers
│   └── usalign/         # Vendored USalign headers
└── data/                # BLOSUM62 matrix, residue dictionaries

Adding a C++ feature

  1. Add .cpp/.h files in src/pdb_cpp/_core/.

  2. Register new .cpp files in setup.py’s ext_modules source list.

  3. Expose Python bindings in src/pdb_cpp/_core/pybind.cpp.

  4. Add a Python wrapper in the appropriate src/pdb_cpp/*.py module.

  5. Add tests in tests/.

  6. Rebuild: pip install -e . --no-build-isolation

  7. Run tests: pytest

Running tests

pytest                     # full suite
pytest tests/test_mmcif.py # single file
pytest -v                  # verbose output
pytest -k "tmalign"        # filter by name

Code style

  • Python: follow PEP 8; use type hints in new code.

  • C++: use C++17; prefer std::runtime_error over assert() for error conditions that can occur with user input.

  • Docstrings: use NumPy-style docstrings for all public functions.

Building documentation

cd docs
pip install -r requirements.txt
make html

The output is in docs/build/html/.

Documentation guidelines

Use the docs pages with distinct responsibilities to keep the documentation organized and avoid repeating content:

  • README.md: short project overview, installation, and one minimal quick start.

  • docs/source/basic_example.md: beginner walkthrough only.

  • docs/source/functionality.md: canonical explanations of features and behavior.

  • docs/source/quick_recipes.md: compact copy-paste snippets.

  • docs/source/pdb_cpp.rst: API reference entry page.

When adding or updating docs:

  1. Put the full explanation in one canonical page (usually functionality.md).

  2. In tutorial/recipe pages, link to canonical explanations instead of duplicating text.

  3. Add cross-links at the end of a section when a related page exists.

  4. Keep examples runnable and as short as possible.

Memory safety checks

Sanitizer and Valgrind scripts are in scripts/:

bash scripts/asan_core_only.sh      # AddressSanitizer
bash scripts/valgrind_core_only.sh  # Valgrind memcheck