pdb_cpp.analysis package

Submodules

pdb_cpp.analysis.dockq module

DockQ scoring and interface-metric helpers.

The public functions in this module compute RMSD, contact fractions, and the combined DockQ score for protein-protein and protein-nucleic interfaces.

pdb_cpp.analysis.dockq.dockQ(coor, native_coor, rec_chains=None, lig_chains=None, native_rec_chains=None, native_lig_chains=None, back_atom=None, _search_mode=False)[source]

Compute DockQ scores between a model and a native structure.

DockQ combines interface contacts (Fnat), ligand RMSD (LRMS), and interface RMSD (iRMS) into a single docking quality metric. Chain roles are inferred by selecting the shortest chain as ligand when not provided.

Parameters:

coorCoor: Model coordinates.
native_coorCoor: Native coordinates.
rec_chainslist[str], optional: Model receptor chains. If None, uses all chains except the shortest chain (ligand).
lig_chainslist[str], optional: Model ligand chains. If None, uses the shortest chain.
native_rec_chainslist[str], optional: Native receptor chains. If None, uses all chains except the shortest chain (ligand).
native_lig_chainslist[str], optional: Native ligand chains. If None, uses the shortest chain.
back_atomlist[str], optional: Backbone atom names used for alignment and RMSD calculations.

Returns:

dict: Dictionary with keys Fnat, Fnonnat, rRMS, iRMS, LRMS, and DockQ, each containing lists per model.

Notes

This implementation mirrors the pdb_numpy DockQ pipeline and relies on sequence-based alignment for receptor superposition before computing the ligand RMSD and interface metrics.

pdb_cpp.analysis.dockq.dockQ_multimer(coor, native_coor, chain_map=None, back_atom=None, n_cpu=1, _search_mode=False, _native_iface_cache=None, _model_backbone_cache=None)[source]

Compute DockQ over all pairwise native chain interfaces (multimer).

Scores every \(\binom{n}{2}\) interface between the n native chains and returns per-interface DockQ metrics as well as GlobalDockQ (the average DockQ over all interfaces), mirroring the DockQ v2 multimer output.

Parameters:

coorCoor: Model coordinates.
native_coorCoor: Native coordinates.
chain_mapdict[str, str], optional: Mapping from native chain IDs to model chain IDs. If None the function assumes that both structures share the same chain names and builds an identity mapping for chains present in both.
back_atomlist[str], optional: Backbone atom names used for alignment and RMSD calculations.

Returns:

dict

Dictionary with two keys:

"interfaces": dict[(native_ch1, native_ch2), result] where result is the dict returned by dockQ() for that pair (or None when the interface could not be scored).
"GlobalDockQ": list[float] — average DockQ over all valid interfaces, one value per model frame.

Notes

The larger native chain of each pair is used as receptor to match the DockQ v2 convention. For chains of equal length the pair is presented in the order they appear in the chain_map iteration order.

pdb_cpp.analysis.dockq.interface_rmsd(coor, coor_native, rec_chains_native, lig_chains_native, cutoff=10.0, back_atom=None, index_pair=None)[source]

Compute the interface RMSD between two models.

The interface is defined as atoms within cutoff Angstrom of the opposite partner chain(s) in the native structure. The RMSD is computed on the selected backbone atoms after aligning the model to the native structure using the interface atoms.

Parameters:

coorCoor: Model coordinates.
coor_nativeCoor: Native coordinates.
rec_chains_nativelist[str]: Native receptor chains.
lig_chains_nativelist[str]: Native ligand chains.
cutofffloat, default=10.0: Interface distance cutoff in Angstrom.
back_atomlist[str], optional: Backbone atom names used for RMSD (default: ["CA", "N", "C", "O"]).
index_pairtuple[list[int], list[int]], optional: Pre-computed (model_indices, native_indices) of the interface backbone atoms. When supplied the function skips the chain-selection and residue-mapping steps entirely, calling align_index_based() directly. This correctly handles non-sequential or non-contiguous index lists (e.g. when model chains are numbered from 1 on every chain and thus have overlapping resid values).

Returns:

list[float]: Interface RMSD values for each model. Returns None entries when no interface residues are found.

pdb_cpp.analysis.dockq.native_contact(coor, native_coor, rec_chains, lig_chains, native_rec_chains, native_lig_chains, cutoff=5.0, residue_id_map=None, native_residue_id_map=None)[source]

Compute native and non-native contact fractions between model and native.

The function builds the set of native receptor-ligand residue contacts within cutoff Angstrom in the native structure, then counts which of those contacts are present in the model (Fnat) and which model contacts are not native (Fnonnat).

Parameters:

coorCoor: Model coordinates.
native_coorCoor: Native coordinates.
rec_chainslist[str]: Model receptor chains.
lig_chainslist[str]: Model ligand chains.
native_rec_chainslist[str]: Native receptor chains.
native_lig_chainslist[str]: Native ligand chains.
cutofffloat, default=5.0: Contact distance cutoff in Angstrom.
residue_id_mapdict[int, int], optional: Mapping from model residue IDs to a shared residue ID space.
native_residue_id_mapdict[int, int], optional: Mapping from native residue IDs to the same shared residue ID space.

Returns:

tuple[list[float], list[float]]: (fnat_list, fnonnat_list) for each model in coor.

pdb_cpp.analysis.dockq.rmsd(coor_1, coor_2, selection='name CA', index_list=None, frame_ref=0)[source]

Compute RMSD between two sets of coordinates.

Parameters:

coor_1Coor: First set of coordinates.
coor_2Coor: Second set of coordinates.
selectionstr, optional: Selection string used when index_list is not provided.
index_listlist, optional: Pair of index lists [index_1, index_2].
frame_refint, optional: Reference frame index in coor_2.

Returns:

list[float]: RMSD values for each model in coor_1.

pdb_cpp.analysis.hbonds module

Hydrogen-bond computation using Baker & Hubbard geometric criteria.

Virtual hydrogen positions are reconstructed from heavy-atom geometry when H atoms are absent from the coordinate file. Backbone N-H positions are computed from the preceding C and the current CA/N atoms; sidechain hydrogens are placed along the extension of the bond from the nearest heavy-atom neighbour.

Criteria (all must be satisfied):: D···A distance < dist_DA_cutoff (default 3.5 Å) H···A distance < dist_HA_cutoff (default 2.5 Å) D−H···A angle > angle_cutoff (default 90°)

Reference

Baker EN & Hubbard RE (1984) Hydrogen bonding in globular proteins. Prog Biophys Mol Biol 44 97-179.

pdb_cpp.analysis.hbonds.hbonds(coor, donor_sel='protein', acceptor_sel='protein', dist_DA_cutoff=3.5, dist_HA_cutoff=2.5, angle_cutoff=90.0)[source]

Compute hydrogen bonds between two selections for every model in coor.

Parameters:

coorCoor: Coordinate object (one or more models / frames).
donor_selstr, optional: Atom selection string for donor atoms (default: "protein"). Use "protein or nucleic" or "nucleic" to include nucleic acids.
acceptor_selstr, optional: Atom selection string for acceptor atoms (default: "protein").
dist_DA_cutofffloat, optional: Maximum donor-heavy to acceptor distance in Å (default 3.5).
dist_HA_cutofffloat, optional: Maximum hydrogen to acceptor distance in Å (default 2.5).
angle_cutofffloat, optional: Minimum D−H···A angle in degrees (default 90).

Returns:

list[list[HBond]]

One list of HBond objects per model frame. Each HBond has the following read-only attributes:

donor_resid – unique residue ID of the donor
donor_resname – residue name of the donor
donor_chain – chain ID of the donor
donor_heavy_name – heavy donor atom name (e.g. "N", "OG")
donor_h_name – hydrogen atom name (actual or virtual)
donor_heavy_xyz – (x, y, z) of the donor heavy atom
donor_h_xyz – (x, y, z) of the H (actual or reconstructed)
acceptor_resid – unique residue ID of the acceptor
acceptor_resname – residue name of the acceptor
acceptor_chain – chain ID of the acceptor
acceptor_name – acceptor atom name (e.g. "O", "OD1")
acceptor_xyz – (x, y, z) of the acceptor atom
dist_DA – D···A distance (Å)
dist_HA – H···A distance (Å)
angle_DHA – D−H···A angle (degrees)

pdb_cpp.analysis.salt_bridge module

Salt-bridge detection helpers.

The implementation uses explicit charged-atom tables and a simple distance cutoff so the behavior stays predictable across protein and nucleic selections.

class pdb_cpp.analysis.salt_bridge.SaltBridge(cation_resid: int, cation_resname: str, cation_chain: str, cation_name: str, cation_xyz: tuple[float, float, float], anion_resid: int, anion_resname: str, anion_chain: str, anion_name: str, anion_xyz: tuple[float, float, float], distance: float)[source]

Bases: object

Describe a single salt bridge between a cationic and anionic atom.

anion_chain: str

anion_name: str

anion_resid: int

anion_resname: str

anion_xyz: tuple[float, float, float]

cation_chain: str

cation_name: str

cation_resid: int

cation_resname: str

cation_xyz: tuple[float, float, float]

distance: float

pdb_cpp.analysis.salt_bridge.salt_bridges(coor, cation_sel: str = 'protein', anion_sel: str = 'protein', cutoff: float = 4.0)[source]

Identify salt bridges between two selections for every model in coor.

Salt bridges are detected between explicitly typed cationic and anionic heavy atoms using a simple distance cutoff.

pdb_cpp.analysis.sasa module

SASA and interface-SASA helpers.

pdb_cpp.analysis.sasa.buried_surface_area(coor, receptor_sel, ligand_sel, probe_radius=1.4, n_points=960, include_hydrogen=False, by_residue=False)[source]: Compute buried interface surface for each model in a Coor object.

pdb_cpp.analysis.sasa.sasa(coor, selection=None, probe_radius=1.4, n_points=960, include_hydrogen=False, by_atom=False, by_residue=False)[source]: Compute SASA for each model in a Coor object.

pdb_cpp.analysis.sasa.shape_complementarity(coor, receptor_sel, ligand_sel, probe_radius=1.4, dots_per_sq_angstrom=12.0, search_radius=1.5, include_hydrogen=False, reducer='trimmed_mean', trim_fraction=0.1)[source]

Estimate Lawrence-Colman style shape complementarity for an interface.

Surface dots are generated independently for each partner using a rolling-probe surface with outward normals. Each interface dot is matched to its closest dot on the opposite partner within search_radius and scored with the normal complementarity term dot(n_a, -n_b).

Module contents

High-level analysis namespace.

This package groups structure-analysis helpers into topic-oriented modules:

pdb_cpp.analysis.dockq
pdb_cpp.analysis.sasa
pdb_cpp.analysis.hbonds
pdb_cpp.analysis.salt_bridge

The historical flat API is preserved, so existing code such as pdb_cpp.analysis.rmsd(...) keeps working.

pdb_cpp.analysis.buried_surface_area(coor, receptor_sel, ligand_sel, probe_radius=1.4, n_points=960, include_hydrogen=False, by_residue=False)[source]: Compute buried interface surface for each model in a Coor object.

pdb_cpp.analysis.compute_hbonds(coor, donor_sel='protein', acceptor_sel='protein', dist_DA_cutoff=3.5, dist_HA_cutoff=2.5, angle_cutoff=90.0)

Compute hydrogen bonds between two selections for every model in coor.

Parameters:

coorCoor: Coordinate object (one or more models / frames).
donor_selstr, optional: Atom selection string for donor atoms (default: "protein"). Use "protein or nucleic" or "nucleic" to include nucleic acids.
acceptor_selstr, optional: Atom selection string for acceptor atoms (default: "protein").
dist_DA_cutofffloat, optional: Maximum donor-heavy to acceptor distance in Å (default 3.5).
dist_HA_cutofffloat, optional: Maximum hydrogen to acceptor distance in Å (default 2.5).
angle_cutofffloat, optional: Minimum D−H···A angle in degrees (default 90).

Returns:

list[list[HBond]]

One list of HBond objects per model frame. Each HBond has the following read-only attributes:

donor_resid – unique residue ID of the donor
donor_resname – residue name of the donor
donor_chain – chain ID of the donor
donor_heavy_name – heavy donor atom name (e.g. "N", "OG")
donor_h_name – hydrogen atom name (actual or virtual)
donor_heavy_xyz – (x, y, z) of the donor heavy atom
donor_h_xyz – (x, y, z) of the H (actual or reconstructed)
acceptor_resid – unique residue ID of the acceptor
acceptor_resname – residue name of the acceptor
acceptor_chain – chain ID of the acceptor
acceptor_name – acceptor atom name (e.g. "O", "OD1")
acceptor_xyz – (x, y, z) of the acceptor atom
dist_DA – D···A distance (Å)
dist_HA – H···A distance (Å)
angle_DHA – D−H···A angle (degrees)

pdb_cpp.analysis.dockQ(coor, native_coor, rec_chains=None, lig_chains=None, native_rec_chains=None, native_lig_chains=None, back_atom=None, _search_mode=False)[source]

Compute DockQ scores between a model and a native structure.

DockQ combines interface contacts (Fnat), ligand RMSD (LRMS), and interface RMSD (iRMS) into a single docking quality metric. Chain roles are inferred by selecting the shortest chain as ligand when not provided.

Parameters:

coorCoor: Model coordinates.
native_coorCoor: Native coordinates.
rec_chainslist[str], optional: Model receptor chains. If None, uses all chains except the shortest chain (ligand).
lig_chainslist[str], optional: Model ligand chains. If None, uses the shortest chain.
native_rec_chainslist[str], optional: Native receptor chains. If None, uses all chains except the shortest chain (ligand).
native_lig_chainslist[str], optional: Native ligand chains. If None, uses the shortest chain.
back_atomlist[str], optional: Backbone atom names used for alignment and RMSD calculations.

Returns:

dict: Dictionary with keys Fnat, Fnonnat, rRMS, iRMS, LRMS, and DockQ, each containing lists per model.

Notes

This implementation mirrors the pdb_numpy DockQ pipeline and relies on sequence-based alignment for receptor superposition before computing the ligand RMSD and interface metrics.

pdb_cpp.analysis.dockQ_multimer(coor, native_coor, chain_map=None, back_atom=None, n_cpu=1, _search_mode=False, _native_iface_cache=None, _model_backbone_cache=None)[source]

Compute DockQ over all pairwise native chain interfaces (multimer).

Scores every \(\binom{n}{2}\) interface between the n native chains and returns per-interface DockQ metrics as well as GlobalDockQ (the average DockQ over all interfaces), mirroring the DockQ v2 multimer output.

Parameters:

coorCoor: Model coordinates.
native_coorCoor: Native coordinates.
chain_mapdict[str, str], optional: Mapping from native chain IDs to model chain IDs. If None the function assumes that both structures share the same chain names and builds an identity mapping for chains present in both.
back_atomlist[str], optional: Backbone atom names used for alignment and RMSD calculations.

Returns:

dict

Dictionary with two keys:

"interfaces": dict[(native_ch1, native_ch2), result] where result is the dict returned by dockQ() for that pair (or None when the interface could not be scored).
"GlobalDockQ": list[float] — average DockQ over all valid interfaces, one value per model frame.

Notes

The larger native chain of each pair is used as receptor to match the DockQ v2 convention. For chains of equal length the pair is presented in the order they appear in the chain_map iteration order.

pdb_cpp.analysis.interface_rmsd(coor, coor_native, rec_chains_native, lig_chains_native, cutoff=10.0, back_atom=None, index_pair=None)[source]

Compute the interface RMSD between two models.

The interface is defined as atoms within cutoff Angstrom of the opposite partner chain(s) in the native structure. The RMSD is computed on the selected backbone atoms after aligning the model to the native structure using the interface atoms.

Parameters:

coorCoor: Model coordinates.
coor_nativeCoor: Native coordinates.
rec_chains_nativelist[str]: Native receptor chains.
lig_chains_nativelist[str]: Native ligand chains.
cutofffloat, default=10.0: Interface distance cutoff in Angstrom.
back_atomlist[str], optional: Backbone atom names used for RMSD (default: ["CA", "N", "C", "O"]).
index_pairtuple[list[int], list[int]], optional: Pre-computed (model_indices, native_indices) of the interface backbone atoms. When supplied the function skips the chain-selection and residue-mapping steps entirely, calling align_index_based() directly. This correctly handles non-sequential or non-contiguous index lists (e.g. when model chains are numbered from 1 on every chain and thus have overlapping resid values).

Returns:

list[float]: Interface RMSD values for each model. Returns None entries when no interface residues are found.

pdb_cpp.analysis.native_contact(coor, native_coor, rec_chains, lig_chains, native_rec_chains, native_lig_chains, cutoff=5.0, residue_id_map=None, native_residue_id_map=None)[source]

Compute native and non-native contact fractions between model and native.

The function builds the set of native receptor-ligand residue contacts within cutoff Angstrom in the native structure, then counts which of those contacts are present in the model (Fnat) and which model contacts are not native (Fnonnat).

Parameters:

coorCoor: Model coordinates.
native_coorCoor: Native coordinates.
rec_chainslist[str]: Model receptor chains.
lig_chainslist[str]: Model ligand chains.
native_rec_chainslist[str]: Native receptor chains.
native_lig_chainslist[str]: Native ligand chains.
cutofffloat, default=5.0: Contact distance cutoff in Angstrom.
residue_id_mapdict[int, int], optional: Mapping from model residue IDs to a shared residue ID space.
native_residue_id_mapdict[int, int], optional: Mapping from native residue IDs to the same shared residue ID space.

Returns:

tuple[list[float], list[float]]: (fnat_list, fnonnat_list) for each model in coor.

pdb_cpp.analysis.rmsd(coor_1, coor_2, selection='name CA', index_list=None, frame_ref=0)[source]

Compute RMSD between two sets of coordinates.

Parameters:

coor_1Coor: First set of coordinates.
coor_2Coor: Second set of coordinates.
selectionstr, optional: Selection string used when index_list is not provided.
index_listlist, optional: Pair of index lists [index_1, index_2].
frame_refint, optional: Reference frame index in coor_2.

Returns:

list[float]: RMSD values for each model in coor_1.

pdb_cpp.analysis.salt_bridges(coor, cation_sel: str = 'protein', anion_sel: str = 'protein', cutoff: float = 4.0)[source]

Identify salt bridges between two selections for every model in coor.

Salt bridges are detected between explicitly typed cationic and anionic heavy atoms using a simple distance cutoff.

pdb_cpp.analysis.shape_complementarity(coor, receptor_sel, ligand_sel, probe_radius=1.4, dots_per_sq_angstrom=12.0, search_radius=1.5, include_hydrogen=False, reducer='trimmed_mean', trim_fraction=0.1)[source]

Estimate Lawrence-Colman style shape complementarity for an interface.

Surface dots are generated independently for each partner using a rolling-probe surface with outward normals. Each interface dot is matched to its closest dot on the opposite partner within search_radius and scored with the normal complementarity term dot(n_a, -n_b).