pdb_cpp.analysis package
Submodules
pdb_cpp.analysis.dockq module
DockQ scoring and interface-metric helpers.
The public functions in this module compute RMSD, contact fractions, and the combined DockQ score for protein-protein and protein-nucleic interfaces.
- pdb_cpp.analysis.dockq.dockQ(coor, native_coor, rec_chains=None, lig_chains=None, native_rec_chains=None, native_lig_chains=None, back_atom=None, _search_mode=False)[source]
Compute DockQ scores between a model and a native structure.
DockQ combines interface contacts (Fnat), ligand RMSD (LRMS), and interface RMSD (iRMS) into a single docking quality metric. Chain roles are inferred by selecting the shortest chain as ligand when not provided.
- Parameters:
- coorCoor
Model coordinates.
- native_coorCoor
Native coordinates.
- rec_chainslist[str], optional
Model receptor chains. If
None, uses all chains except the shortest chain (ligand).- lig_chainslist[str], optional
Model ligand chains. If
None, uses the shortest chain.- native_rec_chainslist[str], optional
Native receptor chains. If
None, uses all chains except the shortest chain (ligand).- native_lig_chainslist[str], optional
Native ligand chains. If
None, uses the shortest chain.- back_atomlist[str], optional
Backbone atom names used for alignment and RMSD calculations.
- Returns:
- dict
Dictionary with keys
Fnat,Fnonnat,rRMS,iRMS,LRMS, andDockQ, each containing lists per model.
Notes
This implementation mirrors the pdb_numpy DockQ pipeline and relies on sequence-based alignment for receptor superposition before computing the ligand RMSD and interface metrics.
- pdb_cpp.analysis.dockq.dockQ_multimer(coor, native_coor, chain_map=None, back_atom=None, n_cpu=1, _search_mode=False, _native_iface_cache=None, _model_backbone_cache=None)[source]
Compute DockQ over all pairwise native chain interfaces (multimer).
Scores every \(\binom{n}{2}\) interface between the n native chains and returns per-interface DockQ metrics as well as GlobalDockQ (the average DockQ over all interfaces), mirroring the DockQ v2 multimer output.
- Parameters:
- coorCoor
Model coordinates.
- native_coorCoor
Native coordinates.
- chain_mapdict[str, str], optional
Mapping from native chain IDs to model chain IDs. If
Nonethe function assumes that both structures share the same chain names and builds an identity mapping for chains present in both.- back_atomlist[str], optional
Backbone atom names used for alignment and RMSD calculations.
- Returns:
- dict
Dictionary with two keys:
"interfaces"dict[(native_ch1, native_ch2), result]where result is the dict returned bydockQ()for that pair (orNonewhen the interface could not be scored)."GlobalDockQ"list[float]— average DockQ over all valid interfaces, one value per model frame.
Notes
The larger native chain of each pair is used as receptor to match the DockQ v2 convention. For chains of equal length the pair is presented in the order they appear in the
chain_mapiteration order.
- pdb_cpp.analysis.dockq.interface_rmsd(coor, coor_native, rec_chains_native, lig_chains_native, cutoff=10.0, back_atom=None, index_pair=None)[source]
Compute the interface RMSD between two models.
The interface is defined as atoms within
cutoffAngstrom of the opposite partner chain(s) in the native structure. The RMSD is computed on the selected backbone atoms after aligning the model to the native structure using the interface atoms.- Parameters:
- coorCoor
Model coordinates.
- coor_nativeCoor
Native coordinates.
- rec_chains_nativelist[str]
Native receptor chains.
- lig_chains_nativelist[str]
Native ligand chains.
- cutofffloat, default=10.0
Interface distance cutoff in Angstrom.
- back_atomlist[str], optional
Backbone atom names used for RMSD (default:
["CA", "N", "C", "O"]).- index_pairtuple[list[int], list[int]], optional
Pre-computed
(model_indices, native_indices)of the interface backbone atoms. When supplied the function skips the chain-selection and residue-mapping steps entirely, callingalign_index_based()directly. This correctly handles non-sequential or non-contiguous index lists (e.g. when model chains are numbered from 1 on every chain and thus have overlappingresidvalues).
- Returns:
- list[float]
Interface RMSD values for each model. Returns
Noneentries when no interface residues are found.
- pdb_cpp.analysis.dockq.native_contact(coor, native_coor, rec_chains, lig_chains, native_rec_chains, native_lig_chains, cutoff=5.0, residue_id_map=None, native_residue_id_map=None)[source]
Compute native and non-native contact fractions between model and native.
The function builds the set of native receptor-ligand residue contacts within
cutoffAngstrom in the native structure, then counts which of those contacts are present in the model (Fnat) and which model contacts are not native (Fnonnat).- Parameters:
- coorCoor
Model coordinates.
- native_coorCoor
Native coordinates.
- rec_chainslist[str]
Model receptor chains.
- lig_chainslist[str]
Model ligand chains.
- native_rec_chainslist[str]
Native receptor chains.
- native_lig_chainslist[str]
Native ligand chains.
- cutofffloat, default=5.0
Contact distance cutoff in Angstrom.
- residue_id_mapdict[int, int], optional
Mapping from model residue IDs to a shared residue ID space.
- native_residue_id_mapdict[int, int], optional
Mapping from native residue IDs to the same shared residue ID space.
- Returns:
- tuple[list[float], list[float]]
(fnat_list, fnonnat_list)for each model incoor.
- pdb_cpp.analysis.dockq.rmsd(coor_1, coor_2, selection='name CA', index_list=None, frame_ref=0)[source]
Compute RMSD between two sets of coordinates.
- Parameters:
- coor_1Coor
First set of coordinates.
- coor_2Coor
Second set of coordinates.
- selectionstr, optional
Selection string used when index_list is not provided.
- index_listlist, optional
Pair of index lists [index_1, index_2].
- frame_refint, optional
Reference frame index in coor_2.
- Returns:
- list[float]
RMSD values for each model in coor_1.
pdb_cpp.analysis.hbonds module
Hydrogen-bond computation using Baker & Hubbard geometric criteria.
Virtual hydrogen positions are reconstructed from heavy-atom geometry when H atoms are absent from the coordinate file. Backbone N-H positions are computed from the preceding C and the current CA/N atoms; sidechain hydrogens are placed along the extension of the bond from the nearest heavy-atom neighbour.
- Criteria (all must be satisfied):
D···A distance < dist_DA_cutoff (default 3.5 Å) H···A distance < dist_HA_cutoff (default 2.5 Å) D−H···A angle > angle_cutoff (default 90°)
Reference
Baker EN & Hubbard RE (1984) Hydrogen bonding in globular proteins. Prog Biophys Mol Biol 44 97-179.
- pdb_cpp.analysis.hbonds.hbonds(coor, donor_sel='protein', acceptor_sel='protein', dist_DA_cutoff=3.5, dist_HA_cutoff=2.5, angle_cutoff=90.0)[source]
Compute hydrogen bonds between two selections for every model in coor.
- Parameters:
- coorCoor
Coordinate object (one or more models / frames).
- donor_selstr, optional
Atom selection string for donor atoms (default:
"protein"). Use"protein or nucleic"or"nucleic"to include nucleic acids.- acceptor_selstr, optional
Atom selection string for acceptor atoms (default:
"protein").- dist_DA_cutofffloat, optional
Maximum donor-heavy to acceptor distance in Å (default 3.5).
- dist_HA_cutofffloat, optional
Maximum hydrogen to acceptor distance in Å (default 2.5).
- angle_cutofffloat, optional
Minimum D−H···A angle in degrees (default 90).
- Returns:
- list[list[HBond]]
One list of
HBondobjects per model frame. EachHBondhas the following read-only attributes:donor_resid– unique residue ID of the donordonor_resname– residue name of the donordonor_chain– chain ID of the donordonor_heavy_name– heavy donor atom name (e.g."N","OG")donor_h_name– hydrogen atom name (actual or virtual)donor_heavy_xyz– (x, y, z) of the donor heavy atomdonor_h_xyz– (x, y, z) of the H (actual or reconstructed)acceptor_resid– unique residue ID of the acceptoracceptor_resname– residue name of the acceptoracceptor_chain– chain ID of the acceptoracceptor_name– acceptor atom name (e.g."O","OD1")acceptor_xyz– (x, y, z) of the acceptor atomdist_DA– D···A distance (Å)dist_HA– H···A distance (Å)angle_DHA– D−H···A angle (degrees)
pdb_cpp.analysis.salt_bridge module
Salt-bridge detection helpers.
The implementation uses explicit charged-atom tables and a simple distance cutoff so the behavior stays predictable across protein and nucleic selections.
- class pdb_cpp.analysis.salt_bridge.SaltBridge(cation_resid: int, cation_resname: str, cation_chain: str, cation_name: str, cation_xyz: tuple[float, float, float], anion_resid: int, anion_resname: str, anion_chain: str, anion_name: str, anion_xyz: tuple[float, float, float], distance: float)[source]
Bases:
objectDescribe a single salt bridge between a cationic and anionic atom.
- anion_chain: str
- anion_name: str
- anion_resid: int
- anion_resname: str
- anion_xyz: tuple[float, float, float]
- cation_chain: str
- cation_name: str
- cation_resid: int
- cation_resname: str
- cation_xyz: tuple[float, float, float]
- distance: float
- pdb_cpp.analysis.salt_bridge.salt_bridges(coor, cation_sel: str = 'protein', anion_sel: str = 'protein', cutoff: float = 4.0)[source]
Identify salt bridges between two selections for every model in coor.
Salt bridges are detected between explicitly typed cationic and anionic heavy atoms using a simple distance cutoff.
pdb_cpp.analysis.sasa module
SASA and interface-SASA helpers.
- pdb_cpp.analysis.sasa.buried_surface_area(coor, receptor_sel, ligand_sel, probe_radius=1.4, n_points=960, include_hydrogen=False, by_residue=False)[source]
Compute buried interface surface for each model in a Coor object.
- pdb_cpp.analysis.sasa.sasa(coor, selection=None, probe_radius=1.4, n_points=960, include_hydrogen=False, by_atom=False, by_residue=False)[source]
Compute SASA for each model in a Coor object.
- pdb_cpp.analysis.sasa.shape_complementarity(coor, receptor_sel, ligand_sel, probe_radius=1.4, dots_per_sq_angstrom=12.0, search_radius=1.5, include_hydrogen=False, reducer='trimmed_mean', trim_fraction=0.1)[source]
Estimate Lawrence-Colman style shape complementarity for an interface.
Surface dots are generated independently for each partner using a rolling-probe surface with outward normals. Each interface dot is matched to its closest dot on the opposite partner within
search_radiusand scored with the normal complementarity termdot(n_a, -n_b).
Module contents
High-level analysis namespace.
This package groups structure-analysis helpers into topic-oriented modules:
The historical flat API is preserved, so existing code such as
pdb_cpp.analysis.rmsd(...) keeps working.
- pdb_cpp.analysis.buried_surface_area(coor, receptor_sel, ligand_sel, probe_radius=1.4, n_points=960, include_hydrogen=False, by_residue=False)[source]
Compute buried interface surface for each model in a Coor object.
- pdb_cpp.analysis.compute_hbonds(coor, donor_sel='protein', acceptor_sel='protein', dist_DA_cutoff=3.5, dist_HA_cutoff=2.5, angle_cutoff=90.0)
Compute hydrogen bonds between two selections for every model in coor.
- Parameters:
- coorCoor
Coordinate object (one or more models / frames).
- donor_selstr, optional
Atom selection string for donor atoms (default:
"protein"). Use"protein or nucleic"or"nucleic"to include nucleic acids.- acceptor_selstr, optional
Atom selection string for acceptor atoms (default:
"protein").- dist_DA_cutofffloat, optional
Maximum donor-heavy to acceptor distance in Å (default 3.5).
- dist_HA_cutofffloat, optional
Maximum hydrogen to acceptor distance in Å (default 2.5).
- angle_cutofffloat, optional
Minimum D−H···A angle in degrees (default 90).
- Returns:
- list[list[HBond]]
One list of
HBondobjects per model frame. EachHBondhas the following read-only attributes:donor_resid– unique residue ID of the donordonor_resname– residue name of the donordonor_chain– chain ID of the donordonor_heavy_name– heavy donor atom name (e.g."N","OG")donor_h_name– hydrogen atom name (actual or virtual)donor_heavy_xyz– (x, y, z) of the donor heavy atomdonor_h_xyz– (x, y, z) of the H (actual or reconstructed)acceptor_resid– unique residue ID of the acceptoracceptor_resname– residue name of the acceptoracceptor_chain– chain ID of the acceptoracceptor_name– acceptor atom name (e.g."O","OD1")acceptor_xyz– (x, y, z) of the acceptor atomdist_DA– D···A distance (Å)dist_HA– H···A distance (Å)angle_DHA– D−H···A angle (degrees)
- pdb_cpp.analysis.dockQ(coor, native_coor, rec_chains=None, lig_chains=None, native_rec_chains=None, native_lig_chains=None, back_atom=None, _search_mode=False)[source]
Compute DockQ scores between a model and a native structure.
DockQ combines interface contacts (Fnat), ligand RMSD (LRMS), and interface RMSD (iRMS) into a single docking quality metric. Chain roles are inferred by selecting the shortest chain as ligand when not provided.
- Parameters:
- coorCoor
Model coordinates.
- native_coorCoor
Native coordinates.
- rec_chainslist[str], optional
Model receptor chains. If
None, uses all chains except the shortest chain (ligand).- lig_chainslist[str], optional
Model ligand chains. If
None, uses the shortest chain.- native_rec_chainslist[str], optional
Native receptor chains. If
None, uses all chains except the shortest chain (ligand).- native_lig_chainslist[str], optional
Native ligand chains. If
None, uses the shortest chain.- back_atomlist[str], optional
Backbone atom names used for alignment and RMSD calculations.
- Returns:
- dict
Dictionary with keys
Fnat,Fnonnat,rRMS,iRMS,LRMS, andDockQ, each containing lists per model.
Notes
This implementation mirrors the pdb_numpy DockQ pipeline and relies on sequence-based alignment for receptor superposition before computing the ligand RMSD and interface metrics.
- pdb_cpp.analysis.dockQ_multimer(coor, native_coor, chain_map=None, back_atom=None, n_cpu=1, _search_mode=False, _native_iface_cache=None, _model_backbone_cache=None)[source]
Compute DockQ over all pairwise native chain interfaces (multimer).
Scores every \(\binom{n}{2}\) interface between the n native chains and returns per-interface DockQ metrics as well as GlobalDockQ (the average DockQ over all interfaces), mirroring the DockQ v2 multimer output.
- Parameters:
- coorCoor
Model coordinates.
- native_coorCoor
Native coordinates.
- chain_mapdict[str, str], optional
Mapping from native chain IDs to model chain IDs. If
Nonethe function assumes that both structures share the same chain names and builds an identity mapping for chains present in both.- back_atomlist[str], optional
Backbone atom names used for alignment and RMSD calculations.
- Returns:
- dict
Dictionary with two keys:
"interfaces"dict[(native_ch1, native_ch2), result]where result is the dict returned bydockQ()for that pair (orNonewhen the interface could not be scored)."GlobalDockQ"list[float]— average DockQ over all valid interfaces, one value per model frame.
Notes
The larger native chain of each pair is used as receptor to match the DockQ v2 convention. For chains of equal length the pair is presented in the order they appear in the
chain_mapiteration order.
- pdb_cpp.analysis.interface_rmsd(coor, coor_native, rec_chains_native, lig_chains_native, cutoff=10.0, back_atom=None, index_pair=None)[source]
Compute the interface RMSD between two models.
The interface is defined as atoms within
cutoffAngstrom of the opposite partner chain(s) in the native structure. The RMSD is computed on the selected backbone atoms after aligning the model to the native structure using the interface atoms.- Parameters:
- coorCoor
Model coordinates.
- coor_nativeCoor
Native coordinates.
- rec_chains_nativelist[str]
Native receptor chains.
- lig_chains_nativelist[str]
Native ligand chains.
- cutofffloat, default=10.0
Interface distance cutoff in Angstrom.
- back_atomlist[str], optional
Backbone atom names used for RMSD (default:
["CA", "N", "C", "O"]).- index_pairtuple[list[int], list[int]], optional
Pre-computed
(model_indices, native_indices)of the interface backbone atoms. When supplied the function skips the chain-selection and residue-mapping steps entirely, callingalign_index_based()directly. This correctly handles non-sequential or non-contiguous index lists (e.g. when model chains are numbered from 1 on every chain and thus have overlappingresidvalues).
- Returns:
- list[float]
Interface RMSD values for each model. Returns
Noneentries when no interface residues are found.
- pdb_cpp.analysis.native_contact(coor, native_coor, rec_chains, lig_chains, native_rec_chains, native_lig_chains, cutoff=5.0, residue_id_map=None, native_residue_id_map=None)[source]
Compute native and non-native contact fractions between model and native.
The function builds the set of native receptor-ligand residue contacts within
cutoffAngstrom in the native structure, then counts which of those contacts are present in the model (Fnat) and which model contacts are not native (Fnonnat).- Parameters:
- coorCoor
Model coordinates.
- native_coorCoor
Native coordinates.
- rec_chainslist[str]
Model receptor chains.
- lig_chainslist[str]
Model ligand chains.
- native_rec_chainslist[str]
Native receptor chains.
- native_lig_chainslist[str]
Native ligand chains.
- cutofffloat, default=5.0
Contact distance cutoff in Angstrom.
- residue_id_mapdict[int, int], optional
Mapping from model residue IDs to a shared residue ID space.
- native_residue_id_mapdict[int, int], optional
Mapping from native residue IDs to the same shared residue ID space.
- Returns:
- tuple[list[float], list[float]]
(fnat_list, fnonnat_list)for each model incoor.
- pdb_cpp.analysis.rmsd(coor_1, coor_2, selection='name CA', index_list=None, frame_ref=0)[source]
Compute RMSD between two sets of coordinates.
- Parameters:
- coor_1Coor
First set of coordinates.
- coor_2Coor
Second set of coordinates.
- selectionstr, optional
Selection string used when index_list is not provided.
- index_listlist, optional
Pair of index lists [index_1, index_2].
- frame_refint, optional
Reference frame index in coor_2.
- Returns:
- list[float]
RMSD values for each model in coor_1.
- pdb_cpp.analysis.salt_bridges(coor, cation_sel: str = 'protein', anion_sel: str = 'protein', cutoff: float = 4.0)[source]
Identify salt bridges between two selections for every model in coor.
Salt bridges are detected between explicitly typed cationic and anionic heavy atoms using a simple distance cutoff.
- pdb_cpp.analysis.shape_complementarity(coor, receptor_sel, ligand_sel, probe_radius=1.4, dots_per_sq_angstrom=12.0, search_radius=1.5, include_hydrogen=False, reducer='trimmed_mean', trim_fraction=0.1)[source]
Estimate Lawrence-Colman style shape complementarity for an interface.
Surface dots are generated independently for each partner using a rolling-probe surface with outward normals. Each interface dot is matched to its closest dot on the opposite partner within
search_radiusand scored with the normal complementarity termdot(n_a, -n_b).