While the most prominent functionality of the CDK lies around small organic molecules, there is support for protein structures too. Of course, protein are just large organic molecules, and the IAtomContainer can simply be used. The same holds for DNA strands. However, there is more extensive support for protein and DNA in the CDK, and this chapter will outline some of that. The core interface is the IBioPolymer interface, which is derived from an IAtomContainer. Figure proteinClass shows its hierarchy.
![](images/biopolymer.png)One straightforward way to create protein and DNA structures is to read them from PDB files [Q24650571]. Chapter io explains how files are read in general. For PDB files, the PDBReader should be used. A code example showing how to use this reader is given by Script script:PDBCoordinateExtraction.
Of course, we can also read PDB files from a local disc. The results are read into a IChemFile. from which the first IAtomContainer is the IBioPolymer. For example, we can read crambin [Q34206190]:
ProteinFromFile
Which returns:
ProteinFromFile
It is also possible to create an protein data structure starting from a sequence with the ProteinBuilder class:
ProteinFromSequence
Because a IBioPolymer extends the IAtomContainer interface, we can simply query for the number of atoms, as done here. The scripts returns us:
ProteinFromSequence
The IBioPolymer interface is modeled after PDB files, those being their primary use case. Therefore, the data structure can hold atoms part of proteins, hetero atoms, solvents, etc. The atoms in the protein structure itself, are also part of a monomer, but also of strands, which consist of a sequence of polymers. So, a BioPolymer is not a single polymeric molecule.
There are access methods for the strand information we can use to iterate over the sequence of a biopolymer:
ProteinStrands
This returns a list of strands and the number of atoms per strand.
ProteinStrands
Each strand consists of a sequence monomers, over which we can iterate:
ProteinMonomers
The full script has some hidden code to only list the first few monomers:
ProteinMonomers
The IStrand and IMonomer interfaces provide functionality to access
specific properties, but also extend the IAtomContainer
interface, as depicted
in Figure strandmonomerClass. Both provide access to a name for the entity as
well as a type:
BioNameType
Using these methods, we get some additional information about the strands and monomers:
BioNameType
![](images/strandmonomer.png)