Skip to content

Latest commit

 

History

History
85 lines (54 loc) · 3.27 KB

protein.i.md

File metadata and controls

85 lines (54 loc) · 3.27 KB

Protein and DNA

While the most prominent functionality of the CDK lies around small organic molecules, there is support for protein structures too. Of course, protein are just large organic molecules, and the IAtomContainer can simply be used. The same holds for DNA strands. However, there is more extensive support for protein and DNA in the CDK, and this chapter will outline some of that. The core interface is the IBioPolymer interface, which is derived from an IAtomContainer. Figure proteinClass shows its hierarchy.

![](images/biopolymer.png)

Protein From File

One straightforward way to create protein and DNA structures is to read them from PDB files [Q24650571]. Chapter io explains how files are read in general. For PDB files, the PDBReader should be used. A code example showing how to use this reader is given by Script script:PDBCoordinateExtraction.

Of course, we can also read PDB files from a local disc. The results are read into a IChemFile. from which the first IAtomContainer is the IBioPolymer. For example, we can read crambin [Q34206190]:

ProteinFromFile

Which returns:

ProteinFromFile

Protein From Sequence

It is also possible to create an protein data structure starting from a sequence with the ProteinBuilder class:

ProteinFromSequence

Because a IBioPolymer extends the IAtomContainer interface, we can simply query for the number of atoms, as done here. The scripts returns us:

ProteinFromSequence

Strands and Monomers

The IBioPolymer interface is modeled after PDB files, those being their primary use case. Therefore, the data structure can hold atoms part of proteins, hetero atoms, solvents, etc. The atoms in the protein structure itself, are also part of a monomer, but also of strands, which consist of a sequence of polymers. So, a BioPolymer is not a single polymeric molecule.

There are access methods for the strand information we can use to iterate over the sequence of a biopolymer:

ProteinStrands

This returns a list of strands and the number of atoms per strand.

ProteinStrands

Each strand consists of a sequence monomers, over which we can iterate:

ProteinMonomers

The full script has some hidden code to only list the first few monomers:

ProteinMonomers

The IStrand and IMonomer interfaces provide functionality to access specific properties, but also extend the IAtomContainer interface, as depicted in Figure strandmonomerClass. Both provide access to a name for the entity as well as a type:

BioNameType

Using these methods, we get some additional information about the strands and monomers:

BioNameType

![](images/strandmonomer.png)

References