The basic objects in the CDK are the IAtom, IBond and IAtomContainer [Q27061829]. The name of the latter is somewhat misleading, as it contains not just IAtoms but also IBonds. The primary use of the model is the graph-based representation of molecules, where bonds are edges between two atoms being the nodes [Q37988904].
Before we start, it is important to note that CDK 2.0 has an important
convention around object properties: when a property is unset, the
object’s field is set to null. This brings in sources for NullPointerExceptions
,
but also allows us to distinguish between, for example, zero and unset
formal charge. In the former case, the formal charge value be set and have
a zero value; in the latter case, the field has a null value, indicating the
formal charge is currently unknown.
The CDK interface IAtom is the underlying data model of atoms. Creating a new atom is fairly easy. For example, we can create an atom of element type carbon, as defined by the element’s atomic number that we pass as parameter in the constructor:
CreateAtom3
For this we can also use the atomic number from the IElement
class:
CreateAtom4
An atom can also be constructed by passing in the symbol but this is marginally less efficient:
CreateAtom1
Alternatively, we can also construct a new carbon atom, by passing a carbon IElement, conveniently provided by the Elements class:
CreateAtom2
A CDK atom has many properties, many of them inherited from the IElement
,
IIsotope
and IAtomType
interfaces. Figure atomInheritance shows the interface
inheritance specified by the CDK data model.
These constructors will set the atomic number of the atom:
CreateAtom2
![](images/atomInheritance.png)The most common property of IElements are their symbol and atomic
number. Because the IAtom
extends the IElement
, CDK atoms also have
these properties. Therefore, we can set these properties for atoms
manually too:
ElementProperties
Of course, we can use the matching get methods to recover the properties:
ElementGetProperties
which outputs:
ElementGetProperties
The IIsotope information consists of the mass number, exact mass and natural abundance:
IsotopeProperties
Here too, the complementary get methods are available:
IsotopeGetProperties
giving:
IsotopeGetProperties
Appendix isotopes lists all isotopes defined in the CDK with a natural abundance of more than 0.1.
Atom types are an important concept in cheminformatics. They describe some basic facts about that particular atom in some particular configuration. These properties are used in many cheminformatics algorithms, including adding hydrogens to hydrogen-depleted chemical graphs (see Section implicithydrogens) and force fields. Chapter atomtype provides much more detail on the atom type infrastructure in the CDK library, and, for example, details how atom types can be perceived, and how atom type information is set for atoms.
The IAtomType interface contains fields that relate to atom types. These properties include formal charge, neighbor count, maximum bond order and atom type name:
AtomTypeProperties
The IAtom
class supports three types of coordinates: 2D coordinates,
used for diagrams, 3D coordinates for geometries, and crystal unit cell
or notional coordinates. These properties are set with the respective
methods:
AtomCoordinates
The latter coordinates define the locations of the atoms with respect to (or inside) the crystal structure’s unit cell. Section 5.2 explains the full crystal structure functionality.
The IBond interface of the CDK is an interaction between two or more
IAtom
s, extending the IElectronContainer interface. While the most
common application in the CDK originates from graph theory [Q37988904], it is not
restricted to that. That said, many algorithms implemented in the CDK
expect a graph theory based model, where each bond connects two, and
not more, atoms.
For example, to create ethanol we write:
Ethanol
The CDK has a few bond orders, which we can list with this groovy code:
BondOrders
which outputs:
BondOrders
As you might notice, there is no AROMATIC
bond defined. This is
deliberate and the CDK allows to define single-double bond order patterns at
the same time as aromaticity information. For example, a kekule
structure of benzene with bonds marked as aromatic can be constructed with:
AromaticBond
Bond orders, as we have seen earlier, are commonly used in the CDK to indicate the electronic properties of a bond. At the same time, each bond consists of a number of atoms. For example, in a single (sigma) bond, two electrons are involved. In a double (pi) bond, four electrons are involved, and in a triple bond, six electrons are involved. We can report on the electron counts for the various orders with this code:
ElectronCounts
showing us the default implementation:
ElectronCounts
The IBond.setStereo()
method is discussed in Section stereo:bond.
We already saw in the previous pieces of code how the CDK can be used to create molecules, and while the above is, strictly speaking, enough to find all atoms in the molecule starting with only one of the atoms in the molecule, it often is more convenient to store all atoms and bonds in a container.
The CDK has one container: the IAtomContainer. It is a general container to holds atoms an bonds, and can contain both unconnected as well asfully connected structures. The latter has the added implication that it holds a single molecule, of which all atoms are connected to each other via one or more covalent bonds.
Adding atoms and bonds is done by the methods addAtom(IAtom)
and
addBond(IBond)
:
AtomContainerAddAtomsAndBonds
The addBond()
method has an alternative which takes three parameters:
the first atom, the second atom, and the bond order. Note that atom indices
follows programmers habits and starts at 0
, as you can observe in the
previous example too. This shortens the previous version a bit:
AtomContainerAddAtomsAndBonds2
The IAtomContainer comes with convenience methods to iterate over atoms and bonds. Both methods use the Iterable interfaces, and for atoms we do:
CountHydrogens
which returns
CountHydrogens
And for bonds the equivalent:
CountDoubleBonds
giving
CountDoubleBonds
It is quite common that you like to see what atoms are connected to one particular atom. For example, you may wish to count how many bonds surround a particular atom. Or, you may want to list all atoms that are bound to this atom. The IAtomContainer class provides methods for these use cases. But it should be stressed that these methods do only take into account explicit hydrogens (see the next section).
Let's consider ethanol again, given in Script script:Ethanol, and count the number of neighbors for each atom:
NeighborCount
which lists for the three heavy atoms:
NeighborCount
Similarly, we can also list all connected atoms:
ConnectedAtoms
which outputs:
ConnectedAtoms
We can do the same thing for connected bonds:
ConnectedBonds
which outputs:
ConnectedBonds
Getting the molecular formula of a molecule and returning that as a String is both done with the MolecularFormulaManipulator class:
MFGeneration
giving:
MFGeneration
The CDK has two concepts for hydrogens: implicit hydrogens and explicit hydrogens. Explicit hydrogens are hydrogens that are separate vertices on the chemical graph. Implicit hydrogens, however, are not, and are attributes of existing vertices.
![](images/generated/MethaneImplicit.png)![](images/generated/MethaneExplicit.png)
For example, if we represent methane as a chemical graph, we can define either a hydrogen-depleted chemical graph with a single carbon atom and zero bonds, or a graph with one carbon and four hydrogen atoms, and four bonds connecting the hydrogens to the central carbon. In the latter case, the hydrogens are explicit, while in the former case we can add those four hydrogens as implicit hydrogens on these carbon.
The first option in CDK code looks like:
HydrogenDepletedGraph
while the alternative look like:
HydrogenExplicitGraph
Section missinghydrogens describes how hydrogens can be added programmatically.
Another interface that must be introduced is the IChemOject as it plays an key role in the CDK data model. Almost all interfaces used in the data model inherit from this interface. The IChemObject interface provides a bit of basic functionality, including support for object identifiers, properties, and flags.
For example. identifiers are set and retrieved with the setID()
and
getID()
methods:
ChemObjectIdentifiers
If you have more than one identifier, or other properties you like to
associate with objects, you can use the setProperty()
and
getProperty()
methods:
ChemObjectProperties
For example, we can use this approach to assign labels to atoms, such as in this example from substructure searching (see Chapter substructure):
AtomLabels
The CDKConstants class provides a few constants for common properties:
CDKConstantsProperties
outputting:
CDKConstantsProperties
A third characteristic of the IChemObject interface is the concept of flags. Flags are used in the CDK to indicate, for example, if an atom or bond is aromatic (see Script script:AromaticBond) or if an atom is part of a ring:
RingBond
The next section talks about the CDK data class for \topic{rings}.
One important aspect of molecules is rings, partly because rings can show interesting chemical phenomena. For example, if the number of FIXME electrons is right, then the ring will become aromatic, as we commonly observer in phenyl rings, such as in benzene. But, cheminformatics has many other aspects where one like to know about those rings. For example, 2D coordinate generator (see Section layout) requires algorithms to know what the rings are in a molecule.
![](images/rings.png)Section spanningtree explains what functionality the CDK has to
determine a bond takes part in a ring system. Here, we just introduce the
IRing interface, which extends the more general IAtomContainer
as shown in Figure ring. Practically, there is nothing much to
say about the IRing interface. One method it adds, is to get the size of the
ring:
RingExample
But this should be by definition the same as the number as atoms and bonds:
RingExample
An overview of three algorithms to find rings in atom containers is provided in Section ringsearch. Additionally, you may also be interested in ring sets, explained in Section reactionandringsets.