Skip to content

Commit

Permalink
Merge pull request #55 from bobleesj/doc-strings
Browse files Browse the repository at this point in the history
Add docstrings of best methods and radius value information in Cif
  • Loading branch information
bobleesj authored Oct 28, 2024
2 parents c5b03da + 40891b9 commit dbaf324
Show file tree
Hide file tree
Showing 22 changed files with 382 additions and 39 deletions.
22 changes: 22 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
## What type of PR is this? (check all applicable)

- [ ] Refactor
- [ ] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update

## Description (Screenshots, files, etc)

## Checklist

- [ ] Are the tests passing?
- [ ] If it's a new feature, have tests been written?
- [ ] Have you added the `.rst` news file?

## Added to documentation?

- [ ] README.md
- [ ] Official documentation
- [ ] Google Codelab
- [ ] No documentation needed
2 changes: 2 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ pytest
After completing your changes, stage and commit your work:

```bash
pre-commit run --all-files # pip install pre-commit required

git add .
git commit -m "Describe your changes"
git push origin branch-name
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ The following example generates a distribution of structure.
```python
from cifkit import CifEnsemble

ensemble = CifEnsemble("cif_containing_folder_path")
ensemble = CifEnsemble("your_folder_path_containing_cif_files")
ensemble.generate_structure_histogram()
```

Expand Down
27 changes: 15 additions & 12 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# Getting started


## Statement of need

`cifkit` uses .cif files by offering higher-level
Expand Down Expand Up @@ -44,32 +43,36 @@ mixing, among other parameters.
attributes such as coordination numbers, space groups, unit cells, shortest
distances, elements, and more.

## Processing speed expectation

Based on the Apple M1 iMac chip, processing a .cif file of 150 files takes around 50 seconds, but this depends on the size of the unit cell in the .cif files. Processing 10,000 .cif files or more may take about an hour, but this also depends on the laptop.

## Installation

Python 3.10, 3.11, 3.12 are supported.

![Python - Version](https://img.shields.io/pypi/pyversions/quacc)
![Python - Version](https://img.shields.io/pypi/pyversions/cifkit)
[![PyPi version](https://img.shields.io/pypi/v/cifkit.svg)](https://pypi.python.org/pypi/cifkit)
[![Conda version](https://img.shields.io/conda/vn/conda-forge/cifkit)](https://anaconda.org/conda-forge/cifkit)


Option 1. pip install
Option 1. conda install

```bash
pip install cifkit
```

Option 2. conda install
The preferred method is to install `cifkit` using Conda.

```bash
conda install cifkit
conda create cifkit_env cifkit
conda activate cifkit_env
```

If you are new to Conda, feel free to read the following blog posts:

- [How to use Python package manager for beginners (Ft. Conda with Cheatsheet)](https://bobleesj.github.io/tutorial/2024/02/26/intro-to-python-package-manager.html)
- [Why there are two Python installation methods (Ft. Conda and pip)](https://bobleesj.github.io/tutorial/2024/08/31/conda-pip-installation.html)
If the above option does not work, please feel free to use pip install.

Option 2. pip install

```bash
pip install cifkit
```

## Overview

Expand Down
8 changes: 5 additions & 3 deletions docs/notebooks/01_cif.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"You can initialize `Cif` object using a file path to the `.cif` file. Or you can simply use the example `.cif` provided in `cifkit` below."
"You can initialize `Cif` object using a file path to the `.cif` file. Or you can simply use the example `.cif` provided in `cifkit` below.\n",
"\n",
"In `cifkit` we provide .cif files that can be accessed through `from cifkit import Example` as shown below. For advancuser, these example .cif files are located under `src/cifkit/data` in the package."
]
},
{
Expand Down Expand Up @@ -348,7 +350,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "cif-test",
"display_name": "cifkit_env",
"language": "python",
"name": "python3"
},
Expand All @@ -362,7 +364,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
"version": "3.12.7"
}
},
"nbformat": 4,
Expand Down
11 changes: 10 additions & 1 deletion docs/notebooks/02_cif_ensemble.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -322,6 +322,7 @@
"metadata": {},
"outputs": [],
"source": [
"ensemble = CifEnsemble(Example.ErCoIn_big_folder_path)\n",
"ensemble.generate_structure_histogram()\n",
"ensemble.generate_formula_histogram()\n",
"ensemble.generate_tag_histogram()\n",
Expand All @@ -332,7 +333,15 @@
"ensemble.generate_CN_by_min_dist_method_histogram()\n",
"ensemble.generate_CN_by_best_methods_histogram()\n",
"ensemble.generate_composition_type_histogram()\n",
"ensemble.generate_site_mixing_type_histogram()"
"ensemble.generate_site_mixing_type_histogram()\n",
"\n",
"'''\n",
"# Optional: Specify the output directory where the .png file will be saved.\n",
"ensemble.generate_site_mixing_type_histogram(output_dir=\"path/to/directory\")\n",
"\n",
"# Optional: Call plt.show() to display the histogram on screen.\n",
"ensemble.generate_site_mixing_type_histogram(display=False)\n",
"'''"
]
}
],
Expand Down
2 changes: 1 addition & 1 deletion src/cifkit/coordination/composition.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ def get_bond_counts(
elements: list[str],
connections: dict[str, list],
sorted_by_mendeleev=False,
) -> dict:
) -> dict[str, dict[tuple[str, str], int]]:
"""Return a dictionary containing bond pairs and counts per label site."""
if sorted_by_mendeleev:
bond_pairs = bond_pair.get_pairs_sorted_by_mendeleev(elements)
Expand Down
2 changes: 1 addition & 1 deletion src/cifkit/coordination/method.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ def compute_CN_max_gap_per_site(
all_labels_connections,
is_radius_data_available: bool,
site_mixing_type: str,
):
) -> dict[str: dict[str: dict[str: float]]]:
use_all_methods = False

if is_radius_data_available and site_mixing_type == "full_occupancy":
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/cifkit/data/ErCoIn_big/histograms/tag.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 0 additions & 1 deletion src/cifkit/data/radius_optimization.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ def constraint(params, index_pair: tuple[int, int], shortest_distance: dict):
"""Enforce that the sum of the radii of the pair does not exceed the shortest
allowed distance between them."""
i, j = index_pair
i, j = index_pair
return shortest_distance - (params[i] + params[j])


Expand Down
160 changes: 143 additions & 17 deletions src/cifkit/models/cif.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,9 +143,14 @@ def __init__(
Set of site label pairs sorted by Mendeleev Numbers.
site_mixing_type : str
Descriptor of the mixing type, categorized into four types:
deficiency_atomic_mixing,
full_occupancy_atomic_mixing, deficiency_without_atomic_mixing,
full_occupancy.
Full occupancy is assigned when a single atomic site occupies
the fractional coordinate with an occupancy value of 1.
Full occupancy with mixing is assigned when multiple atomic sites
collectively occupy the fractional coordinate to a sum of 1.
Deficiency without mixing is assigned when a single atomic site occupying
the fractional coordinate with a sum less than 1.
Deficiency with atomic mixing is assigned when multiple atomic sites occupy
the fractional coordinate with a sum less than 1.
is_radius_data_available : bool
Indicates whether Pauling and CIF atomic radii are available for
all elements in the .cif file.
Expand All @@ -158,8 +163,9 @@ def __init__(
List of points defining the unit cell; each point contains
fractional coordinates and a site label.
supercell_points : list[list[tuple[float, float, float, str]]]
List of points defining the supercell of the cell, with
translations of ±1, ±1, ±1 from the unit cell.
List of points defining the supercell of the cell For each .cif file,
a unit cell is generated by applying the symmetry operations.
A supercell is generated by applying ±1 shifts from the unit cell.
unitcell_atom_count : int
Total count of atoms within the unit cell.
supercell_atom_count : int
Expand Down Expand Up @@ -321,6 +327,7 @@ def compute_connections(self, cutoff_radius=10.0) -> None:

# Find the best methods
self._CN_best_methods = find_best_polyhedron(

self.CN_max_gap_per_site, self.connections
)

Expand Down Expand Up @@ -468,7 +475,7 @@ def shortest_bond_pair_distance(self):
Examples
--------
>>> cif.shortest_bond_pair_distance
>>> {
{
("In", "In"): 3.244,
("In", "Rh"): 2.697,
("In", "U"): 3.21,
Expand Down Expand Up @@ -496,7 +503,7 @@ def shortest_site_pair_distance(self):
Examples
--------
>>> cif.shortest_site_pair_distance
>>> {
{
"In1": ("Rh2", 2.697),
"Rh1": ("In1", 2.852),
"Rh2": ("In1", 2.697),
Expand All @@ -508,23 +515,40 @@ def shortest_site_pair_distance(self):
@property
@ensure_connections
def radius_values(self):
"""Retrieve CIF radius, CIF_refined radius, and Pauling C12 radius.
This property uses lazily loaded connections to compute these distances
if they are not already available because the CIF radius values are
determined using the shortest bonding pair from
shortest_bond_pair_distance.
"""Retrieve CIF radius, CIF_refined radius, and Pauling C12 radius for
each element.
This property uses lazy loading to compute or retrieve radius values only when
needed, optimizing performance. The CIF radius and Pauling C12 radius are standard
values sourced from `data/radius.py` for each element. In contrast, the
CIF_refined radius is calculated based on bonding distances to ensure accuracy
across different environments.
- **CIF_radius**: The standard radius value commonly determined from
elemental .cif files, the approximate size of an atom within a crystal structure.
- **CIF_radius_refined**: An optimized radius calculated to ensure that, across
all bonding pairs, the sum of the two radii in a bonded pair attempts to
matches the shortest unique observed bond distances. This refinement is designed
to improve packing efficiency within a coordination polyhedron.
- **Pauling_radius_CN12**: The Pauling radius of the element, calculated with a
coordination number (CN) of 12, providing a basis for comparison with other radius
types.
Returns
-------
dict[str : dict[str:float]]
Dictionary where each key is an atomic label and the value is a dictionary
containing the CIF radius, CIF_refined radius, and Pauling C12 radius in
Angstroms.
dict[str, dict[str, float]]
A dictionary where each key is an atomic label (e.g., "In", "Rh", "U"), and
the corresponding value is a dictionary with radius information in Angstroms:
- `CIF_radius` (float): The standard CIF radius.
- `CIF_radius_refined` (float): The optimized radius based on CIF radius.
- `Pauling_radius_CN12` (float): The Pauling radius with a coordination
number of 12, parsed from literature.
Examples
--------
>>> cif.radius_values
>>> {
{
"In": {
"CIF_radius": 1.624,
"CIF_radius_refined": 1.328,
Expand Down Expand Up @@ -593,11 +617,113 @@ def radius_sum(self):
@property
@ensure_connections
def CN_max_gap_per_site(self):
"""Determines the maximum gap in coordination number (CN) for each
atomic site.
For each atomic site, considers the first 20 nearest neighbors. The distances
to these neighbors are normalized based on four methods:
- `dist_by_shortest_dist`: Normalization by the shortest distance from the site.
- `dist_by_CIF_radius_sum`: Normalization by the sum of CIF radii.
- `dist_by_CIF_radius_refined_sum`: Normalization by the sum of refined CIF radii.
- `dist_by_Pauling_radius_sum`: Normalization by the sum of Pauling radii.
The radius sums are calculated for each element pair involved. For each
normalization method, the maximum gap is determined as the largest difference
between consecutive normalized distances (i.e., the difference between the nth
and (n-1)th neighbors).
This CN gap provides insight into the bonding relevance for each site.
Returns
-------
dict of dict of dict
A dictionary where each key represents an atomic site, mapping to another
dictionary with normalization methods as keys. Each normalization method
contains a dictionary with:
- `max_gap` (float): The maximum gap in the normalized distances.
- `CN` (int): Coordination number based on the normalization method.
Examples
--------
>>> cif.CN_max_gap_per_site
{
"In1": {
"dist_by_shortest_dist": {"max_gap": 0.306, "CN": 14},
"dist_by_CIF_radius_sum": {"max_gap": 0.39, "CN": 14},
"dist_by_CIF_radius_refined_sum": {"max_gap": 0.341, "CN": 12},
"dist_by_Pauling_radius_sum": {"max_gap": 0.398, "CN": 14},
},
"U1": {
"dist_by_shortest_dist": {"max_gap": 0.197, "CN": 11},
"dist_by_CIF_radius_sum": {"max_gap": 0.312, "CN": 11},
"dist_by_CIF_radius_refined_sum": {"max_gap": 0.27, "CN": 17},
"dist_by_Pauling_radius_sum": {"max_gap": 0.256, "CN": 17},
},
"Rh1": {
"dist_by_shortest_dist": {"max_gap": 0.315, "CN": 9},
"dist_by_CIF_radius_sum": {"max_gap": 0.347, "CN": 9},
"dist_by_CIF_radius_refined_sum": {"max_gap": 0.418, "CN": 9},
"dist_by_Pauling_radius_sum": {"max_gap": 0.402, "CN": 9},
},
"Rh2": {
"dist_by_shortest_dist": {"max_gap": 0.31, "CN": 9},
"dist_by_CIF_radius_sum": {"max_gap": 0.324, "CN": 9},
"dist_by_CIF_radius_refined_sum": {"max_gap": 0.397, "CN": 9},
"dist_by_Pauling_radius_sum": {"max_gap": 0.380, "CN": 9},
},
}
"""
return self._CN_max_gap_per_site

@property
@ensure_connections
def CN_best_methods(self):
"""Determines the optimal coordination method for each atomic site.
For each atomic site, the coordination polyhedron is generated for each method
in `self.CN_max_gap_per_site`. The method with the smallest value of
`polyhedron_metrics["distance_from_avg_point_to_center"]`, indicating the highest
symmetry of the polyhedron, is selected as the "best method" among the four
methods used to determine the CN gap in `self.CN_max_gap_per_site`.
Returns
-------
dict[str, dict[str, float | int | str]]]
Dictionary where each key represents an atomic site, and the corresponding
value is a dictionary containing:
- `volume_of_polyhedron` (float): The volume of the polyhedron surrounding
the atomic site.
- `distance_from_avg_point_to_center` (float): The average distance from
the polyhedron's vertices to its geometric center, used as a measure of
symmetry.
- `number_of_vertices` (int): The number of vertices in the coordination
polyhedron.
- `number_of_edges` (int): The number of edges connecting vertices in the
polyhedron.
- `number_of_faces` (int): The number of faces in the coordination polyhedron.
- `shortest_distance_to_face` (float): The shortest distance between the
atomic site and the nearest face.
- `shortest_distance_to_edge` (float): The shortest distance between the
atomic site and the nearest edge.
- `volume_of_inscribed_sphere` (float): Volume of the largest sphere that can
it inside the polyhedron.
- `packing_efficiency` (float): A measure of how efficiently the polyhedron
is packed around the atomic site.
- `method_used` (str): The name of the chosen method
(e.g., `dist_by_shortest_dist`) providing the highest symmetry based on
`distance_from_avg_point_to_center`.
Examples
--------
>>> CN_best_methods = cif_URhIn.CN_best_methods
>>> CN_best_methods["In1"]["number_of_vertices"] == 14
>>> CN_best_methods["Rh2"]["number_of_vertices"] == 9
>>> CN_best_methods["In1"]["method_used"] == "dist_by_shortest_dist"
>>> CN_best_methods["Rh2"]["method_used"] == "dist_by_shortest_dist"
"""
return self._CN_best_methods

@property
Expand Down
Loading

0 comments on commit dbaf324

Please sign in to comment.