bobleesj · bobleesj · Oct 28, 2024 · Oct 27, 2024 · Oct 27, 2024 · Oct 27, 2024
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -0,0 +1,22 @@
+## What type of PR is this? (check all applicable)
+
+- [ ] Refactor
+- [ ] Feature
+- [ ] Bug Fix
+- [ ] Optimization
+- [ ] Documentation Update
+
+## Description (Screenshots, files, etc)
+
+## Checklist
+
+- [ ] Are the tests passing?
+- [ ] If it's a new feature, have tests been written?
+- [ ] Have you added the `.rst` news file?
+
+## Added to documentation?
+
+- [ ] README.md
+- [ ] Official documentation
+- [ ] Google Codelab
+- [ ] No documentation needed
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -74,6 +74,8 @@ pytest
 After completing your changes, stage and commit your work:
 
 ```bash
+pre-commit run --all-files # pip install pre-commit required
+
 git add .
 git commit -m "Describe your changes"
 git push origin branch-name

diff --git a/README.md b/README.md
@@ -63,7 +63,7 @@ The following example generates a distribution of structure.
 ```python
 from cifkit import CifEnsemble
 
-ensemble = CifEnsemble("cif_containing_folder_path")
+ensemble = CifEnsemble("your_folder_path_containing_cif_files")
 ensemble.generate_structure_histogram()
 ```
 

diff --git a/docs/index.md b/docs/index.md
@@ -1,6 +1,5 @@
 # Getting started
 
-
 ## Statement of need
 
 `cifkit` uses .cif files by offering higher-level
@@ -44,32 +43,36 @@ mixing, among other parameters.
   attributes such as coordination numbers, space groups, unit cells, shortest
   distances, elements, and more.
 
+## Processing speed expectation
+
+Based on the Apple M1 iMac chip, processing a .cif file of 150 files takes around 50 seconds, but this depends on the size of the unit cell in the .cif files. Processing 10,000 .cif files or more may take about an hour, but this also depends on the laptop.
+
 ## Installation
 
 Python 3.10, 3.11, 3.12 are supported.
 
-![Python - Version](https://img.shields.io/pypi/pyversions/quacc)
+![Python - Version](https://img.shields.io/pypi/pyversions/cifkit)
 [![PyPi version](https://img.shields.io/pypi/v/cifkit.svg)](https://pypi.python.org/pypi/cifkit)
 [![Conda version](https://img.shields.io/conda/vn/conda-forge/cifkit)](https://anaconda.org/conda-forge/cifkit)
 
 
-Option 1. pip install
+Option 1. conda install
 
-```bash
-pip install cifkit
-```
-
-Option 2. conda install
+The preferred method is to install `cifkit` using Conda.
 
 ```bash
-conda install cifkit
+conda create cifkit_env cifkit
+conda activate cifkit_env
 ```
 
-If you are new to Conda, feel free to read the following blog posts:
 
-- [How to use Python package manager for beginners (Ft. Conda with Cheatsheet)](https://bobleesj.github.io/tutorial/2024/02/26/intro-to-python-package-manager.html)
-- [Why there are two Python installation methods (Ft. Conda and pip)](https://bobleesj.github.io/tutorial/2024/08/31/conda-pip-installation.html)
+If the above option does not work, please feel free to use pip install.
+
+Option 2. pip install
 
+```bash
+pip install cifkit
+```
 
 ## Overview
 

diff --git a/docs/notebooks/01_cif.ipynb b/docs/notebooks/01_cif.ipynb
@@ -33,7 +33,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "You can initialize `Cif` object using a file path to the `.cif` file. Or you can simply use the example `.cif` provided in `cifkit` below."
+    "You can initialize `Cif` object using a file path to the `.cif` file. Or you can simply use the example `.cif` provided in `cifkit` below.\n",
+    "\n",
+    "In `cifkit` we provide .cif files that can be accessed through `from cifkit import Example` as shown below. For advancuser, these example .cif files are located under `src/cifkit/data` in the package."
    ]
   },
   {
@@ -348,7 +350,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "cif-test",
+   "display_name": "cifkit_env",
    "language": "python",
    "name": "python3"
   },
@@ -362,7 +364,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.3"
+   "version": "3.12.7"
   }
  },
  "nbformat": 4,

diff --git a/docs/notebooks/02_cif_ensemble.ipynb b/docs/notebooks/02_cif_ensemble.ipynb
@@ -322,6 +322,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "ensemble = CifEnsemble(Example.ErCoIn_big_folder_path)\n",
     "ensemble.generate_structure_histogram()\n",
     "ensemble.generate_formula_histogram()\n",
     "ensemble.generate_tag_histogram()\n",
@@ -332,7 +333,15 @@
     "ensemble.generate_CN_by_min_dist_method_histogram()\n",
     "ensemble.generate_CN_by_best_methods_histogram()\n",
     "ensemble.generate_composition_type_histogram()\n",
-    "ensemble.generate_site_mixing_type_histogram()"
+    "ensemble.generate_site_mixing_type_histogram()\n",
+    "\n",
+    "'''\n",
+    "# Optional: Specify the output directory where the .png file will be saved.\n",
+    "ensemble.generate_site_mixing_type_histogram(output_dir=\"path/to/directory\")\n",
+    "\n",
+    "# Optional: Call plt.show() to display the histogram on screen.\n",
+    "ensemble.generate_site_mixing_type_histogram(display=False)\n",
+    "'''"
    ]
   }
  ],

diff --git a/src/cifkit/coordination/composition.py b/src/cifkit/coordination/composition.py
@@ -8,7 +8,7 @@ def get_bond_counts(
     elements: list[str],
     connections: dict[str, list],
     sorted_by_mendeleev=False,
-) -> dict:
+) -> dict[str, dict[tuple[str, str], int]]:
     """Return a dictionary containing bond pairs and counts per label site."""
     if sorted_by_mendeleev:
         bond_pairs = bond_pair.get_pairs_sorted_by_mendeleev(elements)

diff --git a/src/cifkit/coordination/method.py b/src/cifkit/coordination/method.py
@@ -6,7 +6,7 @@ def compute_CN_max_gap_per_site(
     all_labels_connections,
     is_radius_data_available: bool,
     site_mixing_type: str,
-):
+) -> dict[str: dict[str: dict[str: float]]]:
     use_all_methods = False
 
     if is_radius_data_available and site_mixing_type == "full_occupancy":

diff --git a/src/cifkit/data/ErCoIn_big/histograms/CN_by_best_methods.png b/src/cifkit/data/ErCoIn_big/histograms/CN_by_best_methods.png
diff --git a/src/cifkit/data/ErCoIn_big/histograms/CN_by_min_dist_method.png b/src/cifkit/data/ErCoIn_big/histograms/CN_by_min_dist_method.png
diff --git a/src/cifkit/data/ErCoIn_big/histograms/composition_type.png b/src/cifkit/data/ErCoIn_big/histograms/composition_type.png
diff --git a/src/cifkit/data/ErCoIn_big/histograms/elements.png b/src/cifkit/data/ErCoIn_big/histograms/elements.png
diff --git a/src/cifkit/data/ErCoIn_big/histograms/formula.png b/src/cifkit/data/ErCoIn_big/histograms/formula.png
diff --git a/src/cifkit/data/ErCoIn_big/histograms/site_mixing_type.png b/src/cifkit/data/ErCoIn_big/histograms/site_mixing_type.png
diff --git a/src/cifkit/data/ErCoIn_big/histograms/space_group_name.png b/src/cifkit/data/ErCoIn_big/histograms/space_group_name.png
diff --git a/src/cifkit/data/ErCoIn_big/histograms/space_group_number.png b/src/cifkit/data/ErCoIn_big/histograms/space_group_number.png
diff --git a/src/cifkit/data/ErCoIn_big/histograms/structures.png b/src/cifkit/data/ErCoIn_big/histograms/structures.png
diff --git a/src/cifkit/data/ErCoIn_big/histograms/supercell_size.png b/src/cifkit/data/ErCoIn_big/histograms/supercell_size.png
diff --git a/src/cifkit/data/ErCoIn_big/histograms/tag.png b/src/cifkit/data/ErCoIn_big/histograms/tag.png
diff --git a/src/cifkit/data/radius_optimization.py b/src/cifkit/data/radius_optimization.py
@@ -31,7 +31,6 @@ def constraint(params, index_pair: tuple[int, int], shortest_distance: dict):
     """Enforce that the sum of the radii of the pair does not exceed the shortest
     allowed distance between them."""
     i, j = index_pair
-    i, j = index_pair
     return shortest_distance - (params[i] + params[j])
 
 

diff --git a/src/cifkit/models/cif.py b/src/cifkit/models/cif.py
@@ -143,9 +143,14 @@ def __init__(
             Set of site label pairs sorted by Mendeleev Numbers.
         site_mixing_type : str
             Descriptor of the mixing type, categorized into four types:
-            deficiency_atomic_mixing,
-            full_occupancy_atomic_mixing, deficiency_without_atomic_mixing,
-            full_occupancy.
+            Full occupancy is assigned when a single atomic site occupies
+            the fractional coordinate with an occupancy value of 1.
+            Full occupancy with mixing is assigned when multiple atomic sites
+            collectively occupy the fractional coordinate to a sum of 1.
+            Deficiency without mixing is assigned when a single atomic site occupying
+            the fractional coordinate with a sum less than 1.
+            Deficiency with atomic mixing is assigned when multiple atomic sites occupy
+            the fractional coordinate with a sum less than 1.
         is_radius_data_available : bool
             Indicates whether Pauling and CIF atomic radii are available for
             all elements in the .cif file.
@@ -158,8 +163,9 @@ def __init__(
             List of points defining the unit cell; each point contains
             fractional coordinates and a site label.
         supercell_points : list[list[tuple[float, float, float, str]]]
-            List of points defining the supercell of the cell, with
-            translations of ±1, ±1, ±1 from the unit cell.
+            List of points defining the supercell of the cell For each .cif file,
+            a unit cell is generated by applying the symmetry operations.
+            A supercell is generated by applying ±1 shifts from the unit cell.
         unitcell_atom_count : int
             Total count of atoms within the unit cell.
         supercell_atom_count : int
@@ -321,6 +327,7 @@ def compute_connections(self, cutoff_radius=10.0) -> None:
 
         # Find the best methods
         self._CN_best_methods = find_best_polyhedron(
+
             self.CN_max_gap_per_site, self.connections
         )
 
@@ -468,7 +475,7 @@ def shortest_bond_pair_distance(self):
         Examples
         --------
         >>> cif.shortest_bond_pair_distance
-        >>> {
+        {
             ("In", "In"): 3.244,
             ("In", "Rh"): 2.697,
             ("In", "U"): 3.21,
@@ -496,7 +503,7 @@ def shortest_site_pair_distance(self):
         Examples
         --------
         >>> cif.shortest_site_pair_distance
-        >>> {
+        {
             "In1": ("Rh2", 2.697),
             "Rh1": ("In1", 2.852),
             "Rh2": ("In1", 2.697),
@@ -508,23 +515,40 @@ def shortest_site_pair_distance(self):
     @property
     @ensure_connections
     def radius_values(self):
-        """Retrieve CIF radius, CIF_refined radius, and Pauling C12 radius.
-        This property uses lazily loaded connections to compute these distances
-        if they are not already available because the CIF radius values are
-        determined using the shortest bonding pair from
-        shortest_bond_pair_distance.
+        """Retrieve CIF radius, CIF_refined radius, and Pauling C12 radius for
+        each element.
+
+        This property uses lazy loading to compute or retrieve radius values only when
+        needed, optimizing performance. The CIF radius and Pauling C12 radius are standard
+        values sourced from `data/radius.py` for each element. In contrast, the
+        CIF_refined radius is calculated based on bonding distances to ensure accuracy
+        across different environments.
+
+        - **CIF_radius**: The standard radius value commonly determined from
+        elemental .cif files, the approximate size of an atom within a crystal structure.
+        - **CIF_radius_refined**: An optimized radius calculated to ensure that, across
+        all bonding pairs, the sum of the two radii in a bonded pair attempts to
+        matches the shortest unique observed bond distances. This refinement is designed
+        to improve packing efficiency within a coordination polyhedron.
+        - **Pauling_radius_CN12**: The Pauling radius of the element, calculated with a
+        coordination number (CN) of 12, providing a basis for comparison with other radius
+        types.
 
         Returns
         -------
-        dict[str : dict[str:float]]
-            Dictionary where each key is an atomic label and the value is a dictionary
-            containing the CIF radius, CIF_refined radius, and Pauling C12 radius in
-            Angstroms.
+        dict[str, dict[str, float]]
+            A dictionary where each key is an atomic label (e.g., "In", "Rh", "U"), and
+            the corresponding value is a dictionary with radius information in Angstroms:
+
+            - `CIF_radius` (float): The standard CIF radius.
+            - `CIF_radius_refined` (float): The optimized radius based on CIF radius.
+            - `Pauling_radius_CN12` (float): The Pauling radius with a coordination
+            number of 12, parsed from literature.
 
         Examples
         --------
         >>> cif.radius_values
-        >>> {
+        {
             "In": {
                 "CIF_radius": 1.624,
                 "CIF_radius_refined": 1.328,
@@ -593,11 +617,113 @@ def radius_sum(self):
     @property
     @ensure_connections
     def CN_max_gap_per_site(self):
+        """Determines the maximum gap in coordination number (CN) for each
+        atomic site.
+
+        For each atomic site, considers the first 20 nearest neighbors. The distances
+        to these neighbors are normalized based on four methods:
+
+        - `dist_by_shortest_dist`: Normalization by the shortest distance from the site.
+        - `dist_by_CIF_radius_sum`: Normalization by the sum of CIF radii.
+        - `dist_by_CIF_radius_refined_sum`: Normalization by the sum of refined CIF radii.
+        - `dist_by_Pauling_radius_sum`: Normalization by the sum of Pauling radii.
+
+        The radius sums are calculated for each element pair involved. For each
+        normalization method, the maximum gap is determined as the largest difference
+        between consecutive normalized distances (i.e., the difference between the nth
+        and (n-1)th neighbors).
+
+        This CN gap provides insight into the bonding relevance for each site.
+
+        Returns
+        -------
+        dict of dict of dict
+            A dictionary where each key represents an atomic site, mapping to another
+            dictionary with normalization methods as keys. Each normalization method
+            contains a dictionary with:
+
+            - `max_gap` (float): The maximum gap in the normalized distances.
+            - `CN` (int): Coordination number based on the normalization method.
+
+        Examples
+        --------
+        >>> cif.CN_max_gap_per_site
+        {
+            "In1": {
+                "dist_by_shortest_dist": {"max_gap": 0.306, "CN": 14},
+                "dist_by_CIF_radius_sum": {"max_gap": 0.39, "CN": 14},
+                "dist_by_CIF_radius_refined_sum": {"max_gap": 0.341, "CN": 12},
+                "dist_by_Pauling_radius_sum": {"max_gap": 0.398, "CN": 14},
+            },
+            "U1": {
+                "dist_by_shortest_dist": {"max_gap": 0.197, "CN": 11},
+                "dist_by_CIF_radius_sum": {"max_gap": 0.312, "CN": 11},
+                "dist_by_CIF_radius_refined_sum": {"max_gap": 0.27, "CN": 17},
+                "dist_by_Pauling_radius_sum": {"max_gap": 0.256, "CN": 17},
+            },
+            "Rh1": {
+                "dist_by_shortest_dist": {"max_gap": 0.315, "CN": 9},
+                "dist_by_CIF_radius_sum": {"max_gap": 0.347, "CN": 9},
+                "dist_by_CIF_radius_refined_sum": {"max_gap": 0.418, "CN": 9},
+                "dist_by_Pauling_radius_sum": {"max_gap": 0.402, "CN": 9},
+            },
+            "Rh2": {
+                "dist_by_shortest_dist": {"max_gap": 0.31, "CN": 9},
+                "dist_by_CIF_radius_sum": {"max_gap": 0.324, "CN": 9},
+                "dist_by_CIF_radius_refined_sum": {"max_gap": 0.397, "CN": 9},
+                "dist_by_Pauling_radius_sum": {"max_gap": 0.380, "CN": 9},
+            },
+        }
+        """
         return self._CN_max_gap_per_site
 
     @property
     @ensure_connections
     def CN_best_methods(self):
+        """Determines the optimal coordination method for each atomic site.
+
+        For each atomic site, the coordination polyhedron is generated for each method
+        in `self.CN_max_gap_per_site`. The method with the smallest value of
+        `polyhedron_metrics["distance_from_avg_point_to_center"]`, indicating the highest
+        symmetry of the polyhedron, is selected as the "best method" among the four
+        methods used to determine the CN gap in `self.CN_max_gap_per_site`.
+
+        Returns
+        -------
+        dict[str, dict[str, float | int | str]]]
+            Dictionary where each key represents an atomic site, and the corresponding
+            value is a dictionary containing:
+
+            - `volume_of_polyhedron` (float): The volume of the polyhedron surrounding
+            the atomic site.
+            - `distance_from_avg_point_to_center` (float): The average distance from
+            the polyhedron's vertices  to its geometric center, used as a measure of
+            symmetry.
+            - `number_of_vertices` (int): The number of vertices in the coordination
+            polyhedron.
+            - `number_of_edges` (int): The number of edges connecting vertices in the
+            polyhedron.
+            - `number_of_faces` (int): The number of faces in the coordination polyhedron.
+            - `shortest_distance_to_face` (float): The shortest distance between the
+            atomic site and the nearest face.
+            - `shortest_distance_to_edge` (float): The shortest distance between the
+            atomic site and the nearest edge.
+            - `volume_of_inscribed_sphere` (float): Volume of the largest sphere that can
+            it inside the polyhedron.
+            - `packing_efficiency` (float): A measure of how efficiently the polyhedron
+            is packed around the atomic site.
+            - `method_used` (str): The name of the chosen method
+            (e.g., `dist_by_shortest_dist`) providing the  highest symmetry based on
+            `distance_from_avg_point_to_center`.
+
+        Examples
+        --------
+        >>> CN_best_methods = cif_URhIn.CN_best_methods
+        >>> CN_best_methods["In1"]["number_of_vertices"] == 14
+        >>> CN_best_methods["Rh2"]["number_of_vertices"] == 9
+        >>> CN_best_methods["In1"]["method_used"] == "dist_by_shortest_dist"
+        >>> CN_best_methods["Rh2"]["method_used"] == "dist_by_shortest_dist"
+        """
         return self._CN_best_methods
 
     @property