Skip to content

Commit

Permalink
Merge pull request #49 from dccuchile/fix/benchmark_doc
Browse files Browse the repository at this point in the history
Fix/benchmark doc
  • Loading branch information
pbadillatorrealba authored May 5, 2023
2 parents f2955c2 + 5c36d4a commit bebc71b
Show file tree
Hide file tree
Showing 4 changed files with 52 additions and 77 deletions.
7 changes: 6 additions & 1 deletion .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,16 @@ formats:
- epub
- pdf

build:
os: ubuntu-22.04
tools:
python: "3.11"


sphinx:
configuration: docs/conf.py

python:
version: "3.7"
install:
- requirements: requirements.txt
- requirements: requirements-dev.txt
Expand Down
90 changes: 28 additions & 62 deletions docs/benchmark/benchmark.rst
Original file line number Diff line number Diff line change
Expand Up @@ -519,8 +519,8 @@ supporting the same number of number of word sets).
2. Fair Embedding Engine
~~~~~~~~~~~~~~~~~~~~~~~~
Fair Embedding Engine
~~~~~~~~~~~~~~~~~~~~~

In the case of Fair Embedding Engine, the WE model is passed in the
metric instantiation. Then, the output value of the metric is computed
Expand Down Expand Up @@ -833,8 +833,8 @@ family vs. career).
"relatives",
]
1. WEFE
~~~~~~~
WEFE
~~~~

WEFE defines a standardized framework for executing bias mitigation
algorithms based on the scikit-learn fit transform interface.
Expand Down Expand Up @@ -968,8 +968,8 @@ methods implemented in the library.
Repulsion Attraction Neutralization debiased model WEAT evaluation: 0.26007230998948216
1. Fair Embedding Engine
~~~~~~~~~~~~~~~~~~~~~~~~
Fair Embedding Engine
~~~~~~~~~~~~~~~~~~~~~

The Fair Embedding Engine (FEE) requires the embedding model to be
passed during instantiation of the algorithm. It currently does not
Expand Down Expand Up @@ -1042,8 +1042,8 @@ interface
1. Responsibly
~~~~~~~~~~~~~~
Responsibly
~~~~~~~~~~~

In Responsibly the embedding model is provided during the instantiation
of the ``GenderBiasWe`` class. Definitional pairs cannot be provided by
Expand All @@ -1063,8 +1063,8 @@ such as ``twitter-25``.
gender_bias_we = GenderBiasWE(word2vec) # instance the GenderBiasWE
gender_bias_we.debias(neutral_words=targets) # apply the debias
4. EmbeddingBiasScore
~~~~~~~~~~~~~~~~~~~~~
EmbeddingBiasScore
~~~~~~~~~~~~~~~~~~

The library does not implement mitigation methods, so it is not included
in this comparison.
Expand Down Expand Up @@ -1111,15 +1111,13 @@ SAME ✖ ✖ ✖ ✔
Generalized WEAT ✖ ✖ ✖ ✔
================ ==== === =========== ===================

The table exclusively focuses on metrics that directly compute from word
embeddings (WE) using predefined word sets. As a result, it omits
metrics that are not compatible with the wefe interface such as:
The table exclusively focuses on metrics that directly compute from word embeddings
(WE) using predefined word sets. As a result, it omits the following metrics:

- IndirectBias, a metric that accepts as input only two words and the
gender direction, previously calculated in a distinct operation.
- GIPE, PMN, and Proximity Bias, which evaluate WE models before and
after debiasing with auxiliary mitigation methods.
- SemBias, which is an analogy evaluation dataset.
- IndirectBias, a metric that accepts as input only two words and the gender
direction, previously calculated in a distinct operation.
- GIPE, PMN, and Proximity Bias, which evaluate WE models before and after debiasing
with auxiliary mitigation methods.

Mitigation algorithms
~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -1140,47 +1138,15 @@ Conclusion
The following table summarizes the main differences between the
libraries analyzed in this benchmark study.

+-------------+-----------+--------------------+------------+---------+
| | WEFE | FEE | Responsibl | Embeddi |
| | | | y | ngBiasS |
| | | | | cores |
+=============+===========+====================+============+=========+
| Implemented | 7 | 7 | 3 | 6 |
| Metrics | | | | |
+-------------+-----------+--------------------+------------+---------+
| Implemented | 5 | 3 | 1 | 0 |
| Mitigation | | | | |
| Algorithms | | | | |
+-------------+-----------+--------------------+------------+---------+
| Extensible | Easy | Easy | Difficult, | Easy |
| | | | not very | |
| | | | modular. | |
+-------------+-----------+--------------------+------------+---------+
| Well-define |||||
| d | | | | |
| interface | | | | |
| for metrics | | | | |
+-------------+-----------+--------------------+------------+---------+
| Well-define |||||
| d | | | | |
| interface | | | | |
| for | | | | |
| mitigation | | | | |
| algorithms | | | | |
+-------------+-----------+--------------------+------------+---------+
| Lastest | January | October 2020 | April 2021 | April |
| update | 2023 | | | 2023 |
+-------------+-----------+--------------------+------------+---------+
| Installatio | Easy: pip | No instructions. | Only with | Only |
| n | or conda | It can be | pip. | from |
| | | installed from the | Presents | the |
| | | repository | problems | reposit |
| | | | | ory |
+-------------+-----------+--------------------+------------+---------+
| Documentati | Extensive | Almost no | Limited | No |
| on | documenta | documentation | documentat | documen |
| | tion | | ion | tation, |
| | with | | with some | only |
| | examples | | examples | example |
| | | | | s. |
+-------------+-----------+--------------------+------------+---------+
==================================================== ========================================= ========================================================== ========================================== ====================================
Item WEFE FEE Responsibly EmbeddingBiasScores
==================================================== ========================================= ========================================================== ========================================== ====================================
Implemented Metrics 7 7 3 6
Implemented Mitigation Algorithms 5 3 1 0
Extensible Easy Easy Difficult, not very modular. Easy
Well-defined interface for metrics ✔ ✖ ✖ ✔
Well-defined interface for mitigation algorithms ✔ ✖ ✖ ✖
Lastest update January 2023 October 2020 April 2021 April 2023
Installation Easy: pip or conda No instructions. It can be installed from the repository Only with pip. Presents problems Only from the repository
Documentation Extensive documentation with examples Almost no documentation Limited documentation with some examples No documentation, only examples.
==================================================== ========================================= ========================================================== ========================================== ====================================
11 changes: 7 additions & 4 deletions examples/benchmark.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1491,6 +1491,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -1513,11 +1514,13 @@
"| SAME | ✖ | ✖ | ✖ | ✔ |\n",
"| Generalized WEAT | ✖ | ✖ | ✖ | ✔ |\n",
"\n",
"The table exclusively focuses on metrics that directly compute from word embeddings (WE) using predefined word sets. As a result, it omits metrics that are not compatible with the wefe interface such as: \n",
"The table exclusively focuses on metrics that directly compute from word embeddings\n",
"(WE) using predefined word sets. As a result, it omits the following metrics:\n",
"\n",
"- IndirectBias, a metric that accepts as input only two words and the gender direction, previously calculated in a distinct operation.\n",
"- GIPE, PMN, and Proximity Bias, which evaluate WE models before and after debiasing with auxiliary mitigation methods.\n",
"- SemBias, which is an analogy evaluation dataset."
"- IndirectBias, a metric that accepts as input only two words and the gender\n",
" direction, previously calculated in a distinct operation.\n",
"- GIPE, PMN, and Proximity Bias, which evaluate WE models before and after debiasing\n",
" with auxiliary mitigation methods."
]
},
{
Expand Down
21 changes: 11 additions & 10 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
pytest>=7.0.0
pytest-cov==3.0.0
coverage==6.4.2
coverage==7.2.5
# flake8==5.0.4
black==22.6.0
isort==5.10.1
mypy==0.812
Sphinx==5.0.2
sphinx-gallery==0.11.1
sphinx-rtd-theme==1.0.0
sphinx-copybutton==0.5.0
urllib3==1.26.15
black==23.3.0
isort==5.11.5
mypy==1.2.0
Sphinx==5.3.0
sphinx-gallery==0.13.0
sphinx-rtd-theme==1.2.0
sphinx-copybutton==0.5.2
numpydoc==1.5.0
docutils==0.16
docutils==0.18
torch==1.13.1
ipython==7.34.0
ruff==0.0.194
ruff==0.0.264

0 comments on commit bebc71b

Please sign in to comment.