Detecting Edit Failures in LLMs: An Improved Specificity Benchmark (website)

This repository contains the code for the paper Detecting Edit Failures in LLMs: An Improved Specificity Benchmark (ACL Findings 2023).

It extends previous work on model editing by Meng et al. [1] by introducing a new benchmark, called CounterFact+, for measuring the specificity of model edits.

Attribution

The repository is a fork of MEMIT, which implements the model editing algorithms MEMIT (Mass Editing Memory in a Transformer) and ROME (Rank-One Model Editing). Our fork extends this code by additional evaluation scripts implementing the CounterFact+ benchmark. For installation instructions see the original repository.

Installation

We recommend conda for managing Python, CUDA, and PyTorch; pip is for everything else. To get started, simply install conda and run:

CONDA_HOME=$CONDA_HOME ./scripts/setup_conda.sh

$CONDA_HOME should be the path to your conda installation, e.g., ~/miniconda3.

Running Experiments

See INSTRUCTIONS.md for instructions on how to run the experiments and evaluations.

How to Cite

If you find our paper useful, please consider citing as:

@inproceedings{jason2023detecting,
title         = {Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark},
author        = {Hoelscher-Obermaier, Jason and Persson, Julia and Kran, Esben and Konstas, Ionnis and Barez, Fazl},
booktitle     = {Findings of ACL},
year          = {2023},
organization  = {Association for Computational Linguistics}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Detecting Edit Failures in LLMs: An Improved Specificity Benchmark (website)

Attribution

Installation

Running Experiments

How to Cite

Files

README.md

Latest commit

History

README.md

File metadata and controls

Detecting Edit Failures in LLMs: An Improved Specificity Benchmark (website)

Attribution

Installation

Running Experiments

How to Cite