Evaluating Explanations for Software Patches Generated by Large Language Models

In the following, we explain the scripts and datasets used for the experiments of the SSBSE 2023 challenge paper "Evaluating Explanations for Software Patches Generated by Large Language Models":

ARJAe/

This folder contains the ARJAe-generated patches. Patches taken from: https://github.com/yyxhdy/arja/tree/arja-e/

Human/

This folder contains the human-written patches. Patches taken from: https://github.com/rjust/defects4j

responses/

This folder contains the requests and the LLM's responses for all considered patches and runs.

results/CompleteEval.csv

A CSV file containing the results for all patches and runs.

getARJAEDiffs.py and getHumanDiffs.py

Scripts to build long diffs (the complete file). E.g., ARJAe/Chart_5/long_diff.patch.

get_explanations.py

The script needed to run the LLM requests via API.

You need an additional file named open_api_key.py with the following contents to run this script:

api_key = 'Your OpenAI API key'

helper_functions.py

Contains helper functions needed by the get_explanations.py script.

evaluation.py

Generates the data for the results table.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Evaluating Explanations for Software Patches Generated by Large Language Models

ARJAe/

Human/

responses/

results/CompleteEval.csv

getARJAEDiffs.py and getHumanDiffs.py

get_explanations.py

helper_functions.py

evaluation.py

Files

README.md

Latest commit

History

README.md

File metadata and controls

Evaluating Explanations for Software Patches Generated by Large Language Models

ARJAe/

Human/

responses/

results/CompleteEval.csv

getARJAEDiffs.py and getHumanDiffs.py

get_explanations.py

helper_functions.py

evaluation.py