Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate the generation changelogs in text format #480

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions .github/workflows/generate_changelog.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
name: Generate Changelog

on:
workflow_dispatch:
inputs:
release_tag:
description: 'Release tag of latest release'
required: true
trigger_secret:
description: 'Common Secret stored in both repos for Authentication'
required: true

permissions:
contents: write

jobs:
process-release:
runs-on: ubuntu-latest
steps:
- name: Verify Trigger Source
env:
RECIEVED_SECRET: ${{ inputs.trigger_secret }}
EXPECTED_SECRET: ${{ secrets.CHANGELOG_TRIGGER_SECRET }}
run: |
if [ "$RECIEVED_SECRET" != "$EXPECTED_SECRET" ]; then
echo "Unauthorized trigger"
exit 1
fi

- name: Checkout Repository
uses: actions/checkout@v4

- name: Install Dependencies
run: pip install -r requirements.txt

- name: Run Script
env:
RELEASE_TAG: ${{ inputs.release_tag }}
GITHUB_PAT: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
run: python scripts/create_changelog.py "$RELEASE_TAG"

- name: Commit and Push Changes
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
git config --global user.name "github-actions[bot]"
git config --global user.email "41898282+github-actions[bot]@users.noreply.github.com"
git add .
git commit -m "Added changelog for release: ${{ inputs.release_tag }}"
git push
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the user.name and user.email be simpler?

We may use a specific file name (src/changelogs/sage-......txt?) instead of . for safety?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the user.name and user.email be simpler?

Yes we could do that, but if we change the email then we won't see the black github-actions icon github-actions in the commits. Instead we would see it in gray disabled mode.

We may use a specific file name (src/changelogs/sage-......txt?) instead of . for safety?

Yes, will fix that

3 changes: 3 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,6 @@ autopep8 >= 0.5
pyyaml
six
pybtex
requests
python-dotenv
unidecode
246 changes: 246 additions & 0 deletions scripts/create_changelog.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,246 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
This script is used in 'Generate Changelog' workflow, defined in the 'generate_changelog.yml' file
to automate the process of creating changelogs after each stable release.

It fetches release data, extracts relevant pull request (PR) information, and generates a detailed changelog
and add its to src/changelogs.
The script uses the GitHub REST API to collect information about contributors, PR authors, and reviewers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add it to


Additional Note:
- The script requires a GitHub personal access token (PAT) stored in an environment variable named `GITHUB_PAT`.
"""

import requests
import re
from dotenv import load_dotenv
import os
import argparse
import xml.etree.ElementTree as ET
from unidecode import unidecode

load_dotenv()
GITHUB_PAT = os.getenv('GITHUB_PAT')
BASE_URL = r"https://api.github.com/repos/sagemath/sage"
HEADERS = {'Authorization': f'token {GITHUB_PAT}', }
AUTOMATED_BOTS = ['dependabot[bot]', 'github-actions', 'renovate[bot]']

# Maps the github username to contributer name
git_to_name = {}

# Stores unique names of contributors
all_contribs = set([])

# Stores unique names of contributors
first_contribs = set([])

# Stores information for all the prs across all pre-releases
# Maps tag of pre-release to its info
all_info = {}


def map_git_to_names():
tree = ET.parse('conf/contributors.xml')
root = tree.getroot()
for c in root.findall('contributor'):
name = c.get('name')
git = c.get('github')
if git and name:
git_to_name[git] = unidecode(name)


def update_names():
"""
Replace the github usernames with real names. If name is not found in contributors.xml,
then github usernames are used in the form @<github-username>
"""
for tag in all_info:
for pr in all_info[tag]:
pr['creator'] = git_to_name.get(pr['creator'], f"@{pr['creator']}")
pr['authors'] = [git_to_name.get(a, f"@{a}") for a in pr['authors']]
pr['reviewers'] = [git_to_name.get(r, f"@{r}") for r in pr['reviewers']]
global all_contribs
global first_contribs
all_contribs = set([git_to_name.get(c, f"@{c}") for c in all_contribs])
first_contribs = set([git_to_name.get(c, f"@{c}") for c in first_contribs])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we can retrieve the github user's real name via github api from his/her github account name. If this is the case, then we may create a new entry in conf/contributors.xml and then use the real name instead of the github account name in the created changelog. For example, we may create this entry

<contributor
 name="Marie Bonboire"
 github="marizee"/>

and use "Marie Bonboire" instead of @marizee. The entries in contributors.xml are listed in alphabetical order in the last name. We may use the fact to find the right place to put the new entry in.

But this is out of the scope of this PR. You may decide to do this extra work or not. Just let me know.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is doable, but some users might not have their real name on their Github Profile, in such case we seem to have no other option than to user their github username in the format @github-username.
I will work on it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, we can retrieve the location of the user (if available on their GitHub profile). Then, for all contributors who have their GitHub username in conf/contributors.xml but don't have location information, we can add it.

I think it would be better to tackle this in a separate script, so I may put together another PR specifically for adding the location data


def get_release_data(tag):
url = fr"{BASE_URL}/releases/tags/{tag}"
res = requests.get(url, headers=HEADERS)
if res.status_code == 404:
print(f"{tag} release not found")
return None
if res.status_code != 200:
print(f"Failed to fetch release data: {res.status_code}")
return None
return res.json()


def get_release_date(release_data):
if not release_data:
return 'N/A'
date_time = release_data.get('published_at', '')
if not date_time:
return 'Unavailable'
return date_time.split('T')[0]


def extract_pr_info(release_data):
body = release_data.get('body', '')
pr_info = []
pattern = r"\* (.*?) by (@\S+) in https://github.com/sagemath/sage/pull/(\d+)"
matches = re.findall(pattern, body)
for match in matches:
title = match[0]
creator = match[1][1::]
pr_id = match[2]
authors = get_authors(pr_id)
reviewers = get_reviewers(pr_id, authors)
pr_info.append({
'title': title,
'creator': creator,
'pr_id': pr_id,
'authors': authors,
'reviewers': reviewers
})
return pr_info


def update_first_contribs(release_data):
body = release_data.get('body', '')
pattern = r"\* (@\S+) made their first contribution in"
matches = re.findall(pattern, body)
for match in matches:
username = match[1::]
first_contribs.add(username)


def get_authors(pr_id):
url = f"{BASE_URL}/pulls/{pr_id}/commits"
authors = []
try:
res = requests.get(url, headers=HEADERS)
res.raise_for_status()
commits = res.json()
for commit in commits:
if commit['commit']['committer']['name'] in AUTOMATED_BOTS:
continue
if 'author' in commit and 'login' and commit['author']:
username = commit['author']['login']
if username not in AUTOMATED_BOTS:
authors.append(username)
all_contribs.add(username)
except Exception as e:
print(f"Failed to fetch commits for PR {pr_id}: {e}")
return list(set(authors))


def get_reviewers(pr_id, authors):
url = f"{BASE_URL}/pulls/{pr_id}/reviews"
reviewers = []
try:
res = requests.get(url, headers=HEADERS)
res.raise_for_status()
reviews = res.json()
for review in reviews:
if 'user' in review and 'login' in review['user']:
username = review['user']['login']
if username not in authors and username not in AUTOMATED_BOTS:
reviewers.append(username)
all_contribs.add(username)
except Exception as e:
print(f"Failed to fetch reviews for PR {pr_id}: {e}")
return list(set(reviewers))


def get_latest_tags():
url = f"{BASE_URL}/tags?per_page=1000" # If per_page is not specified then very few tags are fetched
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get recent tags in the first page, I think 100 would be enough.

res = requests.get(url, headers=HEADERS)
res.raise_for_status()
tags = res.json()
tags = [tag['name'] for tag in tags]
return tags
except Exception as e:
print(f"Failed to fetch tags")
return None


def sort_tags(tag):
name = tag.lower()
if "beta" in name:
return (0, name) # Beta comes first
elif "rc" in name:
return (1, name) # RC comes next
else:
return (2, name) # Stable versions come last


def save_to_file(filename, ver, date_of_release):
with open(filename, 'w') as file:
file.write(f"Sage {ver} was released on {date_of_release}. It is available from:\n\n")
file.write(f" * https://www.sagemath.org/download-source.html\n\n")
file.write(f"Sage (http://www.sagemath.org/) is developed by volunteers and\n")
file.write(f"combines hundreds of open source packages.\n\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a slight deviation from the tradition, but it may be better to use (http://www.sagemath.org) with no trailing slash for a new start.

Copy link
Author

@soham30rane soham30rane Nov 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done here!

file.write(f"The following {len(all_contribs)} people contributed to this release.\n")
file.write(f"Of those, {len(first_contribs)} made their first contribution to Sage:\n\n")
for c in all_contribs:
file.write(f" - {c}{' [First Contribution]' if c in first_contribs else ''}\n")
pr_count = sum([len(all_info[tag]) for tag in all_info])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[First contribution] to align with other bracketed phrases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done here!

file.write(f"\n* We merged {pr_count} pull requests in this release.\n\n")
sorted_tags = sorted(all_info.keys(), key=sort_tags)
for tag in sorted_tags:
file.write(f"Merged in sage-{tag}:\n\n")
for pr in all_info[tag]:
file.write(f"#{pr['pr_id']}: {', '.join(pr['authors'])}: {pr['title']}")
if pr['reviewers']:
file.write(f" [Reviewed by {', '.join(pr['reviewers'])}]")
file.write('\n')
file.write('\n')

print(f"Saved changelog to {filename}")


if __name__ == '__main__':
parser = argparse.ArgumentParser(description="Fetch release data from GitHub and extract PR info")
parser.add_argument('version', type=str, help="The release version (e.g., 10.1)")
args = parser.parse_args()
ver = args.version
is_stable = re.match(r'^\d+(\.\d+){0,3}$', ver)
if not is_stable:
print(f"{ver} is not a stable release. terminating....")
exit()

filepath = f"src/changelogs/sage-{ver}.txt"
if os.path.exists(filepath):
print(f"{filepath} already exists. Exiting without making changes.")
exit()

map_git_to_names()
all_tags = get_latest_tags()
tag_pattern = fr"^{ver}.(beta|rc)\d*$"
valid_tags = set([ver,])
for tag in all_tags:
if re.match(tag_pattern, tag):
valid_tags.add(tag)

for tag in valid_tags:
release_data = get_release_data(tag)
if tag == ver:
date_of_release = get_release_date(release_data)
if release_data is None:
continue
pr_info = extract_pr_info(release_data)
all_info[tag] = pr_info
update_first_contribs(release_data)
print(f"Fetched data for tag: {tag}")

update_names()
first_contribs = first_contribs.intersection(all_contribs)
all_contribs = sorted(all_contribs, key=lambda x: (x[0].startswith('@'), x[0]))
if all_info:
save_to_file(filepath, ver, date_of_release)
else:
print("No information found.")
Loading