Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added logic to fetch README files, documentation, commit messages, an… #2919

Open
wants to merge 63 commits into
base: main
Choose a base branch
from

Conversation

SahilDhillon21
Copy link
Contributor

…d issue trackers from repository APIs.

Related issue: #2681

  • Updated Project model to include fields for README content, documentation links, commit summaries, and issue tracker counts.
  • Implemented API calls in update_projects.py to gather and save this data to the database.

readme ss

Fixed the migration file issue that I was facing in the previous PR.

I had to remove the line "Authorization": f"token {settings.GITHUB_TOKEN}" from the header as it was giving an error saying 'Unable to fetch repository - 401'. I was unsure how to deal with it so I have removed it for now.

Copy link

sentry-io bot commented Nov 13, 2024

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: website/management/commands/update_projects.py

Function Unhandled Issue
handle SystemExit: 1 /project/{slug}/
Event Count: 1

Did you find this useful? React with a 👍 or 👎

@SahilDhillon21
Copy link
Contributor Author

@DonnieBLT I believe for the CodeQL test the languages should be python, javascript; instead of being 'python javascript'. Though I am unsure about how to fix this.

@DonnieBLT
Copy link
Collaborator

The reason that the token is there is because it will help with the rate limiting, you can create a token from your github profile settings, page and can you please avoid changes to white space I’m not sure what linter settings you’re using, but maybe if we can standardize them we won’t see the white space changes

@SahilDhillon21
Copy link
Contributor Author

I've added the GitHub token and fixed the formatting. Please review and let me know.

Copy link
Collaborator

@DonnieBLT DonnieBLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please check the comments?

)

# Set Issue Tracker URL
project.issue_tracker_url = f"https://github.com/{repo_name}/issues"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this since it's universal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood.

@SahilDhillon21
Copy link
Contributor Author

I have integrated a basic summary model "facebook/bart-large-cnn". However, due to the large variations in the readme files of the repositories, the summaries aren't too effective. I had thought of pre-processing the content to only pass relevant sections, but even that seems to be difficult since there is no particular structure followed. @DonnieBLT which direction should I look into to improve this? Though openai API is paid, it could do the job really well compared to the generic python models

@DonnieBLT
Copy link
Collaborator

We can use the OpenAI we already have a API key and it is set up in the code

@@ -53,130 +62,7 @@ <h3>Projects: {{ projects.count }}</h3>
{% endfor %}
</ul>
{% endif %}
<ul class="project-list">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep this in this file please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I assume we don't want to create a separate template and have all the code here itself? Let us finalize what to do with the search function and I'll make the changes accordingly

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can combine this into one template and adjust the global search to work as you have it.

openai.api_key = os.getenv("OPENAI_API_KEY")


def generate_labels(readme_content, github_topics):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this to utils.py too please

project.readme_content = readme_content
readme_text = markdown_to_text(readme_content)
project.ai_summary = ai_summary(readme_text, project.topics)
project.ai_labels = json.loads(generate_labels(readme_text, project.topics))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just add the labels verbatim from the topics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the AI-generated labels to be more accurate and effective, but we can surely use these topics directly. I'll modify it

Copy link
Collaborator

@DonnieBLT DonnieBLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requesting a few changes, we're almost there

@SahilDhillon21
Copy link
Contributor Author

Have made most of the changes, just need a final heads-up on what to do with the search functionality as it's quite buggy and limited. Should I create a new PR to improve its working if we go that route?

Copy link
Collaborator

@DonnieBLT DonnieBLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few adjustments request

)

# Check for Documentation URL (homepage)
project.documentation_url = repo_data.get("homepage")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we have a homepage field

@@ -756,6 +756,10 @@ class Project(models.Model):
closed_issues = models.IntegerField(default=0)
size = models.IntegerField(default=0)
commit_count = models.IntegerField(default=0)
readme_content = models.TextField(null=True, blank=True)
documentation_url = models.URLField(null=True, blank=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use homepage_url

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -22,6 +22,15 @@ <h3>Projects: {{ projects.count }}</h3>
<i class="fas fa-plus-circle"></i> Add Project
</button>
</form>
<form id="search-form" class="search-form">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you combine them into the one top search?

@@ -53,130 +62,7 @@ <h3>Projects: {{ projects.count }}</h3>
{% endfor %}
</ul>
{% endif %}
<ul class="project-list">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can combine this into one template and adjust the global search to work as you have it.

@SahilDhillon21
Copy link
Contributor Author

I have combined the search bar into one and added all the remaining categories. I'll raise a new pr later to improve the UI of the search results, have kept it basic for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants