Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve downloading of attachments #131

Conversation

huyz
Copy link
Contributor

@huyz huyz commented Apr 19, 2024

Fixes:

  • 🐛 get_attachments_on_projects: Overwrite attachment file by default
  • 🐛 json_field_builder: check if sprint_custom_id is None
  • 🐛 path_builder: handle multi-dir base_dir
  • 🐛 download_attachments: avoid conflicts/overwrites by isolating attachments (helps with Issue using download_attachments #112)

Improvements:

  • download_attachments: make defaults and behavior match get_attachments_on_projects

Features:

  • download_attachments: support create_html_directors option (resolves Issue using download_attachments #112)
  • download_attachments: support overwrite=False option (to speed up incremental backups)

Let me know if you want me to split these commits into separate PRs

@huyz huyz force-pushed the feat/improve_attachment_downloads branch from 6be60bd to 0eea789 Compare April 20, 2024 07:52
@huyz
Copy link
Contributor Author

huyz commented Apr 20, 2024

Forced push updates that includes updates to docstring documentation

@huyz huyz force-pushed the feat/improve_attachment_downloads branch 2 times, most recently from 57e20ad to c3cb5a8 Compare April 20, 2024 08:18
@huyz huyz force-pushed the feat/improve_attachment_downloads branch from c3cb5a8 to 1b5e8ea Compare April 20, 2024 08:32
@princenyeche princenyeche self-assigned this Apr 20, 2024
@princenyeche princenyeche added the enhancement New feature or request label Apr 20, 2024
@huyz
Copy link
Contributor Author

huyz commented Apr 20, 2024

Looks like the build errors consist of:

Traceback (most recent call last):
  File "test.py", line 48, in setUp
    os.environ.get("JIRAONEIMAGEURL") + "/image-size/large?v=v2&px=999"
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

I haven't touched anything related to this.

@huyz
Copy link
Contributor Author

huyz commented Apr 20, 2024

Sample script:

"""Exporting the specified projects along with all their attachments.

This code depends on the patches at
    https://github.com/huyz/jiraone/tree/feat/improve_attachment_downloads
"""
import csv
import json
import os
from threading import Thread

from jiraone import LOGIN, PROJECT, issue_export


# 2024-04-20 To avoid the following error when fields have size over `2**17`:
#  _csv.Error: field larger than field limit (131072)
csv.field_size_limit(2 ** 31 - 1)

# `config.json` should consist of:
# {
#     "user": "JIRA_EMAIL_ADDRESS",
#     "password": "JIRA_API_KEY",
#     "url": "https://JIRA_ORG_NAME.atlassian.net"
# }
file = "config.json"
config = json.load(open(file))
LOGIN(**config)

EXPORT_ROOT = 'Export'

for key in [
	'MYPROJ1',
	'MYPROJ2',
]:
    print('\n==============================================================================')
    print(f'𐄫 Processing {key}…')

    jql = f"project in ({key}) order by created ASC"

    folder = os.path.join(EXPORT_ROOT, key)
    filename_base = 'all_issues'
    filename_ext = 'json'

    # To speed things up, don't overwrite
    if not os.path.exists(os.path.join(folder, filename_base + '.' + filename_ext)):
        print(f'\n𐄬 {key}: Fetching issues…\n')
        issue_export(
            jql=jql,
            folder=folder,
            final_file=filename_base,
            extension=filename_ext,
            is_cache_filename='users.json',
        )

    attachment_root = os.path.join(EXPORT_ROOT, key, 'attachment')
    filename = 'attachment_index.csv'
    file_path = os.path.join(attachment_root, filename)

    # To speed things up, don't overwrite
    if not os.path.exists(file_path) or ',,,,Total Size' not in open(file_path).read().splitlines()[-1]:
        print(f'\n𐄬 {key}: building attachment index')

        # By default, this creates an attachment list CSV file at `Attachment/attachment_file.csv`
        Thread(target=PROJECT.get_attachments_on_projects(
            attachment_folder=attachment_root,
            attachment_file_name=filename,
            query=jql)).start()

    print(f'\n𐄬 {key}: downloading attachments…')
    # - file_folder: directory of the attachment list CSV file created by
    #   `get_attachments_on_projects`
    # - file_name: filename of the attachment list CSV file created by
    #   `get_attachments_on_projects`
    # - overwrite=False : speed incremental exports by skipping attachments that have already
    #   been downloaded
    # - create_html_redirectors=True : create index.html files in the attachment directories
    #   that redirect to the actual path of the attachment with the original filename
    PROJECT.download_attachments(
        file_folder=attachment_root,
        file_name=filename,
        download_path=attachment_root,
        overwrite=False,
        create_html_redirectors=True)

@princenyeche princenyeche changed the base branch from main to princenyeche-patch-1 April 22, 2024 11:26
@princenyeche
Copy link
Owner

Hi @huyz thanks for the contribution, give me some time to review the commits.
Regarding the test errors, I believe it's because of this change but I need to look into it properly

if not os.path.exists(base_dir):
        os.makedirs(base_dir)

I would prefer you use a condition here to choose between mkdir and makedirs so it doesn't break other parts that rely on single dirs.
E.g.
param default: Create single dirs, True creates multi-dirs, defaults to False

if not os.path.exists(base_dir):
        os.mkdir(base_dir) if default is False else os.makedirs(base_dir)

@huyz
Copy link
Contributor Author

huyz commented Apr 25, 2024

I would prefer you use a condition here to choose between mkdir and makedirs so it doesn't break other parts that rely on single dirs.

I can't see how makedirs could break things. I imagine it works like a superset of mkdir (in fact, it calls mkdir). So it seems that a condition would just make things redundant.

@princenyeche
Copy link
Owner

Yes, it wouldn't and you're right about it being redundant but existing dirs without the exist_ok parameter being true will cause an exception and what I'm trying to avoid is what is said on this thread about umask is set temporary to 0. I don't know if this was ever fixed but I like the option to allow it by user choice not default.

In addition, I looked at what caused the test errors, the environment variables were not available for the test due to PR restrictions. I'll complete my review and push it here in the allowed branch which should trigger the test again.

@huyz
Copy link
Contributor Author

huyz commented May 1, 2024

Yes, it wouldn't and you're right about it being redundant but existing dirs without the exist_ok parameter being true will cause an exception

But that's the exact same thing for os.mkdir:

If the directory already exists, FileExistsError is raised

mkdir acts like makedirs with the default option of exist_ok of False.

So the behavior would not change and affect anything else in your code.

Also there already is a check for directory:

if not os.path.exists(base_dir):

So barring any race condition from some other concurrent process (which anyway the user shouldn't be running touching the same output directories or thereabouts, as that would be asking for trouble; plus, I'm sure you'd have race problems not just here, but all over the codebase in that case), then there won't be an exception regardless.

what I'm trying to avoid is what is said on this thread about umask is set temporary to 0. I don't know if this was ever fixed

This issue was fixed in python/cpython#65281 (comment)
Notice that _get_masked_mode was removed.
But anyway it would have been irrelevant, I'm fairly sure

@princenyeche princenyeche merged commit f48be52 into princenyeche:princenyeche-patch-1 May 4, 2024
5 of 6 checks passed
@princenyeche
Copy link
Owner

@huyz that's true and thanks for clarifying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue using download_attachments
2 participants