-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve downloading of attachments #131
Improve downloading of attachments #131
Conversation
Patches for process_executor function
* Fix to CSV change file type * Added backup endpoint
6be60bd
to
0eea789
Compare
Forced push updates that includes updates to docstring documentation |
which corresponds to the default output of get_attachments_on_projects
57e20ad
to
c3cb5a8
Compare
c3cb5a8
to
1b5e8ea
Compare
Looks like the build errors consist of:
I haven't touched anything related to this. |
Sample script: """Exporting the specified projects along with all their attachments.
This code depends on the patches at
https://github.com/huyz/jiraone/tree/feat/improve_attachment_downloads
"""
import csv
import json
import os
from threading import Thread
from jiraone import LOGIN, PROJECT, issue_export
# 2024-04-20 To avoid the following error when fields have size over `2**17`:
# _csv.Error: field larger than field limit (131072)
csv.field_size_limit(2 ** 31 - 1)
# `config.json` should consist of:
# {
# "user": "JIRA_EMAIL_ADDRESS",
# "password": "JIRA_API_KEY",
# "url": "https://JIRA_ORG_NAME.atlassian.net"
# }
file = "config.json"
config = json.load(open(file))
LOGIN(**config)
EXPORT_ROOT = 'Export'
for key in [
'MYPROJ1',
'MYPROJ2',
]:
print('\n==============================================================================')
print(f'𐄫 Processing {key}…')
jql = f"project in ({key}) order by created ASC"
folder = os.path.join(EXPORT_ROOT, key)
filename_base = 'all_issues'
filename_ext = 'json'
# To speed things up, don't overwrite
if not os.path.exists(os.path.join(folder, filename_base + '.' + filename_ext)):
print(f'\n𐄬 {key}: Fetching issues…\n')
issue_export(
jql=jql,
folder=folder,
final_file=filename_base,
extension=filename_ext,
is_cache_filename='users.json',
)
attachment_root = os.path.join(EXPORT_ROOT, key, 'attachment')
filename = 'attachment_index.csv'
file_path = os.path.join(attachment_root, filename)
# To speed things up, don't overwrite
if not os.path.exists(file_path) or ',,,,Total Size' not in open(file_path).read().splitlines()[-1]:
print(f'\n𐄬 {key}: building attachment index')
# By default, this creates an attachment list CSV file at `Attachment/attachment_file.csv`
Thread(target=PROJECT.get_attachments_on_projects(
attachment_folder=attachment_root,
attachment_file_name=filename,
query=jql)).start()
print(f'\n𐄬 {key}: downloading attachments…')
# - file_folder: directory of the attachment list CSV file created by
# `get_attachments_on_projects`
# - file_name: filename of the attachment list CSV file created by
# `get_attachments_on_projects`
# - overwrite=False : speed incremental exports by skipping attachments that have already
# been downloaded
# - create_html_redirectors=True : create index.html files in the attachment directories
# that redirect to the actual path of the attachment with the original filename
PROJECT.download_attachments(
file_folder=attachment_root,
file_name=filename,
download_path=attachment_root,
overwrite=False,
create_html_redirectors=True) |
Hi @huyz thanks for the contribution, give me some time to review the commits. if not os.path.exists(base_dir):
os.makedirs(base_dir) I would prefer you use a condition here to choose between if not os.path.exists(base_dir):
os.mkdir(base_dir) if default is False else os.makedirs(base_dir) |
I can't see how |
Yes, it wouldn't and you're right about it being redundant but existing dirs without the exist_ok parameter being true will cause an exception and what I'm trying to avoid is what is said on this thread about umask is set temporary to 0. I don't know if this was ever fixed but I like the option to allow it by user choice not default. In addition, I looked at what caused the test errors, the environment variables were not available for the test due to PR restrictions. I'll complete my review and push it here in the allowed branch which should trigger the test again. |
But that's the exact same thing for
So the behavior would not change and affect anything else in your code. Also there already is a check for directory: if not os.path.exists(base_dir): So barring any race condition from some other concurrent process (which anyway the user shouldn't be running touching the same output directories or thereabouts, as that would be asking for trouble; plus, I'm sure you'd have race problems not just here, but all over the codebase in that case), then there won't be an exception regardless.
This issue was fixed in python/cpython#65281 (comment) |
f48be52
into
princenyeche:princenyeche-patch-1
@huyz that's true and thanks for clarifying. |
Fixes:
get_attachments_on_projects
: Overwrite attachment file by defaultjson_field_builder
: check ifsprint_custom_id
is Nonepath_builder
: handle multi-dir base_dirdownload_attachments
: avoid conflicts/overwrites by isolating attachments (helps with Issue using download_attachments #112)Improvements:
download_attachments
: make defaults and behavior matchget_attachments_on_projects
Features:
download_attachments
: supportcreate_html_directors
option (resolves Issue using download_attachments #112)download_attachments
: supportoverwrite=False
option (to speed up incremental backups)Let me know if you want me to split these commits into separate PRs