Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker.py: Pass the use_ino option to fix hardlnks #455

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 21 additions & 1 deletion imagefactory_plugins/Docker/Docker.py
Original file line number Diff line number Diff line change
Expand Up @@ -312,12 +312,32 @@ def builder_should_create_target_image(self, builder, target, image_id, template
# we call a blocking function to activate the mount, which requires a thread
# We also need a temp dir to mount it to - do our best to clean up when things
# go wrong
#
# A better approach here would be to use:
# g.tar_out_opts("/", dest_filename, excludes=[excludes])
# Though that would break compatibility with the tar_options parameter.
#
tempdir = None
fuse_thread = None
try:
tempdir = tempfile.mkdtemp(dir=storagedir)
self.log.debug("Mounting input image locally at (%s)" % (tempdir))
guestfs_handle.mount_local(tempdir)

# The "use_ino" option causes FUSE to pass through the original inode
# numbers. Without it tar cannot properly detect hardlinks, possibly greatly
# increasing the size of the image. This does create an edge case. If there
# are:
#
# - Two separate groups of > 1 files hardlinked together
# - On different partitions
# - With the same inode number
#
# Then the groups will be incorrectly merged in the output image. This

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too bad there isn't a derive_ino option or something that would hash the underlying st_ino and st_dev together. I guess the mounting could also do a 1-to-1 guestfs_handle.mount_local call for every g.mount_options call in launch_inspect_and_mount so tar would see the separate file systems as separate. anyway, like you said, tar_out_opts clearly seems like the best "right" answer and use_ino seems like the best "right now" answer.

# is unlikely to be encountered with typical container images, where almost
# all files are on a single partition. The correct fix is to use
# g.tar_out_opts() as described above.

guestfs_handle.mount_local(tempdir, options="use_ino")
def _run_guestmount(g):
g.mount_local_run()
self.log.debug("Launching mount_local_run thread")
Expand Down