Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RleX packbits #68

Open
wants to merge 3 commits into
base: refactor
Choose a base branch
from

Conversation

andrews05
Copy link
Contributor

@andrews05 andrews05 commented Sep 26, 2022

This uses a modified packbits implementation with extended repeats. My tests have shown notably smaller file sizes than current RleX, and it currently decodes around 10x faster too (on a >10MB sprite which I haven't attached here).
The surface has to be accessed via pointer since we're setting individual bytes, not entire colours. There's currently no bounds checking here, though I don't think there was before either.
The pack data is also accessed via pointer (rather than data reader) for better performance. Not sure if this is appropriate or not. I have at least added bounds checking here.

The attached file includes a sprite with two copies: The current opcode-based version and the packbits version.

RleX Packbits.zip

@andrews05
Copy link
Contributor Author

I've found performance can be improved a little by changing block::set<uint8_t> and block::copy_from to use memset/memcpy respectively, rather than simd. Is this okay to change?

@andrews05 andrews05 force-pushed the rlex_packbits branch 2 times, most recently from 790b74c to 0690966 Compare September 28, 2022 07:58
byte = value;
}
simd::set(get<std::uint32_t *>(start), size() - start, v, bytes);
memset(get<std::uint8_t *>(start), value, bytes);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to verify that the use of memset and memcpy here is faster. In theory they should be, but earlier versions of these algorithms from the start of the year were using them, and I updated to the current implementations for performance improvements.

I'm currently adding testing into the library and will be adding some metrics to it as well to time measurements, so I'll want to run this through that as well to be sure the performance is acceptable. Due to the size of the assets being used by Kestrel, slight discrepancies can add up quickly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross platform optimisations are tricky, huh? 😂
All I can say is this is all working much faster for me on macOS/x86_64, but I realise it could be different on other platforms.

@@ -320,16 +307,12 @@ auto graphite::data::block::increase_size_to(std::size_t new_size) -> void

auto graphite::data::block::clear() -> void
{
set((uint32_t)0);
set((uint8_t)0, size());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be needed, as it will ultimately pass through to set(uint32_t) anyway. If a difference in behaviour has been observed, then it should be the set function that is fixed, rather than this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to the change to memset. Surface calls block::clear, which calls set(uint8_t), which calls memset. I observed a large reduction in time taken to initialise the surface for rlex.

}
}
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'm going to need to verify the performance of this. I moved to a look up table to eliminate the excessive cpu cycles resulting from conditionals. Given that the goal of this PR is to further compress and reduce the size of the rlëX resource, it is invariably going to result in degraded performance, as more work will be required to decompress.

🤔

@andrews05 andrews05 changed the title [WIP] RleX packbits RleX packbits Sep 29, 2022
@andrews05
Copy link
Contributor Author

Encoder added. Ready for full test/review.

@andrews05
Copy link
Contributor Author

I've just pushed an update to add bounds checking of the surface frame on decode, to protect against invalid data (or incompatible variants of rleX). Performance is improved a little too. Compared to the current implementation on the refactor branch, I'm seeing 4x to 20x faster decodes here, depending on the sprite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants