-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RleX packbits #68
base: refactor
Are you sure you want to change the base?
RleX packbits #68
Conversation
e1a8578
to
bccfd5c
Compare
I've found performance can be improved a little by changing |
790b74c
to
0690966
Compare
libGraphite/data/data.cpp
Outdated
byte = value; | ||
} | ||
simd::set(get<std::uint32_t *>(start), size() - start, v, bytes); | ||
memset(get<std::uint8_t *>(start), value, bytes); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need to verify that the use of memset and memcpy here is faster. In theory they should be, but earlier versions of these algorithms from the start of the year were using them, and I updated to the current implementations for performance improvements.
I'm currently adding testing into the library and will be adding some metrics to it as well to time measurements, so I'll want to run this through that as well to be sure the performance is acceptable. Due to the size of the assets being used by Kestrel, slight discrepancies can add up quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cross platform optimisations are tricky, huh? 😂
All I can say is this is all working much faster for me on macOS/x86_64, but I realise it could be different on other platforms.
libGraphite/data/data.cpp
Outdated
@@ -320,16 +307,12 @@ auto graphite::data::block::increase_size_to(std::size_t new_size) -> void | |||
|
|||
auto graphite::data::block::clear() -> void | |||
{ | |||
set((uint32_t)0); | |||
set((uint8_t)0, size()); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be needed, as it will ultimately pass through to set(uint32_t)
anyway. If a difference in behaviour has been observed, then it should be the set function that is fixed, rather than this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to the change to memset. Surface calls block::clear, which calls set(uint8_t), which calls memset. I observed a large reduction in time taken to initialise the surface for rlex.
} | ||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I'm going to need to verify the performance of this. I moved to a look up table to eliminate the excessive cpu cycles resulting from conditionals. Given that the goal of this PR is to further compress and reduce the size of the rlëX
resource, it is invariably going to result in degraded performance, as more work will be required to decompress.
🤔
0690966
to
52f7905
Compare
Encoder added. Ready for full test/review. |
I've just pushed an update to add bounds checking of the surface frame on decode, to protect against invalid data (or incompatible variants of rleX). Performance is improved a little too. Compared to the current implementation on the refactor branch, I'm seeing 4x to 20x faster decodes here, depending on the sprite. |
This uses a modified packbits implementation with extended repeats. My tests have shown notably smaller file sizes than current RleX, and it currently decodes around 10x faster too (on a >10MB sprite which I haven't attached here).
The surface has to be accessed via pointer since we're setting individual bytes, not entire colours. There's currently no bounds checking here, though I don't think there was before either.
The pack data is also accessed via pointer (rather than data reader) for better performance. Not sure if this is appropriate or not. I have at least added bounds checking here.
The attached file includes a sprite with two copies: The current opcode-based version and the packbits version.
RleX Packbits.zip