-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize scalar multiplication with a 4-bit window #61
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,7 +30,7 @@ macro_rules! new_curve_impl { | |
(($($privacy:tt)*), $name:ident, $name_affine:ident, $iso:ident, $base:ident, $scalar:ident, | ||
$curve_id:literal, $a_raw:expr, $b_raw:expr, $curve_type:ident) => { | ||
/// Represents a point in the projective coordinate space. | ||
#[derive(Copy, Clone, Debug)] | ||
#[derive(Copy, Clone, Debug, zeroize::Zeroize)] | ||
#[cfg_attr(feature = "repr-c", repr(C))] | ||
$($privacy)* struct $name { | ||
x: $base, | ||
|
@@ -466,32 +466,41 @@ macro_rules! new_curve_impl { | |
type Output = $name; | ||
|
||
fn mul(self, other: &'b $scalar) -> Self::Output { | ||
// TODO: make this faster | ||
|
||
let mut acc = $name::identity(); | ||
|
||
// This is a simple double-and-add implementation of point | ||
// multiplication, moving from most significant to least | ||
// significant bit of the scalar. | ||
// | ||
// We don't use `PrimeFieldBits::.to_le_bits` here, because that would | ||
// force users of this crate to depend on `bitvec` where they otherwise | ||
// might not need to. | ||
// | ||
// NOTE: We skip the leading bit because it's always unset (we are turning | ||
// the 32-byte repr into 256 bits, and $scalar::NUM_BITS = 255). | ||
for bit in other | ||
.to_repr() | ||
.iter() | ||
.rev() | ||
.flat_map(|byte| (0..8).rev().map(move |i| Choice::from((byte >> i) & 1u8))) | ||
.skip(1) | ||
{ | ||
acc = acc.double(); | ||
acc = $name::conditional_select(&acc, &(acc + self), bit); | ||
// Create a table out of the point | ||
// TODO: This can be made more efficient with a 5-bit window | ||
let mut arr = [$name::identity(); 16]; | ||
arr[1] = *self; | ||
for i in 2 .. 16 { | ||
arr[i] = if (i % 2) == 0 { | ||
arr[i / 2].double() | ||
} else { | ||
arr[i - 1] + arr[1] | ||
}; | ||
} | ||
|
||
let mut res = $name::identity(); | ||
let mut first = true; | ||
// Iterate from significant byte to least significant byte | ||
for byte in other.to_repr().iter().rev() { | ||
// Shift the result over 4 bits | ||
if !first { | ||
for _ in 0 .. 4 { | ||
res = res.double(); | ||
} | ||
} | ||
first = false; | ||
|
||
// Add the top-nibble from this byte into the result | ||
res += arr[usize::from(byte >> 4)]; | ||
// Shift the result over | ||
for _ in 0 .. 4 { | ||
res = res.double(); | ||
} | ||
// Add the bottom-nibble from this byte into the result | ||
res += arr[usize::from(byte & 0b1111)]; | ||
} | ||
|
||
acc | ||
res | ||
} | ||
} | ||
|
||
|
@@ -581,32 +590,41 @@ macro_rules! new_curve_impl { | |
type Output = $name; | ||
|
||
fn mul(self, other: &'b $scalar) -> Self::Output { | ||
// TODO: make this faster | ||
|
||
let mut acc = $name::identity(); | ||
|
||
// This is a simple double-and-add implementation of point | ||
// multiplication, moving from most significant to least | ||
// significant bit of the scalar. | ||
// | ||
// We don't use `PrimeFieldBits::.to_le_bits` here, because that would | ||
// force users of this crate to depend on `bitvec` where they otherwise | ||
// might not need to. | ||
// | ||
// NOTE: We skip the leading bit because it's always unset (we are turning | ||
// the 32-byte repr into 256 bits, and $scalar::NUM_BITS = 255). | ||
for bit in other | ||
.to_repr() | ||
.iter() | ||
.rev() | ||
.flat_map(|byte| (0..8).rev().map(move |i| Choice::from((byte >> i) & 1u8))) | ||
.skip(1) | ||
{ | ||
acc = acc.double(); | ||
acc = $name::conditional_select(&acc, &(acc + self), bit); | ||
// Create a table out of the point | ||
// TODO: This can be made more efficient with a 5-bit window | ||
let mut arr = [$name::identity(); 16]; | ||
arr[1] = (*self).into(); | ||
for i in 2 .. 16 { | ||
arr[i] = if (i % 2) == 0 { | ||
arr[i / 2].double() | ||
} else { | ||
arr[i - 1] + arr[1] | ||
}; | ||
} | ||
|
||
let mut res = $name::identity(); | ||
let mut first = true; | ||
// Iterate from significant byte to least significant byte | ||
for byte in other.to_repr().iter().rev() { | ||
// Shift the result over 4 bits | ||
if !first { | ||
for _ in 0 .. 4 { | ||
res = res.double(); | ||
} | ||
} | ||
first = false; | ||
|
||
// Add the top-nibble from this byte into the result | ||
res += arr[usize::from(byte >> 4)]; | ||
// Shift the result over | ||
for _ in 0 .. 4 { | ||
res = res.double(); | ||
} | ||
// Add the bottom-nibble from this byte into the result | ||
res += arr[usize::from(byte & 0b1111)]; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like previous impl targets constant time multiplication but here there is an addition of infinity point where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be constant time if the underlying functions are constant time. While it's a bit bold to assume to_repr is constant time, the prior impl also used it. If adding the identity isn't constant time, that's a fault in the addition operation, not this algorithm. The prior algorithm performed addition by identity for every zero bit, so that specific behavior isn't unique to this algorithm. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It appears addition isn't constant time. I believe that deserves its own issue. Here, I solely note the prior algorithm also performed addition with identity, so that isn't a new issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ah sorry it's my bad to figure the prior impl as constant time |
||
} | ||
|
||
acc | ||
res | ||
} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a rather large stack allocation of 1.5 kiB (and it would be even larger with a 5-bit window). This could affect the ability for this library to be used effectively (or at all) in
no-std
environments. I suspect at a minimum we would need a feature flag to switch back to the previous behaviour.What we currently do in downstream crates (e.g. here) is use the
group::WnafBase
andgroup::WnafScalar
types to implement this on the heap (and allow precomputing this table where it makes sense). Maybe we could instead document this approach?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason not to simply document this is because every single
*
call is ~2x as slow as it needs to be. While I do ack how the stack allocation may be too large for nostd, I'd argue the solution is to make this function body std only, not to remove it.WnafBase/WnafScalar, from what I understand, is intended for/makes sense for points frequently used in multiplication operatons. This isn't for the reuse case and I'd argue against using WnafBase here because it uses the heap.
I'd be happy to update the PR with a feature flag between this and the old body.