Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restriction to the last dimension #3

Open
ChrisRackauckas opened this issue Oct 10, 2017 · 27 comments
Open

Restriction to the last dimension #3

ChrisRackauckas opened this issue Oct 10, 2017 · 27 comments

Comments

@ChrisRackauckas
Copy link
Member

ChrisRackauckas commented Oct 10, 2017

Why the restriction to the last dimension? Is this due to the design or could that be changed?

@oschulz
Copy link
Collaborator

oschulz commented Oct 10, 2017

It's currently due to design - ElasticArray should be contiguous and fast (basically native array speed, except for a slightly more expensive size-function). The internal vector reference, kernel size, etc. are immutable.

ElasticArray is mainly intended for storing/streaming incoming chunks of multi-dimensional data, etc.. I also want to use it as a back-end for a vector-of-arrays wrapper that I'm writing (see branch "arrayvector") and a few other things. So usually, limiting resize to the last dimension is not a problem. But if it can be done without loss of performance (and I have to admit that I haven't done extensive performance tests yet), multi-dimensional resize would be even better, of course.

@oschulz
Copy link
Collaborator

oschulz commented Oct 10, 2017

Oh, I forgot: One important design criterion is that an ElasticArray should be able to "breathe" (grow,shrink,grow,...) without causing a high memory allocation and GC load, to support multithreaded high-throughput applications. That's why I'm using a Vector and it's native GC-friendly resize capabilities (space stays reserved after shrink) internally.

@oschulz
Copy link
Collaborator

oschulz commented Oct 11, 2017

I did a few tests and it looks like ElasticArray might actually be faster (at least sometimes) if it's a mutable struct (expert advice would be very welcome). If ElasticArray were mutable, then it could, in principle, be resizeable in all dimensions.

But changing the size of any dimension but the last would be very expensive, as it would require memory allocation and data movement. @ChrisRackauckas, do you have any specific use case(s) in mind?

@ChrisRackauckas
Copy link
Member Author

@ChrisRackauckas, do you have any specific use case(s) in mind?

I want to change the size of Jacobian matrices but attempt to re-use some of the same memory / not re-allocate the whole thing. I am not sure if it's possible.

@timholy
Copy link
Member

timholy commented Oct 11, 2017

You could try manually with

julia> buf = Vector{Int}(10)
10-element Array{Int64,1}:
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0

julia> a = Base.ReshapedArray(buf, (5, 2), ())
5×2 reshape(::Array{Int64,1}, 5, 2) with eltype Int64:
 0  0
 0  0
 0  0
 0  0
 0  0

julia> push!(buf, 3)
11-element Array{Int64,1}:
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 3

julia> a
5×2 reshape(::Array{Int64,1}, 5, 2) with eltype Int64:
 0  0
 0  0
 0  0
 0  0
 0  0

julia> buf[2] = 7
7

julia> a
5×2 reshape(::Array{Int64,1}, 5, 2) with eltype Int64:
 0  0
 7  0
 0  0
 0  0
 0  0

You have to create a new wrapper every time you want to resize. But this approach lets you live dangerously and circumvent the "cannot resize array with shared data" error.

The best reason to try it this way and see if it makes a difference is that you might save yourself some time: I'd be a little surprised if this helps that much, unless you're talking very small arrays. These days Julia's GC is pretty impressive.

@timholy
Copy link
Member

timholy commented Oct 11, 2017

Although I just realized I pulled such tricks in TiledIteration, so maybe...

@oschulz
Copy link
Collaborator

oschulz commented Oct 11, 2017

I want to change the size of Jacobian matrices but attempt to re-use some of the same memory / not re-allocate the whole thing. I am not sure if it's possible.

It might be - if it turns out that ElasticArray won't do any worse as a mutable struct, it would be possible to implement it.

Implementing the necessary data movement by hand would be a bit of work, though we could just do it via temorary views for now and live with the alloc/GC views (I wish we had stack-allocated views already ...). And it might not cost anything in certain use cases where the array contents doesn't need to be conserved during resize - I guess that's that case for you, Chris? In such cases, one could resize to size (0,0,...) and then blow the array up again to the desired new size - would result in no data movement. And memory allocation would only occur sometimes if the new total length is greater than ever before in the array's history. Would that fit your use case, Chris?

I'm still not sure about the mutable/immutable struct question, though.

@oschulz
Copy link
Collaborator

oschulz commented Oct 11, 2017

@timholy: These days Julia's GC is pretty impressive.

GC can still quickly become the limiting factor when running on many threads, though, in my experience.

@stevengj
Copy link

stevengj commented Jan 21, 2020

One way to make any dimension elastic, without sacrificing much efficiency, would simply be to allocate padding in each dimension (growing geometrically as needed whenever data is appended). This would mean that the data is no longer contiguous in memory, but:

  • Access is still the same fast column-major operation, just with different strides.
  • Resulting matrices can still be used with LAPACK, because LAPACK supports padding in the matrix columns.
  • Extending any given dimension is a fast operation because of the geometric padding growth — just like how resizing a 1d array is implemented in Julia, growing by N in any dimension would require only O(N) work. Most of the time, extending a given dimension would require no reallocation or data movement.

(Access is exactly equivalent to a SubArray view of a larger allocated array.)

That's why I'm using a Vector and it's native GC-friendly resize capabilities (space stays reserved after shrink) internally.

There's nothing about this growth/shrink algorithm that you can't implement natively in Julia for multiple dimensions — it is just allocating padding and accessing a view.

@oschulz
Copy link
Collaborator

oschulz commented Jan 22, 2020

Hmhm - that would require up-front knowledge of the maximum size, right?

@stevengj
Copy link

stevengj commented Jan 22, 2020

Hmhm - that would require up-front knowledge of the maximum size, right?

No. You do exactly the same thing as push(vector, x) does — when you run out of space, allocate a new array with double the space in that dimension and copy the old data over.

(Because there are only a logarithmic number of re-allocations, you get O(1) amortized time to increase any dimension by 1.)

@oschulz
Copy link
Collaborator

oschulz commented Jan 22, 2020

Ah, yes - but that would indeed require kernel_size to be mutable (currently, ElasticArray is immutable). I'm not sure if making ElasticArray mutable would be good or bad for performance.

There's on other problem though: Resized can then invalidate existing views into the array - from what I understand, that's one of the reasons why standard multi-dim Arrays are not resizeable. @ChrisRackauckas, do you know any details about that? Could Array become resizeable in it's last dimension, in principle?

Maybe we should add a separate type, like MultiElasticArray or so, which does what @stevengj suggests and gives fewer guarantees on views in return?

@alejandroschuler
Copy link

A project I'm working on requires me to iteratively update a matrix X'X as I tack on columns to X in batches. I'm therefore expanding both rows and columns of X'X and this kind of functionality is necessary. I'm using the following barebones implementation, which is along the lines of what has been suggested here.

import Base: *

function _elastic_array(
	public::V, 
	private_size::NTuple{N,Int}, 
	W::Type=typeof(public),
) where V<:AbstractArray{T,N} where {T,N}
	private = W(undef, private_size)
	public_idx = [1:d for d in size(public)]
	private[public_idx...] = public
	public = @view private[public_idx...]
	return public::SubArray{T,N,W}, private::W
end

mutable struct ElasticArray{T, N, V<:DenseArray{T,N}} <: DenseArray{T,N}
	public::SubArray{T,N,V}
	private::V

	function ElasticArray{T,N,V}(public) where {T,N,V}
		new(_elastic_array(public, 2 .* size(public))...)
	end
end

ElasticArray(public) = ElasticArray{eltype(public), ndims(public), typeof(public)}(public)

@inline Base.size(A::ElasticArray) = size(A.public)
@inline Base.getindex(A::ElasticArray, i::Int) = getindex(A.public, i)
@inline Base.setindex!(A::ElasticArray, v, i::Int) = setindex!(A.private, v, i)
@inline Base.pointer(A::ElasticArray, i::Integer) = pointer(A.public, i)
@inline Base.unsafe_convert(::Type{Ptr{T}}, A::ElasticArray{T}) where T = Base.unsafe_convert(Ptr{T}, A.public)
Base.convert(::Type{T}, A::AbstractArray) where {T<:ElasticArray} = A isa T ? A : T(A)
*(A::Array{T}, B::ElasticArray) where T = A*B.public
*(B::ElasticArray, A::Array{T}) where T = B.public*A
Base.show(io, A::ElasticArray) = Base.show(A.public)

function size_or_0(A::V, dims)::NTuple{N,Int} where V<:AbstractArray{T,N} where {T,N}
	Tuple((i in dims) ? d : 0 for (i,d) in enumerate(size(A)))
end

function cat!(A::ElasticArray{T,N,V}, a::V, dims) where {T,N,V}
	idxs_start = 1 .+ size_or_0(A, dims)
	idxs_end = size(A) .+ size_or_0(a, dims)

	if any(idxs_end .≥ size(A.private))
		expand_private!(A, 2 .* idxs_end)
	end

	A.private[(i:j for (i,j) in zip(idxs_start, idxs_end))...] = a
	A.public = @view A.private[(1:j for j in idxs_end)...]
	return A
end

function expand_private!(A::ElasticArray, private_size)
	A.public, A.private = _elastic_array(A.public, private_size, typeof(A.private))
	return A
end

Hopefully it's useful to you all. Naturally, any other views into A.private that exist when the array gets resized end up pointing to old data I suppose. Only way around that would be for ElasticArray to keep track of any of its views and repoint them to the resized A.private whenever that array changes. Probably more work than its worth unless there's a compelling use case for having additional views be compatible with ElasticArray.

@oschulz
Copy link
Collaborator

oschulz commented May 12, 2022

Thanks @alejandroschuler - would you like to try your hand at a PR?

It would be nice if we could share as much code as possible between ElasticArray and ExtraFlexArray (or whatever we want to name it). We could add an AbstractElasticArray supertype to achieve that.

@alejandroschuler
Copy link

I'd like to but I actually only barely know what I'm doing in julia. Re: a supertype: all the shared methods would then be dispatched on that supertype, yeah? In terms of naming, perhaps AbstractElasticArray for the supertype, ElasticArray for my generic implementation, and LastDimElasticArray for the original? Or LastDimElasticArrayViewable to make clear the possible advantage of the latter?

Related question: I noticed that in my implementation matrix multiplication were painfully slow if I didn't manually define *. Presumably this is also the case for other operations. Is there a better way than going through tons and tons of functions and repeatedly defining: f(A::ElasticArray, ...) = f(A.public, ...) for each one of them? Any function f that works for a subarray (A.public) should ideally be defined in this way so that it's optimized when operating on my ElasticArray, unless otherwise defined (though I think the only place we need to operate on A.private is in setindex!). Is there a better way of doing this? In python I'd do some complicated stuff with __getattr__ so that methods that are not defined for A would instead get called on A.public. It'd be disappointing if there weren't a less clunky way to achieve the same effect in julia.

@oschulz
Copy link
Collaborator

oschulz commented May 12, 2022

I'd like to but I actually only barely know what I'm doing in julia.

We'll have a try if you like. :-) I can put this on my to-do list, but I'm afraid quite a few high-priority things are already overdue, so I probably won't be able to get to this myself soon.

Re: a supertype: all the shared methods would then be dispatched on that supertype, yeah?

Yes

In terms of naming, perhaps AbstractElasticArray for the supertype, ElasticArray for my generic implementation, and LastDimElasticArray for the original?

That would force us to go to v2.0.0, because it would break backward compatibility. We shouldn't change the name of the existing type. I'd be fine with ExtraFlexArray or SuperElasticArray or so for the new type.

I noticed that in my implementation matrix multiplication were painfully slow if I didn't manually define

Linear algebra is fast for the current ElasticArray because it's dense in memory and so can specialize Base.pointer(A). An approach like suggested by @stevengj - reserving space in any growing dimensions in larger chunks - would be efficient regarding resize (O(1) amortized time) but couldn't use optimized BLAS-based linear algebra. Still, the generic linear algebra operations shouldn't be horridly slow (just not reach BLAS-speed).

s there a better way than going through tons and tons of functions and repeatedly defining [...]

Take a look at https://discourse.julialang.org/t/method-forwarding-macro/23355 , but it's not a magic pill.

In the end, if storage can't be compact, I don't think you can reach optimal linear algebra speed. But if you want to extend the first dimension and still keep things compact im memory, vcat on a standard Matrix may be the best choice (and not necessarily inefficient of the chunks are comparatively large.

@alejandroschuler
Copy link

Sounds good. We'll see who gets to it first!

Re: naming: can you tell I'm not a software engineer? haha. Yes, preserving backwards compatibility makes sense.

Thanks a bunch for that link, seems like TypedDelegation is exactly what I need here if I can figure out how to except setindex!.

@alejandroschuler
Copy link

vcat on a standard Matrix may be the best choice (and not necessarily inefficient of the chunks are comparatively large.

in my application it's lots of little chunks so this ended up being a huge bottleneck and that's why I'm here :)

@oschulz
Copy link
Collaborator

oschulz commented May 12, 2022

Re: naming: can you tell I'm not a software engineer? haha. Yes, preserving backwards compatibility makes sense.

No worries! :-) If you're interested, here's some more info on Julia and semantic versioning: https://pkgdocs.julialang.org/v1/compatibility/#Version-specifier-format

in my application it's lots of little chunks so this ended up being a huge bottleneck and that's why I'm here :)

Ah, yep, then @stevengj's scheme (see above) will be the way to go. And you'll have to do linear algebra operations with it while it grows, not just at the end, right? In that case, there's no way around generic, pure-Julia linear algebra. However, there are quite a few fast packages in that direction now that you can try out (list likely incomplete): https://github.com/mcabbott/Tullio.jl, https://github.com/MasonProtter/Gaius.jl, https://github.com/chriselrod/PaddedMatrices.jl

We can have ElasticArrays depend on any of them, they're far too "heavy", but your application could use them directly the new "super-elastic" arrays.

@oschulz
Copy link
Collaborator

oschulz commented May 12, 2022

A project I'm working on requires me to iteratively update a matrix X'X as I tack on columns to X in batches.

You mean X' * X? Maybe there's also another way to approach this - what shape does X have, and in which direction(s) does it grow in your application?

@alejandroschuler
Copy link

You mean X' * X? Maybe there's also another way to approach this - what shape does X have, and in which direction(s) does it grow in your application?

The naive implementation looks like this:

# n ≈ 1000, k ≈ 10
X = Matrix(undef, n, 0)
for iter in 1:total_iterations
    X_new_batch = generate_new_batch(result) # size (n,k)
    X = hcat(X, X_new_batch)
    P = X'X
    result = heavy_computation(X, P)
end

@oschulz
Copy link
Collaborator

oschulz commented May 12, 2022

You mean X' * X? Maybe there's also another way to approach this - what shape does X have, and in which direction(s) does it grow in your application?

The naive implementation looks like this:

    X_new_batch = generate_new_batch(result) # size (n,k)
    X = hcat(X, X_new_batch)
    P = X'X

Would this work for you, performance-wise? Store X as an ElasticArray, then you can replace the hcat with an efficient append! - in that direction an ElasticArray can already grow, after all. But don't run X' * X explicitly - instead, whenever you have to run P * v run X' * (X * v) instead. You can use LinearMaps.jl to do that for you transparently:

using LinearMaps
...
X_ea = ElasticArray(...) # Initial X
X = LinearMap(X_ea)  # X.lmap === X_ea
P = Xm' * X  # O(1) operation, doesn't run any calculation, just creates a new LinearMap
...
append!(Xm.lmap, ...) # As Xm.lmap === X_ea grows, P grows as well
result = heavy_computation(X, P)

P * v with now run X_ea' * (X_ea * v) under the hood. As long as heavy_computation only uses operations like multiplications and so on, this scheme can be very efficient.

@alejandroschuler
Copy link

Thanks @oschulz, but no dice :( heavy_computation asks for P::AbstractMatrix. It's a complex optimization that involves factorizations of P so I don't think the linear maps approach works here. Moreover, even if it did, doing it this way would effectively repeat a lot of computations of upper-left blocks of X'X which is what I'd like to avoid in the first place. It's quite elegant though, so I'm sure this will come in handy for me at some point- thanks anyways!

@oschulz
Copy link
Collaborator

oschulz commented May 14, 2022

heavy_computation asks for P::AbstractMatrix

Ah, yes, that's a problem when using LinearMaps sometimes, I suggested to make LinearMap a subtype of AbstractMatrix a few days ago: JuliaLinearAlgebra/LinearMaps.jl#180

@oschulz
Copy link
Collaborator

oschulz commented May 14, 2022

It's a complex optimization that involves factorizations of P so I don't think the linear maps approach works here

Yes, if there's explicit factorization and so on, LinearMaps won't help. But in that case, will an elastic array really safe much memory allocation? After all, these factorization will then be allocated very frequently anyhow and be of similar size, right? Or is all of it in-place?

@alejandroschuler
Copy link

Yes, if there's explicit factorization and so on, LinearMaps won't help. But in that case, will an elastic array really safe much memory allocation? After all, these factorization will then be allocated very frequently anyhow and be of similar size, right? Or is all of it in-place?

Yeah it all has to be reallocated to make this happen, but I think the speedup comes from having to do it once instead of twice. Empirically it's much faster than the hcat implementation.

@oschulz
Copy link
Collaborator

oschulz commented May 14, 2022

My you do have a tricky use case there! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants