Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add method for getindex(::ProductIterator, inds...) #49965

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

MasonProtter
Copy link
Contributor

@MasonProtter MasonProtter commented May 26, 2023

Currently, ProductIterator has a rather artificial limitation of not supporting getindex, even though there's nothing about ProductIterator itself that should preclude it from supporting this function.

It's true that ProductIterator may contain iterators that don't support indexing, but if that's the case, then it should be the job of the enclosed iterators to throw an error saying they don't support getindex, not ProductIterator.

Here's some examples of things this enables:

julia> let itr = Iterators.product(rand(3), rand(3))
           itr[2, 1]
       end
(0.17086814961424035, 0.8039107344409738)

julia> let itr = Iterators.product(Dict(:a => 1, :b => 2), [1, 2, 3])
           itr[:b, 3]
       end
(2, 3)

julia> let itr = Iterators.product()
           itr[]
       end
()

and something that rightfully errors:

julia> let itr = Iterators.product(rand(3), (x for x in 1:10 if rand(Bool)))
           itr[1, 2]
       end
ERROR: MethodError: no method matching getindex(::Base.Generator{Base.Iterators.Filter{var"#10#11", UnitRange{Int64}}, typeof(identity)}, ::Int64)

Here, I've implemented the most straightforward case of indexing hoping to solicit comments on what this functionality should offer. For instance, this does not currently support things like linear indexing:

julia> Iterators.product([:a, :b, :c], ['i', 'j', 'k'])[2, 3]
(:b, 'k')

julia> Iterators.product([:a, :b, :c], ['i', 'j', 'k'])[2]
ERROR: BoundsError: attempt to access Tuple{} at index [1]

which I think is okay because this object isn't an abstract array. Thoughts on this direction?

@FelixBenning
Copy link

I wonder wether you can ensure better error messages than hoping that the errors from the individual iterators is going to be fine. Base.HasLength() is the wrong check though...

base/iterators.jl Outdated Show resolved Hide resolved
@FelixBenning
Copy link

What I would also really like is to be able to do

julia> vec(PermutedDimsArray(Iterators.product(rand(3), 1:4), (2,1)))
12-element reshape(PermutedDimsArray(::Matrix{Tuple{Float64, Int64}}, (2, 1)), 12) with eltype Tuple{Float64, Int64}:
 (0.39059299961662275, 1)
 (0.39059299961662275, 2)
 (0.39059299961662275, 3)
 (0.39059299961662275, 4)
 (0.1864604429198029, 1)
 (0.1864604429198029, 2)
 (0.1864604429198029, 3)
 (0.1864604429198029, 4)
 (0.3248860864098778, 1)
 (0.3248860864098778, 2)
 (0.3248860864098778, 3)
 (0.3248860864098778, 4)

to get this output at the moment it is necessary to collect the ProductIterator. This is because PermutedDimsArray only accepts AbstractArrays and ProductIterator is not an AbstractArray. For this reason it might make sense to got the route

struct VectorProduct{T,N} <: AbstractArray{NTuple{N,T},N}
    vectors::NTuple{N,AbstractVector{T}}
end
product(vectors::Vector...) = VectorProduct(vectors)

unfortunately this can not also be a subtype of an AbstractProductIterator because julia forbids multiple inheritance. So this would have to be somehow implemented with traits I think

@FelixBenning
Copy link

FelixBenning commented May 29, 2023

maybe something like this to get the subtype of AbstractArray?

function _eltype_iterator_tuple(::Type{T}) where {T<:Tuple}
    return Tuple{ntuple(n -> eltype(fieldtype(T, n)), Base._counttuple(T))...}
end

struct ProductArray{T<:Tuple,Eltype,N} <: AbstractArray{Eltype,N}
    vectors::T
    ProductArray(vectors::T) where {T} = new{T,_eltype_iterator_tuple(T),Base._counttuple(T)}(vectors)
end

struct AllLinearIndexed end
struct GeneralIterators end

function _all_linear_indexed(::T) where {T<:Tuple}
    all(ntuple(
        n -> hasmethod(Base.getindex, (fieldtype(T, n), Int)),
        Base._counttuple(T)
    )) && return AllLinearIndexed()
    return GeneralIterators()
end
product(iterators...) = _product(_all_linear_indexed(iterators), iterators)
_product(::AllLinearIndexed, vectors) = ProductArray(vectors)
_product(::GeneralIterators, vectors) = Base.ProductIterator(vectors)

with a const Product = Union{ProductArray,ProductIterator} it might be possible to implement all methods at once.

I am still not sure if hasmethod(Base.getindex, ( ..., Int)) is the right question to ask.

@MasonProtter
Copy link
Contributor Author

I think your ProductArray could be a good idea, but I also think it's outside of the scope of this PR.

@FelixBenning
Copy link

I can see why you think this is out of scope - but this idea also requires getindex so it does not make much sense as a stand-alone PR. Maybe I should merge this branch into the other feature branch 🤔

@FelixBenning FelixBenning mentioned this pull request May 29, 2023
@@ -1128,6 +1128,7 @@ end
reverse(p::ProductIterator) = ProductIterator(Base.map(reverse, p.iterators))
last(p::ProductIterator) = Base.map(last, p.iterators)
intersect(a::ProductIterator, b::ProductIterator) = ProductIterator(intersect.(a.iterators, b.iterators))
getindex(p::ProductIterator, inds...) = map(getindex, p.iterators, inds)
Copy link

@FelixBenning FelixBenning Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here is an actual problem with this implementation:

julia> a = Iterators.product([1 2; 3 4], 1:2);

julia> b = collect(a)
2×2×2 Array{Tuple{Int64, Int64}, 3}:
[:, :, 1] =
 (1, 1)  (2, 1)
 (3, 1)  (4, 1)

[:, :, 2] =
 (1, 2)  (2, 2)
 (3, 2)  (4, 2)

julia> b[1,1,1]
(1, 1)

julia> map(getindex, a.iterators, (1,1,1))
ERROR: BoundsError: attempt to access Tuple{} at index [1]
Stacktrace:
 [1] getindex(t::Tuple, i::Int64)
   @ Base ./tuple.jl:29
 [2] map (repeats 3 times)
   @ ./tuple.jl:250 [inlined]
 [3] top-level scope
   @ REPL[24]:1

i.e. map assumes every iterator is one dimensional. The following should be a fix I think

function getindex(prod::ProductIterator, indices...)
    return _prod_getindex(prod.iterators, indices...)
end
_prod_getindex(::Tuple{}) = ()
function _prod_getindex(p_vecs::Tuple, indices...)
    v = first(p_vecs)
    n = ndims(v)
    return (
        v[indices[1:n]...],
        _prod_getindex(Base.tail(p_vecs), indices[n+1:end]...)...
    )
end

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you also need to manage all types of access variants (colons, ranges, ...) if you don't limit indices to Vararg{Int,N}. But for this you need to determine N (i.e. ndims) statically.

@aplavin
Copy link
Contributor

aplavin commented Jun 1, 2023

Not to discourage extending Iterators functionality, just for information: product-as-array is available in packages, eg see RectiGrids.jl.

julia> using RectiGrids

julia> grid([:a,:b], 1:3)
2-dimensional KeyedArray(...) with keys:
   2-element Vector{Symbol}
   3-element UnitRange{Int64}
And data, 2×3 RectiGrids.RectiGridArr{Base.OneTo(2), Tuple{Symbol, Int64}, 2, Tuple{Nothing, Nothing}, Tuple{Vector{Symbol}, UnitRange{Int64}}}:
        (1)         (2)         (3)
  (:a)     (:a, 1)     (:a, 2)     (:a, 3)
  (:b)     (:b, 1)     (:b, 2)     (:b, 3)

It returns a keyed array to keep track of individual axes and allow indexing by values and not just ordinal numbers. Overwise, it's just a lazy nd array.

@FelixBenning
Copy link

FelixBenning commented Jun 2, 2023

"product-as-array" is available in packages, eg see RectiGrids.jl. @aplavin

While really cool, this is not equivalent to Iterators.product. Here are a few dumb edge cases:

julia> grid(1:2, (:a, :b))
ERROR: MethodError: no method matching grid(::UnitRange{Int64}, ::Tuple{Symbol, Symbol})
Closest candidates are:
  grid(::AbstractVector...) at ~/.julia/packages/RectiGrids/K7F4a/src/RectiGrids.jl:88
Stacktrace:
 [1] top-level scope
   @ REPL[24]:1

julia> grid([1 2; 3 4], (:a, :b))
ERROR: MethodError: no method matching grid(::Matrix{Int64}, ::Tuple{Symbol, Symbol})
Stacktrace:
 [1] top-level scope
   @ REPL[25]:1

julia> Iterators.product(1:2, (:a, :b)) |> collect
2×2 Matrix{Tuple{Int64, Symbol}}:
 (1, :a)  (1, :b)
 (2, :a)  (2, :b)

julia> Iterators.product([1 2; 3 4], (:a, :b)) |> collect
2×2×2 Array{Tuple{Int64, Symbol}, 3}:
[:, :, 1] =
 (1, :a)  (2, :a)
 (3, :a)  (4, :a)

[:, :, 2] =
 (1, :b)  (2, :b)
 (3, :b)  (4, :b)

I actually ended up writing a small package which attempts to behave exactly like a lazy Iterators.product. If it ever doesn't, that counts as a bug - please report 😅 : https://github.com/lazyLibraries/ProductArrays.jl should be available in the official repository soon

@aplavin
Copy link
Contributor

aplavin commented Jun 2, 2023

Trying to make a comprehensive list of this kind of functionality in packages.

@brenhinkeller brenhinkeller added the iteration Involves iteration or the iteration protocol label Aug 6, 2023
@mcabbott
Copy link
Contributor

mcabbott commented Jan 15, 2024

Should eachindex(it) do something? It could return CartesianIndices(axes(it)) provided getindex accepted those... which pushes you closer to allowing full array-style indexing.

it = Iterators.product(1:2, 3:5);
Base.IteratorSize(it)  # Base.HasShape{2}()
axes(it)  # (Base.OneTo(2), Base.OneTo(3))
it[2, 3]  # this PR

length(it)  # 6
it[end]  # should this work?

it2 = Iterators.product(1:2, (x for x in 3:5 if true))
Base.IteratorSize(it2)  # Base.SizeUnknown()
axes(it2, 1)  # should this work?
it2[2, :]  # could return Iterators.product(2, it2.iterators[2])

Edit, after reading the first message :) ... the Dict example perhaps pushes you away from CartesianIndices:

it3 = Iterators.product(Dict(:a => 1, :b => 2), [1, 2, 3])  # from above
axes(it3)  # (Base.OneTo(2), Base.OneTo(3))
it3[2, 3]  # does not work with this PR
collect(it3)[2, 3]  # this is what `axes` refers to?

See also #52343, where Threads.@threads wants firstindex etc.

@LilithHafner
Copy link
Member

I don't love that Iterators.product(reshape(1:9, 3,3), 1:2))[3,2] == (3, 2) and Iterators.product(reshape(1:9, 3,3), 1:2))[3,1,2] errors while the reverse is the case after collecting.

If iterators with HashShape consume a number of indices corresponding to their shape that would fix this example by making non-vector arrays never index with linear indexing instead of always indexing with linear indexing.

OTOH the current implementation works with eachindex(prod::ProductIterator) = product(map(keys, prod.iterators)...).

Tagging for triage as @MasonProtter asked for feedback in the #triage slack channel

@LilithHafner LilithHafner added the triage This should be discussed on a triage call label Jan 18, 2024
@thchr
Copy link
Contributor

thchr commented Jan 23, 2024

I think that if Cartesian indexing is implemented for ProductIterator, then linear indexing should also be implemented. I.e., it would be nice if the following was possible:

julia> P = Iterators.product(1:3, 4:5, 9:10)
julia> A = collect(P)
3×2×2 Array{Tuple{Int64, Int64, Int64}, 3}:
[:, :, 1] =
 (1, 4, 9)  (1, 5, 9)
 (2, 4, 9)  (2, 5, 9)
 (3, 4, 9)  (3, 5, 9)
 
[:, :, 2] =
 (1, 4, 10)  (1, 5, 10)
 (2, 4, 10)  (2, 5, 10)
 (3, 4, 10)  (3, 5, 10)

julia> ijks = (1,2,2)
julia> n = LinearIndices(A)[ijks...] # = 10

julia> A[ijks...] == A[n] # this already holds
julia> P[ijks...] == P[n] = A[ijks...] # this should also hold and be possible

An implementation could be borrowed from Base._ind2sub:

getindex(p::ProductIterator, ind::Integer) = getindex(p, CartesianIndex(_ind2sub(size(p), ind)))
getindex(p::ProductIterator, inds::Integer...) = getindex(p, CartesianIndex(inds))
function Base.getindex(p::Iterators.ProductIterator, inds::CartesianIndex)
    length(inds) == length(p.iterators) || prod_indexing_error(p, inds)
    Base.map(getindex, p.iterators, Tuple(inds))
end

(although I don't know if the use of Tuple(::CartesianIndex) is an antipattern?)

Somewhat relatedly, is it intended that P[1,2,2,1,1,1,1] throws while A[1,2,2,1,1,1,1] does not? I.e., redundant unit-indices are allowed on indexing of Arrays but not of ProductIterators.

@MasonProtter MasonProtter added triage This should be discussed on a triage call and removed triage This should be discussed on a triage call labels Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iteration Involves iteration or the iteration protocol triage This should be discussed on a triage call
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants