You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I was doing some testing, there seems to be some type instability related to the Schema Struct. Just as a MWE:
f =@formula(a ~ b + c)
data = (
a =randn(1000),
b =randn(1000),
c =randn(1000)
)
functionsimple_function(f, data)
sch =apply_schema(f, schema(f, data))
sch.lhs
end@code_warntypesimple_function(f, data)
# MethodInstance for simple_function(::FormulaTerm{Term, Tuple{Term, Term}}, ::NamedTuple{(:a, :b, :c), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float64}}})# from simple_function(f, data) in Main at c:\Users\underwjo\Documents\GitHub\AbnormalReturns.jl\benchmark\run_benchmark.jl:203# Arguments# #self#::Core.Const(simple_function)# f::FormulaTerm{Term, Tuple{Term, Term}}# data::NamedTuple{(:a, :b, :c), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float64}}}# Locals# sch::FormulaTerm# Body::Any# 1 ─ %1 = Main.schema(f, data)::StatsModels.Schema# │ (sch = Main.apply_schema(f, %1))# │ %3 = Base.getproperty(sch, :lhs)::Any# └── return %3
Similarly, if the last line of the function was modelcols(sch, data), then the compiler interprets the return type as Tuple{Any, Any}.
Maybe this isn't really a problem since Julia can optimize it later. For example, just doing each step in the REPL and then calling modelcols(sch, data), the compiler correctly interprets the type as Tuple{Vector{Float64}, Matrix{Float64}}. Therefore, if this isn't really a problem, then please go ahead and close this issue.
The text was updated successfully, but these errors were encountered:
That's a really interesting observation and one that I hadn't thought of (the fact that inferrability of types is blocked by the dict in the schema). I think it's not something we can/should/want to optimize for at this point, since the performance of the modelcols bit is actually quite good; the main bottleneck for most users (and in my own experience running lots and lots of regressions) is that we're TOO aggressive about specializing things on the types of terms (e.g., use tuples to represent collections of terms; type parameters encode the types of child terms, etc.). So we'll probably want to go in the opposite direction of LESS specialization. But I honestly haven't had the time to really sit down and play around with these choices so I'm certainly open to suggestions!
When I was doing some testing, there seems to be some type instability related to the Schema Struct. Just as a MWE:
Similarly, if the last line of the function was
modelcols(sch, data)
, then the compiler interprets the return type asTuple{Any, Any}
.Maybe this isn't really a problem since Julia can optimize it later. For example, just doing each step in the REPL and then calling
modelcols(sch, data)
, the compiler correctly interprets the type asTuple{Vector{Float64}, Matrix{Float64}}
. Therefore, if this isn't really a problem, then please go ahead and close this issue.The text was updated successfully, but these errors were encountered: