Support for Querying using UDF Type #440
Replies: 7 comments 10 replies
-
@jarulraj @xzdandy @pchunduri6 @kaushikravichandran @LordDarkula For visibility |
Beta Was this translation helpful? Give feedback.
-
Isn't this a bit confusing? Should we make the differentiation b/w ObjectRecognition being a UDF or a TYPE more explicit? |
Beta Was this translation helpful? Give feedback.
-
We wouldn't need to keep track of enums. Instead, the |
Beta Was this translation helpful? Give feedback.
-
Could you put some more detail into the actual design. What is the motivation for this? Why is it beneficial for the user to specify a UDF type rather than a specific UDF? Perhaps a concrete example could clear this up. |
Beta Was this translation helpful? Give feedback.
-
I think the choosing the physical UDF for a logical UDF should be a rule in the optimizer instead of during query parsing or query binding. |
Beta Was this translation helpful? Give feedback.
-
Have a rough/draft PR that works with LATERAL JOIN query using the Logical Type. As mentioned above, we might need to decide on:
Will add this shortly in the Discussion also |
Beta Was this translation helpful? Give feedback.
-
Updated Discussion with the new changes in the PR |
Beta Was this translation helpful? Give feedback.
-
Goal and Motivation
To enable the following functionality:
SELECT id, ObjectRecognition(data) FROM MyMovies;
Which will automatically select the corresponding UDF that can perform Object Recognition from the list of created UDFs. The motivation behind this is to enable users to be able to just specify the task instead of selecting a specific UDF to run. Additionally, by controlling the selection of the UDF ourselves, we would be able to focus on carefully selecting the best UDF considering the available resources, input and task requirements.
Assumption: All UDFs of the same type should have the same set of inputs and outputs.
Implementation
We are already storing the
TYPE
attribute while creating the UDF and this attribute is currently not being used anywhere. We can reuse this attribute to store the type of task the UDF can be used for (eg. FaceRecognition, ObjectDetection, EmotionClassification etc).Sample Query:
When a query such as the following:
SELECT id, ObjectRecognition(data) FROM MyMovies;
is triggered, we will first try to match the name of the UDF (
ObjectRecognition
) to (in order):Steps:
2.1. In
statement_binder.py
, we would setexpr.function
toNone
andexpr.function_type
to the passed name.2.2. During Logical Plan to Physical Plan conversion, for each FunctionExpression encountered, we will check if
expr.function
isNone
and if true, we will load any UDF that matches the type set inexpr.function_type
For Reference: Check for changes in
statement_binder.py
,rules.py
andfunction_expression.py
in PR#441Pending Tasks
Beta Was this translation helpful? Give feedback.
All reactions