Replies: 13 comments
-
Hi @mle-els, thanks for your suggestion. It's cool to see that it only took you ~60 lines of code to customize Kedro to your needs! However, I don't see how your I noticed this in your README:
Could you describe this problem in a bit more detail? |
Beta Was this translation helpful? Give feedback.
-
Hi @astrojuanlu, thanks for your response. The bit about typos: when dataset names are strings, IDEs can't check that they actually match. I ran into this problem before when some code changes lead to the wrong dataset being used. When dataset names are variable names, this kind of issues will be caught sooner. |
Beta Was this translation helpful? Give feedback.
-
Thanks @mle-els. If I understand correctly there are two issues at hand:
What do you think of this assessment? |
Beta Was this translation helpful? Give feedback.
-
Linking this |
Beta Was this translation helpful? Give feedback.
-
Hi @astrojuanlu, I thought of the decorator syntax but didn't find it satisfactory because of 2 reasons:
About VSCode extension, not everyone uses VSCode. I think an alternative API will apply more broadly. Plus, installing an extension is an extra step added to the learning curve. I think the most important feature of the proposed builder is that it looks just like normal Python code. Everyone is familiar with how to write and comprehend a Python function. By being friendly and intuitive, you'll make it more attractive for new users. |
Beta Was this translation helpful? Give feedback.
-
Clarification on "VSCode extension": the Language Server Protocol (LSP) is now understood by VSCode, vim, JupyterLab, and others. So kedro-lsp is not exclusive to VSCode. Looks like the only major editor missing LSP support is the IntelliJ family. Having said that, I sympathize with not tying the Kedro DX to a specific family of editors, and exploring alternative APIs that make inspection easy. However, the Just to set the expectations clear, although this touches adjacent issues that are already in our radar, we have limited resources and I don't think we're going to look into this particular API any time soon. |
Beta Was this translation helpful? Give feedback.
-
HI @astrojuanlu, here is an alternative design that I thought of but it's harder to pull off so I went for @kedro_function
def func1(x) -> Any:
...
@kedro_function
def func2(y) -> Any:
...
def create_pipeline():
with kedro_pipeline_builder() as pb:
y = func1('input', output_names='dataset_y')
out = func2(y, output_names='dataset_z')
return pb.build() |
Beta Was this translation helpful? Give feedback.
-
We could also wrap datasets and params into variables like |
Beta Was this translation helpful? Give feedback.
-
variance_pipeline = pipeline(
[ # flat list makes it hard to "see" the DAG
node(len, "xs", "n"), # stitched together by strings encoding names of input/output variables
node(mean, ["xs", "n"], "m", name="mean_node"),
node(mean_sos, ["xs", "n"], "m2", name="mean_sos"),
node(variance, ["m", "m2"], "v", name="variance_node"),
]
)
graph TD
a[x]-->|*arg|len[len]
len-->|n|mean[mean]
a-->|xs|mean[mean]
len-->|n|sos[mean_sos]
a-->|xs|sos[mean_sos]
sos-->|m2|var[variance]
mean-->|m|var[variance]
|
Beta Was this translation helpful? Give feedback.
-
Hi @dertilo , thanks for chiming in. Your proposal suffers from the same problem as the original one:
where does |
Beta Was this translation helpful? Give feedback.
-
@dataclass
class SomeDatum:
some_id:str
data: WhateverData # text,image,video,array,tensor, who-cares
more_data: FooBar
@dataclass
class SomeInputNode(Iterable[SomeDatum]):
"""
this could be an input-node to some data-processing DAG
"""
name:str
def __iter__(self) -> Iterator[SomeDatum]:
# some data-loading code here
graph TD
a[SomeInputNode]-->|x|b[some DAG here]
|
Beta Was this translation helpful? Give feedback.
-
Thanks @dertilo - then for the "free inputs" one would still need to use Python strings. One of the nice things of the current design is that free inputs and intermediate results are all treated the same, which gives the pipeline definition a uniform aspect. Another useful thing is that the "wrapping" of the nodes happens outside of the node definition. This was discussed in #2471 - by having decorators like other frameworks and SDKs, the nodes are coupled with the framework in question. In Kedro, nodes can be reused because they are regular, pure functions that do no I/O (see https://sans-io.readthedocs.io/, https://rhodesmill.org/brandon/slides/2015-05-pywaw/hoist/).
Mathematically, a graph is a set of nodes
At some point the user needs to specify the names of the datasets, either for everything (current design) or for the free inputs (proposed designs). Dataset names are no different from dictionary keys that represent something else - with the caveat that, yes, such keys are not immediately available in the IDE. |
Beta Was this translation helpful? Give feedback.
-
Hi! @mle-els in case it helps, we released a VSCode extension that makes navigating between the Python and YAML files from a Kedro project a breeze https://marketplace.visualstudio.com/items?itemName=kedro.Kedro Turning this issue into a Discussion to continue the conversation there |
Beta Was this translation helpful? Give feedback.
-
Description
I doesn't like the way pipelines are specified. It's verbose and hard to read. When looking for workflow software, I almost skipped Kedro because the pipeline code is so ugly.
Context
A better way of building pipelines would make my code more pleasant to write and read. It should help other people too and help Kedro recruit more users.
Possible Implementation
UPDATE: THIS IS A NEW PROPOSAL:
END UPDATE
I have made a simple implementation of a pipeline builder, modeling after the
spaceflights
example. The pipeline can be built like this which works exactly the same way as the original pipeline:Executable code can be found in https://github.com/mle-els/kedro-pipeline-builder
Beta Was this translation helpful? Give feedback.
All reactions