Replies: 1 comment
-
@RHobart 简单的数据流是1个Input,1个或多个Filter 以及1个或多个Output组成的。因为不同的Input之间的DataSet可能是异构的,所以没有办法多个Filter在多个Input的基础上进行操作。 但是的确存在多Input的使用场景,比如多数据源之间的Join或者复杂的数据处理逻辑,这类处理我们通过把Input注册为临时表来解决,可以通过Filter中指定source_table来确定到底处理的是哪个数据源。 这类配置文件可以参考:https://github.com/InterestingLab/waterdrop/blob/master/config/complex.conf.template |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
`private def batchProcessing(
sparkSession: SparkSession,
configBuilder: ConfigBuilder,
staticInputs: List[BaseStaticInput],
filters: List[BaseFilter],
outputs: List[BaseOutput]): Unit = {
}`
Beta Was this translation helpful? Give feedback.
All reactions