Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Multi agent #5844

Closed
wants to merge 4 commits into from
Closed

feat: Multi agent #5844

wants to merge 4 commits into from

Conversation

exu
Copy link
Member

@exu exu commented Sep 11, 2024

Pull request description

Checklist (choose whats happened)

  • breaking change! (describe)
  • tested locally
  • tested on cluster
  • added new dependencies
  • updated the docs
  • added a test

Breaking changes

Changes

Fixes

@exu exu force-pushed the sandbox/multiagent branch 3 times, most recently from 0d0d34c to fc3c3af Compare September 18, 2024 08:10
@exu exu changed the base branch from develop to main September 20, 2024 05:53
@exu exu marked this pull request as ready for review September 26, 2024 13:53
@exu exu requested a review from a team as a code owner September 26, 2024 13:53

// Check Pro/Enterprise subscription
var subscriptionChecker checktcl.SubscriptionChecker
if mode == common.ModeAgent {
subscriptionChecker, err = checktcl.NewSubscriptionChecker(ctx, proContext, grpcClient, grpcConn)
exitOnError("Failed creating subscription checker", err)

// Load environment/org details based on token grpc call
environment, err := controlplanetcl.GetEnvironment(ctx, proContext, grpcClient, grpcConn)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executor need to follow loading it from this one - not from inlined env variables

func (ag *Agent) executeCommand(ctx context.Context, cmd *cloud.ExecuteRequest) *cloud.ExecuteResponse {
switch {
case cmd.Url == healthcheckCommand:
case cmd.Url == HealthcheckCommand || cmd.Command == string(cloud.HealthcheckCommand):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed anymore

@@ -0,0 +1,47 @@
package handlers
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be better to have some errors wrapped
e.g.

errors.Wrap(err, errors.NotFound)

and next the mapper which will check what kinds of error this is and set valid response

@@ -542,11 +532,55 @@ func (e *executor) Execute(ctx context.Context, workflow testworkflowsv1.TestWor
log.DefaultLogger.Errorw("failed to encode tags", "id", id, "error", err)
}

// Get (for centralized mode) TW execution or create it
if request.Id != "" {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to de-concrete executions logic from here to split fully executor engine from data implementation

execution.Tags = tags

// Insert or save execution
if request.Id != "" {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exctract it outside?

executor could be in form

func(ExecutionRequest, Workflow) Result


// TODO - valid error handling

func NewExecuteTestWorkflowHandler(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for sure not needed:

  • There is already option to execute Test Workflow via generic gRPC command we have (that calls API)
  • For future Execution Worker, the command will look differently - as the Execution Worker needs to be stateless

Copy link
Member Author

@exu exu Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've planned here to fully decouple from API server - as it's additional unneded thing. So it's quite needed to not spawn this APIServer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These was the attempt to decouple it from the API server totally - can be refactored/reordered if we decide about API later

Copy link
Member

@rangoo94 rangoo94 Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've planned here to fully decouple from API server - as it's additional unneded thing. So it's quite needed to not spawn this APIServer

But when it will be decoupled from the API Server, the signature will also differ, so (A) this function is for decoupling, yet (B) it needs to be deleted (and replaced with a new handler) after decoupling, as it will look differently 🙂 So probably better to not pollute the new gRPC schema with obsolete functions

@rangoo94
Copy link
Member

rangoo94 commented Sep 26, 2024

Considering that in this PR the runner IDs are not dynamic (but needs to be pre-created), either:

  • There should be no Runner ID thing, but rather separate Agent Keys for each of them
  • The Runner ID should be completely dynamic (and informal + not unique)

On the other hand, the best solution is to avoid runner IDs at all, and have runner tags instead (like K8S nodeAffinity -> Testkube runnerAffinity). It would work great:

  • simple,
  • Kubernetes-like,
  • fully customizable,
  • scalable (as runners could be created on demand too)
  • runnerAffinity could be defined on any level (globally, per workflow, per execution)

@exu
Copy link
Member Author

exu commented Sep 27, 2024

Considering that in this PR the runner IDs are not dynamic (but needs to be pre-created), either:

  • There should be no Runner ID thing, but rather separate Agent Keys for each of them
  • The Runner ID should be completely dynamic (and informal + not unique)

On the other hand, the best solution is to avoid runner IDs at all, and have runner tags instead (like K8S nodeAffinity -> Testkube runnerAffinity). It would work great:

  • simple,
  • Kubernetes-like,
  • fully customizable,
  • scalable (as runners could be created on demand too)
  • runnerAffinity could be defined on any level (globally, per workflow, per execution)

But you need some kind of ID here (whatever we name it) to e.g. schedule against it - I'm not sure if we really should follow Kubernetes naming here at all. These points are both valid for IDs and tags like in affinity.

I agree about separate keys - if we want to decouple runners from environments - it'll be next thing to do for sure.

@rangoo94
Copy link
Member

rangoo94 commented Sep 27, 2024

But you need some kind of ID here (whatever we name it) to e.g. schedule against it

  • You only need to target a tag, not ID - if someone wants to have actually unique way to identify them, simply there may be some tag unique 🙂
  • IDs may be required, but not in the way that User needs to pass them and register in the Cloud. They may only be useful to pull the already started execution information, or control it (pause, resume, abort), It would be enough to provide such ID on the Agent (or rather Worker/Runner) only then (and associate to i.e. execution)

@exu exu closed this Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants