Add leader election#6683
Conversation
e0ad305 to
7b5ee47
Compare
7b5ee47 to
26fb8d9
Compare
26fb8d9 to
210423a
Compare
210423a to
8eae52e
Compare
mattmoor
left a comment
There was a problem hiding this comment.
I'll TAL at the pkg PR next.
Gopkg.toml
Outdated
| [[override]] | ||
| name = "knative.dev/pkg" | ||
| branch = "master" | ||
| source = "github.com/pmorie/pkg" |
There was a problem hiding this comment.
Pro tip: I usually throw a // TODO(mattmoor): DO NOT SUBMIT around this, so that the bot catches it and leaves a comment.
cmd/autoscaler/main.go
Outdated
| if !leConfig.LeaderElect { | ||
| log.Printf("autoscaler will not run in leader-elected mode") | ||
| run(ctx) | ||
| panic("unreachable") |
There was a problem hiding this comment.
is this not hit when the context is cancelled?
There was a problem hiding this comment.
(which happens when K8s sends SIGTERM)
| return err == nil, nil | ||
| }); perr != nil { | ||
| log.Fatal("Timed out attempting to get k8s version: ", err) | ||
| } |
There was a problem hiding this comment.
If you'd left this as is, then you could save some kubeclient.Get calls below.
cmd/autoscaler/main.go
Outdated
| if err != nil { | ||
| logger.Fatalw("Failed to get hostname for leader election", zap.Error(err)) | ||
| } | ||
| id = id + "_" + string(uuid.NewUUID()) |
There was a problem hiding this comment.
I think hoisting this little bit into a helper would document it in the most straightforward way. At first I read the comment above and was like: "That's not what os.Hostname() does...?" :)
| # giving up; 10 seconds is the value used by core kubernetes controllers. | ||
| renewDeadline: "10s" | ||
| # retryPeriod is how long the leader election client waits between tries of | ||
| # actions; 2 seconds is the value used by core kuberntes controllers. |
There was a problem hiding this comment.
| # actions; 2 seconds is the value used by core kuberntes controllers. | |
| # actions; 2 seconds is the value used by core kubernetes controllers. |
There was a problem hiding this comment.
This still seems to be a problem
cmd/autoscaler/main.go
Outdated
| logger.Fatalw("leaderelection lost") | ||
| }, | ||
| }, | ||
| // WatchDog: electionChecker, |
cmd/autoscaler/main.go
Outdated
| if err := eg.Wait(); err != nil && err != http.ErrServerClosed { | ||
| logger.Errorw("Error while running server", zap.Error(err)) | ||
| } | ||
| panic("unreachable") |
There was a problem hiding this comment.
I'd use logger.Fatal in place of panic
cmd/autoscaler/main.go
Outdated
| eg, egCtx := errgroup.WithContext(ctx) | ||
| eg.Go(func() error { | ||
| return customMetricsAdapter.Run(ctx.Done()) | ||
| leaderelection.RunOrDie(ctx, leaderelection.LeaderElectionConfig{ |
There was a problem hiding this comment.
IIUC we get all of this for free in the other binaries due to their use of sharedmain?
| }) | ||
| if err != nil { | ||
| logger.Fatalw("Failed to create admission controller", zap.Error(err)) | ||
| logger.Fatalw("error creating lock: %v", err) |
There was a problem hiding this comment.
Are we leaking resources on uninstall?
8eae52e to
9799bb1
Compare
9799bb1 to
a303651
Compare
|
/retest |
a303651 to
58a39e5
Compare
|
/retest |
9aef37a to
1e9ef61
Compare
knative-prow-robot
left a comment
There was a problem hiding this comment.
@vagababov: 1 warning.
Details
In response to this:
/lint
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
mattmoor
left a comment
There was a problem hiding this comment.
/lgtm
/approve
/hold
Mostly nits. If we want to get this baking, feel free to /unhold and follow-up, or I can restamp if you want to knock these off now.
cmd/webhook/main.go
Outdated
| // config validation constructors | ||
| tracingconfig "knative.dev/pkg/tracing/config" | ||
| defaultconfig "knative.dev/serving/pkg/apis/config" | ||
| configvalidation "knative.dev/serving/pkg/apis/config/validation" |
There was a problem hiding this comment.
This is sort of an odd place to put this. Can we make it knative.dev/serving/pkg/leaderelection?
There was a problem hiding this comment.
Note: the defaulting config is here because defaulting happens as part of the pkg/apis stuff
There was a problem hiding this comment.
That's fair. I had something similar in eventing that already went in, mind if we do this in a follow-up?
| @@ -0,0 +1,69 @@ | |||
| # Copyright 2018 The Knative Authors | |||
There was a problem hiding this comment.
| # Copyright 2018 The Knative Authors | |
| # Copyright 2020 The Knative Authors |
| resourceLock: "leases" | ||
| leaseDuration: "15s" | ||
| renewDeadline: "10s" | ||
| retryPeriod: "2s" |
There was a problem hiding this comment.
All of the above should come from defaults, right? If so we generally let the defaulting provide them so that folks can safely tweak the knobs via kubectl edit and not upset the three-way merge.
There was a problem hiding this comment.
Ack, I will do that in a follow-up.
| # giving up; 10 seconds is the value used by core kubernetes controllers. | ||
| renewDeadline: "10s" | ||
| # retryPeriod is how long the leader election client waits between tries of | ||
| # actions; 2 seconds is the value used by core kuberntes controllers. |
There was a problem hiding this comment.
This still seems to be a problem
|
|
||
| // ValidateLeaderElectionConfig enriches the leader election config validation | ||
| // with extra validations specific to serving. | ||
| func ValidateLeaderElectionConfig(configMap *corev1.ConfigMap) (*kle.Config, error) { |
There was a problem hiding this comment.
If we go with pkg/leaderelection (comment above) then we should rename ValidateConfig to avoid stutter.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mattmoor, pmorie The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/unhold |
|
/retest Looks like the failure in auto tls tests may be related to #7018 |
|
@pmorie: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
The following is the coverage report on the affected files.
|
|
/retest |
Fixes knative/pkg#1007
Proposed Changes
sharedmain, which is used by:/configis changed so that it is simple to enable leader election, but it is not enabled by defaultRelease Note