Skip to content

Disable RBAC risk due to performance instability#1609

Merged
viswajithiii merged 7 commits intomasterfrom
cgorman-disable-rbac-risk
May 10, 2022
Merged

Disable RBAC risk due to performance instability#1609
viswajithiii merged 7 commits intomasterfrom
cgorman-disable-rbac-risk

Conversation

@connorgorman
Copy link
Copy Markdown
Contributor

Description

12k roles, rolebindings, service accounts
11k deployments

Running for 13 minutes

goroutine 1 [chan receive, 13 minutes]:
process_cpu_throttled_time                                                       3448381546095

which is 57 minutes of throttling which explains some of the slowness.

Screen Shot 2022-05-06 at 10 38 30 AM

Screen Shot 2022-05-06 at 10 38 07 AM

Screen Shot 2022-05-06 at 10 38 19 AM

Checklist

  • Investigated and inspected CI test results
  • Unit test and regression tests added
  • Evaluated and added CHANGELOG entry if required
  • Determined and documented upgrade steps
  • Documented user facing changes (create PR based on stackrox/openshift-docs and merge into rhacs-docs)

If any of these don't apply, please comment below.

Testing Performed

CI should pass

Copy link
Copy Markdown
Contributor

@clickboo clickboo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a customer facing change - in the sense risk scoring and risk priority will change for a customer if they upgrade to a release with this PR right? We should add a changelog and some doc?

I believe this is a purely perf issue with our risk calc algorithm and how we process risk?

@connorgorman
Copy link
Copy Markdown
Contributor Author

This is a customer facing change - in the sense risk scoring and risk priority will change for a customer if they upgrade to a release with this PR right? We should add a changelog and some doc?

Yeah I will add a changelog if we think we want to make this change

@ghost
Copy link
Copy Markdown

ghost commented May 6, 2022

Tag for build #538637 is 3.70.x-66-g7b06944377.

💻 For deploying this image using the dev scripts, run the following first:

export MAIN_IMAGE_TAG='3.70.x-66-g7b06944377'

🕹️ A roxctl binary can be downloaded from the CircleCI artifacts.

@viswajithiii
Copy link
Copy Markdown
Contributor

@connorgorman Are you planning to merge this one or just testing?

@md2119
Copy link
Copy Markdown
Contributor

md2119 commented May 6, 2022

Can we release a patch only for the badly affected customers instead of dropping it entirely?

@connorgorman connorgorman changed the title Disable RBAC due to performance instability Disable RBAC risk due to performance instability May 9, 2022
@keyallis
Copy link
Copy Markdown
Contributor

keyallis commented May 9, 2022

From my prospective this seems like a change we should be making, and one of the bigger reasons it's been hard to justify is that we didn't have a concrete example of a customer being impacted negatively enough by rbac in risk calculations to the point where it should be removed.
I personally can't see any reasons why removing this would significantly impact customers and lead to the ask of having it added back, but if it seems appropriate to add a feature flag to toggle it's inclusion I would say the default state should be disabled.

@viswajithiii viswajithiii force-pushed the cgorman-disable-rbac-risk branch from ead4422 to 0b5139e Compare May 10, 2022 16:50
CHANGELOG.md Outdated
- Violation tags and process tags are deprecated, and will be removed in version 3.72.0.

- RBAC calculation is no longer included in Risk by default, since it was not performant. Users who want it included can set
the "INCLUDE_RBAC_IN_RISK" to "true" in the Central deployment spec.
Copy link
Copy Markdown
Contributor

@clickboo clickboo May 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think by default risk compute should be included, and the env var can be set to false if not required. I believe Kirsten had the same feedback. In her exact words:
"Also, it should not be disabled by default so as not to take aways current functionality"

@viswajithiii viswajithiii added this to the 3.70.0-rc.3 milestone May 10, 2022
- ROX-10018: The policy `OpenShift: Kubeadmin Secret Accessed` will no longer trigger if the request was from the default OpenShift `oauth-apiserver-sa` service account, because this is an expected access pattern for the OpenShift apiserver.
- Violation tags and process tags are deprecated, and will be removed in version 3.72.0.

- Users who do not want to include the RBAC factor in risk calculation can set
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want it to be documented at all?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it's fine to.

@viswajithiii viswajithiii force-pushed the cgorman-disable-rbac-risk branch from 54b443b to f7460bb Compare May 10, 2022 22:10
@viswajithiii viswajithiii enabled auto-merge (squash) May 10, 2022 22:36
@viswajithiii viswajithiii merged commit 12266f6 into master May 10, 2022
@viswajithiii viswajithiii deleted the cgorman-disable-rbac-risk branch May 10, 2022 22:58
parametalol pushed a commit that referenced this pull request May 11, 2022
Co-authored-by: Viswajith Venugopal <[email protected]>
(cherry picked from commit 12266f6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants