Skip to content

Fix issue preventing app from being restarted if /proc/[pid]/task/[tid]/children is missing in container (case of Minikube with VirtualBox or KVM drivers)#6690

Merged
openshift-merge-robot merged 10 commits intoredhat-developer:mainfrom
rm3l:6263-odo-dev-application-not-restarted-if-proc-pid-task-tid-children-file-is-missing-in-container
Apr 3, 2023
Merged

Conversation

@rm3l
Copy link
Member

@rm3l rm3l commented Mar 27, 2023

What type of PR is this:
/kind bug
/area dev

What does this PR do / why we need it:

Which issue(s) this PR fixes:
As depicted in #6263, this issue might happen if the host kernel is not configured with the CONFIG_PROC_CHILDREN config, which is currently the case with some platforms (like Minikube and its VirtualBox or KVM2 drivers).
So this PR changes the way we determine a process children, by reading all /proc/*/stat files instead, so we can reliably stop them.

Fixes #6263

Co-authored-by: @valaparthvi
Co-authored-by: @feloy

PR acceptance criteria:

  • Unit test

  • Integration test

  • Documentation

How to test changes / Special notes to the reviewer:

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. Required by Prow. label Mar 27, 2023
@openshift-ci
Copy link

openshift-ci bot commented Mar 27, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added kind/bug Categorizes issue or PR as related to a bug. area/dev Issues or PRs related to `odo dev` labels Mar 27, 2023
@netlify
Copy link

netlify bot commented Mar 27, 2023

Deploy Preview for odo-docusaurus-preview ready!

Name Link
🔨 Latest commit 4e133b1
🔍 Latest deploy log https://app.netlify.com/sites/odo-docusaurus-preview/deploys/642708b1b94b1f0008fe33d4
😎 Deploy Preview https://deploy-preview-6690--odo-docusaurus-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@rm3l rm3l changed the title [WIP] Get the process children by reading the "/proc/*/stat" files instead of relying on "/proc/*/task/*/children" [WIP] Get the process children by reading /proc/*/stat files instead of /proc/*/task/*/children Mar 27, 2023
@rm3l rm3l changed the title [WIP] Get the process children by reading /proc/*/stat files instead of /proc/*/task/*/children [WIP] Get the process children by reading /proc/*/stat instead of /proc/*/task/*/children Mar 27, 2023
@rm3l rm3l changed the title [WIP] Get the process children by reading /proc/*/stat instead of /proc/*/task/*/children [WIP] Get the process children by reading /proc/[pid]/stat instead of /proc/[pid]/task/[tid]/children Mar 27, 2023
@odo-robot
Copy link

odo-robot bot commented Mar 27, 2023

OpenShift Unauthenticated Tests on commit 8dcf6fd finished successfully.
View logs: TXT HTML

@odo-robot
Copy link

odo-robot bot commented Mar 27, 2023

NoCluster Tests on commit 8dcf6fd finished successfully.
View logs: TXT HTML

@odo-robot
Copy link

odo-robot bot commented Mar 27, 2023

Validate Tests on commit 8dcf6fd finished successfully.
View logs: TXT HTML

@odo-robot
Copy link

odo-robot bot commented Mar 27, 2023

Unit Tests on commit 8dcf6fd finished successfully.
View logs: TXT HTML

@odo-robot
Copy link

odo-robot bot commented Mar 27, 2023

Kubernetes Tests on commit 8dcf6fd finished successfully.
View logs: TXT HTML

@odo-robot
Copy link

odo-robot bot commented Mar 27, 2023

Windows Tests (OCP) on commit 8dcf6fd finished with errors.
View logs: TXT HTML

@odo-robot
Copy link

odo-robot bot commented Mar 27, 2023

OpenShift Tests on commit 8dcf6fd finished with errors.
View logs: TXT HTML

@odo-robot
Copy link

odo-robot bot commented Mar 27, 2023

Kubernetes Docs Tests on commit 235539d finished successfully.
View logs: TXT HTML

@rm3l rm3l changed the title [WIP] Get the process children by reading /proc/[pid]/stat instead of /proc/[pid]/task/[tid]/children [WIP] Fix issue preventing app from being restarted if /proc/[pid]/task/[tid]/children is missing Mar 27, 2023
@rm3l rm3l changed the title [WIP] Fix issue preventing app from being restarted if /proc/[pid]/task/[tid]/children is missing Fix issue preventing app from being restarted if /proc/[pid]/task/[tid]/children is missing Mar 28, 2023
@rm3l rm3l marked this pull request as ready for review March 28, 2023 09:54
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. Required by Prow. label Mar 28, 2023
@openshift-ci openshift-ci bot requested review from anandrkskd and feloy March 28, 2023 09:54
@rm3l rm3l requested review from valaparthvi and removed request for anandrkskd March 28, 2023 09:55
@rm3l rm3l requested a review from valaparthvi March 28, 2023 15:47
@rm3l rm3l force-pushed the 6263-odo-dev-application-not-restarted-if-proc-pid-task-tid-children-file-is-missing-in-container branch from 549905b to 4e133b1 Compare March 31, 2023 16:22
@valaparthvi
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. Required by Prow. label Apr 3, 2023
@valaparthvi valaparthvi closed this Apr 3, 2023
@valaparthvi valaparthvi reopened this Apr 3, 2023
@rm3l
Copy link
Member Author

rm3l commented Apr 3, 2023

  [FAILED] Expected
      <*url.Error | 0xc00044aea0>: {
          Op: "Post",
          URL: "http://127.0.0.1:54165/api/newuser",
          Err: <*errors.errorString | 0xc00008a130>{s: "EOF"},
      }
  to be nil
  In [It] at: C:/Users/Administrator.ANSIBLE-TEST-VS/3579/tests/e2escenarios/e2e_test.go:306 @ 04/03/23 02:14:35.765

  There were additional failures detected.  To view them in detail run ginkgo -vv
------------------------------

Summarizing 1 Failure:
  [FAIL] E2E Test starting with non-empty Directory add Binding [It] should verify developer workflow of using binding as env in innerloop
  C:/Users/Administrator.ANSIBLE-TEST-VS/3579/tests/e2escenarios/e2e_test.go:306

Ran 7 of 7 Specs in 208.074 seconds
FAIL! -- 6 Passed | 1 Failed | 0 Pending | 0 Skipped

Flaky E2E test (#6582)

/override windows-integration-test/Windows-test

@openshift-ci
Copy link

openshift-ci bot commented Apr 3, 2023

@rm3l: Overrode contexts on behalf of rm3l: windows-integration-test/Windows-test

Details

In response to this:

 [FAILED] Expected
     <*url.Error | 0xc00044aea0>: {
         Op: "Post",
         URL: "http://127.0.0.1:54165/api/newuser",
         Err: <*errors.errorString | 0xc00008a130>{s: "EOF"},
     }
 to be nil
 In [It] at: C:/Users/Administrator.ANSIBLE-TEST-VS/3579/tests/e2escenarios/e2e_test.go:306 @ 04/03/23 02:14:35.765

 There were additional failures detected.  To view them in detail run ginkgo -vv
------------------------------

Summarizing 1 Failure:
 [FAIL] E2E Test starting with non-empty Directory add Binding [It] should verify developer workflow of using binding as env in innerloop
 C:/Users/Administrator.ANSIBLE-TEST-VS/3579/tests/e2escenarios/e2e_test.go:306

Ran 7 of 7 Specs in 208.074 seconds
FAIL! -- 6 Passed | 1 Failed | 0 Pending | 0 Skipped

Flaky E2E test (#6582)

/override windows-integration-test/Windows-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rm3l rm3l changed the title Fix issue preventing app from being restarted if /proc/[pid]/task/[tid]/children is missing Fix issue preventing app from being restarted on Minikube Apr 3, 2023
@rm3l rm3l changed the title Fix issue preventing app from being restarted on Minikube Fix issue preventing app from being restarted if /proc/[pid]/task/[tid]/children is missing in container (case of Minikube with VirtualBox or KVM drivers) Apr 3, 2023
@rm3l rm3l closed this Apr 3, 2023
@rm3l rm3l reopened this Apr 3, 2023
@sonarqubecloud
Copy link

sonarqubecloud bot commented Apr 3, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@rm3l
Copy link
Member Author

rm3l commented Apr 3, 2023


  [FAILED] user-guides/quickstart/docs-mdx/java/java_odo_dev_output.mdx
  Expected
      <string>:   (
        	"""
        	... // 10 identical lines
        	✓  Added storage m2 to component
        	⚠  Pod is Pending
      - 	✓  Pod is Running
        	✓  Syncing files into the container [1s]
        	✓  Building your application in container (command: build) [1s]
        	... // 9 identical lines
        	[Ctrl+c] - Exit and delete resources from the cluster
        	[p] - Manually apply local changes to the application on the cluster
      + 	✓  Pod is Running
        	```
        	"""
        )
      
  to be empty
  In [It] at: /go/odo_1/tests/documentation/user-guides/doc_user_guides_quickstart_test.go:256 @ 04/03/23 08:12:14.443

Previous run passed - flaky test (to be addressed by #6545)

/override Kubernetes-Integration-Tests/Kubernetes-Docs-Integration-Tests

@openshift-ci
Copy link

openshift-ci bot commented Apr 3, 2023

@rm3l: Overrode contexts on behalf of rm3l: Kubernetes-Integration-Tests/Kubernetes-Docs-Integration-Tests

Details

In response to this:


 [FAILED] user-guides/quickstart/docs-mdx/java/java_odo_dev_output.mdx
 Expected
     <string>:   (
       	"""
       	... // 10 identical lines
       	✓  Added storage m2 to component
       	⚠  Pod is Pending
     - 	✓  Pod is Running
       	✓  Syncing files into the container [1s]
       	✓  Building your application in container (command: build) [1s]
       	... // 9 identical lines
       	[Ctrl+c] - Exit and delete resources from the cluster
       	[p] - Manually apply local changes to the application on the cluster
     + 	✓  Pod is Running
       	```
       	"""
       )
     
 to be empty
 In [It] at: /go/odo_1/tests/documentation/user-guides/doc_user_guides_quickstart_test.go:256 @ 04/03/23 08:12:14.443

Previous run passed - flaky test (to be addressed by #6545)

/override Kubernetes-Integration-Tests/Kubernetes-Docs-Integration-Tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rm3l
Copy link
Member Author

rm3l commented Apr 3, 2023

  [odo] I0403 08:34:07.924897    4424 delete.go:94] failed to delete resource "my-nodejs-app-cluster-sample-ocp" (binding.operators.coreos.com.v1alpha1.servicebindings): timeout while waiting for "my-nodejs-app-cluster-sample-ocp" resource to be deleted
  << Timeline

  [FAILED] Timed out after 180.000s.
  Expected process to exit.  It did not.
  In [AfterEach] at: /go/odo_1/tests/helper/helper_dev.go:200 @ 04/03/23 08:34:32.161

Summarizing 1 Failure:
  [FAIL] odo dev command tests when Starting a PostgreSQL service when creating local files and dir and running odo dev - with metadata.name [AfterEach] when deleting local files and dir and waiting for sync should not list deleted dir and file in container
  /go/odo_1/tests/helper/helper_dev.go:200

Ran 431 of 788 Specs in 1284.598 seconds
FAIL! -- 430 Passed | 1 Failed | 0 Pending | 357 Skipped

Previous run of the same test passed.

/override OpenShift-Integration-tests/OpenShift-Integration-tests

@openshift-ci
Copy link

openshift-ci bot commented Apr 3, 2023

@rm3l: Overrode contexts on behalf of rm3l: OpenShift-Integration-tests/OpenShift-Integration-tests

Details

In response to this:

 [odo] I0403 08:34:07.924897    4424 delete.go:94] failed to delete resource "my-nodejs-app-cluster-sample-ocp" (binding.operators.coreos.com.v1alpha1.servicebindings): timeout while waiting for "my-nodejs-app-cluster-sample-ocp" resource to be deleted
 << Timeline

 [FAILED] Timed out after 180.000s.
 Expected process to exit.  It did not.
 In [AfterEach] at: /go/odo_1/tests/helper/helper_dev.go:200 @ 04/03/23 08:34:32.161

Summarizing 1 Failure:
 [FAIL] odo dev command tests when Starting a PostgreSQL service when creating local files and dir and running odo dev - with metadata.name [AfterEach] when deleting local files and dir and waiting for sync should not list deleted dir and file in container
 /go/odo_1/tests/helper/helper_dev.go:200

Ran 431 of 788 Specs in 1284.598 seconds
FAIL! -- 430 Passed | 1 Failed | 0 Pending | 357 Skipped

Previous run of the same test passed.

/override OpenShift-Integration-tests/OpenShift-Integration-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rm3l
Copy link
Member Author

rm3l commented Apr 3, 2023

  [odo] I0403 08:34:07.924897    4424 delete.go:94] failed to delete resource "my-nodejs-app-cluster-sample-ocp" (binding.operators.coreos.com.v1alpha1.servicebindings): timeout while waiting for "my-nodejs-app-cluster-sample-ocp" resource to be deleted
...
Summarizing 1 Failure:
  [FAIL] odo dev command tests when Starting a PostgreSQL service [BeforeEach] when creating local files and dir and running odo dev - without metadata.name when deleting local files and dir and waiting for sync should not list deleted dir and file in container
  C:/Users/Administrator.ANSIBLE-TEST-VS/3580/tests/helper/helper_generic.go:58

Ran 424 of 788 Specs in 1064.047 seconds
FAIL! -- 423 Passed | 1 Failed | 0 Pending | 364 Skipped

Flaky test - previous run of the same test passed.

/override windows-integration-test/Windows-test

@openshift-ci
Copy link

openshift-ci bot commented Apr 3, 2023

@rm3l: Overrode contexts on behalf of rm3l: windows-integration-test/Windows-test

Details

In response to this:

 [odo] I0403 08:34:07.924897    4424 delete.go:94] failed to delete resource "my-nodejs-app-cluster-sample-ocp" (binding.operators.coreos.com.v1alpha1.servicebindings): timeout while waiting for "my-nodejs-app-cluster-sample-ocp" resource to be deleted
...
Summarizing 1 Failure:
 [FAIL] odo dev command tests when Starting a PostgreSQL service [BeforeEach] when creating local files and dir and running odo dev - without metadata.name when deleting local files and dir and waiting for sync should not list deleted dir and file in container
 C:/Users/Administrator.ANSIBLE-TEST-VS/3580/tests/helper/helper_generic.go:58

Ran 424 of 788 Specs in 1064.047 seconds
FAIL! -- 423 Passed | 1 Failed | 0 Pending | 364 Skipped

Flaky test - previous run of the same test passed.

/override windows-integration-test/Windows-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 25bb53f into redhat-developer:main Apr 3, 2023
@rm3l rm3l deleted the 6263-odo-dev-application-not-restarted-if-proc-pid-task-tid-children-file-is-missing-in-container branch April 3, 2023 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/dev Issues or PRs related to `odo dev` kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. Required by Prow.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

odo dev: Application not restarted on Minikube (with VirtualBox or KVM drivers)

4 participants