Skip to content

Install python package dependencies through setup.py.#39

Merged
pcmoritz merged 4 commits intomasterfrom
setup
Nov 13, 2016
Merged

Install python package dependencies through setup.py.#39
pcmoritz merged 4 commits intomasterfrom
setup

Conversation

@robertnishihara
Copy link
Collaborator

@robertnishihara robertnishihara commented Nov 13, 2016

Ideally, we would only need to pip install ray, and we wouldn't need to pip install the other Python packages. However, the following problems were encountered.

  1. It is difficult to tell setup.py to install a specific commit of cloudpickle. This seems to be against the spirit of setuptools. This is why we have people install cloudpickle by hand. Hopefully the cloudpickle package will eventually be updated on PyPI and we can just require a sufficiently recent version.
  2. giving setup.py the URL of the numbuf repository with something like
    setup(
      dependency_links=["https://github.com/ray-project/numbuf/tarball/d1974afbab9f0f1bcf8af15a8c476d868ad31aff#egg=numbuf"]
    )
    doesn't seem to work reliably.
  3. In numbuf's setup.py, we can set setup_requires=["numpy"], and this attempts to install numpy, but it doesn't appear to succeed.
  4. Including funcsigs in install_requires in setup.py fails on Travis with the error
    Processing funcsigs-1.0.2.tar.gz
    Writing /tmp/easy_install-7EOPah/funcsigs-1.0.2/setup.cfg
    Running funcsigs-1.0.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-7EOPah/funcsigs-1.0.2/egg-dist-tmp-q5IF8e
    error: Installed distribution setuptools 1.1.6 conflicts with requirement setuptools>=17.1
    

@pcmoritz pcmoritz merged commit 43aba4f into master Nov 13, 2016
@pcmoritz pcmoritz deleted the setup branch November 13, 2016 03:37
edoakes pushed a commit that referenced this pull request Feb 18, 2026
#61034)

Currently, there is a chance that a worker can crash on the `getenv`
syscall from the otel lazy initialization. We found the race is between
`setenv` on the user thread (`setenv(RBLN_DEVICES)`) and `getenv` on the
worker internal thread. However, we can't forbid `setenv` on a user's
thread; the only thing we can do is not call `getenv` once the user's
thread starts.

Here is the backtrace of the crash we found by intercepting the
`getenv`:

```
[getenv_preload] setenv name=RBLN_DEVICES value= overwrite=1
[getenv_preload] setenv backtrace:
  #0 /home/ray/getenv_trace_preload.so(setenv+0x73) [0x748a77ea870b]
  #1 ray::IDLE(+0x224d5b) [0x59f10aeead5b]
  #2 ray::IDLE(+0x13dfc3) [0x59f10ae03fc3]
  #3 ray::IDLE(_PyEval_EvalFrameDefault+0x313) [0x59f10adf3703]
  #4 ray::IDLE(+0x184bfd) [0x59f10ae4abfd]
  #5 ray::IDLE(+0x19da04) [0x59f10ae63a04]
  #6 ray::IDLE(_PyEval_EvalFrameDefault+0x115a) [0x59f10adf454a]
  #7 ray::IDLE(_PyFunction_Vectorcall+0x6c) [0x59f10ae03dfc]
  #8 ray::IDLE(_PyEval_EvalFrameDefault+0x49ae) [0x59f10adf7d9e]
  #9 ray::IDLE(_PyFunction_Vectorcall+0x6c) [0x59f10ae03dfc]
  #10 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x9a9333) [0x748a76270333]
  #11 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZNSt17_Function_handlerIFN3ray6StatusERKNS0_3rpc7AddressENS2_8TaskTypeESsRKNS0_4core11RayFunctionERKSt13unordered_mapISsdSt4hashISsESt8equal_toISsESaISt4pairIKSsdEEERKSt6vectorISt10shared_ptrINS0_9RayObjectEESaISQ_EERKSN_INS2_15ObjectReferenceESaISV_EERSH_S10_PSN_ISG_INS0_8ObjectIDESQ_ESaIS12_EES15_PSN_ISG_IS11_bESaIS16_EERSO_INS0_17LocalMemoryBufferEEPbPSsS1E_RKSN_INS0_16ConcurrencyGroupESaIS1F_EESsbbblRKSt8optionalISsEEPFS1_S5_S6_SsSA_SM_SU_SZ_SsSsS15_S15_S19_S1C_S1D_S1E_S1E_S1J_SsbbblS1L_EE9_M_invokeERKSt9_Any_dataS5_OS6_OSsSA_SM_SU_SZ_S10_S10_OS15_S1X_OS19_S1C_OS1D_OS1E_S20_S1J_S1W_ObS21_S21_OlS1N_+0x1ab) [0x748a761786ab]
  #12 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker11ExecuteTaskERKNS_17TaskSpecificationESt8optionalISt13unordered_mapISsSt6vectorISt4pairIldESaIS9_EESt4hashISsESt8equal_toISsESaIS8_IKSsSB_EEEEPS7_IS8_INS_8ObjectIDESt10shared_ptrINS_9RayObjectEEESaISP_EESS_PS7_IS8_ISL_bESaIST_EEPN6google8protobuf16RepeatedPtrFieldINS_3rpc20ObjectReferenceCountEEEPbPSsS15_+0x1166) [0x748a76320a96]
  #13 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZNSt17_Function_handlerIFN3ray6StatusERKNS0_17TaskSpecificationESt8optionalISt13unordered_mapISsSt6vectorISt4pairIldESaIS9_EESt4hashISsESt8equal_toISsESaIS8_IKSsSB_EEEEPS7_IS8_INS0_8ObjectIDESt10shared_ptrINS0_9RayObjectEEESaISP_EESS_PS7_IS8_ISL_bESaIST_EEPN6google8protobuf16RepeatedPtrFieldINS0_3rpc20ObjectReferenceCountEEEPbPSsS15_ESt5_BindIFMNS0_4core10CoreWorkerEFS1_S4_SK_SS_SS_SW_S13_S14_S15_S15_EPS19_St12_PlaceholderILi1EES1D_ILi2EES1D_ILi3EES1D_ILi4EES1D_ILi5EES1D_ILi6EES1D_ILi7EES1D_ILi8EES1D_ILi9EEEEE9_M_invokeERKSt9_Any_dataS4_OSK_OSS_S1U_OSW_OS13_OS14_OS15_S1Y_+0x87) [0x748a762e8647]
  #14 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb5186d) [0x748a7641886d]
  #15 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb557c5) [0x748a7641c7c5]
  #16 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x103e3eb) [0x748a769053eb]
  #17 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1034f0b) [0x748a768fbf0b]
  #18 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb6f21b) [0x748a7643621b]
  #19 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x15893cb) [0x748a76e503cb]
  #20 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158ad69) [0x748a76e51d69]
  #21 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158b472) [0x748a76e52472]
  #22 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker20RunTaskExecutionLoopEv+0x132) [0x748a762e4252]
  #23 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core21CoreWorkerProcessImpl26RunWorkerTaskExecutionLoopEv+0x41) [0x748a76336bd1]
  #24 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x8a45c1) [0x748a7616b5c1]
  #25 ray::IDLE(_PyEval_EvalFrameDefault+0x6fb) [0x59f10adf3aeb]
  #26 ray::IDLE(_PyFunction_Vectorcall+0x6c) [0x59f10ae03dfc]
  #27 ray::IDLE(_PyEval_EvalFrameDefault+0x6fb) [0x59f10adf3aeb]
  #28 ray::IDLE(+0x1d5cac) [0x59f10ae9bcac]
  #29 ray::IDLE(PyEval_EvalCode+0x85) [0x59f10ae9bbf5]
  #30 ray::IDLE(+0x20732a) [0x59f10aecd32a]
  #31 ray::IDLE(+0x201d13) [0x59f10aec7d13]
  #32 ray::IDLE(+0x976be) [0x59f10ad5d6be]
  #33 ray::IDLE(_PyRun_SimpleFileObject+0x1bb) [0x59f10aec23db]
  #34 ray::IDLE(_PyRun_AnyFileObject+0x44) [0x59f10aec1f74]
  #35 ray::IDLE(Py_RunMain+0x371) [0x59f10aebf3e1]
  #36 ray::IDLE(Py_BytesMain+0x37) [0x59f10ae8f447]
  #37 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x748a77baad90]
  #38 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x748a77baae40]
  #39 ray::IDLE(+0x1c930e) [0x59f10ae8f30e]

[getenv_preload] getenv name=OTEL_CPP_EXPORTER_OTLP_METRICS_RETRY_BACKOFF_MULTIPLIER
[getenv_preload] backtrace:
  #0 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x10a9d17) [0x7321ce3c9d17]
  #1 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x10abe2b) [0x7321ce3cbe2b]
  #2 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1050ffc) [0x7321ce370ffc]
  #3 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x104f4d7) [0x7321ce36f4d7]
  #4 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1045833) [0x7321ce365833]
  #5 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xa6c760) [0x7321cdd8c760]
  #6 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xe69d9a) [0x7321ce189d9a]
  #7 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray3rpc14ClientCallImplINS0_16HealthCheckReplyEE15OnReplyReceivedEv+0x165) [0x7321ce18c005]
  #8 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZNSt17_Function_handlerIFvvEZN3ray3rpc17ClientCallManager29PollEventsFromCompletionQueueEiEUlvE_E9_M_invokeERKSt9_Any_data+0x15) [0x7321cdd8e475]
  #9 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x103e3eb) [0x7321ce35e3eb]
  #10 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1034f0b) [0x7321ce354f0b]
  #11 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb6f21b) [0x7321cde8f21b]
  #12 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x15893cb) [0x7321ce8a93cb]
  #13 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158ad69) [0x7321ce8aad69]
  #14 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158b472) [0x7321ce8ab472]
  #15 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xa6bb54) [0x7321cdd8bb54]
  #16 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xba2250) [0x7321cdec2250]
  #17 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7321cf66eac3]
  #18 /lib/x86_64-linux-gnu/libc.so.6(+0x1268d0) [0x7321cf7008d0]

*** SIGSEGV received at time=1770862205 on cpu 1 ***
PC: @     0x748a77bc5c1d  (unknown)  getenv
    @     0x748a77bc3520  (unknown)  (unknown)
{"asctime":"2026-02-11 18:10:05,910","levelname":"E","message":"*** SIGSEGV received at time=1770862205 on cpu 1 ***","filename":"logging.cc","lineno":474}
{"asctime":"2026-02-11 18:10:05,910","levelname":"E","message":"PC: @     0x748a77bc5c1d  (unknown)  getenv","filename":"logging.cc","lineno":474}
{"asctime":"2026-02-11 18:10:05,910","levelname":"E","message":"    @     0x748a77bc3520  (unknown)  (unknown)","filename":"logging.cc","lineno":474}
Fatal Python error: Segmentation fault
```

According to the backtrace, we can identify that it is the
`OtlpGrpcMetricExporterOptions`, [which called
`getenv(OTEL_CPP_EXPORTER_OTLP_METRICS_RETRY_BACKOFF_MULTIPLIER)`](https://github.com/open-telemetry/opentelemetry-cpp/blob/13ad05a6f431efb76995cffb1225d26b45374749/exporters/otlp/src/otlp_grpc_metric_exporter_options.cc#L47),
getting initialized by calling `InitOpenTelemetryExporter` in the
`metrics_agent_client_->WaitForServerReady()` callback, that causes the
issue.

This PR moves `OtlpGrpcMetricExporterOptions` into
`OpenTelemetryMetricRecorder` (so that we keep otel details
encapsulated) and moves its initialization early to `stats::Init()`, to
force the `OtlpGrpcMetricExporterOptions` to be initialized early, so
that we don't call `getenv` afterward.

---------

Signed-off-by: Rueian Huang <[email protected]>
ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
ray-project#61034)

Currently, there is a chance that a worker can crash on the `getenv`
syscall from the otel lazy initialization. We found the race is between
`setenv` on the user thread (`setenv(RBLN_DEVICES)`) and `getenv` on the
worker internal thread. However, we can't forbid `setenv` on a user's
thread; the only thing we can do is not call `getenv` once the user's
thread starts.

Here is the backtrace of the crash we found by intercepting the
`getenv`:

```
[getenv_preload] setenv name=RBLN_DEVICES value= overwrite=1
[getenv_preload] setenv backtrace:
  #0 /home/ray/getenv_trace_preload.so(setenv+0x73) [0x748a77ea870b]
  ray-project#1 ray::IDLE(+0x224d5b) [0x59f10aeead5b]
  ray-project#2 ray::IDLE(+0x13dfc3) [0x59f10ae03fc3]
  ray-project#3 ray::IDLE(_PyEval_EvalFrameDefault+0x313) [0x59f10adf3703]
  ray-project#4 ray::IDLE(+0x184bfd) [0x59f10ae4abfd]
  ray-project#5 ray::IDLE(+0x19da04) [0x59f10ae63a04]
  ray-project#6 ray::IDLE(_PyEval_EvalFrameDefault+0x115a) [0x59f10adf454a]
  ray-project#7 ray::IDLE(_PyFunction_Vectorcall+0x6c) [0x59f10ae03dfc]
  ray-project#8 ray::IDLE(_PyEval_EvalFrameDefault+0x49ae) [0x59f10adf7d9e]
  ray-project#9 ray::IDLE(_PyFunction_Vectorcall+0x6c) [0x59f10ae03dfc]
  ray-project#10 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x9a9333) [0x748a76270333]
  ray-project#11 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZNSt17_Function_handlerIFN3ray6StatusERKNS0_3rpc7AddressENS2_8TaskTypeESsRKNS0_4core11RayFunctionERKSt13unordered_mapISsdSt4hashISsESt8equal_toISsESaISt4pairIKSsdEEERKSt6vectorISt10shared_ptrINS0_9RayObjectEESaISQ_EERKSN_INS2_15ObjectReferenceESaISV_EERSH_S10_PSN_ISG_INS0_8ObjectIDESQ_ESaIS12_EES15_PSN_ISG_IS11_bESaIS16_EERSO_INS0_17LocalMemoryBufferEEPbPSsS1E_RKSN_INS0_16ConcurrencyGroupESaIS1F_EESsbbblRKSt8optionalISsEEPFS1_S5_S6_SsSA_SM_SU_SZ_SsSsS15_S15_S19_S1C_S1D_S1E_S1E_S1J_SsbbblS1L_EE9_M_invokeERKSt9_Any_dataS5_OS6_OSsSA_SM_SU_SZ_S10_S10_OS15_S1X_OS19_S1C_OS1D_OS1E_S20_S1J_S1W_ObS21_S21_OlS1N_+0x1ab) [0x748a761786ab]
  ray-project#12 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker11ExecuteTaskERKNS_17TaskSpecificationESt8optionalISt13unordered_mapISsSt6vectorISt4pairIldESaIS9_EESt4hashISsESt8equal_toISsESaIS8_IKSsSB_EEEEPS7_IS8_INS_8ObjectIDESt10shared_ptrINS_9RayObjectEEESaISP_EESS_PS7_IS8_ISL_bESaIST_EEPN6google8protobuf16RepeatedPtrFieldINS_3rpc20ObjectReferenceCountEEEPbPSsS15_+0x1166) [0x748a76320a96]
  ray-project#13 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZNSt17_Function_handlerIFN3ray6StatusERKNS0_17TaskSpecificationESt8optionalISt13unordered_mapISsSt6vectorISt4pairIldESaIS9_EESt4hashISsESt8equal_toISsESaIS8_IKSsSB_EEEEPS7_IS8_INS0_8ObjectIDESt10shared_ptrINS0_9RayObjectEEESaISP_EESS_PS7_IS8_ISL_bESaIST_EEPN6google8protobuf16RepeatedPtrFieldINS0_3rpc20ObjectReferenceCountEEEPbPSsS15_ESt5_BindIFMNS0_4core10CoreWorkerEFS1_S4_SK_SS_SS_SW_S13_S14_S15_S15_EPS19_St12_PlaceholderILi1EES1D_ILi2EES1D_ILi3EES1D_ILi4EES1D_ILi5EES1D_ILi6EES1D_ILi7EES1D_ILi8EES1D_ILi9EEEEE9_M_invokeERKSt9_Any_dataS4_OSK_OSS_S1U_OSW_OS13_OS14_OS15_S1Y_+0x87) [0x748a762e8647]
  ray-project#14 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb5186d) [0x748a7641886d]
  ray-project#15 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb557c5) [0x748a7641c7c5]
  ray-project#16 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x103e3eb) [0x748a769053eb]
  ray-project#17 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1034f0b) [0x748a768fbf0b]
  ray-project#18 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb6f21b) [0x748a7643621b]
  ray-project#19 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x15893cb) [0x748a76e503cb]
  ray-project#20 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158ad69) [0x748a76e51d69]
  ray-project#21 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158b472) [0x748a76e52472]
  ray-project#22 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker20RunTaskExecutionLoopEv+0x132) [0x748a762e4252]
  ray-project#23 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core21CoreWorkerProcessImpl26RunWorkerTaskExecutionLoopEv+0x41) [0x748a76336bd1]
  ray-project#24 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x8a45c1) [0x748a7616b5c1]
  ray-project#25 ray::IDLE(_PyEval_EvalFrameDefault+0x6fb) [0x59f10adf3aeb]
  ray-project#26 ray::IDLE(_PyFunction_Vectorcall+0x6c) [0x59f10ae03dfc]
  ray-project#27 ray::IDLE(_PyEval_EvalFrameDefault+0x6fb) [0x59f10adf3aeb]
  ray-project#28 ray::IDLE(+0x1d5cac) [0x59f10ae9bcac]
  ray-project#29 ray::IDLE(PyEval_EvalCode+0x85) [0x59f10ae9bbf5]
  ray-project#30 ray::IDLE(+0x20732a) [0x59f10aecd32a]
  ray-project#31 ray::IDLE(+0x201d13) [0x59f10aec7d13]
  ray-project#32 ray::IDLE(+0x976be) [0x59f10ad5d6be]
  ray-project#33 ray::IDLE(_PyRun_SimpleFileObject+0x1bb) [0x59f10aec23db]
  ray-project#34 ray::IDLE(_PyRun_AnyFileObject+0x44) [0x59f10aec1f74]
  ray-project#35 ray::IDLE(Py_RunMain+0x371) [0x59f10aebf3e1]
  ray-project#36 ray::IDLE(Py_BytesMain+0x37) [0x59f10ae8f447]
  ray-project#37 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x748a77baad90]
  ray-project#38 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x748a77baae40]
  ray-project#39 ray::IDLE(+0x1c930e) [0x59f10ae8f30e]

[getenv_preload] getenv name=OTEL_CPP_EXPORTER_OTLP_METRICS_RETRY_BACKOFF_MULTIPLIER
[getenv_preload] backtrace:
  #0 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x10a9d17) [0x7321ce3c9d17]
  ray-project#1 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x10abe2b) [0x7321ce3cbe2b]
  ray-project#2 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1050ffc) [0x7321ce370ffc]
  ray-project#3 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x104f4d7) [0x7321ce36f4d7]
  ray-project#4 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1045833) [0x7321ce365833]
  ray-project#5 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xa6c760) [0x7321cdd8c760]
  ray-project#6 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xe69d9a) [0x7321ce189d9a]
  ray-project#7 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray3rpc14ClientCallImplINS0_16HealthCheckReplyEE15OnReplyReceivedEv+0x165) [0x7321ce18c005]
  ray-project#8 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZNSt17_Function_handlerIFvvEZN3ray3rpc17ClientCallManager29PollEventsFromCompletionQueueEiEUlvE_E9_M_invokeERKSt9_Any_data+0x15) [0x7321cdd8e475]
  ray-project#9 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x103e3eb) [0x7321ce35e3eb]
  ray-project#10 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1034f0b) [0x7321ce354f0b]
  ray-project#11 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb6f21b) [0x7321cde8f21b]
  ray-project#12 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x15893cb) [0x7321ce8a93cb]
  ray-project#13 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158ad69) [0x7321ce8aad69]
  ray-project#14 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158b472) [0x7321ce8ab472]
  ray-project#15 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xa6bb54) [0x7321cdd8bb54]
  ray-project#16 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xba2250) [0x7321cdec2250]
  ray-project#17 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7321cf66eac3]
  ray-project#18 /lib/x86_64-linux-gnu/libc.so.6(+0x1268d0) [0x7321cf7008d0]

*** SIGSEGV received at time=1770862205 on cpu 1 ***
PC: @     0x748a77bc5c1d  (unknown)  getenv
    @     0x748a77bc3520  (unknown)  (unknown)
{"asctime":"2026-02-11 18:10:05,910","levelname":"E","message":"*** SIGSEGV received at time=1770862205 on cpu 1 ***","filename":"logging.cc","lineno":474}
{"asctime":"2026-02-11 18:10:05,910","levelname":"E","message":"PC: @     0x748a77bc5c1d  (unknown)  getenv","filename":"logging.cc","lineno":474}
{"asctime":"2026-02-11 18:10:05,910","levelname":"E","message":"    @     0x748a77bc3520  (unknown)  (unknown)","filename":"logging.cc","lineno":474}
Fatal Python error: Segmentation fault
```

According to the backtrace, we can identify that it is the
`OtlpGrpcMetricExporterOptions`, [which called
`getenv(OTEL_CPP_EXPORTER_OTLP_METRICS_RETRY_BACKOFF_MULTIPLIER)`](https://github.com/open-telemetry/opentelemetry-cpp/blob/13ad05a6f431efb76995cffb1225d26b45374749/exporters/otlp/src/otlp_grpc_metric_exporter_options.cc#L47),
getting initialized by calling `InitOpenTelemetryExporter` in the
`metrics_agent_client_->WaitForServerReady()` callback, that causes the
issue.

This PR moves `OtlpGrpcMetricExporterOptions` into
`OpenTelemetryMetricRecorder` (so that we keep otel details
encapsulated) and moves its initialization early to `stats::Init()`, to
force the `OtlpGrpcMetricExporterOptions` to be initialized early, so
that we don't call `getenv` afterward.

---------

Signed-off-by: Rueian Huang <[email protected]>
Signed-off-by: Adel Nour <[email protected]>
Aydin-ab pushed a commit to kunling-anyscale/ray that referenced this pull request Feb 20, 2026
ray-project#61034)

Currently, there is a chance that a worker can crash on the `getenv`
syscall from the otel lazy initialization. We found the race is between
`setenv` on the user thread (`setenv(RBLN_DEVICES)`) and `getenv` on the
worker internal thread. However, we can't forbid `setenv` on a user's
thread; the only thing we can do is not call `getenv` once the user's
thread starts.

Here is the backtrace of the crash we found by intercepting the
`getenv`:

```
[getenv_preload] setenv name=RBLN_DEVICES value= overwrite=1
[getenv_preload] setenv backtrace:
  #0 /home/ray/getenv_trace_preload.so(setenv+0x73) [0x748a77ea870b]
  ray-project#1 ray::IDLE(+0x224d5b) [0x59f10aeead5b]
  ray-project#2 ray::IDLE(+0x13dfc3) [0x59f10ae03fc3]
  ray-project#3 ray::IDLE(_PyEval_EvalFrameDefault+0x313) [0x59f10adf3703]
  ray-project#4 ray::IDLE(+0x184bfd) [0x59f10ae4abfd]
  ray-project#5 ray::IDLE(+0x19da04) [0x59f10ae63a04]
  ray-project#6 ray::IDLE(_PyEval_EvalFrameDefault+0x115a) [0x59f10adf454a]
  ray-project#7 ray::IDLE(_PyFunction_Vectorcall+0x6c) [0x59f10ae03dfc]
  ray-project#8 ray::IDLE(_PyEval_EvalFrameDefault+0x49ae) [0x59f10adf7d9e]
  ray-project#9 ray::IDLE(_PyFunction_Vectorcall+0x6c) [0x59f10ae03dfc]
  ray-project#10 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x9a9333) [0x748a76270333]
  ray-project#11 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZNSt17_Function_handlerIFN3ray6StatusERKNS0_3rpc7AddressENS2_8TaskTypeESsRKNS0_4core11RayFunctionERKSt13unordered_mapISsdSt4hashISsESt8equal_toISsESaISt4pairIKSsdEEERKSt6vectorISt10shared_ptrINS0_9RayObjectEESaISQ_EERKSN_INS2_15ObjectReferenceESaISV_EERSH_S10_PSN_ISG_INS0_8ObjectIDESQ_ESaIS12_EES15_PSN_ISG_IS11_bESaIS16_EERSO_INS0_17LocalMemoryBufferEEPbPSsS1E_RKSN_INS0_16ConcurrencyGroupESaIS1F_EESsbbblRKSt8optionalISsEEPFS1_S5_S6_SsSA_SM_SU_SZ_SsSsS15_S15_S19_S1C_S1D_S1E_S1E_S1J_SsbbblS1L_EE9_M_invokeERKSt9_Any_dataS5_OS6_OSsSA_SM_SU_SZ_S10_S10_OS15_S1X_OS19_S1C_OS1D_OS1E_S20_S1J_S1W_ObS21_S21_OlS1N_+0x1ab) [0x748a761786ab]
  ray-project#12 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker11ExecuteTaskERKNS_17TaskSpecificationESt8optionalISt13unordered_mapISsSt6vectorISt4pairIldESaIS9_EESt4hashISsESt8equal_toISsESaIS8_IKSsSB_EEEEPS7_IS8_INS_8ObjectIDESt10shared_ptrINS_9RayObjectEEESaISP_EESS_PS7_IS8_ISL_bESaIST_EEPN6google8protobuf16RepeatedPtrFieldINS_3rpc20ObjectReferenceCountEEEPbPSsS15_+0x1166) [0x748a76320a96]
  ray-project#13 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZNSt17_Function_handlerIFN3ray6StatusERKNS0_17TaskSpecificationESt8optionalISt13unordered_mapISsSt6vectorISt4pairIldESaIS9_EESt4hashISsESt8equal_toISsESaIS8_IKSsSB_EEEEPS7_IS8_INS0_8ObjectIDESt10shared_ptrINS0_9RayObjectEEESaISP_EESS_PS7_IS8_ISL_bESaIST_EEPN6google8protobuf16RepeatedPtrFieldINS0_3rpc20ObjectReferenceCountEEEPbPSsS15_ESt5_BindIFMNS0_4core10CoreWorkerEFS1_S4_SK_SS_SS_SW_S13_S14_S15_S15_EPS19_St12_PlaceholderILi1EES1D_ILi2EES1D_ILi3EES1D_ILi4EES1D_ILi5EES1D_ILi6EES1D_ILi7EES1D_ILi8EES1D_ILi9EEEEE9_M_invokeERKSt9_Any_dataS4_OSK_OSS_S1U_OSW_OS13_OS14_OS15_S1Y_+0x87) [0x748a762e8647]
  ray-project#14 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb5186d) [0x748a7641886d]
  ray-project#15 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb557c5) [0x748a7641c7c5]
  ray-project#16 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x103e3eb) [0x748a769053eb]
  ray-project#17 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1034f0b) [0x748a768fbf0b]
  ray-project#18 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb6f21b) [0x748a7643621b]
  ray-project#19 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x15893cb) [0x748a76e503cb]
  ray-project#20 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158ad69) [0x748a76e51d69]
  ray-project#21 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158b472) [0x748a76e52472]
  ray-project#22 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker20RunTaskExecutionLoopEv+0x132) [0x748a762e4252]
  ray-project#23 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core21CoreWorkerProcessImpl26RunWorkerTaskExecutionLoopEv+0x41) [0x748a76336bd1]
  ray-project#24 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x8a45c1) [0x748a7616b5c1]
  ray-project#25 ray::IDLE(_PyEval_EvalFrameDefault+0x6fb) [0x59f10adf3aeb]
  ray-project#26 ray::IDLE(_PyFunction_Vectorcall+0x6c) [0x59f10ae03dfc]
  ray-project#27 ray::IDLE(_PyEval_EvalFrameDefault+0x6fb) [0x59f10adf3aeb]
  ray-project#28 ray::IDLE(+0x1d5cac) [0x59f10ae9bcac]
  ray-project#29 ray::IDLE(PyEval_EvalCode+0x85) [0x59f10ae9bbf5]
  ray-project#30 ray::IDLE(+0x20732a) [0x59f10aecd32a]
  ray-project#31 ray::IDLE(+0x201d13) [0x59f10aec7d13]
  ray-project#32 ray::IDLE(+0x976be) [0x59f10ad5d6be]
  ray-project#33 ray::IDLE(_PyRun_SimpleFileObject+0x1bb) [0x59f10aec23db]
  ray-project#34 ray::IDLE(_PyRun_AnyFileObject+0x44) [0x59f10aec1f74]
  ray-project#35 ray::IDLE(Py_RunMain+0x371) [0x59f10aebf3e1]
  ray-project#36 ray::IDLE(Py_BytesMain+0x37) [0x59f10ae8f447]
  ray-project#37 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x748a77baad90]
  ray-project#38 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x748a77baae40]
  ray-project#39 ray::IDLE(+0x1c930e) [0x59f10ae8f30e]

[getenv_preload] getenv name=OTEL_CPP_EXPORTER_OTLP_METRICS_RETRY_BACKOFF_MULTIPLIER
[getenv_preload] backtrace:
  #0 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x10a9d17) [0x7321ce3c9d17]
  ray-project#1 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x10abe2b) [0x7321ce3cbe2b]
  ray-project#2 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1050ffc) [0x7321ce370ffc]
  ray-project#3 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x104f4d7) [0x7321ce36f4d7]
  ray-project#4 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1045833) [0x7321ce365833]
  ray-project#5 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xa6c760) [0x7321cdd8c760]
  ray-project#6 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xe69d9a) [0x7321ce189d9a]
  ray-project#7 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray3rpc14ClientCallImplINS0_16HealthCheckReplyEE15OnReplyReceivedEv+0x165) [0x7321ce18c005]
  ray-project#8 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZNSt17_Function_handlerIFvvEZN3ray3rpc17ClientCallManager29PollEventsFromCompletionQueueEiEUlvE_E9_M_invokeERKSt9_Any_data+0x15) [0x7321cdd8e475]
  ray-project#9 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x103e3eb) [0x7321ce35e3eb]
  ray-project#10 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1034f0b) [0x7321ce354f0b]
  ray-project#11 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb6f21b) [0x7321cde8f21b]
  ray-project#12 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x15893cb) [0x7321ce8a93cb]
  ray-project#13 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158ad69) [0x7321ce8aad69]
  ray-project#14 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158b472) [0x7321ce8ab472]
  ray-project#15 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xa6bb54) [0x7321cdd8bb54]
  ray-project#16 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xba2250) [0x7321cdec2250]
  ray-project#17 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7321cf66eac3]
  ray-project#18 /lib/x86_64-linux-gnu/libc.so.6(+0x1268d0) [0x7321cf7008d0]

*** SIGSEGV received at time=1770862205 on cpu 1 ***
PC: @     0x748a77bc5c1d  (unknown)  getenv
    @     0x748a77bc3520  (unknown)  (unknown)
{"asctime":"2026-02-11 18:10:05,910","levelname":"E","message":"*** SIGSEGV received at time=1770862205 on cpu 1 ***","filename":"logging.cc","lineno":474}
{"asctime":"2026-02-11 18:10:05,910","levelname":"E","message":"PC: @     0x748a77bc5c1d  (unknown)  getenv","filename":"logging.cc","lineno":474}
{"asctime":"2026-02-11 18:10:05,910","levelname":"E","message":"    @     0x748a77bc3520  (unknown)  (unknown)","filename":"logging.cc","lineno":474}
Fatal Python error: Segmentation fault
```

According to the backtrace, we can identify that it is the
`OtlpGrpcMetricExporterOptions`, [which called
`getenv(OTEL_CPP_EXPORTER_OTLP_METRICS_RETRY_BACKOFF_MULTIPLIER)`](https://github.com/open-telemetry/opentelemetry-cpp/blob/13ad05a6f431efb76995cffb1225d26b45374749/exporters/otlp/src/otlp_grpc_metric_exporter_options.cc#L47),
getting initialized by calling `InitOpenTelemetryExporter` in the
`metrics_agent_client_->WaitForServerReady()` callback, that causes the
issue.

This PR moves `OtlpGrpcMetricExporterOptions` into
`OpenTelemetryMetricRecorder` (so that we keep otel details
encapsulated) and moves its initialization early to `stats::Init()`, to
force the `OtlpGrpcMetricExporterOptions` to be initialized early, so
that we don't call `getenv` afterward.

---------

Signed-off-by: Rueian Huang <[email protected]>
limarkdcunha pushed a commit to limarkdcunha/ray that referenced this pull request Feb 20, 2026
ray-project#61034)

Currently, there is a chance that a worker can crash on the `getenv`
syscall from the otel lazy initialization. We found the race is between
`setenv` on the user thread (`setenv(RBLN_DEVICES)`) and `getenv` on the
worker internal thread. However, we can't forbid `setenv` on a user's
thread; the only thing we can do is not call `getenv` once the user's
thread starts.

Here is the backtrace of the crash we found by intercepting the
`getenv`:

```
[getenv_preload] setenv name=RBLN_DEVICES value= overwrite=1
[getenv_preload] setenv backtrace:
  #0 /home/ray/getenv_trace_preload.so(setenv+0x73) [0x748a77ea870b]
  ray-project#1 ray::IDLE(+0x224d5b) [0x59f10aeead5b]
  ray-project#2 ray::IDLE(+0x13dfc3) [0x59f10ae03fc3]
  ray-project#3 ray::IDLE(_PyEval_EvalFrameDefault+0x313) [0x59f10adf3703]
  ray-project#4 ray::IDLE(+0x184bfd) [0x59f10ae4abfd]
  ray-project#5 ray::IDLE(+0x19da04) [0x59f10ae63a04]
  ray-project#6 ray::IDLE(_PyEval_EvalFrameDefault+0x115a) [0x59f10adf454a]
  ray-project#7 ray::IDLE(_PyFunction_Vectorcall+0x6c) [0x59f10ae03dfc]
  ray-project#8 ray::IDLE(_PyEval_EvalFrameDefault+0x49ae) [0x59f10adf7d9e]
  ray-project#9 ray::IDLE(_PyFunction_Vectorcall+0x6c) [0x59f10ae03dfc]
  ray-project#10 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x9a9333) [0x748a76270333]
  ray-project#11 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZNSt17_Function_handlerIFN3ray6StatusERKNS0_3rpc7AddressENS2_8TaskTypeESsRKNS0_4core11RayFunctionERKSt13unordered_mapISsdSt4hashISsESt8equal_toISsESaISt4pairIKSsdEEERKSt6vectorISt10shared_ptrINS0_9RayObjectEESaISQ_EERKSN_INS2_15ObjectReferenceESaISV_EERSH_S10_PSN_ISG_INS0_8ObjectIDESQ_ESaIS12_EES15_PSN_ISG_IS11_bESaIS16_EERSO_INS0_17LocalMemoryBufferEEPbPSsS1E_RKSN_INS0_16ConcurrencyGroupESaIS1F_EESsbbblRKSt8optionalISsEEPFS1_S5_S6_SsSA_SM_SU_SZ_SsSsS15_S15_S19_S1C_S1D_S1E_S1E_S1J_SsbbblS1L_EE9_M_invokeERKSt9_Any_dataS5_OS6_OSsSA_SM_SU_SZ_S10_S10_OS15_S1X_OS19_S1C_OS1D_OS1E_S20_S1J_S1W_ObS21_S21_OlS1N_+0x1ab) [0x748a761786ab]
  ray-project#12 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker11ExecuteTaskERKNS_17TaskSpecificationESt8optionalISt13unordered_mapISsSt6vectorISt4pairIldESaIS9_EESt4hashISsESt8equal_toISsESaIS8_IKSsSB_EEEEPS7_IS8_INS_8ObjectIDESt10shared_ptrINS_9RayObjectEEESaISP_EESS_PS7_IS8_ISL_bESaIST_EEPN6google8protobuf16RepeatedPtrFieldINS_3rpc20ObjectReferenceCountEEEPbPSsS15_+0x1166) [0x748a76320a96]
  ray-project#13 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZNSt17_Function_handlerIFN3ray6StatusERKNS0_17TaskSpecificationESt8optionalISt13unordered_mapISsSt6vectorISt4pairIldESaIS9_EESt4hashISsESt8equal_toISsESaIS8_IKSsSB_EEEEPS7_IS8_INS0_8ObjectIDESt10shared_ptrINS0_9RayObjectEEESaISP_EESS_PS7_IS8_ISL_bESaIST_EEPN6google8protobuf16RepeatedPtrFieldINS0_3rpc20ObjectReferenceCountEEEPbPSsS15_ESt5_BindIFMNS0_4core10CoreWorkerEFS1_S4_SK_SS_SS_SW_S13_S14_S15_S15_EPS19_St12_PlaceholderILi1EES1D_ILi2EES1D_ILi3EES1D_ILi4EES1D_ILi5EES1D_ILi6EES1D_ILi7EES1D_ILi8EES1D_ILi9EEEEE9_M_invokeERKSt9_Any_dataS4_OSK_OSS_S1U_OSW_OS13_OS14_OS15_S1Y_+0x87) [0x748a762e8647]
  ray-project#14 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb5186d) [0x748a7641886d]
  ray-project#15 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb557c5) [0x748a7641c7c5]
  ray-project#16 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x103e3eb) [0x748a769053eb]
  ray-project#17 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1034f0b) [0x748a768fbf0b]
  ray-project#18 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb6f21b) [0x748a7643621b]
  ray-project#19 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x15893cb) [0x748a76e503cb]
  ray-project#20 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158ad69) [0x748a76e51d69]
  ray-project#21 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158b472) [0x748a76e52472]
  ray-project#22 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker20RunTaskExecutionLoopEv+0x132) [0x748a762e4252]
  ray-project#23 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core21CoreWorkerProcessImpl26RunWorkerTaskExecutionLoopEv+0x41) [0x748a76336bd1]
  ray-project#24 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x8a45c1) [0x748a7616b5c1]
  ray-project#25 ray::IDLE(_PyEval_EvalFrameDefault+0x6fb) [0x59f10adf3aeb]
  ray-project#26 ray::IDLE(_PyFunction_Vectorcall+0x6c) [0x59f10ae03dfc]
  ray-project#27 ray::IDLE(_PyEval_EvalFrameDefault+0x6fb) [0x59f10adf3aeb]
  ray-project#28 ray::IDLE(+0x1d5cac) [0x59f10ae9bcac]
  ray-project#29 ray::IDLE(PyEval_EvalCode+0x85) [0x59f10ae9bbf5]
  ray-project#30 ray::IDLE(+0x20732a) [0x59f10aecd32a]
  ray-project#31 ray::IDLE(+0x201d13) [0x59f10aec7d13]
  ray-project#32 ray::IDLE(+0x976be) [0x59f10ad5d6be]
  ray-project#33 ray::IDLE(_PyRun_SimpleFileObject+0x1bb) [0x59f10aec23db]
  ray-project#34 ray::IDLE(_PyRun_AnyFileObject+0x44) [0x59f10aec1f74]
  ray-project#35 ray::IDLE(Py_RunMain+0x371) [0x59f10aebf3e1]
  ray-project#36 ray::IDLE(Py_BytesMain+0x37) [0x59f10ae8f447]
  ray-project#37 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x748a77baad90]
  ray-project#38 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x748a77baae40]
  ray-project#39 ray::IDLE(+0x1c930e) [0x59f10ae8f30e]

[getenv_preload] getenv name=OTEL_CPP_EXPORTER_OTLP_METRICS_RETRY_BACKOFF_MULTIPLIER
[getenv_preload] backtrace:
  #0 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x10a9d17) [0x7321ce3c9d17]
  ray-project#1 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x10abe2b) [0x7321ce3cbe2b]
  ray-project#2 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1050ffc) [0x7321ce370ffc]
  ray-project#3 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x104f4d7) [0x7321ce36f4d7]
  ray-project#4 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1045833) [0x7321ce365833]
  ray-project#5 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xa6c760) [0x7321cdd8c760]
  ray-project#6 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xe69d9a) [0x7321ce189d9a]
  ray-project#7 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray3rpc14ClientCallImplINS0_16HealthCheckReplyEE15OnReplyReceivedEv+0x165) [0x7321ce18c005]
  ray-project#8 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZNSt17_Function_handlerIFvvEZN3ray3rpc17ClientCallManager29PollEventsFromCompletionQueueEiEUlvE_E9_M_invokeERKSt9_Any_data+0x15) [0x7321cdd8e475]
  ray-project#9 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x103e3eb) [0x7321ce35e3eb]
  ray-project#10 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1034f0b) [0x7321ce354f0b]
  ray-project#11 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb6f21b) [0x7321cde8f21b]
  ray-project#12 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x15893cb) [0x7321ce8a93cb]
  ray-project#13 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158ad69) [0x7321ce8aad69]
  ray-project#14 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158b472) [0x7321ce8ab472]
  ray-project#15 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xa6bb54) [0x7321cdd8bb54]
  ray-project#16 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xba2250) [0x7321cdec2250]
  ray-project#17 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7321cf66eac3]
  ray-project#18 /lib/x86_64-linux-gnu/libc.so.6(+0x1268d0) [0x7321cf7008d0]

*** SIGSEGV received at time=1770862205 on cpu 1 ***
PC: @     0x748a77bc5c1d  (unknown)  getenv
    @     0x748a77bc3520  (unknown)  (unknown)
{"asctime":"2026-02-11 18:10:05,910","levelname":"E","message":"*** SIGSEGV received at time=1770862205 on cpu 1 ***","filename":"logging.cc","lineno":474}
{"asctime":"2026-02-11 18:10:05,910","levelname":"E","message":"PC: @     0x748a77bc5c1d  (unknown)  getenv","filename":"logging.cc","lineno":474}
{"asctime":"2026-02-11 18:10:05,910","levelname":"E","message":"    @     0x748a77bc3520  (unknown)  (unknown)","filename":"logging.cc","lineno":474}
Fatal Python error: Segmentation fault
```

According to the backtrace, we can identify that it is the
`OtlpGrpcMetricExporterOptions`, [which called
`getenv(OTEL_CPP_EXPORTER_OTLP_METRICS_RETRY_BACKOFF_MULTIPLIER)`](https://github.com/open-telemetry/opentelemetry-cpp/blob/13ad05a6f431efb76995cffb1225d26b45374749/exporters/otlp/src/otlp_grpc_metric_exporter_options.cc#L47),
getting initialized by calling `InitOpenTelemetryExporter` in the
`metrics_agent_client_->WaitForServerReady()` callback, that causes the
issue.

This PR moves `OtlpGrpcMetricExporterOptions` into
`OpenTelemetryMetricRecorder` (so that we keep otel details
encapsulated) and moves its initialization early to `stats::Init()`, to
force the `OtlpGrpcMetricExporterOptions` to be initialized early, so
that we don't call `getenv` afterward.

---------

Signed-off-by: Rueian Huang <[email protected]>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
ray-project#61034)

Currently, there is a chance that a worker can crash on the `getenv`
syscall from the otel lazy initialization. We found the race is between
`setenv` on the user thread (`setenv(RBLN_DEVICES)`) and `getenv` on the
worker internal thread. However, we can't forbid `setenv` on a user's
thread; the only thing we can do is not call `getenv` once the user's
thread starts.

Here is the backtrace of the crash we found by intercepting the
`getenv`:

```
[getenv_preload] setenv name=RBLN_DEVICES value= overwrite=1
[getenv_preload] setenv backtrace:
  #0 /home/ray/getenv_trace_preload.so(setenv+0x73) [0x748a77ea870b]
  ray-project#1 ray::IDLE(+0x224d5b) [0x59f10aeead5b]
  ray-project#2 ray::IDLE(+0x13dfc3) [0x59f10ae03fc3]
  ray-project#3 ray::IDLE(_PyEval_EvalFrameDefault+0x313) [0x59f10adf3703]
  ray-project#4 ray::IDLE(+0x184bfd) [0x59f10ae4abfd]
  ray-project#5 ray::IDLE(+0x19da04) [0x59f10ae63a04]
  ray-project#6 ray::IDLE(_PyEval_EvalFrameDefault+0x115a) [0x59f10adf454a]
  ray-project#7 ray::IDLE(_PyFunction_Vectorcall+0x6c) [0x59f10ae03dfc]
  ray-project#8 ray::IDLE(_PyEval_EvalFrameDefault+0x49ae) [0x59f10adf7d9e]
  ray-project#9 ray::IDLE(_PyFunction_Vectorcall+0x6c) [0x59f10ae03dfc]
  ray-project#10 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x9a9333) [0x748a76270333]
  ray-project#11 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZNSt17_Function_handlerIFN3ray6StatusERKNS0_3rpc7AddressENS2_8TaskTypeESsRKNS0_4core11RayFunctionERKSt13unordered_mapISsdSt4hashISsESt8equal_toISsESaISt4pairIKSsdEEERKSt6vectorISt10shared_ptrINS0_9RayObjectEESaISQ_EERKSN_INS2_15ObjectReferenceESaISV_EERSH_S10_PSN_ISG_INS0_8ObjectIDESQ_ESaIS12_EES15_PSN_ISG_IS11_bESaIS16_EERSO_INS0_17LocalMemoryBufferEEPbPSsS1E_RKSN_INS0_16ConcurrencyGroupESaIS1F_EESsbbblRKSt8optionalISsEEPFS1_S5_S6_SsSA_SM_SU_SZ_SsSsS15_S15_S19_S1C_S1D_S1E_S1E_S1J_SsbbblS1L_EE9_M_invokeERKSt9_Any_dataS5_OS6_OSsSA_SM_SU_SZ_S10_S10_OS15_S1X_OS19_S1C_OS1D_OS1E_S20_S1J_S1W_ObS21_S21_OlS1N_+0x1ab) [0x748a761786ab]
  ray-project#12 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker11ExecuteTaskERKNS_17TaskSpecificationESt8optionalISt13unordered_mapISsSt6vectorISt4pairIldESaIS9_EESt4hashISsESt8equal_toISsESaIS8_IKSsSB_EEEEPS7_IS8_INS_8ObjectIDESt10shared_ptrINS_9RayObjectEEESaISP_EESS_PS7_IS8_ISL_bESaIST_EEPN6google8protobuf16RepeatedPtrFieldINS_3rpc20ObjectReferenceCountEEEPbPSsS15_+0x1166) [0x748a76320a96]
  ray-project#13 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZNSt17_Function_handlerIFN3ray6StatusERKNS0_17TaskSpecificationESt8optionalISt13unordered_mapISsSt6vectorISt4pairIldESaIS9_EESt4hashISsESt8equal_toISsESaIS8_IKSsSB_EEEEPS7_IS8_INS0_8ObjectIDESt10shared_ptrINS0_9RayObjectEEESaISP_EESS_PS7_IS8_ISL_bESaIST_EEPN6google8protobuf16RepeatedPtrFieldINS0_3rpc20ObjectReferenceCountEEEPbPSsS15_ESt5_BindIFMNS0_4core10CoreWorkerEFS1_S4_SK_SS_SS_SW_S13_S14_S15_S15_EPS19_St12_PlaceholderILi1EES1D_ILi2EES1D_ILi3EES1D_ILi4EES1D_ILi5EES1D_ILi6EES1D_ILi7EES1D_ILi8EES1D_ILi9EEEEE9_M_invokeERKSt9_Any_dataS4_OSK_OSS_S1U_OSW_OS13_OS14_OS15_S1Y_+0x87) [0x748a762e8647]
  ray-project#14 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb5186d) [0x748a7641886d]
  ray-project#15 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb557c5) [0x748a7641c7c5]
  ray-project#16 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x103e3eb) [0x748a769053eb]
  ray-project#17 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1034f0b) [0x748a768fbf0b]
  ray-project#18 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb6f21b) [0x748a7643621b]
  ray-project#19 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x15893cb) [0x748a76e503cb]
  ray-project#20 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158ad69) [0x748a76e51d69]
  ray-project#21 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158b472) [0x748a76e52472]
  ray-project#22 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker20RunTaskExecutionLoopEv+0x132) [0x748a762e4252]
  ray-project#23 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core21CoreWorkerProcessImpl26RunWorkerTaskExecutionLoopEv+0x41) [0x748a76336bd1]
  ray-project#24 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x8a45c1) [0x748a7616b5c1]
  ray-project#25 ray::IDLE(_PyEval_EvalFrameDefault+0x6fb) [0x59f10adf3aeb]
  ray-project#26 ray::IDLE(_PyFunction_Vectorcall+0x6c) [0x59f10ae03dfc]
  ray-project#27 ray::IDLE(_PyEval_EvalFrameDefault+0x6fb) [0x59f10adf3aeb]
  ray-project#28 ray::IDLE(+0x1d5cac) [0x59f10ae9bcac]
  ray-project#29 ray::IDLE(PyEval_EvalCode+0x85) [0x59f10ae9bbf5]
  ray-project#30 ray::IDLE(+0x20732a) [0x59f10aecd32a]
  ray-project#31 ray::IDLE(+0x201d13) [0x59f10aec7d13]
  ray-project#32 ray::IDLE(+0x976be) [0x59f10ad5d6be]
  ray-project#33 ray::IDLE(_PyRun_SimpleFileObject+0x1bb) [0x59f10aec23db]
  ray-project#34 ray::IDLE(_PyRun_AnyFileObject+0x44) [0x59f10aec1f74]
  ray-project#35 ray::IDLE(Py_RunMain+0x371) [0x59f10aebf3e1]
  ray-project#36 ray::IDLE(Py_BytesMain+0x37) [0x59f10ae8f447]
  ray-project#37 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x748a77baad90]
  ray-project#38 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x748a77baae40]
  ray-project#39 ray::IDLE(+0x1c930e) [0x59f10ae8f30e]

[getenv_preload] getenv name=OTEL_CPP_EXPORTER_OTLP_METRICS_RETRY_BACKOFF_MULTIPLIER
[getenv_preload] backtrace:
  #0 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x10a9d17) [0x7321ce3c9d17]
  ray-project#1 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x10abe2b) [0x7321ce3cbe2b]
  ray-project#2 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1050ffc) [0x7321ce370ffc]
  ray-project#3 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x104f4d7) [0x7321ce36f4d7]
  ray-project#4 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1045833) [0x7321ce365833]
  ray-project#5 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xa6c760) [0x7321cdd8c760]
  ray-project#6 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xe69d9a) [0x7321ce189d9a]
  ray-project#7 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray3rpc14ClientCallImplINS0_16HealthCheckReplyEE15OnReplyReceivedEv+0x165) [0x7321ce18c005]
  ray-project#8 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZNSt17_Function_handlerIFvvEZN3ray3rpc17ClientCallManager29PollEventsFromCompletionQueueEiEUlvE_E9_M_invokeERKSt9_Any_data+0x15) [0x7321cdd8e475]
  ray-project#9 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x103e3eb) [0x7321ce35e3eb]
  ray-project#10 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1034f0b) [0x7321ce354f0b]
  ray-project#11 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb6f21b) [0x7321cde8f21b]
  ray-project#12 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x15893cb) [0x7321ce8a93cb]
  ray-project#13 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158ad69) [0x7321ce8aad69]
  ray-project#14 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158b472) [0x7321ce8ab472]
  ray-project#15 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xa6bb54) [0x7321cdd8bb54]
  ray-project#16 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xba2250) [0x7321cdec2250]
  ray-project#17 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7321cf66eac3]
  ray-project#18 /lib/x86_64-linux-gnu/libc.so.6(+0x1268d0) [0x7321cf7008d0]

*** SIGSEGV received at time=1770862205 on cpu 1 ***
PC: @     0x748a77bc5c1d  (unknown)  getenv
    @     0x748a77bc3520  (unknown)  (unknown)
{"asctime":"2026-02-11 18:10:05,910","levelname":"E","message":"*** SIGSEGV received at time=1770862205 on cpu 1 ***","filename":"logging.cc","lineno":474}
{"asctime":"2026-02-11 18:10:05,910","levelname":"E","message":"PC: @     0x748a77bc5c1d  (unknown)  getenv","filename":"logging.cc","lineno":474}
{"asctime":"2026-02-11 18:10:05,910","levelname":"E","message":"    @     0x748a77bc3520  (unknown)  (unknown)","filename":"logging.cc","lineno":474}
Fatal Python error: Segmentation fault
```

According to the backtrace, we can identify that it is the
`OtlpGrpcMetricExporterOptions`, [which called
`getenv(OTEL_CPP_EXPORTER_OTLP_METRICS_RETRY_BACKOFF_MULTIPLIER)`](https://github.com/open-telemetry/opentelemetry-cpp/blob/13ad05a6f431efb76995cffb1225d26b45374749/exporters/otlp/src/otlp_grpc_metric_exporter_options.cc#L47),
getting initialized by calling `InitOpenTelemetryExporter` in the
`metrics_agent_client_->WaitForServerReady()` callback, that causes the
issue.

This PR moves `OtlpGrpcMetricExporterOptions` into
`OpenTelemetryMetricRecorder` (so that we keep otel details
encapsulated) and moves its initialization early to `stats::Init()`, to
force the `OtlpGrpcMetricExporterOptions` to be initialized early, so
that we don't call `getenv` afterward.

---------

Signed-off-by: Rueian Huang <[email protected]>
Signed-off-by: peterxcli <[email protected]>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
ray-project#61034)

Currently, there is a chance that a worker can crash on the `getenv`
syscall from the otel lazy initialization. We found the race is between
`setenv` on the user thread (`setenv(RBLN_DEVICES)`) and `getenv` on the
worker internal thread. However, we can't forbid `setenv` on a user's
thread; the only thing we can do is not call `getenv` once the user's
thread starts.

Here is the backtrace of the crash we found by intercepting the
`getenv`:

```
[getenv_preload] setenv name=RBLN_DEVICES value= overwrite=1
[getenv_preload] setenv backtrace:
  #0 /home/ray/getenv_trace_preload.so(setenv+0x73) [0x748a77ea870b]
  ray-project#1 ray::IDLE(+0x224d5b) [0x59f10aeead5b]
  ray-project#2 ray::IDLE(+0x13dfc3) [0x59f10ae03fc3]
  ray-project#3 ray::IDLE(_PyEval_EvalFrameDefault+0x313) [0x59f10adf3703]
  ray-project#4 ray::IDLE(+0x184bfd) [0x59f10ae4abfd]
  ray-project#5 ray::IDLE(+0x19da04) [0x59f10ae63a04]
  ray-project#6 ray::IDLE(_PyEval_EvalFrameDefault+0x115a) [0x59f10adf454a]
  ray-project#7 ray::IDLE(_PyFunction_Vectorcall+0x6c) [0x59f10ae03dfc]
  ray-project#8 ray::IDLE(_PyEval_EvalFrameDefault+0x49ae) [0x59f10adf7d9e]
  ray-project#9 ray::IDLE(_PyFunction_Vectorcall+0x6c) [0x59f10ae03dfc]
  ray-project#10 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x9a9333) [0x748a76270333]
  ray-project#11 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZNSt17_Function_handlerIFN3ray6StatusERKNS0_3rpc7AddressENS2_8TaskTypeESsRKNS0_4core11RayFunctionERKSt13unordered_mapISsdSt4hashISsESt8equal_toISsESaISt4pairIKSsdEEERKSt6vectorISt10shared_ptrINS0_9RayObjectEESaISQ_EERKSN_INS2_15ObjectReferenceESaISV_EERSH_S10_PSN_ISG_INS0_8ObjectIDESQ_ESaIS12_EES15_PSN_ISG_IS11_bESaIS16_EERSO_INS0_17LocalMemoryBufferEEPbPSsS1E_RKSN_INS0_16ConcurrencyGroupESaIS1F_EESsbbblRKSt8optionalISsEEPFS1_S5_S6_SsSA_SM_SU_SZ_SsSsS15_S15_S19_S1C_S1D_S1E_S1E_S1J_SsbbblS1L_EE9_M_invokeERKSt9_Any_dataS5_OS6_OSsSA_SM_SU_SZ_S10_S10_OS15_S1X_OS19_S1C_OS1D_OS1E_S20_S1J_S1W_ObS21_S21_OlS1N_+0x1ab) [0x748a761786ab]
  ray-project#12 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker11ExecuteTaskERKNS_17TaskSpecificationESt8optionalISt13unordered_mapISsSt6vectorISt4pairIldESaIS9_EESt4hashISsESt8equal_toISsESaIS8_IKSsSB_EEEEPS7_IS8_INS_8ObjectIDESt10shared_ptrINS_9RayObjectEEESaISP_EESS_PS7_IS8_ISL_bESaIST_EEPN6google8protobuf16RepeatedPtrFieldINS_3rpc20ObjectReferenceCountEEEPbPSsS15_+0x1166) [0x748a76320a96]
  ray-project#13 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZNSt17_Function_handlerIFN3ray6StatusERKNS0_17TaskSpecificationESt8optionalISt13unordered_mapISsSt6vectorISt4pairIldESaIS9_EESt4hashISsESt8equal_toISsESaIS8_IKSsSB_EEEEPS7_IS8_INS0_8ObjectIDESt10shared_ptrINS0_9RayObjectEEESaISP_EESS_PS7_IS8_ISL_bESaIST_EEPN6google8protobuf16RepeatedPtrFieldINS0_3rpc20ObjectReferenceCountEEEPbPSsS15_ESt5_BindIFMNS0_4core10CoreWorkerEFS1_S4_SK_SS_SS_SW_S13_S14_S15_S15_EPS19_St12_PlaceholderILi1EES1D_ILi2EES1D_ILi3EES1D_ILi4EES1D_ILi5EES1D_ILi6EES1D_ILi7EES1D_ILi8EES1D_ILi9EEEEE9_M_invokeERKSt9_Any_dataS4_OSK_OSS_S1U_OSW_OS13_OS14_OS15_S1Y_+0x87) [0x748a762e8647]
  ray-project#14 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb5186d) [0x748a7641886d]
  ray-project#15 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb557c5) [0x748a7641c7c5]
  ray-project#16 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x103e3eb) [0x748a769053eb]
  ray-project#17 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1034f0b) [0x748a768fbf0b]
  ray-project#18 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb6f21b) [0x748a7643621b]
  ray-project#19 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x15893cb) [0x748a76e503cb]
  ray-project#20 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158ad69) [0x748a76e51d69]
  ray-project#21 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158b472) [0x748a76e52472]
  ray-project#22 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker20RunTaskExecutionLoopEv+0x132) [0x748a762e4252]
  ray-project#23 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core21CoreWorkerProcessImpl26RunWorkerTaskExecutionLoopEv+0x41) [0x748a76336bd1]
  ray-project#24 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x8a45c1) [0x748a7616b5c1]
  ray-project#25 ray::IDLE(_PyEval_EvalFrameDefault+0x6fb) [0x59f10adf3aeb]
  ray-project#26 ray::IDLE(_PyFunction_Vectorcall+0x6c) [0x59f10ae03dfc]
  ray-project#27 ray::IDLE(_PyEval_EvalFrameDefault+0x6fb) [0x59f10adf3aeb]
  ray-project#28 ray::IDLE(+0x1d5cac) [0x59f10ae9bcac]
  ray-project#29 ray::IDLE(PyEval_EvalCode+0x85) [0x59f10ae9bbf5]
  ray-project#30 ray::IDLE(+0x20732a) [0x59f10aecd32a]
  ray-project#31 ray::IDLE(+0x201d13) [0x59f10aec7d13]
  ray-project#32 ray::IDLE(+0x976be) [0x59f10ad5d6be]
  ray-project#33 ray::IDLE(_PyRun_SimpleFileObject+0x1bb) [0x59f10aec23db]
  ray-project#34 ray::IDLE(_PyRun_AnyFileObject+0x44) [0x59f10aec1f74]
  ray-project#35 ray::IDLE(Py_RunMain+0x371) [0x59f10aebf3e1]
  ray-project#36 ray::IDLE(Py_BytesMain+0x37) [0x59f10ae8f447]
  ray-project#37 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x748a77baad90]
  ray-project#38 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x748a77baae40]
  ray-project#39 ray::IDLE(+0x1c930e) [0x59f10ae8f30e]

[getenv_preload] getenv name=OTEL_CPP_EXPORTER_OTLP_METRICS_RETRY_BACKOFF_MULTIPLIER
[getenv_preload] backtrace:
  #0 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x10a9d17) [0x7321ce3c9d17]
  ray-project#1 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x10abe2b) [0x7321ce3cbe2b]
  ray-project#2 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1050ffc) [0x7321ce370ffc]
  ray-project#3 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x104f4d7) [0x7321ce36f4d7]
  ray-project#4 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1045833) [0x7321ce365833]
  ray-project#5 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xa6c760) [0x7321cdd8c760]
  ray-project#6 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xe69d9a) [0x7321ce189d9a]
  ray-project#7 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray3rpc14ClientCallImplINS0_16HealthCheckReplyEE15OnReplyReceivedEv+0x165) [0x7321ce18c005]
  ray-project#8 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZNSt17_Function_handlerIFvvEZN3ray3rpc17ClientCallManager29PollEventsFromCompletionQueueEiEUlvE_E9_M_invokeERKSt9_Any_data+0x15) [0x7321cdd8e475]
  ray-project#9 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x103e3eb) [0x7321ce35e3eb]
  ray-project#10 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x1034f0b) [0x7321ce354f0b]
  ray-project#11 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xb6f21b) [0x7321cde8f21b]
  ray-project#12 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x15893cb) [0x7321ce8a93cb]
  ray-project#13 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158ad69) [0x7321ce8aad69]
  ray-project#14 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x158b472) [0x7321ce8ab472]
  ray-project#15 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xa6bb54) [0x7321cdd8bb54]
  ray-project#16 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xba2250) [0x7321cdec2250]
  ray-project#17 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7321cf66eac3]
  ray-project#18 /lib/x86_64-linux-gnu/libc.so.6(+0x1268d0) [0x7321cf7008d0]

*** SIGSEGV received at time=1770862205 on cpu 1 ***
PC: @     0x748a77bc5c1d  (unknown)  getenv
    @     0x748a77bc3520  (unknown)  (unknown)
{"asctime":"2026-02-11 18:10:05,910","levelname":"E","message":"*** SIGSEGV received at time=1770862205 on cpu 1 ***","filename":"logging.cc","lineno":474}
{"asctime":"2026-02-11 18:10:05,910","levelname":"E","message":"PC: @     0x748a77bc5c1d  (unknown)  getenv","filename":"logging.cc","lineno":474}
{"asctime":"2026-02-11 18:10:05,910","levelname":"E","message":"    @     0x748a77bc3520  (unknown)  (unknown)","filename":"logging.cc","lineno":474}
Fatal Python error: Segmentation fault
```

According to the backtrace, we can identify that it is the
`OtlpGrpcMetricExporterOptions`, [which called
`getenv(OTEL_CPP_EXPORTER_OTLP_METRICS_RETRY_BACKOFF_MULTIPLIER)`](https://github.com/open-telemetry/opentelemetry-cpp/blob/13ad05a6f431efb76995cffb1225d26b45374749/exporters/otlp/src/otlp_grpc_metric_exporter_options.cc#L47),
getting initialized by calling `InitOpenTelemetryExporter` in the
`metrics_agent_client_->WaitForServerReady()` callback, that causes the
issue.

This PR moves `OtlpGrpcMetricExporterOptions` into
`OpenTelemetryMetricRecorder` (so that we keep otel details
encapsulated) and moves its initialization early to `stats::Init()`, to
force the `OtlpGrpcMetricExporterOptions` to be initialized early, so
that we don't call `getenv` afterward.

---------

Signed-off-by: Rueian Huang <[email protected]>
Signed-off-by: peterxcli <[email protected]>
MengjinYan pushed a commit that referenced this pull request Feb 28, 2026
## Description
grpc 1.57.1 will call `GetEnv("GRPC_EXPERIMENTAL_PICKFIRST_LB_CONFIG")`
on every grpc channel establishment for parsing load-balancing policy.
This causes race conditions between user tasks as they are allowed to do
setenv at anytime. This PR upgrades the grpc lib to 1.58.0 to get rid of
the `GetEnv("GRPC_EXPERIMENTAL_PICKFIRST_LB_CONFIG")`.

```
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=11, threadid=129183804413504) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=11, threadid=129183804413504) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=129183804413504, signo=signo@entry=11) at ./nptl/pthread_kill.c:89
#3  0x00007580a7545476 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26
#4  <signal handler called>
#5  __pthread_kill_implementation (no_tid=0, signo=11, threadid=129183804413504) at ./nptl/pthread_kill.c:44
#6  __pthread_kill_internal (signo=11, threadid=129183804413504) at ./nptl/pthread_kill.c:78
#7  __GI___pthread_kill (threadid=129183804413504, signo=signo@entry=11) at ./nptl/pthread_kill.c:89
#8  0x00007580a7545476 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26
#9  <signal handler called>
#10 __GI_getenv (name=0x7580a6a078c2 "PC_EXPERIMENTAL_PICKFIRST_LB_CONFIG") at ./stdlib/getenv.c:84
#11 0x00007580a67e8b8a in grpc_core::GetEnv(char const*) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#12 0x00007580a649601f in grpc_core::ShufflePickFirstEnabled() () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#13 0x00007580a64960ed in grpc_core::json_detail::FinishedJsonObjectLoader<grpc_core::(anonymous namespace)::PickFirstConfig, 1ul, void>::LoadInto(grpc_core::experimental::Json const&, grpc_core::JsonArgs const&, void*, grpc_core::ValidationErrors*) const () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#14 0x00007580a6787384 in grpc_core::json_detail::LoadWrapped::LoadInto(grpc_core::experimental::Json const&, grpc_core::JsonArgs const&, void*, grpc_core::ValidationErrors*) const ()
   from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#15 0x00007580a6497b07 in grpc_core::(anonymous namespace)::PickFirstFactory::ParseLoadBalancingConfig(grpc_core::experimental::Json const&) const ()
   from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#16 0x00007580a67c18a7 in grpc_core::LoadBalancingPolicyRegistry::ParseLoadBalancingConfig(grpc_core::experimental::Json const&) const ()
   from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#17 0x00007580a66ad9b8 in grpc_core::ClientChannel::OnResolverResultChangedLocked(grpc_core::Resolver::Result) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#18 0x00007580a66ae452 in grpc_core::ClientChannel::ResolverResultHandler::ReportResult(grpc_core::Resolver::Result) ()
   from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#19 0x00007580a63bc603 in grpc_core::PollingResolver::OnRequestCompleteLocked(grpc_core::Resolver::Result) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#20 0x00007580a63bcb2d in std::_Function_handler<void (), grpc_core::PollingResolver::OnRequestComplete(grpc_core::Resolver::Result)::{lambda()#1}>::_M_invoke(std::_Any_data const&)
    () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#21 0x00007580a67cbf46 in grpc_core::WorkSerializer::WorkSerializerImpl::Run(std::function<void ()>, grpc_core::DebugLocation const&) ()
   from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#22 0x00007580a67cc0ea in grpc_core::WorkSerializer::Run(std::function<void ()>, grpc_core::DebugLocation const&) ()
   from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#23 0x00007580a63bd117 in grpc_core::PollingResolver::OnRequestComplete(grpc_core::Resolver::Result) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#24 0x00007580a63b3f86 in grpc_core::(anonymous namespace)::AresClientChannelDNSResolver::AresRequestWrapper::OnHostnameResolved(void*, absl::lts_20230802::Status) ()
   from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#25 0x00007580a67c44c4 in grpc_core::ExecCtx::Flush() () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#26 0x00007580a63408a2 in grpc_core::ExecCtx::~ExecCtx() () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#27 0x00007580a6740343 in grpc_call_start_batch () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#28 0x00007580a5e281e9 in grpc::internal::CallOpSet<grpc::internal::CallOpSendInitialMetadata, grpc::internal::CallOpSendMessage, grpc::internal::CallOpRecvInitialMetadata, grpc::internal::CallOpRecvMessage<google::protobuf::MessageLite>, grpc::internal::CallOpClientSendClose, grpc::internal::CallOpClientRecvStatus>::ContinueFillOpsAfterInterception() ()
   from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#29 0x00007580a5e2d809 in grpc::internal::BlockingUnaryCallImpl<google::protobuf::MessageLite, google::protobuf::MessageLite>::BlockingUnaryCallImpl(grpc::ChannelInterface*, grpc::inte--Type <RET> for more, q to quit, c to continue without paging--c
rnal::RpcMethod const&, grpc::ClientContext*, google::protobuf::MessageLite const&, google::protobuf::MessageLite*) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#30 0x00007580a62d76ea in opentelemetry::proto::collector::metrics::v1::MetricsService::Stub::Export(grpc::ClientContext*, opentelemetry::proto::collector::metrics::v1::ExportMetricsServiceRequest const&, opentelemetry::proto::collector::metrics::v1::ExportMetricsServiceResponse*) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#31 0x00007580a62ca40c in opentelemetry::v1::exporter::otlp::OtlpGrpcClient::DelegateExport(opentelemetry::proto::collector::metrics::v1::MetricsService::StubInterface*, std::unique_ptr<grpc::ClientContext, std::default_delete<grpc::ClientContext> >&&, std::unique_ptr<google::protobuf::Arena, std::default_delete<google::protobuf::Arena> >&&, opentelemetry::proto::collector::metrics::v1::ExportMetricsServiceRequest&&, opentelemetry::proto::collector::metrics::v1::ExportMetricsServiceResponse*) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#32 0x00007580a62c23ed in opentelemetry::v1::exporter::otlp::OtlpGrpcMetricExporter::Export(opentelemetry::v1::sdk::metrics::ResourceMetrics const&) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#33 0x00007580a62c0334 in (anonymous namespace)::OpenTelemetryMetricExporter::Export(opentelemetry::v1::sdk::metrics::ResourceMetrics const&) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#34 0x00007580a62e5fdf in opentelemetry::v1::sdk::metrics::PeriodicExportingMetricReader::CollectAndExportOnce()::{lambda()#1}::operator()() const::{lambda(opentelemetry::v1::sdk::metrics::ResourceMetrics&)#1}::operator()(opentelemetry::v1::sdk::metrics::ResourceMetrics&) const () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#35 0x00007580a62ee7a6 in opentelemetry::v1::sdk::metrics::MetricReader::Collect(opentelemetry::v1::nostd::function_ref<bool (opentelemetry::v1::sdk::metrics::ResourceMetrics&)>) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#36 0x00007580a62e5085 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<opentelemetry::v1::sdk::metrics::PeriodicExportingMetricReader::CollectAndExportOnce()::{lambda()#1}> > >::_M_run() () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#37 0x00007580a6997be0 in execute_native_thread_routine () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#38 0x00007580a7597ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#39 0x00007580a76298d0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
```

Signed-off-by: Rueian Huang <[email protected]>
kamil-kaczmarek pushed a commit that referenced this pull request Mar 3, 2026
## Description
grpc 1.57.1 will call `GetEnv("GRPC_EXPERIMENTAL_PICKFIRST_LB_CONFIG")`
on every grpc channel establishment for parsing load-balancing policy.
This causes race conditions between user tasks as they are allowed to do
setenv at anytime. This PR upgrades the grpc lib to 1.58.0 to get rid of
the `GetEnv("GRPC_EXPERIMENTAL_PICKFIRST_LB_CONFIG")`.

```
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=11, threadid=129183804413504) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=11, threadid=129183804413504) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=129183804413504, signo=signo@entry=11) at ./nptl/pthread_kill.c:89
#3  0x00007580a7545476 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26
#4  <signal handler called>
#5  __pthread_kill_implementation (no_tid=0, signo=11, threadid=129183804413504) at ./nptl/pthread_kill.c:44
#6  __pthread_kill_internal (signo=11, threadid=129183804413504) at ./nptl/pthread_kill.c:78
#7  __GI___pthread_kill (threadid=129183804413504, signo=signo@entry=11) at ./nptl/pthread_kill.c:89
#8  0x00007580a7545476 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26
#9  <signal handler called>
#10 __GI_getenv (name=0x7580a6a078c2 "PC_EXPERIMENTAL_PICKFIRST_LB_CONFIG") at ./stdlib/getenv.c:84
#11 0x00007580a67e8b8a in grpc_core::GetEnv(char const*) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#12 0x00007580a649601f in grpc_core::ShufflePickFirstEnabled() () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#13 0x00007580a64960ed in grpc_core::json_detail::FinishedJsonObjectLoader<grpc_core::(anonymous namespace)::PickFirstConfig, 1ul, void>::LoadInto(grpc_core::experimental::Json const&, grpc_core::JsonArgs const&, void*, grpc_core::ValidationErrors*) const () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#14 0x00007580a6787384 in grpc_core::json_detail::LoadWrapped::LoadInto(grpc_core::experimental::Json const&, grpc_core::JsonArgs const&, void*, grpc_core::ValidationErrors*) const ()
   from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#15 0x00007580a6497b07 in grpc_core::(anonymous namespace)::PickFirstFactory::ParseLoadBalancingConfig(grpc_core::experimental::Json const&) const ()
   from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#16 0x00007580a67c18a7 in grpc_core::LoadBalancingPolicyRegistry::ParseLoadBalancingConfig(grpc_core::experimental::Json const&) const ()
   from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#17 0x00007580a66ad9b8 in grpc_core::ClientChannel::OnResolverResultChangedLocked(grpc_core::Resolver::Result) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#18 0x00007580a66ae452 in grpc_core::ClientChannel::ResolverResultHandler::ReportResult(grpc_core::Resolver::Result) ()
   from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#19 0x00007580a63bc603 in grpc_core::PollingResolver::OnRequestCompleteLocked(grpc_core::Resolver::Result) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#20 0x00007580a63bcb2d in std::_Function_handler<void (), grpc_core::PollingResolver::OnRequestComplete(grpc_core::Resolver::Result)::{lambda()#1}>::_M_invoke(std::_Any_data const&)
    () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#21 0x00007580a67cbf46 in grpc_core::WorkSerializer::WorkSerializerImpl::Run(std::function<void ()>, grpc_core::DebugLocation const&) ()
   from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#22 0x00007580a67cc0ea in grpc_core::WorkSerializer::Run(std::function<void ()>, grpc_core::DebugLocation const&) ()
   from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#23 0x00007580a63bd117 in grpc_core::PollingResolver::OnRequestComplete(grpc_core::Resolver::Result) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#24 0x00007580a63b3f86 in grpc_core::(anonymous namespace)::AresClientChannelDNSResolver::AresRequestWrapper::OnHostnameResolved(void*, absl::lts_20230802::Status) ()
   from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#25 0x00007580a67c44c4 in grpc_core::ExecCtx::Flush() () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#26 0x00007580a63408a2 in grpc_core::ExecCtx::~ExecCtx() () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#27 0x00007580a6740343 in grpc_call_start_batch () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#28 0x00007580a5e281e9 in grpc::internal::CallOpSet<grpc::internal::CallOpSendInitialMetadata, grpc::internal::CallOpSendMessage, grpc::internal::CallOpRecvInitialMetadata, grpc::internal::CallOpRecvMessage<google::protobuf::MessageLite>, grpc::internal::CallOpClientSendClose, grpc::internal::CallOpClientRecvStatus>::ContinueFillOpsAfterInterception() ()
   from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#29 0x00007580a5e2d809 in grpc::internal::BlockingUnaryCallImpl<google::protobuf::MessageLite, google::protobuf::MessageLite>::BlockingUnaryCallImpl(grpc::ChannelInterface*, grpc::inte--Type <RET> for more, q to quit, c to continue without paging--c
rnal::RpcMethod const&, grpc::ClientContext*, google::protobuf::MessageLite const&, google::protobuf::MessageLite*) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#30 0x00007580a62d76ea in opentelemetry::proto::collector::metrics::v1::MetricsService::Stub::Export(grpc::ClientContext*, opentelemetry::proto::collector::metrics::v1::ExportMetricsServiceRequest const&, opentelemetry::proto::collector::metrics::v1::ExportMetricsServiceResponse*) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#31 0x00007580a62ca40c in opentelemetry::v1::exporter::otlp::OtlpGrpcClient::DelegateExport(opentelemetry::proto::collector::metrics::v1::MetricsService::StubInterface*, std::unique_ptr<grpc::ClientContext, std::default_delete<grpc::ClientContext> >&&, std::unique_ptr<google::protobuf::Arena, std::default_delete<google::protobuf::Arena> >&&, opentelemetry::proto::collector::metrics::v1::ExportMetricsServiceRequest&&, opentelemetry::proto::collector::metrics::v1::ExportMetricsServiceResponse*) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#32 0x00007580a62c23ed in opentelemetry::v1::exporter::otlp::OtlpGrpcMetricExporter::Export(opentelemetry::v1::sdk::metrics::ResourceMetrics const&) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#33 0x00007580a62c0334 in (anonymous namespace)::OpenTelemetryMetricExporter::Export(opentelemetry::v1::sdk::metrics::ResourceMetrics const&) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#34 0x00007580a62e5fdf in opentelemetry::v1::sdk::metrics::PeriodicExportingMetricReader::CollectAndExportOnce()::{lambda()#1}::operator()() const::{lambda(opentelemetry::v1::sdk::metrics::ResourceMetrics&)#1}::operator()(opentelemetry::v1::sdk::metrics::ResourceMetrics&) const () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#35 0x00007580a62ee7a6 in opentelemetry::v1::sdk::metrics::MetricReader::Collect(opentelemetry::v1::nostd::function_ref<bool (opentelemetry::v1::sdk::metrics::ResourceMetrics&)>) () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#36 0x00007580a62e5085 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<opentelemetry::v1::sdk::metrics::PeriodicExportingMetricReader::CollectAndExportOnce()::{lambda()#1}> > >::_M_run() () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#37 0x00007580a6997be0 in execute_native_thread_routine () from /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so
#38 0x00007580a7597ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#39 0x00007580a76298d0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
```

Signed-off-by: Rueian Huang <[email protected]>
Signed-off-by: Kamil Kaczmarek <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants