Berwyn Hoyt (0bdc35ae) at 17 Mar 07:11
Make test go/fatal_signal also cover YDBGo v2; update ASAN detectio...
Ok, it's just as well I did more testing. It turns out that ydb3n1 does still frequenly cause failures, though not all the time. This means YDBGo v2 is not immune. The reason this was not observed previously is that this test is still intermittent.
The failure is due to a random setting. I know this for certain because I can repeat the test any number of times with -num_runs n and if the first one passes, the subsequent ones always pass.
I have spent a great deal of time trying to isolate the random setting that makes the difference between pass and failure. To do this I have done a diff of settings.csh between passing and failing tests, and then I have used gtmtest -env to explicitly set every single variable that differed, to match the passing version of settings.csh. But it didn't work: the test still sometimes fails. This suggests that the critical setting is random but is not set by settings.csh.
After searching, there is such setting in the script set_asan_options_env_var.csh which sets ASAN_OPTION detect_stack_use_after_return=0 exactly 75% of the time.
To generate the failure, you must set ASAN_OPTIONS detect_stack_use_after_return=0. This always produces the ASAN error (though it does not quite always produce a coredump: maybe there's some other random setting that causes a coredump).
This is rather obscure because you'd think that the ASAN error would be produced when detect-stack-use-after-return is ON. But no: just to turn your world upside down, ASAN produces stack-use-after-scope errors only when detect-use-after-return is is OFF! (Although occasionally I have noticed other errors such as segfaults or CALLINAFTERXIT errors when detect-use-after-return is OFF). I will document this weird behaviour in the test script fatal_signal.csh.
You could not previously force this use-after-scope failure on the gtmtest command line because set_asan_options_env_var.csh trumps this variable. I will change that to let the user set gtmtest -env enable_detect_stack_use_after_return=0, and thus force this failure to occur.
The v1 issue where signal panics can illegally unstack past a CGo boundary will not be easy to fix because not only is it a tricky area of the code, it would require changes to v1 user code which would need to defer a signal handler in every goroutine, a necessity that v2 provides for and mandates.
I am by no means certain that this is a YDBGo v1 problem. However, since it has disappeared with YDBGo v2, I will make that assumption and disable threeenp1C2 ASAN testing of stack_use_after_scope, as @nars1 seems to suggest in the previous message.
@nars1 replied as follows:
Thanks for the better looking logs. Refer to sections of the log below referencing gtm_exit_handler.c:181 and ydb_set_st.c:28:
==3825155==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7845d4dff8f0 at pc 0x0000004c4599 bp 0x7845d4dff6c0 sp 0x7845d4dfee90
WRITE of size 152 at 0x7845d4dff8f0 thread T10
#0 0x4c4598 in __asan_memset (/tmp/tst/tst_V996_R203_dbg_00_260313_201116_Fbg/go_0/fatal_signal/threeenp1C2+0x4c4598) (BuildId: bf6f688f9e771e603d0e3fa2268ad9cf4a6fa2cb)
#1 0x784632176f9d in gtm_exit_handler /Distrib/YottaDB/V996_R203/sr_unix/gtm_exit_handler.c:181:20
#2 0x78463156ff15 in ydb_exit /Distrib/YottaDB/V996_R203/sr_unix/ydb_exit.c:150:3
#3 0x707874 in _cgo_3f92ad9ad48b_Cfunc_ydb_exit /tmp/go-build/cgo-gcc-prolog:106:11
#4 0x5820a3 in runtime.asmcgocall.abi0 /usr/local/go/src/runtime/asm_amd64.s:921
Address 0x7845d4dff8f0 is located in stack of thread T10 at offset 3216 in frame
#0 0x784631628b7f in ydb_set_st /Distrib/YottaDB/V996_R203/sr_unix/ydb_set_st.c:28
Where gtm_exit_handler.c contains:
179 void gtm_exit_handler(void)
180 {
181 struct sigaction act;
182 struct extcall_package_list *package_ptr;
183 boolean_t error_seen;
184 int4 actual_exi_condition;
The above stack use shows up right at function entry in gtm_exit_handler. So I suspect this is the YDBGo v1 issue that you fixed in v2.
As for why this started failing only 1 year back, it is most likely newer clang/asan packages started detecting this condition better and so this longstanding issue started getting detected only in the past year. So I would not worry about that.
Is this something that is also easily fixed in v1? If so, I would suggest please do that. If not, that is fine too. Whatever you have so far in terms of better looking asan logs is good enough.
In an email exchange I said:
I have made some progress on finding the go/fatal_signal bug with threeenp1C2. This MR does two things:
I cannot yet tell whether the problem is YDBGo v1 or YottaDB itself. I do know that in v2 I fixed a v1 issue with Go panic illegally traversing a CGo boundary, and since Go signals handers sometimes exit via panics the issue might be YDBGo v1. But that doesn't explain why this started to occur about a year ago which suggests it might be a YottaDB issue.
Unless looking at the attached asan.log gives you a clue, my next tactic would be to do a git history bisection until I found the problem.
@nars1 As for logs of both scenarios, when ASAN_OPTIONS log_path=asan.log is not set, the test log contains a core dump and just the following two lines per offending process (extracted from test log fatal_signal.log):
/tmp/tst/tst_V996_R203_dbg_00_260317_104241_rvL/go_0/fatal_signal/threeenp1C21-12.out
==4178066==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7d75531ff8f0 at pc 0x0000004c4599 bp 0x7d75531ff6c0 sp 0x7d75531fee90
However, when log_path=asan.log is set, the test log contains a core dump but no ASAN error (per fatal_signal.log). Instead of an ASAN error, a file asan.log.<pid> exists for each offending process. This contains much better information, e.g. asan.log.130974.
However, you make a good point that since these asan files are not named for each invokation of yottadb, we would lose context: especially since there is no error at all logged in the test. I can't find an ASAN option that keeps an error in the test log, nor an option that makes the test log contain the verbose information. Pity). So at this point I'll limit this change to go/fatal_signal. But when you get future ASAN issues, keep in mind the asan.log option to give you better information.
Berwyn Hoyt (58b0c80c) at 16 Mar 23:54
[#720] Fix developer friendly test environment to ignore memory lea...
... and 25 more commits
Berwyn Hoyt (4d70056b) at 16 Mar 23:54
[YDB#974] NEW r204/zroutines_wildcard-ydb974 to test that setting $...
... and 24 more commits
Ah, yes. It is indeed substring matched. That explains why yottadb worked. I have changed it to libyottadb.so alone since this does what we really want.
Ok, I have found docs that say LSAN_OPTIONS and ASAN_OPTIONS are frequently used together. The ASAN_OPTIONS can turn leak detection on and off, but LSAN_OPTIONS specifies details about how ASan does leak detection. This also matches what happens in my own testing.
Having concluded that LSAN_OPTIONS is the best way to go, I have changed it. Now the leak suppression is specific to libyottadb.so. This does mean that there is a slim chance that something else built with ASan will suddenly start failing. If so, that's a good thing as it will make us aware of other leak issues. In fact, @nars1, you may wish to run a DALL on this MR to find any such issues prior to releasing it. Please go ahead and merge this MR if you do not think a DALL is necessary.
Done.
Berwyn Hoyt (3b1b99e8) at 16 Mar 23:48
[#720] Fix developer friendly test environment to ignore memory lea...
Berwyn Hoyt (f46ea921) at 16 Mar 22:52
[#720] Fix developer friendly test environment to ignore memory lea...
Jon Badiali (23e927f8) at 16 Mar 22:00
[#787] Removed V4/V5 and AIX usages from test framework (V6 is curr...
... and 1 more commit
Jon Badiali (4d70056b) at 16 Mar 21:59
[YDB#974] NEW r204/zroutines_wildcard-ydb974 to test that setting $...
Jon Badiali (b8a23ff6) at 16 Mar 21:58
[#906] Test revisions
Narayanan Iyer (4082b816) at 16 Mar 20:44
[#435] [YDB#974] Document $ZROUTINES accepting wildcards for .so files
... and 1 more commit
Narayanan Iyer (4d70056b) at 16 Mar 20:42
[YDB#974] NEW r204/zroutines_wildcard-ydb974 to test that setting $...
Narayanan Iyer (b6e7cf62) at 16 Mar 20:42
[#974] Enhance SET $ZROUTINES to allow glob patterns for .so files
You can copy $ydb_dist to another directory and set that as your ydb_dist env var. Check com/copy_ydb_dist_dir.csh usages in YDBTest.