-
Notifications
You must be signed in to change notification settings - Fork 5.3k
[CoreCLR][Signal] Bump shutdown notif and crashdump before prev handler #123735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adjusts CoreCLR’s signal chaining so shutdown notification and crash dump creation happen before invoking a previously-registered signal handler, ensuring these steps still run when the prior handler doesn’t return (notably on Android).
Changes:
- Reorders
PROCNotifyProcessShutdownandPROCCreateCrashDumpIfEnabledto run before chaining to the prior sigaction handler. - Adds an assertion to document/enforce that the “custom handler” path isn’t reached for
SIG_DFL/SIG_IGN.
|
|
||
| _ASSERTE(!IsSigDfl(action) && !IsSigIgn(action)); | ||
|
|
||
| PROCNotifyProcessShutdown(IsRunningOnAlternateStack(context)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this works well when there are multiple runtimes loaded in the process that all want to handle segfaults. It can be multiple .NET runtimes (e.g. CoreCLR + NAOT, or multiple NAOT), or it can be .NET and some other runtime (e.g. Java runtime).
The expected behavior in these situations is that the given runtime will check whether the signal happened in the code that it cares about. If yes, it will handle the signal. If no, it will forward the signal to the next runtime, and so on.
With this change, I think we will shutdown our runtime instance and generate crashdump if there is segfault gracefully handled by some other runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gracefully handled by another runtime, as in sa_sigaction/sa_handler? Wouldn't those still hit PROCNotifyProcessShutdown/PROCCreateCrashDumpIfEnabled in the original implementation? Or do they somehow return from invoke_previous_action
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the signal is handled in some other runtime, the handler registered by that runtime would not return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so given that we cannot tell how the other runtime will handle the signal, would it make sense then to pivot to an opt-in config switch that allows triggering a shutdown/create dump, even if the other runtime handled it gracefully?
I am not sure yet if there is a way for Android CoreCLR to not have a previously registered signal handler, so in those cases we wouldn't ever create crash dumps for signals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know what the handlers that are registered on Android before us do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have the expertise to discover exactly what handlers are reigstered, but with Copilot CLI's help I think I was able to connect them to android runtime's stuff
01-27 18:34:20.359 30541 30565 F DOTNET : CoreCLR: previous handler for
01-27 18:34:20.359 30541 30565 F DOTNET : SIGILL
01-27 18:34:20.359 30541 30565 F DOTNET : :
01-27 18:34:20.359 30541 30565 F DOTNET : handler=
01-27 18:34:20.359 30541 30565 F DOTNET : 0x0000746225158743
01-27 18:34:20.365 30541 30565 F DOTNET : dladdr failed
01-27 18:34:20.371 30541 30565 F DOTNET : /proc/self/maps:
01-27 18:34:20.371 30541 30565 F DOTNET : 7462250ab000-7462251e4000 r-xp 00044000 07:c0 14 /apex/com.android.runtime/bin/linker64
01-27 18:34:20.372 30541 30565 F DOTNET : CoreCLR: previous handler for
01-27 18:34:20.375 30541 30565 F DOTNET : SIGFPE
01-27 18:34:20.375 30541 30565 F DOTNET : :
01-27 18:34:20.375 30541 30565 F DOTNET : handler=
01-27 18:34:20.375 30541 30565 F DOTNET : 0x0000746225158743
01-27 18:34:20.375 30541 30565 F DOTNET : dladdr failed
01-27 18:34:20.378 30541 30565 F DOTNET : /proc/self/maps:
01-27 18:34:20.378 30541 30565 F DOTNET : 7462250ab000-7462251e4000 r-xp 00044000 07:c0 14 /apex/com.android.runtime/bin/linker64
01-27 18:34:20.379 30541 30565 F DOTNET : CoreCLR: previous handler for
01-27 18:34:20.379 30541 30565 F DOTNET : SIGBUS
01-27 18:34:20.379 30541 30565 F DOTNET : :
01-27 18:34:20.379 30541 30565 F DOTNET : handler=
01-27 18:34:20.379 30541 30565 F DOTNET : 0x0000746204cf1c90
01-27 18:34:20.379 30541 30565 F DOTNET : module_base=
01-27 18:34:20.379 30541 30565 F DOTNET : 0x0000746204a81000
01-27 18:34:20.379 30541 30565 F DOTNET : module=
01-27 18:34:20.380 30541 30565 F DOTNET : /system/lib64/libandroid_runtime.so
01-27 18:34:20.380 30541 30565 F DOTNET : symbol=
01-27 18:34:20.380 30541 30565 F DOTNET : _ZN5incfs13SignalHandler7handlerEiP7siginfoPv
01-27 18:34:20.380 30541 30565 F DOTNET : symbol_addr=
01-27 18:34:20.380 30541 30565 F DOTNET : 0x0000746204cf1c90
01-27 18:34:20.381 30541 30565 F DOTNET : /proc/self/maps:
01-27 18:34:20.381 30541 30565 F DOTNET : 746204b5d000-746204d14000 r-xp 000dc000 fe:05 16575744 /system/lib64/libandroid_runtime.so
01-27 18:34:20.381 30541 30565 F DOTNET : CoreCLR: previous handler for
01-27 18:34:20.381 30541 30565 F DOTNET : SIGABRT
01-27 18:34:20.381 30541 30565 F DOTNET : :
01-27 18:34:20.381 30541 30565 F DOTNET : handler=
01-27 18:34:20.381 30541 30565 F DOTNET : 0x0000746225158743
01-27 18:34:20.382 30541 30565 F DOTNET : dladdr failed
01-27 18:34:20.385 30541 30565 F DOTNET : /proc/self/maps:
01-27 18:34:20.385 30541 30565 F DOTNET : 7462250ab000-7462251e4000 r-xp 00044000 07:c0 14 /apex/com.android.runtime/bin/linker64
01-27 18:34:20.385 30541 30565 F DOTNET : CoreCLR: previous handler for
01-27 18:34:20.385 30541 30565 F DOTNET : SIGINT
01-27 18:34:20.385 30541 30565 F DOTNET : :
01-27 18:34:20.385 30541 30565 F DOTNET : SIG_DFL
01-27 18:34:20.385 30541 30565 F DOTNET : CoreCLR: previous handler for
01-27 18:34:20.385 30541 30565 F DOTNET : SIGQUIT
01-27 18:34:20.385 30541 30565 F DOTNET : :
01-27 18:34:20.386 30541 30565 F DOTNET : SIG_DFL
01-27 18:34:20.386 30541 30565 F DOTNET : CoreCLR: previous handler for
01-27 18:34:20.386 30541 30565 F DOTNET : SIGTRAP
01-27 18:34:20.386 30541 30565 F DOTNET : :
01-27 18:34:20.386 30541 30565 F DOTNET : handler=
01-27 18:34:20.386 30541 30565 F DOTNET : 0x0000746225158743
01-27 18:34:20.386 30541 30565 F DOTNET : dladdr failed
01-27 18:34:20.393 30541 30565 F DOTNET : /proc/self/maps:
01-27 18:34:20.393 30541 30565 F DOTNET : 7462250ab000-7462251e4000 r-xp 00044000 07:c0 14 /apex/com.android.runtime/bin/linker64
01-27 18:34:20.393 30541 30565 F DOTNET : CoreCLR: previous handler for
01-27 18:34:20.393 30541 30565 F DOTNET : SIGSEGV
01-27 18:34:20.393 30541 30565 F DOTNET : :
01-27 18:34:20.393 30541 30565 F DOTNET : handler=
01-27 18:34:20.393 30541 30565 F DOTNET : 0x0000746225158743
01-27 18:34:20.393 30541 30565 F DOTNET : dladdr failed
01-27 18:34:20.395 30541 30565 F DOTNET : /proc/self/maps:
01-27 18:34:20.395 30541 30565 F DOTNET : 7462250ab000-7462251e4000 r-xp 00044000 07:c0 14 /apex/com.android.runtime/bin/linker64
01-27 18:34:20.395 30541 30565 F DOTNET : CoreCLR: previous handler for
01-27 18:34:20.395 30541 30565 F DOTNET : SIGTERM
01-27 18:34:20.395 30541 30565 F DOTNET : :
01-27 18:34:20.395 30541 30565 F DOTNET : SIG_DFL
01-27 18:34:20.395 30541 30565 F DOTNET : CoreCLR: previous handler for
01-27 18:34:20.395 30541 30565 F DOTNET : INJECT_ACTIVATION_SIGNAL
01-27 18:34:20.395 30541 30565 F DOTNET : :
01-27 18:34:20.395 30541 30565 F DOTNET : SIG_DFL
@grendello, would you happen to know how to tell what signal handlers might be registered by default on Android?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense then to pivot to an opt-in config switch that allows triggering a shutdown/create dump, even if the other runtime handled it gracefully?
I wonder if for such opt-in, we should just create a dump and instead of shutting down proceed to the other registered handler to be a good citizen of the Android ecosystem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if for such opt-in, we should just create a dump and instead of shutting down proceed to the other registered handler to be a good citizen of the Android ecosystem?
That can lead to things slowing down to crawl if the other runtime sends and handles multiple instances of the signal over a short period of time. There is no clear winner here.
I agree that this should be an opt-in.
That way AndroidSDK/AndroidAppBuilder may opt-in at build-time.
We may want to enumerate the scenarios for collecting a crash dump. For example, is collecting a crash dump of a appstore-installed app on retail device something we want to enable?
The previously registered signal action/handler aren't guaranteed to return, so we lose out on notifying shutdown and creating a dump in those cases. Specifically, PROCCreateCrashDumpIfEnabled would be the last chance to provide the managed context for the thread that crashed.
e.g. On Android CoreCLR, it seems that, by default, signal handlers are already registered by Android's runtime (/apex/com.android.runtime/bin/linker64 + /system/lib64/libandroid_runtime.so). Whenever an unhandled synchronous fault occurs, the previously registered handler will not return back to invoke_previous_action and aborts the thread itself, so PROCCreateCrashDumpIfEnabled will not be hit.
Sigsegv behavior Android CoreCLR vs other platforms
Android CoreCLR
When intentionally writing to NULL (sigsegv) on Android CoreCLR, the previously registered signal handler goes down this path
runtime/src/coreclr/pal/src/exception/signal.cpp
Line 454 in 40e8c73
MacOS/Linux/NativeAOT(linux)
On MacOS, Linux, NativeAOT (Only checked linux at time of writing), the same intentional SIGSEGV will hit
runtime/src/coreclr/pal/src/exception/signal.cpp
Lines 431 to 448 in 40e8c73
History investigation
From a github history dive, I didn't spot anything in particular requiring the previous signal handler to be invoked before PROCNotifyProcessShutdown + PROCCreateCrashDumpIfEnabled.
PROCNotifyProcessShutdown was first introduced in 1433c3f. It doesn't seem to state a particular reason for invoking it after the previous signal handler.
PROCCreateCrashDumpIfEnabled was added to signal.cpp in 7f9bd2c because the PROCNotifyProcessShutdown didn't create a crash dump. It doesn't state any particular reason for being invoked after the previously registered signal handler, and was probably just placed next to PROCNotifyProcessShutdown.
invoke_previous_actionwas introduced in a740f65 and was refactoring while maintaining the order.Android CoreCLR behavior after swapping order
Locally, I have POC changes to emit managed callstacks in Android's PROCCreateCrashDumpIfEnabled.
Now theres a window to log managed callstacks before the original signal handler aborts and triggers a tombstone.
Android Mono behavior
Mono provides two embeddings APIs to configure signal and crash chaining
runtime/src/mono/mono/mini/driver.c
Lines 2864 to 2894 in 61d3943
runtime/src/mono/mono/mini/mini-runtime.c
Lines 3892 to 3903 in 61d3943
runtime/src/mono/mono/mini/mini-posix.c
Lines 193 to 210 in 61d3943
runtime/src/mono/mono/mini/mini-exceptions.c
Lines 2992 to 3012 in 61d3943
Alternatives
If there is any particular reason to preserve the order of sa_sigaction/sa_handler with respect to PROCNotifyProcessShutdown and PROCCreateCrashDumpIfEnabled for CoreCLR, a config knob can be added to allow Android CoreCLR to opt into the swapped ordering behavior. This may be in the form of config property key/values
runtime/src/coreclr/dlls/mscoree/exports.cpp
Lines 237 to 238 in 54ca569
clrconfigvalues. That way AndroidSDK/AndroidAppBuilder may opt-in at build-time.Given that the history of the ordering didn't reveal any problems with swapping the order, we can fallback to this behavior if the order swap causes problems down the line.
The other way around is more restrictive. Should we first introduce all the overhead to enable an opt-in/opt-out config knob, and later discover that no platforms need to invoke their previous handlers before PROCNotifyProcessShutdown/PROCCreateCrashDumpIfEnabled, it seems harder to justify removing the knob.