简单分析 App 进程 Crash 机制
code小生 一个专注大前端领域的技术平台 公众号回复 Android
加入安卓技术群
作者:杰杰_88
链接:https://www.jianshu.com/p/ecd52cd90a4b
声明:本文已获 杰杰_88
授权发表,转发等请联系原作者授权
结论:App进程Crash,不是真正意义上的进程崩溃(对比native代码崩溃),是java代码运行抛出没人处理的异常后,App自己把自己Kill掉了。
工作中遇到后台Service挂掉后(弹出停止运行),很久没有重启,分析log发现进程抛出FATAL EXCEPTION后并没有被杀,很久后才被杀掉重启,迷惑,遂看看具体的App挂掉流程是什么样的。
表象
当一个Android App进程因为各种原因抛出异常而没有被catch处理的时候,在用户看来,就会看到一个“某某已停止运行”的对话框,之前我一般认为该app进程已经挂掉。
实际上
以前在看到“某某已停止运行”时,一直认为对应进程也同时结束,没有仔细分析过整个App停止运行的机制,其实,停止运行对话框弹出的时候,进程还没有完全退出,真正的退出是进程将自己kill掉的时候。下面就记录下从App抛出没有catch的异常到该进程真正灰飞烟灭的整个过程。
App进程的创建
要分析一个app进程是怎么没的,先看看app进程是怎么来的。
关键代码
App进程创建流程:

frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java
startResult = Process.start(entryPoint, app.processName, uid, uid, gids, debugFlags, mountExternal, app.info.targetSdkVersion, seInfo, requiredAbi, instructionSet, app.info.dataDir, invokeWith, entryPointArgs);
frameworks/base/core/java/android/os/ZygoteProcess.java
//ZygoteState维护了与Zygote进程通过Socket的连接 private ZygoteState openZygoteSocketIfNeeded(String abi) throws ZygoteStartFailedEx { Preconditions.checkState(Thread.holdsLock(mLock), "ZygoteProcess lock not held"); if (primaryZygoteState == null || primaryZygoteState.isClosed()) { try { primaryZygoteState = ZygoteState.connect(mSocket); } catch (IOException ioe) { throw new ZygoteStartFailedEx("Error connecting to primary zygote", ioe); } } if (primaryZygoteState.matches(abi)) { return primaryZygoteState; } // The primary zygote didn't match. Try the secondary. if (secondaryZygoteState == null || secondaryZygoteState.isClosed()) { try { secondaryZygoteState = ZygoteState.connect(mSecondarySocket); } catch (IOException ioe) { throw new ZygoteStartFailedEx("Error connecting to secondary zygote", ioe); } } if (secondaryZygoteState.matches(abi)) { return secondaryZygoteState; } throw new ZygoteStartFailedEx("Unsupported zygote ABI: " + abi); } private static Process.ProcessStartResult zygoteSendArgsAndGetResult( ZygoteState zygoteState, ArrayListargs) throws ZygoteStartFailedEx { try { // Throw early if any of the arguments are malformed. This means we can // avoid writing a partial response to the zygote. int sz = args.size(); for (int i = 0; i < sz; i++) { if (args.get(i).indexOf('\n') >= 0) { throw new ZygoteStartFailedEx("embedded newlines not allowed"); } } /** * See com.android.internal.os.SystemZygoteInit.readArgumentList() * Presently the wire format to the zygote process is: * a) a count of arguments (argc, in essence) * b) a number of newline-separated argument strings equal to count * * After the zygote process reads these it will write the pid of * the child or -1 on failure, followed by boolean to * indicate whether a wrapper process was used. */ final BufferedWriter writer = zygoteState.writer; final DataInputStream inputStream = zygoteState.inputStream; writer.write(Integer.toString(args.size())); writer.newLine(); for (int i = 0; i < sz; i++) { String arg = args.get(i); writer.write(arg); writer.newLine(); } writer.flush(); // Should there be a timeout on this? Process.ProcessStartResult result = new Process.ProcessStartResult(); // Always read the entire result from the input stream to avoid leaving // bytes in the stream for future process starts to accidentally stumble // upon. result.pid = inputStream.readInt(); result.usingWrapper = inputStream.readBoolean(); if (result.pid < 0) { throw new ZygoteStartFailedEx("fork() failed"); } return result; } catch (IOException ex) { zygoteState.close(); throw new ZygoteStartFailedEx(ex); } }
zygoteSendArgsAndGetResult方法通过LocalSocket发送的命令被Zygote接收到:
frameworks/base/core/java/com/android/internal/os/ZygoteConnection.java
pid = Zygote.forkAndSpecialize(parsedArgs.uid, parsedArgs.gid, parsedArgs.gids, parsedArgs.debugFlags, rlimits, parsedArgs.mountExternal, parsedArgs.seInfo, parsedArgs.niceName, fdsToClose, fdsToIgnore, parsedArgs.instructionSet, parsedArgs.appDataDir);
此处fork出真正的app进程,然后在fork出的子进程中执行命令:
ZygoteInit.zygoteInit(parsedArgs.targetSdkVersion, parsedArgs.remainingArgs, null /* classLoader */);
执行的命令:
最终会从ActivityThread.java 的main函数进入,开始App的生命周期
*RuntimeInit.commonInit()
上面流程中,App进程fork出来后,执行此函数:
RuntimeInit.commonInit()
其中:
Thread.setUncaughtExceptionPreHandler(new LoggingHandler()); Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler());
/** * Dispatch an uncaught exception to the handler. This method is * intended to be called only by the runtime and by tests. * * @hide */ // @VisibleForTesting (would be private if not for tests) public final void dispatchUncaughtException(Throwable e) { Thread.UncaughtExceptionHandler initialUeh = Thread.getUncaughtExceptionPreHandler(); if (initialUeh != null) { try { initialUeh.uncaughtException(this, e); } catch (RuntimeException | Error ignored) { // Throwables thrown by the initial handler are ignored } } getUncaughtExceptionHandler().uncaughtException(this, e); }
setUncaughtExceptionPreHandler设置“未捕获异常预处理程序”为loggingHandler,setDefaultUncaughtExceptionHandler设置真正的“未捕获异常默认处理程序”为KillApplicationHandler,按字面意思以及函数dispatchUncaughtException理解,发生异常时,先调用loggingHandler处理异常,再调用KillApplicationHandler处理。loggingHandler就是用来打印FATAL EXCEPTION以及trace的:
E AndroidRuntime: FATAL EXCEPTION: main
KillApplicationHandler:
/** * Handle application death from an uncaught exception. The framework * catches these for the main threads, so this should only matter for * threads created by applications. Before this method runs, * {@link LoggingHandler} will already have logged details. */ private static class KillApplicationHandler implements Thread.UncaughtExceptionHandler { public void uncaughtException(Thread t, Throwable e) { try { // Don't re-enter -- avoid infinite loops if crash-reporting crashes. if (mCrashing) return; mCrashing = true; // Try to end profiling. If a profiler is running at this point, and we kill the // process (below), the in-memory buffer will be lost. So try to stop, which will // flush the buffer. (This makes method trace profiling useful to debug crashes.) if (ActivityThread.currentActivityThread() != null) { ActivityThread.currentActivityThread().stopProfiling(); } final String processName = ActivityThread.currentProcessName(); if (processName != null) { if (Build.IS_USERDEBUG && processName.equals(SystemProperties.get("persist.debug.process"))) { Log.w(TAG, "process: " + processName + " crash message is skip"); return; } } // Bring up crash dialog, wait for it to be dismissed ActivityManager.getService().handleApplicationCrash( mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e)); } catch (Throwable t2) { if (t2 instanceof DeadObjectException) { // System process is dead; ignore } else { try { Clog_e(TAG, "Error reporting crash", t2); } catch (Throwable t3) { // Even Clog_e() fails! Oh well. } } } finally { // Try everything to make sure this process goes away. Process.killProcess(Process.myPid()); System.exit(10); } } }
这里通过如下代码和ActivityManagerService交互弹出“停止运行”对话框,注意注释,对话框消失后才会继续往下执行。
// Bring up crash dialog, wait for it to be dismissed ActivityManager.getService().handleApplicationCrash( mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));
在ActivityManagerService,最终会停在如下代码处:
AppErrors.java crashApplicationInner():
synchronized (mService) { /** * If crash is handled by instance of {@link android.app.IActivityController}, * finish now and don't show the app error dialog. */ if (handleAppCrashInActivityController(r, crashInfo, shortMsg, longMsg, stackTrace, timeMillis, callingPid, callingUid)) { return; } /** * If this process was running instrumentation, finish now - it will be handled in * {@link ActivityManagerService#handleAppDiedLocked}. */ if (r != null && r.instr != null) { return; } // Log crash in battery stats. if (r != null) { mService.mBatteryStatsService.noteProcessCrash(r.processName, r.uid); } AppErrorDialog.Data data = new AppErrorDialog.Data(); data.result = result; data.proc = r; // If we can't identify the process or it's already exceeded its crash quota, // quit right away without showing a crash dialog. if (r == null || !makeAppCrashingLocked(r, shortMsg, longMsg, stackTrace, data)) { return; } final Message msg = Message.obtain(); msg.what = ActivityManagerService.SHOW_ERROR_UI_MSG; task = data.task; msg.obj = data; mService.mUiHandler.sendMessage(msg); } int res = result.get();
result为AppErrorResult类型,result.get()会wait(),block当前Binder调用,等待对应的notify;前面的代码就是弹出“停止运行”的对话框:AppErrorDialog,result会随data传入AppErrorDialog,dismiss时调用result.set(),唤醒刚才Binder线程的wait:
AppErrorResult
final class AppErrorResult { public void set(int res) { synchronized (this) { mHasResult = true; mResult = res; notifyAll(); } } public int get() { synchronized (this) { while (!mHasResult) { try { wait(); } catch (InterruptedException e) { } } } return mResult; } boolean mHasResult = false; int mResult; }
然后进行后面的处理Binder调用返回后,App进程中才最终会杀死自己:
finally { // Try everything to make sure this process goes away. Process.killProcess(Process.myPid()); System.exit(10); }
注意到,在AppErrorDialog构造函数中:
// After the timeout, pretend the user clicked the quit button mHandler.sendMessageDelayed( mHandler.obtainMessage(TIMEOUT), DISMISS_TIMEOUT)
如果用户一直没有理睬,会在5分钟后返回,可以注意如下log:
Slog.w(TAG, "handleApplicationStrictModeViolation; res=" + res);
在超时后才返回,就会导致 app 进程在 crash 状态下存在 5 分钟之久,除了异常的线程,其他线程还会努力工作,有可能会有些奇怪的事情发生。应该挂掉重启的,由于进程没有被杀死, ActivityManagerService
收不到 binderDied
消息,也会在超时之前一直得不到重启。