Input系统—ANR原理分析

基于Android 6.0源码，分析Input事件发生ANR的原理

一. 概述

当input事件处理得慢就会触发ANR，那ANR内部原理是什么，哪些场景会产生ANR呢。 “工欲善其事必先利其器”，为了理解input ANR原理，前面几篇文章疏通了整个input框架的处理流程，都是为了这篇文章而做铺垫。在正式开始分析ANR触发原理以及触发场景之前，先来回顾一下input流程。

1.1 InputReader

点击查看大图:

input_reader_seq

InputReader的主要工作分两部分：

调用EventHub的getEvents()读取节点/dev/input的input_event结构体转换成RawEvent结构体，RawEvent根据不同InputMapper来转换成相应的EventEntry,比如按键事件则对应KeyEntry,触摸事件则对应MotionEntry。
- 转换结果：input_event -> EventEntry;
将事件添加到mInboundQueue队列尾部，加入该队列前有以下两个过滤：
- IMS.interceptKeyBeforeQueueing：事件分发前可增加业务逻辑；
- IMS.filterInputEvent：可拦截事件，当返回值为false的事件都直接拦截，没有机会加入mInboundQueue队列，不会再往下分发；否则进入下一步;
- enqueueInboundEventLocked：该事件放入mInboundQueue队列尾部;
- mLooper->wake:并根据情况来唤醒InputDispatcher线程.
KeyboardInputMapper.processKey()的过程, 记录下按下down事件的时间点.

1.2 InputDispatcher

点击查看大图:

input_dispatcher_seq

dispatchOnceInnerLocked(): 从InputDispatcher的mInboundQueue队列，取出事件EventEntry。另外该方法开始执行的时间点(currentTime)便是后续事件dispatchEntry的分发时间(deliveryTime）
dispatchKeyLocked()：满足一定条件时会添加命令doInterceptKeyBeforeDispatchingLockedInterruptible；
enqueueDispatchEntryLocked()：生成事件DispatchEntry并加入connection的outbound队列
startDispatchCycleLocked()：从outboundQueue中取出事件DispatchEntry, 重新放入connection的waitQueue队列；
runCommandsLockedInterruptible()：通过循环遍历地方式，依次处理mCommandQueue队列中的所有命令。而mCommandQueue队列中的命令是通过postCommandLocked()方式向该队列添加的。ANR回调命令便是在这个时机执行。
handleTargetsNotReadyLocked(): 该过程会判断是否等待超过5s来决定是否调用onANRLocked().

流程15中sendMessage是将input事件分发到app端,当app处理完该事件后会发送finishInputEvent()事件. 接下来又回到pollOnce()方法.

1.3 UI Thread

input_ui

“InputDispatcher”线程监听socket服务端，收到消息后回调InputDispatcher.handleReceiveCallback()；
UI主线程监听socket客户端，收到消息后回调NativeInputEventReceiver.handleEvent().

对于ANR的触发主要是在InputDispatcher过程，下面再从ANR的角度来说一说ANR触发过程。

二. ANR处理流程

ANR时间区间便是指当前这次的事件dispatch过程中执行findFocusedWindowTargetsLocked()方法到下一次执行resetANRTimeoutsLocked()的时间区间. 以下5个时机会reset. 都位于InputDispatcher.cpp文件:

resetAndDropEverythingLocked
releasePendingEventLocked
setFocusedApplication
dispatchOnceInnerLocked
setInputDispatchMode

简单来说, 主要是以下4个场景,会有机会执行resetANRTimeoutsLocked:

解冻屏幕, 系统开/关机的时刻点 (thawInputDispatchingLw, setEventDispatchingLw)
wms聚焦app的改变 (WMS.setFocusedApp, WMS.removeAppToken)
设置input filter的过程 (IMS.setInputFilter)
再次分发事件的过程(dispatchOnceInnerLocked)

当InputDispatcher线程 findFocusedWindowTargetsLocked()过程调用到handleTargetsNotReadyLocked，且满足超时5s的情况则会调用onANRLocked().

2.1 onANRLocked

[-> InputDispatcher.cpp]

void InputDispatcher::onANRLocked(
        nsecs_t currentTime, const sp<InputApplicationHandle>& applicationHandle,
        const sp<InputWindowHandle>& windowHandle,
        nsecs_t eventTime, nsecs_t waitStartTime, const char* reason) {
    float dispatchLatency = (currentTime - eventTime) * 0.000001f;
    float waitDuration = (currentTime - waitStartTime) * 0.000001f;

    ALOGI("Application is not responding: %s.  "
            "It has been %0.1fms since event, %0.1fms since wait started.  Reason: %s",
            getApplicationWindowLabelLocked(applicationHandle, windowHandle).string(),
            dispatchLatency, waitDuration, reason);

    //捕获ANR的现场信息
    time_t t = time(NULL);
    struct tm tm;
    localtime_r(&t, &tm);
    char timestr[64];
    strftime(timestr, sizeof(timestr), "%F %T", &tm);
    mLastANRState.clear();
    mLastANRState.append(INDENT "ANR:\n");
    mLastANRState.appendFormat(INDENT2 "Time: %s\n", timestr);
    mLastANRState.appendFormat(INDENT2 "Window: %s\n",
            getApplicationWindowLabelLocked(applicationHandle, windowHandle).string());
    mLastANRState.appendFormat(INDENT2 "DispatchLatency: %0.1fms\n", dispatchLatency);
    mLastANRState.appendFormat(INDENT2 "WaitDuration: %0.1fms\n", waitDuration);
    mLastANRState.appendFormat(INDENT2 "Reason: %s\n", reason);
    dumpDispatchStateLocked(mLastANRState);

    //将ANR命令加入mCommandQueue
    CommandEntry* commandEntry = postCommandLocked(
            & InputDispatcher::doNotifyANRLockedInterruptible);
    commandEntry->inputApplicationHandle = applicationHandle;
    commandEntry->inputWindowHandle = windowHandle;
    commandEntry->reason = reason;
}

发生ANR调用onANRLocked()的过程会将doNotifyANRLockedInterruptible加入mCommandQueue。在下一轮InputDispatcher.dispatchOnce的过程中会先执行runCommandsLockedInterruptible()方法，取出 mCommandQueue队列的所有命令逐一执行。那么ANR所对应的命令doNotifyANRLockedInterruptible，接下来看该方法。

3.2 doNotifyANRLockedInterruptible

[-> InputDispatcher.cpp]

void InputDispatcher::doNotifyANRLockedInterruptible(
        CommandEntry* commandEntry) {
    mLock.unlock();

    //[见小节3.3]
    nsecs_t newTimeout = mPolicy->notifyANR(
            commandEntry->inputApplicationHandle, commandEntry->inputWindowHandle,
            commandEntry->reason);

    mLock.lock();
    //newTimeout =5s [见小节3.8]
    resumeAfterTargetsNotReadyTimeoutLocked(newTimeout,
            commandEntry->inputWindowHandle != NULL
                    ? commandEntry->inputWindowHandle->getInputChannel() : NULL);
}

mPolicy是指NativeInputManager

3.3 NativeInputManager.notifyANR

[-> com_android_server_input_InputManagerService.cpp]

nsecs_t NativeInputManager::notifyANR(const sp<InputApplicationHandle>& inputApplicationHandle,
        const sp<InputWindowHandle>& inputWindowHandle, const String8& reason) {
    JNIEnv* env = jniEnv();

    jobject inputApplicationHandleObj =
            getInputApplicationHandleObjLocalRef(env, inputApplicationHandle);
    jobject inputWindowHandleObj =
            getInputWindowHandleObjLocalRef(env, inputWindowHandle);
    jstring reasonObj = env->NewStringUTF(reason.string());

    //调用Java方法[见小节3.4]
    jlong newTimeout = env->CallLongMethod(mServiceObj,
                gServiceClassInfo.notifyANR, inputApplicationHandleObj, inputWindowHandleObj,
                reasonObj);
    if (checkAndClearExceptionFromCallback(env, "notifyANR")) {
        newTimeout = 0; //抛出异常,则清理并重置timeout
    }
    ...
    return newTimeout;
}

先看看register_android_server_InputManager过程：

int register_android_server_InputManager(JNIEnv* env) {
    int res = jniRegisterNativeMethods(env, "com/android/server/input/InputManagerService",
            gInputManagerMethods, NELEM(gInputManagerMethods));

    jclass clazz;
    FIND_CLASS(clazz, "com/android/server/input/InputManagerService");
    ...
    GET_METHOD_ID(gServiceClassInfo.notifyANR, clazz,
            "notifyANR",
            "(Lcom/android/server/input/InputApplicationHandle;Lcom/android/server/input/InputWindowHandle;Ljava/lang/String;)J");
    ...
}

可知gServiceClassInfo.notifyANR是指IMS.notifyANR

3.4 IMS.notifyANR

[-> InputManagerService.java]

private long notifyANR(InputApplicationHandle inputApplicationHandle,
        InputWindowHandle inputWindowHandle, String reason) {
    //[见小节3.5]
    return mWindowManagerCallbacks.notifyANR(
            inputApplicationHandle, inputWindowHandle, reason);
}

此处mWindowManagerCallbacks是指InputMonitor对象。

3.5 InputMonitor.notifyANR

[-> InputMonitor.java]

public long notifyANR(InputApplicationHandle inputApplicationHandle,
        InputWindowHandle inputWindowHandle, String reason) {
    AppWindowToken appWindowToken = null;
    WindowState windowState = null;
    boolean aboveSystem = false;
    synchronized (mService.mWindowMap) {
        if (inputWindowHandle != null) {
            windowState = (WindowState) inputWindowHandle.windowState;
            if (windowState != null) {
                appWindowToken = windowState.mAppToken;
            }
        }
        if (appWindowToken == null && inputApplicationHandle != null) {
            appWindowToken = (AppWindowToken)inputApplicationHandle.appWindowToken;
        }
        //输出input事件分发超时log
        if (windowState != null) {
            Slog.i(WindowManagerService.TAG, "Input event dispatching timed out "
                    + "sending to " + windowState.mAttrs.getTitle()
                    + ".  Reason: " + reason);
            int systemAlertLayer = mService.mPolicy.windowTypeToLayerLw(
                    WindowManager.LayoutParams.TYPE_SYSTEM_ALERT);
            aboveSystem = windowState.mBaseLayer > systemAlertLayer;
        } else if (appWindowToken != null) {
            Slog.i(WindowManagerService.TAG, "Input event dispatching timed out "
                    + "sending to application " + appWindowToken.stringName
                    + ".  Reason: " + reason);
        } else {
            Slog.i(WindowManagerService.TAG, "Input event dispatching timed out "
                    + ".  Reason: " + reason);
        }
        mService.saveANRStateLocked(appWindowToken, windowState, reason);
    }

    if (appWindowToken != null && appWindowToken.appToken != null) {
        //【见小节3.6.1】
        boolean abort = appWindowToken.appToken.keyDispatchingTimedOut(reason);
        if (! abort) {
            return appWindowToken.inputDispatchingTimeoutNanos; //5s
        }
    } else if (windowState != null) {
        //【见小节3.6.2】
        long timeout = ActivityManagerNative.getDefault().inputDispatchingTimedOut(
                windowState.mSession.mPid, aboveSystem, reason);
        if (timeout >= 0) {
            return timeout * 1000000L; //5s
        }
    }
    return 0;
}

发生input相关的ANR时在system log输出ANR信息,并且tag为WindowManager. 主要有3类log:

Input event dispatching timed out sending to [windowState.mAttrs.getTitle()]
Input event dispatching timed out sending to application [appWindowToken.stringName)]
Input event dispatching timed out sending.

3.6 DispatchingTimedOut

3.6.1 Token.keyDispatchingTimedOut

[-> ActivityRecord.java :: Token]

final class ActivityRecord {

    static class Token extends IApplicationToken.Stub {

        public boolean keyDispatchingTimedOut(String reason) {
            ActivityRecord r;
            ActivityRecord anrActivity;
            ProcessRecord anrApp;
            synchronized (mService) {
                r = tokenToActivityRecordLocked(this);
                if (r == null) {
                    return false;
                }
                anrActivity = r.getWaitingHistoryRecordLocked();
                anrApp = r != null ? r.app : null;
            }
            //[见小节3.7]
            return mService.inputDispatchingTimedOut(anrApp, anrActivity, r, false, reason);
        }
        ...
    }
}

3.6.2 AMS.inputDispatchingTimedOut

public long inputDispatchingTimedOut(int pid, final boolean aboveSystem, String reason) {
    ...
    ProcessRecord proc;
    long timeout;
    synchronized (this) {
        synchronized (mPidsSelfLocked) {
            proc = mPidsSelfLocked.get(pid); //根据pid查看进程record
        }
        timeout = getInputDispatchingTimeoutLocked(proc);
    }
    //【见小节3.7】
    if (!inputDispatchingTimedOut(proc, null, null, aboveSystem, reason)) {
        return -1;
    }

    return timeout;
}

inputDispatching的超时为KEY_DISPATCHING_TIMEOUT，即timeout = 5s。

3.7 AMS.inputDispatchingTimedOut

public boolean inputDispatchingTimedOut(final ProcessRecord proc,
        final ActivityRecord activity, final ActivityRecord parent,
        final boolean aboveSystem, String reason) {
    ...
    final String annotation;
    if (reason == null) {
        annotation = "Input dispatching timed out";
    } else {
        annotation = "Input dispatching timed out (" + reason + ")";
    }

    if (proc != null) {
        ...
        //通过handler机制，交由“ActivityManager”线程执行ANR处理过程。
        mHandler.post(new Runnable() {
            public void run() {
                appNotResponding(proc, activity, parent, aboveSystem, annotation);
            }
        });
    }
    return true;
}

appNotResponding会输出现场的重要进程的trace等信息。再回到【小节3.2】处理完ANR后再调用resumeAfterTargetsNotReadyTimeoutLocked。

3.8 resumeAfterTargetsNotReadyTimeoutLocked

[-> InputDispatcher.cpp]

void InputDispatcher::resumeAfterTargetsNotReadyTimeoutLocked(nsecs_t newTimeout,
        const sp<InputChannel>& inputChannel) {
    if (newTimeout > 0) {
        //超时时间增加5s
        mInputTargetWaitTimeoutTime = now() + newTimeout;
    } else {
        // Give up.
        mInputTargetWaitTimeoutExpired = true;

        // Input state will not be realistic.  Mark it out of sync.
        if (inputChannel.get()) {
            ssize_t connectionIndex = getConnectionIndexLocked(inputChannel);
            if (connectionIndex >= 0) {
                sp<Connection> connection = mConnectionsByFd.valueAt(connectionIndex);
                sp<InputWindowHandle> windowHandle = connection->inputWindowHandle;

                if (windowHandle != NULL) {
                    const InputWindowInfo* info = windowHandle->getInfo();
                    if (info) {
                        ssize_t stateIndex = mTouchStatesByDisplay.indexOfKey(info->displayId);
                        if (stateIndex >= 0) {
                            mTouchStatesByDisplay.editValueAt(stateIndex).removeWindow(
                                    windowHandle);
                        }
                    }
                }

                if (connection->status == Connection::STATUS_NORMAL) {
                    CancelationOptions options(CancelationOptions::CANCEL_ALL_EVENTS,
                            "application not responding");
                    synthesizeCancelationEventsForConnectionLocked(connection, options);
                }
            }
        }
    }
}

四. input死锁监测机制

4.1 IMS.start

[-> InputManagerService.java]

public void start() {
    ...
    Watchdog.getInstance().addMonitor(this);
    ...
}

InputManagerService实现了Watchdog.Monitor接口, 并且在启动过程将自己加入到了Watchdog线程的monitor队列.

4.2 IMS.monitor

Watchdog便会定时调用IMS.monitor()方法.

public void monitor() {
    synchronized (mInputFilterLock) { }
    nativeMonitor(mPtr);
}

nativeMonitor经过JNI调用,进如如下方法:

static void nativeMonitor(JNIEnv*, jclass, jlong ptr) {
    NativeInputManager* im = reinterpret_cast<NativeInputManager*>(ptr);
    im->getInputManager()->getReader()->monitor(); //见小节4.3
    im->getInputManager()->getDispatcher()->monitor(); //见小节4.4
}

4.3 InputReader.monitor

[-> InputReader.cpp]

void InputReader::monitor() {
    //请求和释放一次mLock,来确保reader没有发生死锁的问题
    mLock.lock();
    mEventHub->wake();
    mReaderIsAliveCondition.wait(mLock);
    mLock.unlock();

    //监测EventHub[见小节4.3.1]
    mEventHub->monitor();
}

获取mLock之后进入Condition类型的wait()方法,等待InputReader线程的loopOnce()中的broadcast()来唤醒.

void InputReader::loopOnce() {
    size_t count = mEventHub->getEvents(timeoutMillis, mEventBuffer, EVENT_BUFFER_SIZE);
    ...
    {
        AutoMutex _l(mLock);
        mReaderIsAliveCondition.broadcast();
        if (count) {
            processEventsLocked(mEventBuffer, count);
        }
    }
    ...
    mQueuedListener->flush();
}

4.3.1 EventHub.monitor

[-> EventHub.cpp]

void EventHub::monitor() {
    //请求和释放一次mLock,来确保reader没有发生死锁的问题
    mLock.lock();
    mLock.unlock();
}

4.4 InputDispatcher

[-> InputDispatcher.cpp]

void InputDispatcher::monitor() {
    mLock.lock();
    mLooper->wake();
    mDispatcherIsAliveCondition.wait(mLock);
    mLock.unlock();
}

获取mLock之后进入Condition类型的wait()方法,等待IInputDispatcher线程的loopOnce()中的broadcast()来唤醒.

void InputDispatcher::dispatchOnce() {
    nsecs_t nextWakeupTime = LONG_LONG_MAX;
    {
        AutoMutex _l(mLock);
        mDispatcherIsAliveCondition.broadcast();
        if (!haveCommandsLocked()) {
            dispatchOnceInnerLocked(&nextWakeupTime);
        }
        if (runCommandsLockedInterruptible()) {
            nextWakeupTime = LONG_LONG_MIN;
        }
    }

    nsecs_t currentTime = now();
    int timeoutMillis = toMillisecondTimeoutDelay(currentTime, nextWakeupTime);
    mLooper->pollOnce(timeoutMillis); //进入epoll_wait
}

4.5 小节

通过将InputManagerService加入到Watchdog的monitor队列,定时监测是否发生死锁. 整个监测过涉及EventHub, InputReader, InputDispatcher, InputManagerService的死锁监测. 监测的原理很简单,通过尝试获取锁并释放锁的方式.

最后, 可通过adb shell dumpsys input来查看手机当前的input状态, 输出内容分别为EventHub.dump(), InputReader.dump(),InputDispatcher.dump()这3类,另外如果发生过input ANR,那么也会输出上一个ANR的状态.

其中mPendingEvent代表的当下正在处理的事件.

五. 总结

5.1 ANR分类

由小节[3.5] InputMonitor.notifyANR完成, 当发生ANR时system log中会出现以下信息, 并且TAG=WindowManager:

Input event dispatching timed out xxx. Reason: + reason, 其中xxx取值:

窗口类型: sending to windowState.mAttrs.getTitle()
应用类型: sending to application appWindowToken.stringName
其他类型: 则为空.

至于Reason主要有以下类型:

5.1.1 reason类型

由小节[2.3.1]checkWindowReadyForMoreInputLocked完成， ANR reason主要有以下几类：

无窗口, 有应用：Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.
窗口暂停: Waiting because the [targetType] window is paused.
窗口未连接: Waiting because the [targetType] window’s input channel is not registered with the input dispatcher. The window may be in the process of being removed.
窗口连接已死亡：Waiting because the [targetType] window’s input connection is [Connection.Status]. The window may be in the process of being removed.
窗口连接已满：Waiting because the [targetType] window’s input channel is full. Outbound queue length: [outboundQueue长度]. Wait queue length: [waitQueue长度].
按键事件，输出队列或事件等待队列不为空：Waiting to send key event because the [targetType] window has not finished processing all of the input events that were previously delivered to it. Outbound queue length: [outboundQueue长度]. Wait queue length: [waitQueue长度].
非按键事件，事件等待队列不为空且头事件分发超时500ms：Waiting to send non-key event because the [targetType] window has not finished processing certain input events that were delivered to it over 500ms ago. Wait queue length: [waitQueue长度]. Wait queue head age: [等待时长].

其中

targetType: 取值为”focused”或者”touched”
Connection.Status: 取值为”NORMAL”，”BROKEN”，”ZOMBIE”

另外, findFocusedWindowTargetsLocked, findTouchedWindowTargetsLocked这两个方法中可以通过实现 updateDispatchStatisticsLocked()来分析anr问题.

5.2 drop事件分类

由小节[2.1.2] dropInboundEventLocked完成，输出事件丢弃的原因：

DROP_REASON_POLICY: “inbound event was dropped because the policy consumed it”;
DROP_REASON_DISABLED: “inbound event was dropped because input dispatch is disabled”;
DROP_REASON_APP_SWITCH: “inbound event was dropped because of pending overdue app switch”;
DROP_REASON_BLOCKED: “inbound event was dropped because the current application is not responding and the user has started interacting with a different application”;
DROP_REASON_STALE: “inbound event was dropped because it is stale”;

其他:

doDispatchCycleFinishedLockedInterruptible的过程, 会记录分发时间超过2s的事件,
findFocusedWindowTargetsLocked的过程, 可以统计等待时长信息.

微信公众号 Gityuan | 微博 weibo.com/gityuan | 博客 留言区交流

一. 概述

1.1 InputReader

1.2 InputDispatcher

1.3 UI Thread

二. ANR处理流程

2.1 onANRLocked

3.2 doNotifyANRLockedInterruptible

3.3 NativeInputManager.notifyANR

3.4 IMS.notifyANR

3.5 InputMonitor.notifyANR

3.6 DispatchingTimedOut

3.6.1 Token.keyDispatchingTimedOut

3.6.2 AMS.inputDispatchingTimedOut

3.7 AMS.inputDispatchingTimedOut

3.8 resumeAfterTargetsNotReadyTimeoutLocked

四. input死锁监测机制

4.1 IMS.start

4.2 IMS.monitor

4.3 InputReader.monitor

4.3.1 EventHub.monitor

4.4 InputDispatcher

4.5 小节

五. 总结

5.1 ANR分类

5.1.1 reason类型

5.2 drop事件分类

CATALOG

标签