EMMC用來做數據記錄儀壽命有問題嗎詳情 - EMMC用來做數據記錄儀壽命有問題嗎,define,低功耗,ci,機器學習,人工智能浪人小風光博客

測試環境：AOSP 7.1.1+Kernel 4.4.17 HW：HiKey

Ubuntu 14.04+Kernel 4.4.0-31

1. Linux內核suspend狀態

Linux內核支持多種類型的睡眠狀態，通過設置不同的模塊進入低功耗模式來達到省電功能。

目前存在四種模式：suspend to idle、power-on standby（Standby）、suspend to ram（STR）和sudpend to disk（Hibernate），分別對應ACPI狀態的S0、S1、S3和S4。

State in Linux	Label	state	ACPI state	註釋
#define PM_SUSPEND_ON ((__force suspend_state_t) 0)				一切正常
#define PM_SUSPEND_FREEZE ((__force suspend_state_t) 1)	freeze	Suspend-to-Idle	S0	凍結進程+掛起設備+CPU空閒
#define PM_SUSPEND_STANDBY ((__force suspend_state_t) 2)	standby	Standby/Power-on Suspend	S1	凍結進程+掛起設備+關閉nonbootCPU
#define PM_SUSPEND_MEM ((__force suspend_state_t) 3)	mem	Suspend-to-RAM	S3	僅保留RAM自刷新
#define PM_SUSPEND_MAX ((__force suspend_state_t) 4)	disk	Suspend-to-disk	S4	關閉所有設備包括RAM，也被稱為Hibernate

從freeze-->standby-->mem睡眠程度越來越深，喚醒花費的時間也越來越多。

Suspend-To-Idle

此狀態包括frozen processes+suspended devices+idle processors，具有輕量化的特點；

並且相對於相對於Idle狀態能節省更多的功耗，因為此時的用户空間被凍結且I/O設備進入了低功耗狀態。

相對於Suspend-To-RAM它具有低延時的優勢。

Standby/Power-On Suspend

此狀態包括frozen processes+suspended devices+offline nonboot CPUs+suspend low-level system，對CPU的處理更近一步。

所以相對於Suspend-To-Idle節省了更多的功耗，但是由於需要恢復CPU和一些底層功能也花費了更多的時間。

Suspend-to-RAM

此狀態使所有的設備進入低功耗狀態，僅保留RAM自刷新。

所有的設備和系統狀態都保存在RAM中，所有外設被掛起。

（在HiKey的實際測試中，boot CPU是沒有關閉的！實際上這裏也沒有standby，mem和standby基本上沒有區別。）

Suspend-to-disk

此狀態是最省功耗的模式。

相對Suspend-to-RAMRAM能節省更多功耗的原因是數據會被寫入磁盤中，RAM也可以被關閉。

但是這也導致了，更多的恢復延時，在resume的時候讀回到RAM，然後在進行系統和設備狀態恢復工作。

但是在一般的嵌入式設備上，此種狀態不支持。

下面用STR表示Suspend to RAM，STI表示Suspend to Idle。

2. Suspend狀態，以及STR 和STI區別

寫入/sys/power/state不同字符串，可以讓系統進入不同睡眠狀態。

針對state sysfs節點的寫入，最終會進入到state_store這個函數，將字符串轉換成上表中不同狀態。

state_store(kernel/power/main.c)
    -->pm_suspend (kernel/power/suspend.c)-------------處理除freeze、standby、mem三種類型suspend
        -->enter_state---------------------------------在進入睡眠之前，做一些準備工作
            -->suspend_devices_and_enter
                -->suspend_enter-----------------------這裏才是freeze與standby/mem區別所在。
    -->hibernate---------------------------------------進入suspend to disk流程

STR和STI的最主要區別就是下面一段代碼：

static int suspend_enter(suspend_state_t state, bool *wakeup)
{

…

    /*
     * PM_SUSPEND_FREEZE equals
     * frozen processes + suspended devices + idle processors.
     * Thus we should invoke freeze_enter() soon after
     * all the devices are suspended.
     */
//====================================FREEZE===============================================================
    if (state == PM_SUSPEND_FREEZE) {------------------------------------如果要進入freeze狀態，就會執行此段代碼。
        trace_suspend_resume(TPS("machine_suspend"), state, true);
        freeze_enter();
        trace_suspend_resume(TPS("machine_suspend"), state, false);
        goto Platform_wake;----------------------------------------------在執行結束跳轉到Platform_wake，中間一段綠色代碼將會被跳過。所以説freeze和standby、mem相比，多了freeze_enter，少了對non-boot CPUs、arch、syscore的操作。
    }
//=====================================MEM===============================================================
    error = disable_nonboot_cpus();
    if (error || suspend_test(TEST_CPUS)) {
        log_suspend_abort_reason("Disabling non-boot cpus failed");
        goto Enable_cpus;
    }

    arch_suspend_disable_irqs();
    BUG_ON(!irqs_disabled());

    error = syscore_suspend();
    if (!error) {
        *wakeup = pm_wakeup_pending();
        if (!(suspend_test(TEST_CORE) || *wakeup)) {
            trace_suspend_resume(TPS("machine_suspend"),
                state, true);
            error = suspend_ops->enter(state);
            trace_suspend_resume(TPS("machine_suspend"),
                state, false);
            events_check_enabled = false;
        } else if (*wakeup) {
            pm_get_active_wakeup_sources(suspend_abort,
                MAX_SUSPEND_ABORT_LEN);
            log_suspend_abort_reason(suspend_abort);
            error = -EBUSY;
        }
        syscore_resume();
    }

    arch_suspend_enable_irqs();
    BUG_ON(irqs_disabled());

Enable_cpus:
    enable_nonboot_cpus();

Platform_wake:
    platform_resume_noirq(state);
    dpm_resume_noirq(PMSG_RESUME);

…

}

3 suspend/resume流程梳理

下面分析一下suspend/resume每個細分階段。

整個suspend可以分為若干階段，每個階段函數—>關鍵節點Trace—>analyze_suspend.py解析Trace—>根據Trace時間畫出Timeline圖表

這樣就可以分析出總的時間差異，每個階段差異，甚至一個設備suspend/resume、一個子系統suspend/resume的時間差異。

analyze_suspend.py 基於默認基於ftrace進行分析（在指定dmesg的時候，會發現缺失了很多log信息，無法生成timeline類型的html文件），將suspend/resume分為若干階段。

下面簡要介紹一下各個階段，然後基於此進行代碼分析。

在kernel版本大於等與3.15之後，解析需要的所有log信息都可以從ftrace中獲取。之前的內核版本還需要藉助於dmesg。

由於使用的kernel版本是4.4.17，sysvals.usetraceeventsonly被置位，所以只會parseTraceLog()。

下表中的各個階段通過解析suspend_resume: XXXXXXX類型的ftrace來獲取。

各子模塊、子系統的解析通過device_pm_callback_start和device_pm_callback_end來截取時間段，以及這時間段內的callgraph。

Phase名稱	ftrace關鍵詞
suspend_prepare	dpm_prepare
suspend	dpm_suspend
suspend_late	dpm_suspend_late
suspend_noirq	dpm_suspend_noirq
suspend_machine	machine_suspend start
resume_machine	machine_suspend end
resume_noirq	dpm_resume_noirq
resume_early	dpm_resume_early
resume	dpm_resume
resume_complete	dpm_complete

下面是一組suspend/resume執行ftrace log，我們將據此進行各階段代碼分析，包括suspend_enter、suspend_prepare、suspend、suspend_late、suspend_noirq、suspend_machine、resume_machine、resume_noirq、resume_early、resume、resume_complete。

從這裏也可以看出freeze和mem/standby除了machine部分不同之外，還少了CPU開關和syscore suspend/resume操作。

suspend_resume: suspend_enter[1] begin suspend_resume: sync_filesystems[0] begin suspend_resume: sync_filesystems[0] end suspend_resume: freeze_processes[0] begin suspend_resume: freeze_processes[0] end suspend_resume: suspend_enter[1] end suspend_resume: dpm_prepare[2] begin suspend_resume: dpm_prepare[2] end suspend_resume: dpm_suspend[2] begin suspend_resume: dpm_suspend[2] end suspend_resume: dpm_suspend_late[2] begin suspend_resume: dpm_suspend_late[2] end suspend_resume: dpm_suspend_noirq[2] begin suspend_resume: dpm_suspend_noirq[2] end No CPU_OFF… No syscore_suspend… suspend_resume: machine_suspend[1] begin suspend_resume: machine_suspend[1] end No suscore_resume… No CPU_ON… suspend_resume: dpm_resume_noirq[16] begin suspend_resume: dpm_resume_noirq[16] end suspend_resume: dpm_resume_early[16] begin suspend_resume: dpm_resume_early[16] end suspend_resume: dpm_resume[16] begin suspend_resume: dpm_resume[16] end suspend_resume: dpm_complete[16] begin suspend_resume: dpm_complete[16] end suspend_resume: resume_console[1] begin suspend_resume: resume_console[1] end suspend_resume: thaw_processes[0] begin suspend_resume: thaw_processes[0] end	suspend_resume: suspend_enter[3] begin suspend_resume: sync_filesystems[0] begin suspend_resume: sync_filesystems[0] end suspend_resume: freeze_processes[0] begin suspend_resume: freeze_processes[0] end suspend_resume: suspend_enter[3] end suspend_resume: dpm_prepare[2] begin suspend_resume: dpm_prepare[2] end suspend_resume: dpm_suspend[2] begin suspend_resume: dpm_suspend[2] end suspend_resume: dpm_suspend_late[2] begin suspend_resume: dpm_suspend_late[2] end suspend_resume: dpm_suspend_noirq[2] begin suspend_resume: dpm_suspend_noirq[2] end suspend_resume: CPU_OFF[1-7] begin/end suspend_resume: syscore_suspend[0] begin/end suspend_resume: machine_suspend[3] begin suspend_resume: machine_suspend[3] end suspend_resume: syscore_resume[0] begin/end suspend_resume: CPU_ON[1-7] begin/end suspend_resume: dpm_resume_noirq[16] begin suspend_resume: dpm_resume_noirq[16] end suspend_resume: dpm_resume_early[16] begin suspend_resume: dpm_resume_early[16] end suspend_resume: dpm_resume[16] begin suspend_resume: dpm_resume[16] end suspend_resume: dpm_complete[16] begin suspend_resume: dpm_complete[16] end suspend_resume: resume_console[3] begin suspend_resume: resume_console[3] end suspend_resume: thaw_processes[0] begin suspend_resume: thaw_processes[0] end

在介紹相關代碼之前，先介紹一下HiKey使用的platform_suspend_ops：

static const struct platform_suspend_ops psci_suspend_ops = { .valid = suspend_valid_only_mem, 僅支持mem類型的suspend .enter = psci_system_suspend_enter, 睡眠的CPU底層支持 };

freeze的platform_freeze_ops如下：

static const struct platform_freeze_ops acpi_freeze_ops = { .begin = acpi_freeze_begin, .prepare = acpi_freeze_prepare, .restore = acpi_freeze_restore, .end = acpi_freeze_end, };

3.1 suspend_enter

enter_state作為suspend/resume的入口點，完成了絕大部分工作。首先確保系統沒有正在進入睡眠狀態；然後為suspend做一些準備，使系統進入睡眠並在喚醒後進行必要清理恢復工作。

下面分析一下suspend之前的準備工作，即suspend_enter階段：

static int enter_state(suspend_state_t state)
{
    int error;

    trace_suspend_resume(TPS("suspend_enter"), state, true);
    if (state == PM_SUSPEND_FREEZE) {--------------------------------------是否是freeze類型suspend
#ifdef CONFIG_PM_DEBUG
        if (pm_test_level != TEST_NONE && pm_test_level <= TEST_CPUS) {
            pr_warning("PM: Unsupported test mode for suspend to idle,"
                   "please choose none/freezer/devices/platform.\n");
            return -EAGAIN;
        }
#endif
    } else if (!valid_state(state)) {-------------------------------------目前只支持mem類型suspend
        return -EINVAL;
    }
    if (!mutex_trylock(&pm_mutex))
        return -EBUSY;

    if (state == PM_SUSPEND_FREEZE)
        freeze_begin();--------------------------------------------------初始化suspend_freeze_state為FREEZE_STATE_NONE

#ifndef CONFIG_SUSPEND_SKIP_SYNC
    trace_suspend_resume(TPS("sync_filesystems"), 0, true);
    printk(KERN_INFO "PM: Syncing filesystems ... ");
    sys_sync();----------------------------------------------------------sync文件系統緩存文件，確保數據sync到硬盤。
    printk("done.\n");
    trace_suspend_resume(TPS("sync_filesystems"), 0, false);
#endif

    pr_debug("PM: Preparing system for sleep (%s)\n", pm_states[state]);
    pm_suspend_clear_flags();
    error = suspend_prepare(state);--------------------------------------注意這裏面的suspend_prepare和下面的suspend_prepare階段容易搞混。
    if (error)
        goto Unlock;

    if (suspend_test(TEST_FREEZER))
        goto Finish;

    trace_suspend_resume(TPS("suspend_enter"), state, false);
    pr_debug("PM: Suspending system (%s)\n", pm_states[state]);
    pm_restrict_gfp_mask();
    error = suspend_devices_and_enter(state);
    pm_restore_gfp_mask();

Finish:
    pr_debug("PM: Finishing wakeup.\n");
    suspend_finish();---------------------------------------------------解凍，重啓進程；發送PM_POST_SUSPEND通知；釋放之前分配的console。
Unlock:
    mutex_unlock(&pm_mutex);
    return error;
}

接着分析一下suspend_prepare函數：

static int suspend_prepare(suspend_state_t state) { int error; 驗證suspend狀態 return -EPERM; 分配一個suspend console 發送PM_SUSPEND_PREPARE通知消息 if (error) goto Finish; trace_suspend_resume(TPS("freeze_processes"), 0, true); error = suspend_freeze_processes(); 凍結進程 trace_suspend_resume(TPS("freeze_processes"), 0, false); if (!error) return 0; suspend_stats.failed_freeze++; dpm_save_failed_step(SUSPEND_FREEZE); Finish: pm_notifier_call_chain(PM_POST_SUSPEND); pm_restore_console(); return error; }

suspend_freeze_process先處理用户空間進程，然後處理內核進程：

static inline int suspend_freeze_processes(void) { int error; 觸發用户空間進程進入freeze狀態。當前進程不會被凍結。因為凍結失敗的進程會自動被解凍，所以不需要進行錯誤處理。 /* * freeze_processes() automatically thaws every task if freezing * fails. So we need not do anything extra upon error. / if (error) return error; 凍結內核線程 / * freeze_kernel_threads() thaws only kernel threads upon freezing * failure. So we have to thaw the userspace tasks ourselves. */ if (error) 由於freeze_kernel_threads凍結失敗，只會解凍內核線程。所以還需要對用户空間進程進行解凍。 thaw_processes(); return error; }

下面的階段都在suspend_devices_and_enter中，可以看出這是一個對稱的流程，每一階段的suspend，都有對應的resume。

int suspend_devices_and_enter(suspend_state_t state) { int error; bool wakeup = false; if (!sleep_state_supported(state)) return -ENOSYS; error = platform_suspend_begin(state); if (error) goto Close; 關閉console子系統，暫停printk打印 suspend_test_start(); error = dpm_suspend_start(PMSG_SUSPEND); suspend_prepare(dpm_prepare)、suspend(dpm_suspend)兩階段 if (error) { pr_err("PM: Some devices failed to suspend, or early wake event detected\n"); log_suspend_abort_reason("Some devices failed to suspend, or early wake event detected"); goto Recover_platform; } suspend_test_finish("suspend devices"); if (suspend_test(TEST_DEVICES)) goto Recover_platform; do { error = suspend_enter(state, &wakeup); suspend_late(dpm_suspend_late)、suspend_noirq(dpm_suspend_noirq)、suspend_machine、resume_machine、resume_noirq(dpm_resume_noirq)、resume_early(dpm_resume_early) Resume_devices: suspend_test_start(); dpm_resume_end(PMSG_RESUME); resume(dpm_resume)、resume_complete(dpm_complete) suspend_test_finish("resume devices"); trace_suspend_resume(TPS("resume_console"), state, true); resume_console(); 打開console子系統，恢復printk打印。 trace_suspend_resume(TPS("resume_console"), state, false); Close: platform_resume_end(state); return error; Recover_platform: platform_recover(state); goto Resume_devices; }

還有必要過一下suspend_enter：

static int suspend_enter(suspend_state_t state, bool wakeup) { char suspend_abort[MAX_SUSPEND_ABORT_LEN]; int error, last_dev; 因為suspend_ops的prepare為空，所以返回0 if (error) goto Platform_finish; suspend_late if (error) { last_dev = suspend_stats.last_failed_dev + REC_FAILED_NUM - 1; last_dev %= REC_FAILED_NUM; printk(KERN_ERR "PM: late suspend of devices failed\n"); log_suspend_abort_reason("%s device failed to power down", suspend_stats.failed_devs[last_dev]); goto Platform_finish; } error = platform_suspend_prepare_late(state); 執行freeze_ops->prepare() if (error) goto Devices_early_resume; suspend_noirq if (error) { last_dev = suspend_stats.last_failed_dev + REC_FAILED_NUM - 1; last_dev %= REC_FAILED_NUM; printk(KERN_ERR "PM: noirq suspend of devices failed\n"); log_suspend_abort_reason("noirq suspend of %s device failed", suspend_stats.failed_devs[last_dev]); goto Platform_early_resume; } error = platform_suspend_prepare_noirq(state); if (error) goto Platform_wake; if (suspend_test(TEST_PLATFORM)) goto Platform_wake; / * PM_SUSPEND_FREEZE equals * frozen processes + suspended devices + idle processors. * Thus we should invoke freeze_enter() soon after * all the devices are suspended. / if (state == PM_SUSPEND_FREEZE) { 這裏是freeze和mem/standy差別所在 trace_suspend_resume(TPS("machine_suspend"), state, true); freeze_enter(); trace_suspend_resume(TPS("machine_suspend"), state, false); goto Platform_wake; } 關閉所有boot-CPU之外的CPU if (error \|\| suspend_test(TEST_CPUS)) { log_suspend_abort_reason("Disabling non-boot cpus failed"); goto Enable_cpus; } arch_suspend_disable_irqs(); BUG_ON(!irqs_disabled()); 執行syscore_ops_list上所有syscore_ops的suspend回調函數 if (!error) { wakeup = pm_wakeup_pending(); 檢查是否需要終止suspend流程？ if (!(suspend_test(TEST_CORE) \|\| wakeup)) { trace_suspend_resume(TPS("machine_suspend"), state, true); error = suspend_ops->enter(state); 調用psci_suspend_ops的enter回調函數，關閉machine trace_suspend_resume(TPS("machine_suspend"), state, false); !!!!!!!!!!!!!!!!這裏即為喚醒之後的執行路徑了!!!!!!!!!!!!!!!!* events_check_enabled = false; } else if (*wakeup) { pm_get_active_wakeup_sources(suspend_abort, MAX_SUSPEND_ABORT_LEN); log_suspend_abort_reason(suspend_abort); error = -EBUSY; } syscore_resume(); 執行所有syscore_ops_list的resume回調函數 } arch_suspend_enable_irqs(); BUG_ON(irqs_disabled()); Enable_cpus: enable_nonboot_cpus(); 打開所有non-boot CPU Platform_wake: platform_resume_noirq(state); dpm_resume_noirq(PMSG_RESUME); resume_noirq Platform_early_resume: platform_resume_early(state); Devices_early_resume: dpm_resume_early(PMSG_RESUME); resume_early Platform_finish: platform_resume_finish(state); return error; }

3.2 suspend_prepare和suspend

DPM是Device Power Management的意思，這些操作都是針對非系統設備（non-sysdev）進行的。那什麼是系統設備呢？下面的machine應該就是所謂的sysdev了。

dpm_prepare實際上就是遍歷dpm_list上的所有設備，執行->prepare回調函數。如果設備存在->prepare回電函數，會將設備的prepare階段打印到ftrace。

int dpm_prepare(pm_message_t state) { int error = 0; trace_suspend_resume(TPS("dpm_prepare"), state.event, true); mutex_lock(&dpm_list_mtx); while (!list_empty(&dpm_list)) { 遍歷dpm_list struct device *dev = to_device(dpm_list.next); get_device(dev); mutex_unlock(&dpm_list_mtx); trace_device_pm_callback_start(dev, "", state.event); error = device_prepare(dev, state); 執行->prepare回調函數 trace_device_pm_callback_end(dev, error); mutex_lock(&dpm_list_mtx); if (error) { if (error == -EAGAIN) { put_device(dev); error = 0; continue; } printk(KERN_INFO "PM: Device %s not prepared " "for power transition: code %d\n", dev_name(dev), error); put_device(dev); break; } dev->power.is_prepared = true; if (!list_empty(&dev->power.entry)) list_move_tail(&dev->power.entry, &dpm_prepared_list); 移動設備到dpm_prepared_list put_device(dev); } mutex_unlock(&dpm_list_mtx); trace_suspend_resume(TPS("dpm_prepare"), state.event, false); return error; }

dpm_suspend遍歷dpm_prepared_list，這點和dpm_prepare有區別。然後執行設備的->suspend回調函數。

int dpm_suspend(pm_message_t state) { ktime_t starttime = ktime_get(); int error = 0; trace_suspend_resume(TPS("dpm_suspend"), state.event, true); cpufreq_suspend(); mutex_lock(&dpm_list_mtx); pm_transition = state; async_error = 0; while (!list_empty(&dpm_prepared_list)) { 基於dpm_prepared_list遍歷設備 struct device *dev = to_device(dpm_prepared_list.prev); get_device(dev); mutex_unlock(&dpm_list_mtx); 執行設備->suspend回調函數 mutex_lock(&dpm_list_mtx); if (error) { pm_dev_err(dev, state, "", error); dpm_save_failed_dev(dev_name(dev)); put_device(dev); break; } if (!list_empty(&dev->power.entry)) list_move(&dev->power.entry, &dpm_suspended_list); 移動設備到dpm_suspended_list put_device(dev); if (async_error) break; } mutex_unlock(&dpm_list_mtx); async_synchronize_full(); if (!error) error = async_error; if (error) { suspend_stats.failed_suspend++; dpm_save_failed_step(SUSPEND_SUSPEND); } else dpm_show_time(starttime, state, NULL); trace_suspend_resume(TPS("dpm_suspend"), state.event, false); return error; }

3.3 suspend_late和suspend_noirq

dpm_suspend_late基於dpm_suspended_list操作設備，所以這也需要函數之間順序執行。

int dpm_suspend_late(pm_message_t state) { ktime_t starttime = ktime_get(); int error = 0; trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true); mutex_lock(&dpm_list_mtx); pm_transition = state; async_error = 0; 遍歷dpm_suspended_list列表 struct device *dev = to_device(dpm_suspended_list.prev); get_device(dev); mutex_unlock(&dpm_list_mtx); 執行->suspend_late回調函數 mutex_lock(&dpm_list_mtx); if (!list_empty(&dev->power.entry)) list_move(&dev->power.entry, &dpm_late_early_list); 移動設備到dpm_late_early_list if (error) { pm_dev_err(dev, state, " late", error); dpm_save_failed_dev(dev_name(dev)); put_device(dev); break; } put_device(dev); if (async_error) break; } mutex_unlock(&dpm_list_mtx); async_synchronize_full(); if (!error) error = async_error; if (error) { suspend_stats.failed_suspend_late++; dpm_save_failed_step(SUSPEND_SUSPEND_LATE); dpm_resume_early(resume_event(state)); } else { dpm_show_time(starttime, state, "late"); } trace_suspend_resume(TPS("dpm_suspend_late"), state.event, false); return error; }

dpm_suspend_noirq基於dpm_late_early_list遍歷所有設備。首先阻止設備驅動接收中斷信息，然後執行->suspend_noirq回調函數。

int dpm_suspend_noirq(pm_message_t state) { ktime_t starttime = ktime_get(); int error = 0; trace_suspend_resume(TPS("dpm_suspend_noirq"), state.event, true); cpuidle_pause(); 暫停cpuidle功能，退出idle的CPU device_wakeup_arm_wake_irqs(); 將具有wakeirq的設備設置成wakeup resource suspend_device_irqs(); 關閉當前所有能夠關閉的irq，置成IRQS_SUSPENDED。IRQF_NO_SUSPEND類型的wakeup中斷不能被關閉，並且作為wakeup喚醒源的中斷不能被關閉。 mutex_lock(&dpm_list_mtx); pm_transition = state; async_error = 0; while (!list_empty(&dpm_late_early_list)) { struct device *dev = to_device(dpm_late_early_list.prev); get_device(dev); mutex_unlock(&dpm_list_mtx); 調用->suspend_noirq回調函數 mutex_lock(&dpm_list_mtx); if (error) { pm_dev_err(dev, state, " noirq", error); dpm_save_failed_dev(dev_name(dev)); put_device(dev); break; } if (!list_empty(&dev->power.entry)) list_move(&dev->power.entry, &dpm_noirq_list); 移動設備到dpm_noirq_list put_device(dev); if (async_error) break; } mutex_unlock(&dpm_list_mtx); async_synchronize_full(); if (!error) error = async_error; if (error) { suspend_stats.failed_suspend_noirq++; dpm_save_failed_step(SUSPEND_SUSPEND_NOIRQ); dpm_resume_noirq(resume_event(state)); } else { dpm_show_time(starttime, state, "noirq"); } trace_suspend_resume(TPS("dpm_suspend_noirq"), state.event, false); return error; }

3.4 suspend_machine和resume_machine

freeze和mem/standby在這部分是不同的。

mem/standby直接調用suspend_ops->enter進入對應的睡眠模式。

而freeze就要稍微複雜了：

static void freeze_enter(void) { spin_lock_irq(&suspend_freeze_lock); if (pm_wakeup_pending()) 檢查是否有wakeup信號在處理，如果有則退出當前流程。 goto out; suspend_freeze_state = FREEZE_STATE_ENTER; spin_unlock_irq(&suspend_freeze_lock); get_online_cpus(); cpuidle_resume(); 允許使用cpuidle /* Push all the CPUs into the idle loop. / wake_up_all_idle_cpus(); 強制所有CPU退出idle狀態 pr_debug("PM: suspend-to-idle\n"); / Make the current CPU wait so it can enter the idle loop too. / wait_event(suspend_freeze_wait_head, suspend_freeze_state == FREEZE_STATE_WAKE); 等待FREEZE_STATE_WAKE事件，進入idle loop pr_debug("PM: resume from suspend-to-idle\n"); !!!!!!!!!!!!!!!!這裏即為喚醒之後的執行路徑了!!!!!!!!!!!!!!!!* cpuidle_pause(); 暫停使用cpuidle put_online_cpus(); spin_lock_irq(&suspend_freeze_lock); out: suspend_freeze_state = FREEZE_STATE_NONE; spin_unlock_irq(&suspend_freeze_lock); }

3.5 resume_noirq

執行dpm_noirq_list上設備的resume_noirq回調函數。

void dpm_resume_noirq(pm_message_t state) { struct device dev; ktime_t starttime = ktime_get(); trace_suspend_resume(TPS("dpm_resume_noirq"), state.event, true); mutex_lock(&dpm_list_mtx); pm_transition = state; / * Advanced the async threads upfront, * in case the starting of async threads is * delayed by non-async resuming devices. */ list_for_each_entry(dev, &dpm_noirq_list, power.entry) { reinit_completion(&dev->power.completion); if (is_async(dev)) { get_device(dev); async_schedule(async_resume_noirq, dev); } } 遍歷dpm_noirq_list dev = to_device(dpm_noirq_list.next); get_device(dev); list_move_tail(&dev->power.entry, &dpm_late_early_list); 移動設備到下一級dpm_late_early_list if (!is_async(dev)) { int error; error = device_resume_noirq(dev, state, false); if (error) { suspend_stats.failed_resume_noirq++; dpm_save_failed_step(SUSPEND_RESUME_NOIRQ); dpm_save_failed_dev(dev_name(dev)); pm_dev_err(dev, state, " noirq", error); } } mutex_lock(&dpm_list_mtx); put_device(dev); } mutex_unlock(&dpm_list_mtx); async_synchronize_full(); dpm_show_time(starttime, state, "noirq"); resume_device_irqs(); device_wakeup_disarm_wake_irqs(); cpuidle_resume(); trace_suspend_resume(TPS("dpm_resume_noirq"), state.event, false); }

3.6 resume_early

執行前述dpm_late_early_list設備的resume_early回調函數，移動設備到dpm_suspended_list列表。

void dpm_resume_early(pm_message_t state) { struct device dev; ktime_t starttime = ktime_get(); trace_suspend_resume(TPS("dpm_resume_early"), state.event, true); mutex_lock(&dpm_list_mtx); pm_transition = state; / * Advanced the async threads upfront, * in case the starting of async threads is * delayed by non-async resuming devices. */ list_for_each_entry(dev, &dpm_late_early_list, power.entry) { reinit_completion(&dev->power.completion); if (is_async(dev)) { get_device(dev); async_schedule(async_resume_early, dev); } } while (!list_empty(&dpm_late_early_list)) { dev = to_device(dpm_late_early_list.next); get_device(dev); list_move_tail(&dev->power.entry, &dpm_suspended_list); mutex_unlock(&dpm_list_mtx); if (!is_async(dev)) { int error; error = device_resume_early(dev, state, false); if (error) { suspend_stats.failed_resume_early++; dpm_save_failed_step(SUSPEND_RESUME_EARLY); dpm_save_failed_dev(dev_name(dev)); pm_dev_err(dev, state, " early", error); } } mutex_lock(&dpm_list_mtx); put_device(dev); } mutex_unlock(&dpm_list_mtx); async_synchronize_full(); dpm_show_time(starttime, state, "early"); trace_suspend_resume(TPS("dpm_resume_early"), state.event, false); }

3.7 resume

執行所有dpm_suspended_list上設備的resume回調函數。

void dpm_resume(pm_message_t state) { struct device *dev; ktime_t starttime = ktime_get(); trace_suspend_resume(TPS("dpm_resume"), state.event, true); mutex_lock(&dpm_list_mtx); pm_transition = state; async_error = 0; list_for_each_entry(dev, &dpm_suspended_list, power.entry) { reinit_completion(&dev->power.completion); if (is_async(dev)) { get_device(dev); async_schedule(async_resume, dev); } } while (!list_empty(&dpm_suspended_list)) { dev = to_device(dpm_suspended_list.next); get_device(dev); if (!is_async(dev)) { int error; mutex_unlock(&dpm_list_mtx); error = device_resume(dev, state, false); if (error) { suspend_stats.failed_resume++; dpm_save_failed_step(SUSPEND_RESUME); dpm_save_failed_dev(dev_name(dev)); pm_dev_err(dev, state, "", error); } mutex_lock(&dpm_list_mtx); } if (!list_empty(&dev->power.entry)) list_move_tail(&dev->power.entry, &dpm_prepared_list); put_device(dev); } mutex_unlock(&dpm_list_mtx); async_synchronize_full(); dpm_show_time(starttime, state, NULL); cpufreq_resume(); trace_suspend_resume(TPS("dpm_resume"), state.event, false); }

3.8 resume_complete

執行所有dpm_prepared_list上設備的complete回調函數。至此dpm_complete結束所有非系統設備的睡眠。

void dpm_complete(pm_message_t state) { struct list_head list; trace_suspend_resume(TPS("dpm_complete"), state.event, true); might_sleep(); INIT_LIST_HEAD(&list); mutex_lock(&dpm_list_mtx); while (!list_empty(&dpm_prepared_list)) { struct device *dev = to_device(dpm_prepared_list.prev); get_device(dev); dev->power.is_prepared = false; list_move(&dev->power.entry, &list); mutex_unlock(&dpm_list_mtx); trace_device_pm_callback_start(dev, "", state.event); device_complete(dev, state); trace_device_pm_callback_end(dev, 0); mutex_lock(&dpm_list_mtx); put_device(dev); } list_splice(&list, &dpm_list); mutex_unlock(&dpm_list_mtx); trace_suspend_resume(TPS("dpm_complete"), state.event, false); }

4 如何讓HiKey進入STR/STI並喚醒？

可以通過配置GPIO作為喚醒源，或者通過RTC作為喚醒源，延時一定時間來喚醒。

檢查是否存在/sys/class/rtc/rtc0/wakealarm，入不存在則需要打開CONFIG_RTC_DRV_PL031。

寫入wakealarm的參數，表示在多少秒之後resume喚醒，退出suspend。

寫mem進入state，是系統進入suspend流程。

adb root && adb remount adb shell "echo +10 > /sys/class/rtc/rtc0/wakealarm && echo mem > /sys/power/state"

5. suspend/resume的latency分析手段

5.1 analyze_suspend.py v3.0

在kernel的scripts中，這個工具可以幫助內核和OS開發者優化suspend/resume時間。

在打開一系列內核選項之後，此工具就可以執行suspend操作，然後抓取dmesg和ftrace數據知道resume結束。

這些數據會按照時間線顯示每個設備，並且顯示佔用最多suspend/resume時間的設備或者子系統的調用關係詳圖。

執行工具後，會根據時間生成一個子目錄，裏面包含：html、dmesg和原始ftrace文件。

下面簡單看一下工具選項：

Options: [general] -h Print this help text -v Print the current tool version -verbose Print extra information during execution and analysis -status Test to see if the system is enabled to run this tool -modes List available suspend modes 顯示當前支持的suspend模式 -m mode Mode to initiate for suspend ['freeze', 'mem', 'disk'] (default: mem) 設置進入何種模式的suspend -rtcwake t Use rtcwake to autoresume after <t> seconds (default: disabled) 使用rtc來喚醒，參數是間隔時間 [advanced] -f Use ftrace to create device callgraphs (default: disabled) 基於ftrace生成調用關係圖 -filter "d1 d2 ..." Filter out all but this list of dev names -x2 Run two suspend/resumes back to back (default: disabled) -x2delay t Minimum millisecond delay <t> between the two test runs (default: 0 ms) -postres t Time after resume completion to wait for post-resume events (default: 0 S) -multi n d Execute <n> consecutive tests at <d> seconds intervals. The outputs will be created in a new subdirectory with a summary page. [utilities] -fpdt Print out the contents of the ACPI Firmware Performance Data Table -usbtopo Print out the current USB topology with power info -usbauto Enable autosuspend for all connected USB devices [android testing] -adb binary Use the given adb binary to run the test on an android device. 參數需要給出adb路徑，工具就會對Android設備進行測試，並將結果pull出來。有一點需要注意，在此之前確保adb具有root權限。 The device should already be connected and with root access. Commands will be executed on the device using "adb shell" [re-analyze data from previous runs] 針對之前測試數據重新分析 -ftrace ftracefile Create HTML output using ftrace input -dmesg dmesgfile Create HTML output using dmesg (not needed for kernel >= 3.15) -summary directory Create a summary of all test in this dir

在瞭解了工具使用方法之後，就可以進行相關測試了。

5.1.1 Android

./analysze_suspend.py –modes –adb /usr/bin/adb獲取當前系統支持的suspend狀態。

['freeze', 'mem']

1.Android上測試STR，suspend/resume共5次，每次間隔20秒。

./analyze_suspend.py -adb /usr/bin/adb -rtcwake 10 -multi 5 20 -f -m mem

2.Android上測試STI，suspend/resume共10次，每次間隔5秒。

./analyze_suspend.py -adb /usr/bin/adb -rtcwake 10 -multi 5 20 -f -m freeze

測試結果可以在如下獲得：

https://github.com/arnoldlu/common-use/tree/master/tools/analyze_suspend/hikey_test

存在的問題：analyze_suspend.py不支持Android的rtcwakeup和callgraph。已經在如下fix：

https://github.com/arnoldlu/common-use/blob/master/tools/analyze_suspend/analyze_suspend.py

5.1.1.1 總體對比

下面是HiKey上測試結果，可以看出兩個數據都不夠穩定。mem的suspend和resume平均值都比較高。

freeze相比mem的suspend/resume平均值提高了304.3ms/613.5ms。

EMMC用來做數據記錄儀壽命有問題嗎_EMMC用來做數據記錄儀壽命有問題嗎

EMMC用來做數據記錄儀壽命有問題嗎_#define_02

5.1.1.2 是否suspend CPU

對比如下兩幅圖，明顯看出mem類型的suspend關閉了除CPU0之外的所有CPU；而freeze則沒有關閉任何CPU。

non-boot CPUs的suspend/resume時間就達到300ms/200ms。

EMMC用來做數據記錄儀壽命有問題嗎_ci_03

EMMC用來做數據記錄儀壽命有問題嗎_ci_04

同時從log中也可以看出mem和freeze的主要區別就在於是否disabling/enabling non-boot CPU。其他設備和子系統的suspend/resume時間基本一致。

同時還可以看出mem的suspend後，系統的timestamp是停止的；而freeze的timestamp還是一直在運行的。可以得出freeze狀態持續的時間。

因為先寫rtcwake為10s，然後進入睡眠，再喚醒，所以freeze時間是小於10s的。

[ 3385.642962] PM: suspend entry 1970-01-01 00:57:30.580909763 UTC [ 3385.649165] PM: Syncing filesystems ... done. [ 3385.661349] Freezing user space processes ... [ 3385.671207] dwc2 f72c0000.usb: dwc2_hsotg_ep_stop_xfr: timeout DOEPCTL.EPDisable [ 3385.678933] dwc2 f72c0000.usb: GINNakEff triggered [ 3385.685718] (elapsed 0.019 seconds) done. [ 3385.689860] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. [ 3385.700092] Suspending console(s) (use no_console_suspend to debug) [ 3385.736020] PM: suspend of devices complete after 27.195 msecs [ 3385.740811] PM: late suspend of devices complete after 4.765 msecs [ 3385.743919] PM: noirq suspend of devices complete after 3.090 msecs Disabling and Enabling non-boot CPUs [ 3386.209126] PM: noirq resume of devices complete after 1.865 msecs [ 3386.212066] PM: early resume of devices complete after 2.460 msecs [ 3386.234729] mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31) [ 3386.311480] mmc_host mmc0: Bus speed (slot 0) = 51756522Hz (slot req 52000000Hz, actual 51756522HZ div = 0) [ 3386.410411] mmc_host mmc2: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31) [ 3386.458232] mmc_host mmc2: Bus speed (slot 0) = 24800000Hz (slot req 25000000Hz, actual 24800000HZ div = 0) [ 3386.458729] PM: resume of devices complete after 246.646 msecs [ 3386.818770] Restarting tasks ... [ 3386.827026] done. [ 3386.844139] PM: suspend exit 1970-01-01 00:57:40.624589167 UTC	[ 3471.760265] PM: Syncing filesystems ... done. [ 3471.771897] Freezing user space processes ... [ 3471.780407] dwc2 f72c0000.usb: dwc2_hsotg_ep_stop_xfr: timeout DOEPCTL.EPDisable [ 3471.788105] dwc2 f72c0000.usb: GINNakEff triggered [ 3471.794916] (elapsed 0.018 seconds) done. [ 3471.799078] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. [ 3471.809320] Suspending console(s) (use no_console_suspend to debug) [ 3471.847947] PM: suspend of devices complete after 29.905 msecs [ 3471.852473] PM: late suspend of devices complete after 4.497 msecs [ 3471.855611] PM: noirq suspend of devices complete after 3.120 msecs [ 3481.034722] PM: noirq resume of devices complete after 1.945 msecs [ 3481.037992] PM: early resume of devices complete after 2.694 msecs [ 3481.062803] mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31) [ 3481.137795] mmc_host mmc0: Bus speed (slot 0) = 51756522Hz (slot req 52000000Hz, actual 51756522HZ div = 0) [ 3481.234796] mmc_host mmc2: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31) [ 3481.278601] mmc_host mmc2: Bus speed (slot 0) = 24800000Hz (slot req 25000000Hz, actual 24800000HZ div = 0) [ 3481.279396] PM: resume of devices complete after 241.388 msecs [ 3481.358513] Restarting tasks ... done. [ 3481.377766] PM: suspend exit 1970-01-01 00:59:15.332218333 UTC

5.1.1.3 resume_console節省時間

對比resume_console可以發現，mem要比freeze多210ms。

5.1.2 Ubuntu

此工具在Ubuntu上顯示了更強大的功能。

支持了callgraph功能之後，更能清晰地分析每個設備或者子系統的suspend/resume佔用的時間。

sudo ./analyze_suspend.py -rtcwake 10 -multi 5 20 -f -m mem sudo ./analyze_suspend.py -rtcwake 10 -multi 5 20 -f -m freeze

在對比兩種不同suspend模式後，發現freeze花費的時間要比mem少。這也符合預期，但是沒有功耗數據?_?。

EMMC用來做數據記錄儀壽命有問題嗎_#define_05

EMMC用來做數據記錄儀壽命有問題嗎_EMMC用來做數據記錄儀壽命有問題嗎_06

下面着重分析一下如何基於此工具分析。

5.1.3 工具界面總體分析

最上面顯示Kernel Suspend Time和Kernel Resume Time，可以從總體上查看是否有回退或者進步。

再下面是一些縮放按鈕。

然後就是基於timeline的圖表，比對顏色示意圖，可以清晰看出suspend prepare、suspend、suspend late、suspend irq、suspend machine、resume machine、resume irq、resume early、resume和resume complete的分佈。

最下面是每個模塊、子系統的詳細函數調用圖以及開始時間、消耗時間。

EMMC用來做數據記錄儀壽命有問題嗎_低功耗_07

5.1.4 子系統、模塊詳細分析

選中一個模塊，會在最下面顯示詳細的模塊在suspend/resume各個階段消費的時間，以及函數調用關係圖。

EMMC用來做數據記錄儀壽命有問題嗎_ci_08

5.1.5 縮放查看細節

ZOOM IN放大，ZOOMOUT縮小，ZOOM 1:1恢復原始尺寸。

通過在timeline圖表，放大可以查看到更小的模塊消耗的時間。從宏觀到模塊，再到函數消耗時間，逐步細化，很有利於分析。

EMMC用來做數據記錄儀壽命有問題嗎_#define_09

如果發現某個函數佔用時間較大，可以逐級展開。知道發現最終佔用較大的函數，發現問題所在。

EMMC用來做數據記錄儀壽命有問題嗎_低功耗_10

5.1.6 工具代碼分析

首先從入口main開始，和大多數工具一樣開始都是解析命令選項，打印help信息；將所有的待測對象參數和測試參數保存在sysvals；

# ----------------- MAIN -------------------- # exec start (skipped if script is loaded as library) if __name__ == '__main__': cmd = '' cmdarg = '' multitest = {'run': False, 'count': 0, 'delay': 0} # loop through the command line arguments args = iter(sys.argv[1:]) for arg in args: … # just run a utility command and exit if(cmd != ''): if(cmd == 'status'): statusCheck() elif(cmd == 'fpdt'): if(sysvals.android): doError('cannot read FPDT on android device', False) getFPDT(True) elif(cmd == 'usbtopo'): if(sysvals.android): doError('cannot read USB topology '+\ 'on an android device', False) detectUSB(True) elif(cmd == 'modes'): modes = getModes() print modes elif(cmd == 'usbauto'): setUSBDevicesAuto() elif(cmd == 'summary'): print("Generating a summary of folder \"%s\"" % cmdarg) runSummary(cmdarg, True) sys.exit() # run test on android device if(sysvals.android): 註釋此段代碼可以在Android上支持callgraph #if(sysvals.usecallgraph): # doError('ftrace (-f) is not yet supported '+\ # 'in the android kernel', False) if(sysvals.notestrun): doError('cannot analyze test files on the '+\ 'android device', False) # if instructed, re-analyze existing data files if(sysvals.notestrun): 分析已有數據文件，不需要重新測試 rerunTest() sys.exit() # verify that we can run a test if(not statusCheck()): 檢查測試條件是否滿足 print('Check FAILED, aborting the test run!') sys.exit() 連續多次測試 # run multiple tests in a separte subdirectory s = 'x%d' % multitest['count'] subdir = datetime.now().strftime('suspend-'+s+'-%m%d%y-%H%M%S') os.mkdir(subdir) for i in range(multitest['count']): if(i != 0): print('Waiting %d seconds...' % (multitest['delay'])) time.sleep(multitest['delay']) print('TEST (%d/%d) START' % (i+1, multitest['count'])) runTest(subdir) 進行單次測試 print('TEST (%d/%d) COMPLETE' % (i+1, multitest['count'])) runSummary(subdir, False) 生成summary.html else: # run the test in the current directory runTest(".")

sysvals.android表示是否在Android設備進行測試。

sysvals.usecallgraph表示是否生成函數調用關係圖。

sysvals.rtcwake表示是否使用rtc進行喚醒。

針對Ubuntu之類的host設備，測試進行的很順利。但是針對Android設備，在callgraph還存在一點問題。

run_Test無疑作為核心，收集log信息（ftrace、dmesg），執行suspend/resume，生成輸出文件（txt、html）。

def runTest(subdir): global sysvals # prepare for the test if(not sysvals.android): 針對不同的待測設備，初始化ftrace initFtrace() else: initFtraceAndroid() sysvals.initTestOutput(subdir) 生成輸出目錄，輸出文件名等。 vprint('Output files:\n %s' % sysvals.dmesgfile) if(sysvals.usecallgraph or sysvals.usetraceevents or sysvals.usetraceeventsonly): vprint(' %s' % sysvals.ftracefile) vprint(' %s' % sysvals.htmlfile) 執行測試，實際上命令內容基本一致。只是針對Android設備，增加了adb shell '…'。 if(not sysvals.android): executeSuspend() else: executeAndroidSuspend() # analyze the data and create the html output print('PROCESSING DATA') if(sysvals.usetraceeventsonly): 3.15之後的版本，只需要通過ftrace即可獲取足夠信息。之前的版本的數據都存在dmesg中。 # data for kernels 3.15 or newer is entirely in ftrace testruns = parseTraceLog() else: # data for kernels older than 3.15 is primarily in dmesg testruns = loadKernelLog() for data in testruns: parseKernelLog(data) if(sysvals.usecallgraph or sysvals.usetraceevents): appendIncompleteTraceLog(testruns) createHTML(testruns) 根據解析的數據生成html矢量圖表

executeAndroidSuspend在Android設備上操作sysfs節點來配置ftrace，抓取log，suspend/resume，然後將log拉到主機。

def executeAndroidSuspend(): global sysvals # check to see if the display is currently off tp = sysvals.tpath out = os.popen(sysvals.adb+\ ' shell dumpsys power \| grep mScreenOn').read().strip() # if so we need to turn it on so we can issue a new suspend if(out.endswith('false')): print('Waking the device up for the test...') # send the KEYPAD_POWER keyevent to wake it up os.system(sysvals.adb+' shell input keyevent 26') # wait a few seconds so the user can see the device wake up time.sleep(3) # execute however many s/r runs requested for count in range(1,sysvals.execcount+1): # clear the kernel ring buffer just as we start os.system(sysvals.adb+' shell dmesg -c > /dev/null 2>&1') 清空dmesg # start ftrace if(sysvals.usetraceevents): print('START TRACING') os.system(sysvals.adb+" shell 'echo 1 > "+tp+"tracing_on'") 開始ftrace抓取 # initiate suspend for count in range(1,sysvals.execcount+1): if(sysvals.usetraceevents): os.system(sysvals.adb+\ " shell 'echo SUSPEND START > "+tp+"trace_marker'") 寫SUSPEND START到ftrace，作為開始標記。後面解析log，會以此為標記。 if(sysvals.rtcwake): print('SUSPEND START') print('will autoresume in %d seconds' % sysvals.rtcwaketime) os.system(sysvals.adb+" shell 'echo +%d > /sys/class/rtc/rtc0/wakealarm'"%(sysvals.rtcwaketime)) 設置wakeup resource else: print('SUSPEND START (press a key to resume)') os.system(sysvals.adb+" shell 'echo "+sysvals.suspendmode+\ " > "+sysvals.powerfile+"'") 進入suspend，之後就是resume # execution will pause here, then adb will exit while(True): 輪詢adb shell pwd判斷設備是否被喚醒 check = os.popen(sysvals.adb+\ ' shell pwd 2>/dev/null').read().strip() if(len(check) > 0): break time.sleep(1) if(sysvals.usetraceevents): os.system(sysvals.adb+" shell 'echo RESUME COMPLETE > "+tp+\ "trace_marker'") 寫RESUME COMPLETE到ftrace，作為結束標記 # return from suspend print('RESUME COMPLETE') # stop ftrace if(sysvals.usetraceevents): os.system(sysvals.adb+" shell 'echo 0 > "+tp+"tracing_on'") 關閉ftrace功能 print('CAPTURING TRACE') os.system('echo "'+sysvals.teststamp+'" > '+sysvals.ftracefile) os.system(sysvals.adb+' shell cat '+tp+\ 'trace >> '+sysvals.ftracefile) 將/sys/kernel/debug/tracing/trace內容保存到本地log # grab a copy of the dmesg output print('CAPTURING DMESG') os.system('echo "'+sysvals.teststamp+'" > '+sysvals.dmesgfile) os.system(sysvals.adb+' shell dmesg >> '+sysvals.dmesgfile) 將dmesg保存到本地

parseTraceLog用於解析ftrace log，phase的判斷是依據suspend_resume關鍵詞；每個模塊的開始結束是以device_pm_callback_start/device_pm_callback_end作為判斷；還調用FTraceCallGraph進行函數調用關係的解析。

createHTML是這個工具真正NB的地方，對parseTraceLog結果進行了可視化，生成可縮放、查看細節的html文件。

6 對工具的改進

雖然工具非常強大，但是在使用中還是有一些視角沒有覆蓋到。所以做了一些改進。

在Android上使能rtcwake；在Android上使能callgraph；針對多次測試生成csv比較不同phase消耗時間，比summary.html更細化；這對每次測試給出Phase時間和每個Phase內Device消耗時間。

6.1 Android上使能rtcwake

https://github.com/arnoldlu/common-use/commit/a862d8c2a4f9bd005c516c6b61b394386b882217

可以在Android上使用rtc作為喚醒源，可以在沒有實體按鍵的設備上進行測試。

6.2 Android上使能callgraph

https://github.com/arnoldlu/common-use/commit/f8e288753a472cf48ccc0e9d7ffc67978c7d165e

如果沒有callgraph只能顯示Phase級別的信息，不能顯示每個device的信息以及內部函數耗費的時間。

6.3 單次測試summary結果

https://github.com/arnoldlu/common-use/commit/53c270669bb0dfaada53e29852999d5367ec65da

在每次測試目錄下，生成一個summary_phase_dev.csv文件。可以直觀的看到不同Phase、不同device消耗的時間。

EMMC用來做數據記錄儀壽命有問題嗎_低功耗_11

如果想要發現那個模塊消耗最大時間，可以使用Excel的Filter功能。比如想看suspend_prepare下Device消耗時間有大到小排列。

這樣就可以找出每個Phase中消耗資源大户。

EMMC用來做數據記錄儀壽命有問題嗎_ci_12

6.4 多次測試summary結果

https://github.com/arnoldlu/common-use/commit/d162c4827a0cdc50fe94d3f1303af682b387dc3d

生成summary_phase.csv文件，按每次測試的不同phase顯示耗費時間。

可以比較不同測試phase的時間耗費，看出哪一個phase存在回退現象。

EMMC用來做數據記錄儀壽命有問題嗎_低功耗_13

6.5 suspend/resume起止時間點判斷

analyze_suspend.py在解析log的時候，以SUSPEND START作為起點，以RESUME COMPLETE為終點。

在發送SUSPEND START之後，觸發suspend動作。在這期間，如果host存在一定搶佔，會增加suspend時間。

然後通poll設備的adb狀態，來判斷是否resume。一方面，adb可用狀態要在resume結束之後，另一方面，在最壞的情況下，可能存在1s的誤差，這對於毫秒級的resume來説是非常嚴重的一個結果。

最後發送RESUME COMPLETE作為結束。

if(sysvals.usetraceevents): os.system(sysvals.adb+\ " shell 'echo SUSPEND START > "+tp+"trace_marker'") print('SUSPEND START (press a key on the device to resume)') os.system(sysvals.adb+" shell 'echo "+sysvals.suspendmode+\ " > "+sysvals.powerfile+"'") # execution will pause here, then adb will exit while(True): check = os.popen(sysvals.adb+\ ' shell pwd 2>/dev/null').read().strip() if(len(check) > 0): break time.sleep(1) if(sysvals.usetraceevents): os.system(sysvals.adb+" shell 'echo RESUME COMPLETE > "+tp+\ "trace_marker'")

更好的方式是在enter_state的開頭結尾加ftrace，然後解析的時候以此為標記。

@@ -486,6 +496,7 @@ static int enter_state(suspend_state_t state) { int error; + trace_suspend_resume(TPS("enter_state"), state, true); trace_suspend_resume(TPS("suspend_enter"), state, true); if (state == PM_SUSPEND_FREEZE) { #ifdef CONFIG_PM_DEBUG @@ -532,6 +543,7 @@ static int enter_state(suspend_state_t state) suspend_finish(); Unlock: mutex_unlock(&pm_mutex); + trace_suspend_resume(TPS("enter_state"), state, false); return error; }

7 分析步驟

本着從宏觀到微觀的進階，一步步分找出可以優化的點。

下面是從開始一次測試到每次測試到suspend/resume不同phase，再到每個phase裏面device callback的關係。

EMMC用來做數據記錄儀壽命有問題嗎_#define_14

下面是每一次正常suspend/resume的流程，之前每個階段函數分析也可以看出他們的對稱關係。

EMMC用來做數據記錄儀壽命有問題嗎_低功耗_15

在修改了工具對於suspend和resume時間判斷的bug過後，得到了一組的數據。

EMMC用來做數據記錄儀壽命有問題嗎_低功耗_16

分析一下穩定性，均方差比較小，還算比較穩定。數據穩定之後，就可以進行詳細分析了。

EMMC用來做數據記錄儀壽命有問題嗎_低功耗_17

下面查看每次測試的每個phase數據，可以看出每個phase數據的穩定性，以及每個phase費時佔比。找出費時大户，suspend_prepare、suspend、suspend_machine、resume_machine、resume、resume_complete。

EMMC用來做數據記錄儀壽命有問題嗎_ci_18

針對上述六個phase，列出Top 10設備或者子系統。

從下圖可以看出，freeze_processes、sync_filesystems、mmc0、mmc2、CUP0~7、resume_console、tsensor是需要重點分析的設備。

EMMC用來做數據記錄儀壽命有問題嗎_EMMC用來做數據記錄儀壽命有問題嗎_19

EMMC用來做數據記錄儀壽命有問題嗎_#define_20

EMMC用來做數據記錄儀壽命有問題嗎_#define_21

EMMC用來做數據記錄儀壽命有問題嗎_#define_22

EMMC用來做數據記錄儀壽命有問題嗎_ci_23

EMMC用來做數據記錄儀壽命有問題嗎_ci_24

不區分phase列出Top 30如下，下面逐一分析可優化的空間。

EMMC用來做數據記錄儀壽命有問題嗎_低功耗_25

7.1 resume_console

adb shell 'echo N > /sys/module/printk/parameters/console_suspend' adb shell 'cat /sys/module/printk/parameters/console_suspend'

先看一下resume_console流程函數：

void resume_console(void) { if (!console_suspend_enabled) return; down_console_sem(); 獲取console_sem和console_lock_dep_map console_suspended = 0; console_unlock(); }

通過分析ftrace發現，主要時間消耗在console_unlock中。因為在console_lock被佔用期間，有相當一部分由printk緩存的log。所以在釋放鎖之前需要將其處理掉。

void console_unlock(void) { static char ext_text[CONSOLE_EXT_LOG_MAX]; static char text[LOG_LINE_MAX + PREFIX_MAX]; static u64 seen_seq; unsigned long flags; bool wake_klogd = false; bool do_cond_resched, retry; trace_console_lock("console_unlock start", strlen("console_unlock start"));\ if (console_suspended) { up_console_sem(); return; } /* * Console drivers are called under logbuf_lock, so * @console_may_schedule should be cleared before; however, we may * end up dumping a lot of lines, for example, if called from * console registration path, and should invoke cond_resched() * between lines if allowable. Not doing so can cause a very long * scheduling stall on a slow console leading to RCU stall and * softlockup warnings which exacerbate the issue with more * messages practically incapacitating the system. / do_cond_resched = console_may_schedule; console_may_schedule = 0; / flush buffered message fragment immediately to console / console_cont_flush(text, sizeof(text)); again: for (;;) { 如果默認的LOGLEVEL定的比較高，即優先級低，則會有相當多的log需要打印。佔用很多時間。 … } console_locked = 0; / Release the exclusive_console once it is used / if (unlikely(exclusive_console)) exclusive_console = NULL; raw_spin_unlock(&logbuf_lock); 釋放console_sem和console_lock_dep_map / * Someone could have filled up the buffer again, so re-check if there's * something to flush. In case we cannot trylock the console_sem again, * there's a new owner and the console_unlock() from them will do the * flush, no worries. */ raw_spin_lock(&logbuf_lock); retry = console_seq != log_next_seq; raw_spin_unlock_irqrestore(&logbuf_lock, flags); if (retry && console_trylock()) goto again; if (wake_klogd) wake_up_klogd(); trace_console_lock("console_unlock end", strlen("console_unlock end"));\ }

那麼問題就變得簡單了，減少printk量就可以了。

通過cat /proc/sys/kernel/printk可以得到。在kernel/sysctl.c中有其實現。

7 4 1 7

這四個值分別對應：

#define console_loglevel (console_printk[0]) #define default_message_loglevel (console_printk[1]) #define minimum_console_loglevel (console_printk[2]) #define default_console_loglevel (console_printk[3])

又對應到：

int console_printk[4] = { CONSOLE_LOGLEVEL_DEFAULT, /* console_loglevel / MESSAGE_LOGLEVEL_DEFAULT, / default_message_loglevel / CONSOLE_LOGLEVEL_MIN, / minimum_console_loglevel / CONSOLE_LOGLEVEL_DEFAULT, / default_console_loglevel / }; / We show everything that is MORE important than this.. / #define CONSOLE_LOGLEVEL_SILENT 0 / Mum's the word / #define CONSOLE_LOGLEVEL_MIN 1 / Minimum loglevel we let people use / #define CONSOLE_LOGLEVEL_QUIET 4 / Shhh ..., when booted with "quiet" / #define CONSOLE_LOGLEVEL_DEFAULT 7 / anything MORE serious than KERN_DEBUG / #define CONSOLE_LOGLEVEL_DEBUG 10 / issue debug messages / #define CONSOLE_LOGLEVEL_MOTORMOUTH 15 / You can't shut this one up */

可知只要內核log優先級高於KERN_DEBUG都會被打印。由下表可知幾乎所有的log都會被打印。這就會造成printk相當繁忙，console_unlock會處理相當多信息。

#define KERN_EMERG KERN_SOH "0" /* system is unusable / #define KERN_ALERT KERN_SOH "1" / action must be taken immediately / #define KERN_CRIT KERN_SOH "2" / critical conditions / #define KERN_ERR KERN_SOH "3" / error conditions / #define KERN_WARNING KERN_SOH "4" / warning conditions / #define KERN_NOTICE KERN_SOH "5" / normal but significant condition / #define KERN_INFO KERN_SOH "6" / informational / #define KERN_DEBUG KERN_SOH "7" / debug-level messages / #define KERN_DEFAULT KERN_SOH "d" / the default kernel loglevel */

想解決也很簡單，提高console_loglevel的優先級。

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c old mode 100644 new mode 100755 index e7e586b..b927d67 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -60,7 +60,7 @@ extern void printascii(char ); #endif int console_printk[4] = { - CONSOLE_LOGLEVEL_DEFAULT, / console_loglevel / + CONSOLE_LOGLEVEL_QUIET, / console_loglevel / MESSAGE_LOGLEVEL_DEFAULT, / default_message_loglevel / CONSOLE_LOGLEVEL_MIN, / minimum_console_loglevel / CONSOLE_LOGLEVEL_DEFAULT, / default_console_loglevel */

在進行修改後，再來進行對比測試。可以看出消耗時間得到顯著提升，優化後的resume_complete時間基本上可以忽略不計。

7, mem Line 748: resume_complete,resume_console[3],248.54900000002544 Line 748: resume_complete,resume_console[3],248.6340000000382 Line 748: resume_complete,resume_console[3],248.26499999994667 Line 748: resume_complete,resume_console[3],248.3510000000706 Line 748: resume_complete,resume_console[3],248.42499999999745 7, freeze Line 996: resume_complete,resume_console[1],76.18400000001202 Line 996: resume_complete,resume_console[1],76.19500000009793 Line 996: resume_complete,resume_console[1],76.3280000001032 Line 996: resume_complete,resume_console[1],76.1689999999362 Line 996: resume_complete,resume_console[1],76.19999999997162 4, freeze Line 996: resume_complete,resume_console[1],0.1010000000007949 Line 996: resume_complete,resume_console[1],0.10499999999069587 Line 996: resume_complete,resume_console[1],0.09799999997994746 Line 996: resume_complete,resume_console[1],0.1010000000007949 Line 996: resume_complete,resume_console[1],0.10000000003174137 4, mem Line 749: resume_complete,resume_console[3],0.3370000000586515 Line 749: resume_complete,resume_console[3],0.33800000005612674 Line 749: resume_complete,resume_console[3],0.37700000007134804 Line 749: resume_complete,resume_console[3],0.3359999999474894 Line 749: resume_complete,resume_console[3],0.3429999999298161

7.2 mmc suspend/resuem分析

從下圖可知，mmc相關suspend/resume主要在mmc0:0001和mmc2:0001兩個設備的suspend/resume。下面重點分析這兩個設備的suspend/resume回調函數。

EMMC用來做數據記錄儀壽命有問題嗎_EMMC用來做數據記錄儀壽命有問題嗎_26

在執行suspend過程中，先將bus上面的設備driver先suspend，然後在suspend bus。

在resume時，過程相反，先bus resume，然後再逐個設備driver resume。

mmc0:0001

那就來看看bus和各個設備耗費的時間：

4013.868837 \| 4) sh-4511 \| \| /* device_pm_callback_start: mmcblk mmc0:0001, parent: mmc0, [suspend] / 4013.868893 \| 4) sh-4511 \| \| / device_pm_callback_end: mmcblk mmc0:0001, err=0 / 4013.869000 \| 4) sh-4511 \| \| / device_pm_callback_start: block mmcblk0, parent: mmc0:0001, [suspend] / 4013.869053 \| 4) sh-4511 \| \| / device_pm_callback_end: block mmcblk0, err=0 / 4013.889229 \| 5) sh-4511 \| \| / device_pm_callback_start: mmcblk mmc0:0001, parent: mmc0, bus [suspend] / 4013.914631 \| 0) sh-4511 \| \| / device_pm_callback_end: mmcblk mmc0:0001, err=0 / 4022.787571 \| 0) sh-4511 \| \| / device_pm_callback_start: mmcblk mmc0:0001, parent: mmc0, bus [resume] / 4022.886749 \| 0) sh-4511 \| \| / device_pm_callback_end: mmcblk mmc0:0001, err=0 / 4023.059198 \| 0) sh-4511 \| \| / device_pm_callback_start: block mmcblk0, parent: mmc0:0001, [resume] / 4023.059270 \| 0) sh-4511 \| \| / device_pm_callback_end: block mmcblk0, err=0 / 4023.059398 \| 0) sh-4511 \| \| / device_pm_callback_start: mmcblk mmc0:0001, parent: mmc0, [resume] / 4023.059830 \| 0) sh-4511 \| \| / device_pm_callback_end: mmcblk mmc0:0001, err=0 */

可以看出driver的suspend/resume並沒有耗費太多時間，主要在mmc bus的suspend/resume耗費了太多時間。

在drivers/mmc/core/bus.c中

static struct bus_type mmc_bus_type = { .name = "mmc", .dev_groups = mmc_dev_groups, .match = mmc_bus_match, .uevent = mmc_bus_uevent, .probe = mmc_bus_probe, .remove = mmc_bus_remove, .shutdown = mmc_bus_shutdown, .pm = &mmc_bus_pm_ops, };

mmc_bus_pm_bus對應的suspend/resume函數如下：

static const struct dev_pm_ops mmc_bus_pm_ops = { SET_RUNTIME_PM_OPS(mmc_runtime_suspend, mmc_runtime_resume, NULL) SET_SYSTEM_SLEEP_PM_OPS(mmc_bus_suspend, mmc_bus_resume) };

mmc bus的suspend/resume如下：

static int mmc_bus_suspend(struct device dev) { struct mmc_card card = mmc_dev_to_card(dev); struct mmc_host host = card->host; int ret; 對應設備驅動的suspend回調函數。 if (ret) return ret; 這裏的host指的是mmc_host，bus_ops指的是mmc_ops。 return ret; } static int mmc_bus_resume(struct device dev) { struct mmc_card card = mmc_dev_to_card(dev); struct mmc_host host = card->host; int ret; 這裏的host指的是mmc_host，bus_ops指的是mmc_ops。 if (ret) pr_warn("%s: error %d during resume (card was removed?)\n", mmc_hostname(host), ret); 對應設備的驅動的resume回調函數。 return ret; }

pm_generic_suspend和pm_generic_resume是對子系統設備的通用回調函數。

int pm_generic_suspend(struct device dev) { const struct dev_pm_ops pm = dev->driver ? dev->driver->pm : NULL; return pm && pm->suspend ? pm->suspend(dev) : 0; } int pm_generic_resume(struct device dev) { const struct dev_pm_ops pm = dev->driver ? dev->driver->pm : NULL; return pm && pm->resume ? pm->resume(dev) : 0; }

從中可以看出，如果對應設備存在dev->driver->pm->suspend和dev->driver->pm->resume則，調用回調函數。

mmc_bus_suspend

mmc_bus_suspend花費了大概25.4ms。下面是ftrace中截取的一段，從中可以看出時間主要消耗在struct mmc_bus_ops mmc_ops的.suspend回調函數。

/* / mmc_bus_suspend() { / / pm_generic_suspend() { /! 307.552 us/ } / / mmc_suspend() { /* 25060.78 us/ } / mmc_suspend / /* 25378.28 us/ } / mmc_bus_suspend */

mmc_bus_resume

mmc_resume消耗了大部分時間，整個流程才99.158ms。

/* / mmc_bus_resume() { / / mmc_resume() { / * 97167.39 us/ } / / pm_generic_resume() { / # 1980.104 us/ } / * 99158.12 us*/ }

mmc_suspend

對應的host->bus_ops，即mmc_ops。在host下的設備都suspend之後，suspend mmc_host。

在mmc_host resume之後，才能進行設備的resume。

static const struct mmc_bus_ops mmc_ops = { .remove = mmc_remove, .detect = mmc_detect, .suspend = mmc_suspend, .resume = mmc_resume, .runtime_suspend = mmc_runtime_suspend, .runtime_resume = mmc_runtime_resume, .alive = mmc_alive, .shutdown = mmc_shutdown, .reset = mmc_reset, };

通過分析ftrace.txt文件，發現其中msleep花費了17.1ms，這裏是存在問題的。

int __mmc_switch(struct mmc_card card, u8 set, u8 index, u8 value, unsigned int timeout_ms, bool use_busy_signal, bool send_status, bool ignore_crc) { struct mmc_host host = card->host; int err; struct mmc_command cmd = {0}; unsigned long timeout; u32 status = 0; bool use_r1b_resp = use_busy_signal; mmc_retune_hold(host); … /* * We are not allowed to issue a status command and the host * does'nt support MMC_CAP_WAIT_WHILE_BUSY, then we can only * rely on waiting for the stated timeout to be sufficient. */ if (!send_status) { mmc_delay(timeout_ms); goto out; } … }

mmc_resume

通過分析ftrece.txt，可以發現mmc_resume存在4個msleep，共消耗了12646.35 +14260.78 +13881.66 +15093.22 =55.882 ms。

關於mmc_ops的suspend/resume/runtime_suspend/runtime_resume的探討

先來看看這四個函數的，其流程受到MMC_CAP_AGGRESSIVE_PM和MMC_CAP_RUNTIME_RESUME兩個flag的控制。執行的實體都是_mmc_suspend、_mmc_resume。

static int mmc_suspend(struct mmc_host host) { int err; err = _mmc_suspend(host, true); if (!err) { pm_runtime_disable(&host->card->dev); pm_runtime_set_suspended(&host->card->dev); } return err; } static int mmc_resume(struct mmc_host host) { int err = 0; MMC_CAP_RUNTIME_RESUME)) { err = _mmc_resume(host); pm_runtime_set_active(&host->card->dev); pm_runtime_mark_last_busy(&host->card->dev); } pm_runtime_enable(&host->card->dev); return err; } static int mmc_runtime_suspend(struct mmc_host host) { int err; MMC_CAP_AGGRESSIVE_PM)) return 0; err = _mmc_suspend(host, true); if (err) pr_err("%s: error %d doing aggressive suspend\n", mmc_hostname(host), err); return err; } static int mmc_runtime_resume(struct mmc_host host) { int err; MMC_CAP_AGGRESSIVE_PM \| MMC_CAP_RUNTIME_RESUME))) return 0; err = _mmc_resume(host); if (err) pr_err("%s: error %d doing aggressive resume\n", mmc_hostname(host), err); return 0; }

1.如果兩flag都沒有定義，則runtime_suspend和runtim_resume都是空函數。起作用的就是跟隨系統的suspend/resume流程。

2.如果只定義了MMC_CAP_RUNTIME_RESUME，則不會runtime_suspend。並且在系統resume的時候，不會執行resume回調函數。只會在根據需要執行runtime_resume。使用runtime_resume代替了resume。

3.如果只定義了MMC_CAP_AGGRESSIVE_PM ，則suspend/resume跟隨系統suspend/resume流程。並且runtime_suspend/resume_resume也根據實際情況執行。一切正常。

4.如果兩者都定義了，既可以suspend也可以runtime_suspend，但是隻能runtime_resume，不能跟隨系統resume流程執行resume回調函數。

也就是説MMC_CAP_AGGRESSIVE_PM 則runtime_suspend/runtime_resume都可用，MMC_CAP_RUNTIME_RESUME則只能使用runtime_resume執行resume功能。

那麼就來看一下，在應用了MMC_CAP_RUNTIME_RESUME之後效果如何。

mmc0:0001增加runtime-suspend屬性：

diff --git a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi old mode 100644 new mode 100755 index 09e2c71..2cec392 --- a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi +++ b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi @@ -801,6 +801,7 @@ clock-names = "ciu", "biu"; resets = <&sys_ctrl PERIPH_RSTDIS0_MMC0>; bus-width = <0x8>; + runtime-suspend; vmmc-supply = <&ldo19>; pinctrl-names = "default"; pinctrl-0 = <&emmc_pmx_func &emmc_clk_cfg_func

修改DeviceTree解析文件，增加MMC_CAP_RUNTIME_RESUME。

index 094202c..35fd7b5 --- a/drivers/mmc/host/dw_mmc.c +++ b/drivers/mmc/host/dw_mmc.c @@ -2922,6 +2922,10 @@ static struct dw_mci_board dw_mci_parse_dt(struct dw_mci host) dev_info(dev, "supports-highspeed property is deprecated.\n"); pdata->caps \|= MMC_CAP_SD_HIGHSPEED \| MMC_CAP_MMC_HIGHSPEED; } + if (of_find_property(np, "runtime-suspend", NULL)) { + dev_info(dev, "supports-highspeed property is deprecated.\n"); + pdata->caps \|= MMC_CAP_RUNTIME_RESUME; + } return pdata; }

修改後mmc0:0001的resume達到了預期，mmc_resume沒有被執行。

針對統計結果，效果明顯。

EMMC用來做數據記錄儀壽命有問題嗎_低功耗_27

雖然沒有在系統resume過程中執行，但是mmc0:0001總要resume。只不過稍微延遲了，不再這個工具統計之中。

延後執行的mmc0:0001的resume耗費了72.317ms，也和之前的差不多。實際上沒有對整個流程作出實質貢獻，只是不在統計數據之內。

[32m[ 32.486851] [0m[33mmmc_host mmc0[0m: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31 caps=40138143 caps2=0) [32m[ 32.500871] [0m[33mueventd[0m: fixup /sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq 1000 1000 664 [32m[ 32.501305] [0m[33mueventd[0m: fixup /sys/devices/system/cpu/cpu4/cpufreq/scaling_min_freq 1000 1000 664 [32m[ 32.540313] [0m[33mueventd[0m: fixup /sys/devices/system/cpu/cpu5/cpufreq/scaling_max_freq 1000 1000 664 [32m[ 32.540747] [0m[33mueventd[0m: fixup /sys/devices/system/cpu/cpu5/cpufreq/scaling_min_freq 1000 1000 664 [32m[ 32.559168] [0m[33mmmc_host mmc0[0m: Bus speed (slot 0) = 51756522Hz (slot req 52000000Hz, actual 51756522HZ div = 0 caps=40138143 caps2=0)

mmc2:0001

mmc2:0001和mmc0:0001的區別在於不同的mmc_bus_ops，mmc2:0001是SDIO接口，對應的應該是mmc_sdio_ops。

4013.876306 \| 4) sh-4511 \| \| /* device_pm_callback_start: mmc mmc2:0001, parent: mmc2, [suspend] / 4013.876360 \| 4) sh-4511 \| \| / device_pm_callback_end: mmc mmc2:0001, err=0 / 4013.876397 \| 4) sh-4511 \| \| / device_pm_callback_start: sdio mmc2:0001:1, parent: mmc2:0001, [suspend] / 4013.876437 \| 4) sh-4511 \| \| / device_pm_callback_end: sdio mmc2:0001:1, err=0 / 4013.876470 \| 4) sh-4511 \| \| / device_pm_callback_start: wl1271_sdio mmc2:0001:2, parent: mmc2:0001, [suspend] / 4013.876525 \| 4) sh-4511 \| \| / device_pm_callback_end: wl1271_sdio mmc2:0001:2, err=0 / 4013.876556 \| 4) sh-4511 \| \| / device_pm_callback_start: wl18xx_driver wl18xx.1.auto, parent: mmc2:0001:2, [suspend] / 4013.876596 \| 4) sh-4511 \| \| / device_pm_callback_end: wl18xx_driver wl18xx.1.auto, err=0 / 4013.881676 \| 4) sh-4511 \| \| / device_pm_callback_start: wl18xx_driver wl18xx.1.auto, parent: mmc2:0001:2, bus [suspend] / 4013.881698 \| 4) sh-4511 \| \| / device_pm_callback_end: wl18xx_driver wl18xx.1.auto, err=0 / 4013.881740 \| 4) sh-4511 \| \| / device_pm_callback_start: wl1271_sdio mmc2:0001:2, parent: mmc2:0001, bus [suspend] / 4013.881765 \| 4) sh-4511 \| \| / device_pm_callback_end: wl1271_sdio mmc2:0001:2, err=0 / 4013.882582 \| 4) sh-4511 \| \| / device_pm_callback_start: sdio mmc2:0001:1, parent: mmc2:0001, bus [suspend] / 4013.882603 \| 4) sh-4511 \| \| / device_pm_callback_end: sdio mmc2:0001:1, err=0 / 4013.882645 \| 4) sh-4511 \| \| / device_pm_callback_start: mmc mmc2:0001, parent: mmc2, bus [suspend] / 4013.885524 \| 4) sh-4511 \| \| / device_pm_callback_end: mmc mmc2:0001, err=0 / 4022.888667 \| 0) sh-4511 \| \| / device_pm_callback_start: mmc mmc2:0001, parent: mmc2, bus [resume] / 4023.042980 \| 0) sh-4511 \| \| / device_pm_callback_end: mmc mmc2:0001, err=0 / 4023.043021 \| 0) sh-4511 \| \| / device_pm_callback_start: sdio mmc2:0001:1, parent: mmc2:0001, bus [resume] / 4023.043037 \| 0) sh-4511 \| \| / device_pm_callback_end: sdio mmc2:0001:1, err=0 / 4023.043067 \| 0) sh-4511 \| \| / device_pm_callback_start: wl1271_sdio mmc2:0001:2, parent: mmc2:0001, bus [resume] / 4023.043089 \| 0) sh-4511 \| \| / device_pm_callback_end: wl1271_sdio mmc2:0001:2, err=0 / 4023.043128 \| 0) sh-4511 \| \| / device_pm_callback_start: wl18xx_driver wl18xx.1.auto, parent: mmc2:0001:2, bus [resume] / 4023.043151 \| 0) sh-4511 \| \| / device_pm_callback_end: wl18xx_driver wl18xx.1.auto, err=0 / 4023.048824 \| 0) sh-4511 \| \| / device_pm_callback_start: wl18xx_driver wl18xx.1.auto, parent: mmc2:0001:2, [resume] / 4023.048877 \| 0) sh-4511 \| \| / device_pm_callback_end: wl18xx_driver wl18xx.1.auto, err=0 / 4023.048916 \| 0) sh-4511 \| \| / device_pm_callback_start: wl1271_sdio mmc2:0001:2, parent: mmc2:0001, [resume] / 4023.048979 \| 0) sh-4511 \| \| / device_pm_callback_end: wl1271_sdio mmc2:0001:2, err=0 / 4023.049011 \| 0) sh-4511 \| \| / device_pm_callback_start: sdio mmc2:0001:1, parent: mmc2:0001, [resume] / 4023.049074 \| 0) sh-4511 \| \| / device_pm_callback_end: sdio mmc2:0001:1, err=0 / 4023.049113 \| 0) sh-4511 \| \| / device_pm_callback_start: mmc mmc2:0001, parent: mmc2, [resume] / 4023.049165 \| 0) sh-4511 \| \| / device_pm_callback_end: mmc mmc2:0001, err=0 */

由下可知不同部分在於mmc_host的suspend/resume，pm_generic_suspend/pm_generic_resume基本上耗費的時間都很少。

所以重點看看mmc_sdio_suspend和mmc_sdio_resume兩個函數。

/* / mmc_bus_suspend() { / 0.833 us / pm_generic_suspend(); / / mmc_sdio_suspend() { / # 2854.687 us/ } / # 2864.115 us*/ }

/* / mmc_bus_resume() { / / mmc_sdio_resume() { / @ 154277.3 us/ } / 1.563 us / pm_generic_resume(); / @ 154290.3 us*/ }

mmc_sdio_suspend

時間很短，不關注。

mmc_sdio_resume

mmc2:001的mmc_bus_resume時間達到154.313，mmc_sdio_resume包含三個msleep共75331.82+15953.43+14369.58=105654.83us=105.654ms。

考慮：是否可以將SDIO的resume也像MMC那樣延後執行呢？

7.3 CPU_OFF/CPU_ON

在分析了resome_console和mmc之後，再來看一下CPU_OFF/CPU_ON的執行過程。

在disable_nonboot_cpus中選取first_cpu，除此之外的所有for_each_online_cpu都會被_cpu_down，並且將其放到frozen_cpus上。

在enable_nonboot_cpus中，遍歷frozen_cpus，將其_cpu_up。

針對HiKey，真個流程就是對CPU 1-7進行關閉、打開的操作，所以重點分析一下_cpu_down和_cpu_up。

耗時最大的三個地方都用粗體下劃線標出，除了發送狀態通知之外，還有rcu sync處理。

對cpu_chain上所有註冊notifier，逐個執行回調函數notifier_call，根據action進行處理，這是一個很耗時的過程。

/* Requires cpu_add_remove_lock to be held / static int _cpu_down(unsigned int cpu, int tasks_frozen) { int err, nr_calls = 0; void hcpu = (void )(long)cpu; unsigned long mod = tasks_frozen ? CPU_TASKS_FROZEN : 0; struct take_cpu_down_param tcd_param = { .mod = mod, .hcpu = hcpu, }; 如果online只有一個CPU，則無法再進行down操作。 return -EBUSY; 如果當前CPU沒有online，則無需進行down。 return -EINVAL; 取得cpu_hotplug.lock鎖在cpu_chain上發從CPU_DOWN_PREPARE狀態。 if (err) { nr_calls--; __cpu_notify(CPU_DOWN_FAILED \| mod, hcpu, nr_calls, NULL); pr_warn("%s: attempt to take down CPU %u failed\n", __func__, cpu); goto out_release; } / * By now we've cleared cpu_active_mask, wait for all preempt-disabled * and RCU users of this state to go away such that all new such users * will observe it. * * For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might * not imply sync_sched(), so wait for both. * * Do sync before park smpboot threads to take care the rcu boost case. / if (IS_ENABLED(CONFIG_PREEMPT)) synchronize_rcu_mult(call_rcu, call_rcu_sched);* else synchronize_rcu(); 將此CPU的由kthread_create創建的線程設置為PARKED。 /* * Prevent irq alloc/free while the dying cpu reorganizes the * interrupt affinities. / irq_lock_sparse(); / * So now all preempt/rcu users must observe !cpu_active(). / err = stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu)); if (err) { / CPU didn't die: tell everyone. Can't complain. / cpu_notify_nofail(CPU_DOWN_FAILED \| mod, hcpu); irq_unlock_sparse(); goto out_release; } BUG_ON(cpu_online(cpu)); 如果指定的CPU還處於online狀態，則觸發kernel panic。 / * The migration_call() CPU_DYING callback will have removed all * runnable tasks from the cpu, there's only the idle task left now * that the migration thread is done doing the stop_machine thing. * * Wait for the stop thread to go away. / while (!per_cpu(cpu_dead_idle, cpu)) cpu_relax(); smp_mb(); / Read from cpu_dead_idle before __cpu_die(). / per_cpu(cpu_dead_idle, cpu) = false; / Interrupts are moved away from the dying cpu, reenable alloc/free / irq_unlock_sparse(); hotplug_cpu__broadcast_tick_pull(cpu); / This actually kills the CPU. / __cpu_die(cpu); 調用底層架構相關的cpu_kill回調函數。 / CPU is completely dead: tell everyone. Too late to complain. / tick_cleanup_dead_cpu(cpu); cpu_notify_nofail(CPU_DEAD \| mod, hcpu);* 通知完成offline動作的處理器狀態為CPU_DEAD。 check_for_tasks(cpu); out_release: cpu_hotplug_done(); 釋放cpu_hotplug.lock鎖。 trace_sched_cpu_hotplug(cpu, err, 0); if (!err) cpu_notify_nofail(CPU_POST_DEAD \| mod, hcpu); return err; }

/* Requires cpu_add_remove_lock to be held / static int _cpu_up(unsigned int cpu, int tasks_frozen) { int ret, nr_calls = 0; void hcpu = (void )(long)cpu; unsigned long mod = tasks_frozen ? CPU_TASKS_FROZEN : 0; struct task_struct idle; 如果沒有其他進程佔有，則退出，執行後面的工作；如果被佔用，則設置這個進程為TASK_INTERRUPTIBLE，等待結束。如果該CPU已經online，則沒有必要執行up；或者非present，則無法up。 ret = -EINVAL; goto out; } 給指定CPU生成一個idle線程 if (IS_ERR(idle)) { ret = PTR_ERR(idle); goto out; } 創建一個用於管理CPU hotplug動作的線程 if (ret) goto out; ret = __cpu_notify(CPU_UP_PREPARE \| mod, hcpu, -1, &nr_calls); 通知cpu_chain中的處理器，當前正在online的CPU狀態為CPU_UP_PREPARE。 if (ret) { nr_calls--; pr_warn("%s: attempt to bring up CPU %u failed\n", __func__, cpu); goto out_notify; } /* Arch-specific enabling code. / ret = __cpu_up(cpu, idle); 調用更底層的使能CPU操作。 if (ret != 0) goto out_notify; BUG_ON(!cpu_online(cpu)); / Now call notifier in preparation. */ cpu_notify(CPU_ONLINE \| mod, hcpu); 通知cpu_chanin中的處理器，目前online動作的處理器的狀態為CPU_ONLINE。 out_notify: if (ret != 0) __cpu_notify(CPU_UP_CANCELED \| mod, hcpu, nr_calls, NULL); out: cpu_hotplug_done(); 釋放cpu_hotplug.lock鎖。 trace_sched_cpu_hotplug(cpu, ret, 1); return ret; }

RCU synchronize

RCU即Read-Copy Update，是Linux內核比較成熟的新型讀寫鎖，具有較高的讀寫併發性能，常常用在需要互斥的關鍵性能路徑。

在Kernel中，有兩種類型實現tiny和tree，tiny rcu更簡潔，常用在小型嵌入式系統中；tree rcu被廣泛用在了server、desktop、android中。

RCU的和心理鏈式讀者訪問的同時，寫者可以更新訪問對象的副本，但寫者需要等待所有讀者完成訪問之後，才能刪除老對象。這個過程實現的關鍵和難點在於如何判斷所有的讀者已經完成訪問。通常把寫者開始更新，到所有讀者完成訪問這段時間叫做寬限期（Grace Period）。內核中實現寬限期等待的函數是synchronize_rcu。

synchronize_rcu_mult同時在call_rcu()函數列表的寬限期上等待，知道所有的都結束。

總結：cpu_chain和rcu sync耗時大部是由外界因素決定的，如果cpu_chain或者call_rcu()列表很多，或者裏面回調函數特別耗時，都會拉長CPU_OFF/CPU_ON時間。這部分的優化特別離散。

參考文檔：

RCU synchronize原理分析 http://www.wowotech.net/kernel_synchronization/223.html

synchronize_rcu()函數詳解 http://blog.chinaunix.net/uid-20648784-id-1592811.html

如何確定一個函數耗費時間？

在函數中添加以下ftrace，可以得到執行時的timestamp，進程名稱，函數名和對應的行數。

trace_suspend_resume(TPS(__func__), __LINE__, true);

結果如下：

223.502950 \| 1) sh-2832 \| \| /* suspend_resume: CPU_ON[4] begin / 223.502953 \| 1) sh-2832 \| \| / suspend_resume: _cpu_up[513] begin / 223.502957 \| 1) sh-2832 \| \| / suspend_resume: _cpu_up[516] begin / 223.502959 \| 1) sh-2832 \| \| / suspend_resume: _cpu_up[522] begin / 223.502969 \| 1) sh-2832 \| \| / suspend_resume: _cpu_up[529] begin / 223.502973 \| 1) sh-2832 \| \| / suspend_resume: _cpu_up[534] begin / 223.529988 \| 1) sh-2832 \| \| / suspend_resume: _cpu_up[544] begin / 223.530382 \| 1) sh-2832 \| \| / suspend_resume: _cpu_up[552] begin / 223.531451 \| 1) sh-2832 \| \| / suspend_resume: _cpu_up[559] begin / 223.531454 \| 1) sh-2832 \| \| / suspend_resume: _cpu_up[563] begin / 223.531456 \| 1) sh-2832 \| \| / suspend_resume: CPU_ON[4] end */

在Excel中打開，可以輕鬆算出時間差。可知Line 534到Line 544之前耗費了最多時間。

EMMC用來做數據記錄儀壽命有問題嗎_#define_28

本文章為轉載內容，我們尊重原作者對文章享有的著作權。如有內容錯誤或侵權問題，歡迎原作者聯繫我們進行內容更正或刪除文章。

博客 / 詳情