Go 1.13 defer 的变化

1.13 正式发布了,Release notes 上说 defer 现在大多数情况下可以提升 30% 的性能。这 30% 的性能怎么来的呢?

我们知道,以前的 defer func 会被翻译成 deferproc 和 deferreturn 两个过程, 这里

现在 deferproc 这一步增加了 deferprocStack 这个新过程,由编译器来选择使用 deferproc 还是 deferprocStack,当然了,既然官方说优化了大部分的使用场景,说明基本上大部分情况下,是会被编译到 deferprocStack 的。

// All other fields can contain junk.
// The defer record must be immediately followed in memory by
// the arguments of the defer.
// Nosplit because the arguments on the stack won't be scanned
// until the defer record is spliced into the gp._defer list.
func deferprocStack(d *_defer) {
    gp := getg()
    if gp.m.curg != gp {
        // go code on the system stack can't defer
        throw("defer on system stack")
    // siz and fn are already set.
    // The other fields are junk on entry to deferprocStack and
    // are initialized here.
    d.started = false
    d.heap = false
    d.sp = getcallersp()
    d.pc = getcallerpc()
    // The lines below implement:
    //   d.panic = nil
    //   d.link = gp._defer
    //   gp._defer = d
    // But without write barriers. The first two are writes to
    // the stack so they don't need a write barrier, and furthermore
    // are to uninitialized memory, so they must not use a write barrier.
    // The third write does not require a write barrier because we
    // explicitly mark all the defer structures, so we don't need to
    // keep track of pointers to them with a write barrier.
    *(*uintptr)(unsafe.Pointer(&d._panic)) = 0
    *(*uintptr)(unsafe.Pointer(&d.link)) = uintptr(unsafe.Pointer(gp._defer))
    *(*uintptr)(unsafe.Pointer(&gp._defer)) = uintptr(unsafe.Pointer(d))

    // No code can go here - the C return register has
    // been set and must not be clobbered.


package main

func main() {
    defer println(1)
0x003a 00058 (deferstack.go:4) LEAQ    ""..autotmp_1+8(SP), AX
    0x003f 00063 (deferstack.go:4)  PCDATA  $0, $0
    0x003f 00063 (deferstack.go:4)  MOVQ    AX, (SP)
    0x0043 00067 (deferstack.go:4)  CALL    runtime.deferprocStack(SB)
    0x0048 00072 (deferstack.go:4)  TESTL   AX, AX
    0x004a 00074 (deferstack.go:4)  JNE 92
    0x004c 00076 (deferstack.go:5)  XCHGL   AX, AX

原来的 deferproc 仍然存在,所以对应的 _defer
结构体上需要区分这个 defer 结构是在栈上还是堆上分配的:

type _defer struct {
    siz     int32 // includes both arguments and results
    started bool
    heap    bool // 增加了这个新字段
    sp      uintptr // sp at time of defer
    pc      uintptr
    fn      *funcval
    _panic  *_panic // panic that is running defer
    link    *_defer

在没有 deferprocStack 之前,就是走 deferproc 的过程,虽然也有 deferpool,但是不够用的时候,肯定还是会有这么个东西:

d = (*_defer)(mallocgc(total, deferType, true))

社区里一直有人吐槽 defer 慢慢慢。所以这次相当于官方响应民意了。。
为什么没有把所有 defer 调用都优化成栈上分配呢?

case ODEFER:
        d := callDefer
        if n.Esc == EscNever {
            d = callDeferStack
        s.call(n.Left, d)

n.Esc 是 ast.Node 的逃逸分析结果,被修改为 EscNever 主要就是下面这段:

case ODEFER:
        if e.loopdepth == 1 { // top level
            n.Esc = EscNever // force stack allocation of defer record (see ssa.go)

怎么理解这个 loopdepth 呢?大概就是每增加一个 for 循环增加一吧,我们照这个思路仿照一个 defer 仍然分配在堆上的例子:

package main

import "fmt"

func main() {
    for i := 0; i < 10; i++ {
        defer func() {
            for {
                var a = make([]int, 128)

go tool compile -S

0x0043 00067 (deferproc.go:7)  PCDATA  $0, $0
    0x0043 00067 (deferproc.go:7)   MOVQ    AX, 8(SP)
    0x0048 00072 (deferproc.go:7)   CALL    runtime.deferproc(SB)
    0x004d 00077 (deferproc.go:7)   TESTL   AX, AX
    0x004f 00079 (deferproc.go:7)   JNE 83
    0x0051 00081 (deferproc.go:7)   JMP 33
    0x0053 00083 (deferproc.go:7)   XCHGL   AX, AX


然而在研究完之后才发现,其实也不用这么麻烦,直接去看官方的 test 就好了哈哈: 这里