Go 1.13 defer 的变化
2012 年 10 月 20 日
1.13 正式发布了,Release notes 上说 defer 现在大多数情况下可以提升 30% 的性能。这 30% 的性能怎么来的呢?
我们知道,以前的 defer func 会被翻译成 deferproc 和 deferreturn 两个过程, 这里
现在 deferproc 这一步增加了 deferprocStack 这个新过程,由编译器来选择使用 deferproc 还是 deferprocStack,当然了,既然官方说优化了大部分的使用场景,说明基本上大部分情况下,是会被编译到 deferprocStack 的。
// All other fields can contain junk. // The defer record must be immediately followed in memory by // the arguments of the defer. // Nosplit because the arguments on the stack won't be scanned // until the defer record is spliced into the gp._defer list. //go:nosplit func deferprocStack(d *_defer) { gp := getg() if gp.m.curg != gp { // go code on the system stack can't defer throw("defer on system stack") } // siz and fn are already set. // The other fields are junk on entry to deferprocStack and // are initialized here. d.started = false d.heap = false d.sp = getcallersp() d.pc = getcallerpc() // The lines below implement: // d.panic = nil // d.link = gp._defer // gp._defer = d // But without write barriers. The first two are writes to // the stack so they don't need a write barrier, and furthermore // are to uninitialized memory, so they must not use a write barrier. // The third write does not require a write barrier because we // explicitly mark all the defer structures, so we don't need to // keep track of pointers to them with a write barrier. *(*uintptr)(unsafe.Pointer(&d._panic)) = 0 *(*uintptr)(unsafe.Pointer(&d.link)) = uintptr(unsafe.Pointer(gp._defer)) *(*uintptr)(unsafe.Pointer(&gp._defer)) = uintptr(unsafe.Pointer(d)) return0() // No code can go here - the C return register has // been set and must not be clobbered. }
简单验证验证:
package main func main() { defer println(1) }
0x003a 00058 (deferstack.go:4) LEAQ ""..autotmp_1+8(SP), AX 0x003f 00063 (deferstack.go:4) PCDATA $0, $0 0x003f 00063 (deferstack.go:4) MOVQ AX, (SP) 0x0043 00067 (deferstack.go:4) CALL runtime.deferprocStack(SB) 0x0048 00072 (deferstack.go:4) TESTL AX, AX 0x004a 00074 (deferstack.go:4) JNE 92 0x004c 00076 (deferstack.go:5) XCHGL AX, AX
原来的 deferproc 仍然存在,所以对应的 _defer
结构体上需要区分这个 defer 结构是在栈上还是堆上分配的:
type _defer struct { siz int32 // includes both arguments and results started bool heap bool // 增加了这个新字段 sp uintptr // sp at time of defer pc uintptr fn *funcval _panic *_panic // panic that is running defer link *_defer }
在没有 deferprocStack 之前,就是走 deferproc 的过程,虽然也有 deferpool,但是不够用的时候,肯定还是会有这么个东西:
d = (*_defer)(mallocgc(total, deferType, true))
社区里一直有人吐槽 defer 慢慢慢。所以这次相当于官方响应民意了。。
为什么没有把所有 defer 调用都优化成栈上分配呢?
case ODEFER: d := callDefer if n.Esc == EscNever { d = callDeferStack } s.call(n.Left, d)
n.Esc 是 ast.Node 的逃逸分析结果,被修改为 EscNever 主要就是下面这段:
case ODEFER: if e.loopdepth == 1 { // top level n.Esc = EscNever // force stack allocation of defer record (see ssa.go) break }
怎么理解这个 loopdepth 呢?大概就是每增加一个 for 循环增加一吧,我们照这个思路仿照一个 defer 仍然分配在堆上的例子:
package main import "fmt" func main() { for i := 0; i < 10; i++ { defer func() { for { var a = make([]int, 128) fmt.Println(a) } }() } }
go tool compile -S
0x0043 00067 (deferproc.go:7) PCDATA $0, $0 0x0043 00067 (deferproc.go:7) MOVQ AX, 8(SP) 0x0048 00072 (deferproc.go:7) CALL runtime.deferproc(SB) 0x004d 00077 (deferproc.go:7) TESTL AX, AX 0x004f 00079 (deferproc.go:7) JNE 83 0x0051 00081 (deferproc.go:7) JMP 33 0x0053 00083 (deferproc.go:7) XCHGL AX, AX
嗯,还是熟悉的味道。
然而在研究完之后才发现,其实也不用这么麻烦,直接去看官方的 test 就好了哈哈: 这里