鹅厂后台大佬教你Go内存管理！

来源：恒创科技编辑：恒创科技编辑部

2022-09-16 01:06:00

导语 | 本文推选自腾讯云开发者社区-【技思广益 · 腾讯技术人原创集】专栏。该专栏是腾讯云开发者社区为腾讯技术人与广泛开发者打造的分享交流窗口。栏目邀约腾讯技术人分享原创的技术积淀，与广泛开发者互启迪共成长。本文作者是腾讯后台开发工程师罗元国。

栈内存栈区的内存由编译器自动进行分配和释放，栈区中存储着函数的参数以及局部变量，它们会随着函数的创建而创建，函数的返回而销毁。

每个goroutine都维护着一个自己的栈区，这个栈区只能自己使用不能被其他goroutine使用。

栈区的初始大小是2KB。栈内存空间、结构和初始大小经过了几个版本的更迭：

v1.0~v1.1：最小栈内存空间为4KB。

v1.2：将最小栈内存提升到了8KB。

v1.3: 使用连续栈替换之前版本的分段栈。

v1.4~v1.19：将最小栈内存降低到了2KB。

栈结构经过了分段栈到连续栈的发展过程，介绍如下。

分段栈随着goroutine调用的函数层级的深入或者局部变量需要的越来越多时，运行时会调用runtime.morestack和runtime.newstack创建一个新的栈空间，这些栈空间是不连续的，但是当前goroutine的多个栈空间会以双向链表的形式串联起来，运行时会通过指针找到连续的栈片段。如下图所示。

优点：按需为当前goroutine分配内存并且及时减少内存的占用。

缺点：如果当前goroutine的栈几乎充满，那么任意的函数调用都会触发栈的扩容，当函数返回后又会触发栈的收缩，如果在一个循环中调用函数，栈的分配和释放就会造成巨大的额外开销，这被称为热分裂问题（Hot split）。

为了解决这个问题，Go在1.2版本的时候不得不将栈的初始化内存从4KB增大到了8KB。

连续栈连续栈可以解决分段栈中存在的两个问题，其核心原理就是每当程序的栈空间不足时，初始化一片比旧栈大两倍的新栈并将原栈中的所有值都迁移到新的栈中，新的局部变量或者函数调用就有了充足的内存空间。

栈空间不足导致的扩容会经历以下几个步骤：

调用runtime.newstack用在内存空间中分配更大的栈内存空间。

使用runtime.copystack将旧栈中的所有内容复制到新的栈中。

将指向旧栈对应变量的指针重新指向新栈。

调用runtime.stackfree销毁并回收旧栈的内存空间。

栈管理Span除了用作堆内存分配外，也用于栈内存分配，只是用途不同的Span对应的mSpan状态不同。用做堆内存的mSpan状态为mSpanInUse，而用做栈内存的状态为mSpanManual。

栈空间在运行时中包含两个重要的全局变量，分别是runtime.stackpool和runtime.stackLarge，这两个变量分别表示全局的栈缓存和大栈缓存，前者可以分配小于32KB的内存，后者用来分配大于32KB的栈空间。

为提高栈内存分配效率，调度器初始化时会初始化两个用于栈分配的全局对象：stackpool和stackLarge。

介绍如下：

（一）StackPoolstackpool面向32KB以下的栈分配，栈大小必须是2的幂，最小2KB，在Linux环境下，stackpool提供了2kB、4KB、8KB、16KB四种规格的mSpan链表。

stackpool结构定义如下

// Global pool of spans that have free stacks.
// Stacks are assigned an order according to size.
//
//  order = log_2(size/FixedStack)
//
// There is a free list for each order.
var stackpool [_NumStackOrders]struct {
  item stackpoolItem
  _    [cpu.CacheLinePadSize - unsafe.Sizeof(stackpoolItem{})%cpu.CacheLinePadSize]byte
}

//go:notinheap
type stackpoolItem struct {
  mu   mutex
  span mSpanList
}

// mSpanList heads a linked list of spans.
//
//go:notinheap
type mSpanList struct {
  first *mspan // first span in list, or nil if none
  last  *mspan // last span in list, or nil if none
}

（二）StackLarge大于等于32KB的栈，由stackLarge来分配，这也是个mSpan链表的数组，长度为25。mSpan规格从8KB开始，之后每个链表的mSpan规格都是前一个的两倍。

8KB和16KB这两个链表，实际上会一直是空的，留着它们是为了方便使用mSpan包含页面数的（以2为底）对数作为数组下标。stackLarge结构定义如下：


// Global pool of large stack spans.
var stackLarge struct {
  lock mutex
  free [heapAddrBits - pageShift]mSpanList // free lists by log_2(s.npages)
}

// mSpanList heads a linked list of spans.
//
//go:notinheap
type mSpanList struct {
  first *mspan // first span in list, or nil if none
  last  *mspan // last span in list, or nil if none
}

（三）内存分配如果运行时只使用全局变量来分配内存的话，势必会造成线程之间的锁竞争进而影响程序的执行效率，栈内存由于与线程关系比较密切，所以在每一个线程缓存runtime.mcache中都加入了栈缓存减少锁竞争影响。

同堆内存分配一样，每个P也有用于栈分配的本地缓存(mcache.stackcache)，这相当于是stackpool的本地缓存，在mcache中的定义如下：

//go:notinheap
type mcache struct {
  // The following members are accessed on every malloc,
  // so they are grouped here for better caching.
  nextSample uintptr // trigger heap sample after allocating this many bytes
  scanAlloc  uintptr // bytes of scannable heap allocated

  // Allocator cache for tiny objects w/o pointers.
  // See "Tiny allocator" comment in malloc.go.

  // tiny points to the beginning of the current tiny block, or
  // nil if there is no current tiny block.
  //
  // tiny is a heap pointer. Since mcache is in non-GC'd memory,
  // we handle it by clearing it in releaseAll during mark
  // termination.
  //
  // tinyAllocs is the number of tiny allocations performed
  // by the P that owns this mcache.
  tiny       uintptr
  tinyoffset uintptr
  tinyAllocs uintptr

  // The rest is not accessed on every malloc.

  alloc [numSpanClasses]*mspan // spans to allocate from, indexed by spanClass

  stackcache [_NumStackOrders]stackfreelist

  // flushGen indicates the sweepgen during which this mcache
  // was last flushed. If flushGen != mheap_.sweepgen, the spans
  // in this mcache are stale and need to the flushed so they
  // can be swept. This is done in acquirep.
  flushGen uint32
}

stackcache [_NumStackOrders]stackfreelist即为栈的本地缓存，在Linux环境下，每个P本地缓存有4（_NumStackOrders）种规格的空闲内存块链表：2KB，4KB，8KB，16KB，定义如下所示：


// Number of orders that get caching. Order 0 is FixedStack
// and each successive order is twice as large.
// We want to cache 2KB, 4KB, 8KB, and 16KB stacks. Larger stacks
// will be allocated directly.
// Since FixedStack is different on different systems, we
// must vary NumStackOrders to keep the same maximum cached size.
//   OS               | FixedStack | NumStackOrders
//   -----------------+------------+---------------
//   linux/darwin/bsd | 2KB        | 4
//   windows/32       | 4KB        | 3
//   windows/64       | 8KB        | 2
//   plan9            | 4KB        | 3
_NumStackOrders = 4 - goarch.PtrSize/4*goos.IsWindows - 1*goos.IsPlan9

小于32KB的栈分配：

对于小于32KB的栈空间，会优先使用当前P的本地缓存。

如果本地缓存中，对应规格的内存块链表为空，就从stackpool这里分配16KB的内存放到本地缓存（stackcache）中，然后继续从本地缓存分配。

若是stackpool中对应链表也为空，就从堆内存直接分配一个32KB的span划分成对应的内存块大小放到stackpool中。

不过有些情况下，是无法使用本地缓存的，在不能使用本地缓存的情况下，就直接从stackpool分配。

大于等于32KB的栈分配：

计算需要的page数目，并以2为底求对数（log2page），将得到的结果作为stackLarge数组的下标，找到对应的空闲mSpan链表。

若链表不为空，就拿一个过来用。如果链表为空，就直接从堆内存分配一个拥有这么多个页面的span，并把它整个用于分配栈内存。

例如想要分配64KB的栈，68/8是8个page，log2page=log2(8)=3。

（四）内存释放什么时候释放栈？
如果协程栈没有增长过（还是2KB），就把这个协程放到有栈的空闲G队列中。

如果协程栈增长过，就把协程栈释放掉，再把协程放入到没有栈的空闲G队列中。

而这些空闲协程的栈，也会在GC执行markroot时被释放掉，到时候这些协程也会加入到没有栈的空闲协程队列中。