"Go Language Memory Model" Reading and Summary

Background Introduction: (can be skipped directly)

Recently, I have been relearning the operating system course and have gained some understanding of concurrent programs in operating systems. However, the code in the operating system course is all about C language. The course mentions that the operating system is just a concurrent program👍 (for example, it provides condition variables and semaphores to ensure the correctness of concurrency), and this concurrent program is almost written in C statements. So I became curious about what concurrent programming patterns are provided by Go language and what concurrent primitives it offers to ensure the correctness of concurrency. Let's take a look at Go's memory model.

Before writing this article, I kept chanting in my mind, jyy yyds!!!

This article is almost a translated version of Go's memory model. If your English is good, it is recommended to read the original material directly, as it is definitely more interesting than the translated version～

Official documentation link: go memory model

Data Race#

A write to a memory location and a read or write to the same location occur simultaneously, unless the sync/atomic package protects the data, making the access operations atomic.

Data races mainly arise from the inability to guarantee the order of operations on data.

For example, goroutine1 writes to variable a, while goroutine2 reads from variable a.

package main

import (
  "fmt"
  "time"
)

var a = 3

func g1() { // goroutine 1 executes
  a *= 2
  fmt.Println(a)
}

func g2() { // goroutine2 executes
  fmt.Println(a)
}

func main() {
  fmt.Println("init value: ", a)
  go g1()
  go g2()

  time.Sleep(time.Second)
}

If you use go build -race hello.go, when executing the binary file, it will throw a warning:

WARNING: DATA RACE
Write at 0x000104d946d0 by goroutine 7:
  main.g1()
      hello.go:11 +0x3c

Previous read at 0x000104d946d0 by goroutine 8:
  main.g2()
      hello.go:16 +0x2c

Goroutine 7 (running) created at:
  main.main()
      hello.go:21 +0xa4

Goroutine 8 (finished) created at:
  main.main()
      hello.go:22 +0xb0
==================
6
Found 1 data race(s)

It is possible that g1 executes first or g2 executes first, making the result of variable a uncertain.

Go officially recommends that programmers use appropriate synchronization to avoid data races. In the absence of data races, Go programs behave as if all Goroutines are reused on a single processor, a behavior sometimes referred to as DRF-SC (data-race-free program executes in a sequentially consistent manner). Programs without data races execute in a sequentially consistent manner.

Although programmers should write data-race-free Go programs, Go language does impose certain restrictions on data races. Go can terminate the program by reporting data races. Otherwise, Go will observe the actual value written at the current memory location (which may have multiple goroutines writing concurrently) and not be overwritten. Go constrains data races like Java and JS, as data races are always limited, unlike C and C++, where the compiler can do anything.

Go can diagnose and report data races through some tools and considers data races to be errors.

Memory Model#

The definition of Go's memory model closely follows Hans-J. Boehm and Sarita V. Adve in “Foundations of the C++ Concurrency Memory Model”, published in PLDI 2008.

The memory model describes the requirements for program execution, which consists of executions by goroutines, and goroutines consist of memory operations.
A memory operation is simulated by four details:

Its type: indicating whether it is a normal data read, a normal data write, or a synchronization operation, such as an atomic data access, a mutex operation, or a channel operation.
Its position in the program.
The location of the accessed memory or variable.
The value read or written by the operation.

Some memory operations related to reading include reading, atomic reading, mutex locks, and sending and receiving on channels. Some operations perform both reading and writing, such as atomic compare and swap (atomic CAS).

The execution of a goroutine is modeled as a set of operations on memory by a goroutine.

Requirement One
Memory operations in each goroutine must be consistent with the sequential execution of that goroutine. This execution order must conform to the partial order relationships specified by Go language and the order of expressions.

Requirement Two
When restricted to synchronization operations, for a given program execution, the mapping W (write) must be interpreted through some implicit synchronization operation's total order, which needs to satisfy the correct order guarantees for consistent memory access.

When writing concurrent programs, shared memory read and write operations are involved, in which case it is necessary to ensure that there is a correct order between multiple accesses to the same variable. If two goroutines do not establish a happens-before relationship through synchronization operations, the execution order is undefined.

Therefore, when writing concurrent programs, it is essential to ensure that a mapping can accurately reflect the actual situation in memory, and read and write operations must be executed in the correct order, which can be guaranteed through synchronization primitives.

synchronized before is a partial order of synchronized memory operations derived from W (write). If a synchronized read operation r observes a synchronized write operation w (that is, if w(r) = w, then the write operation is synchronized before the read). In simple terms, synchronized before is a subset of a total order of synchronization, a partial order of synchronization operations limited to results directly observed by W.

The happens before relationship is defined as the union of sequenced before and synchronized before.

If an operation A happens before another operation B, this relationship can be represented by the -> symbol.

For example, in the following code:

package main

import (
  "fmt"
)

var x, y int

func main() {
  go func() {
      y = 1
      fmt.Println(x)
  }()
  x = 1
  fmt.Println(y)
}

We can define happens-before as:

goroutine -> y = 1 -> fmt.Println(x)
main goroutine -> x = 1 -> fmt.Println(y)

For example, in the following code:

package main

import (
  "fmt"
  "os"
  "sync"
  "time"

)

var x, y int
var mu sync.Mutex

func write_y() {
  mu.Lock()
  defer mu.Unlock()
  fmt.Fprintln(os.Stdout, "call func in main goroutine get lock")
  y = 1
  fmt.Fprintf(os.Stdout, "write y = %d, and x in func call: %d\n", y, x)
}

func main() {
  fmt.Fprintf(os.Stdout, "init ==> x: %d and y: %d\n", x, y)
  go func() {
      mu.Lock()
      defer mu.Unlock()
      x = 1
      fmt.Fprintln(os.Stdout, "g1 get lock")
      fmt.Fprintf(os.Stdout, "write x = %d and y in g1: %d\n", x, y)
  }()

  write_y()
  time.Sleep(time.Second)
  fmt.Println("x in main goroutine: ", x)
}

Executing the above code yields the following result:

$go run main.go

init ==> x: 0 and y: 0
call func in main goroutine get lock
write y = 1, and x in func call: 0
g1 get lock
write x = 1 and y in g1: 1
x in main goroutine:  1

First, the main goroutine's function call acquired the lock and then wrote to y, at this point x = 0, y = 1. After that, g1 acquired the lock and wrote to x, x = 1.

Requirement Three
For a normal data read operation (r) on a memory location x (non-synchronous), W(r) must be a visible write w to r, meaning the following two points must hold:

w happens before r.
It is guaranteed that there are no other writes w' to x before w, under the condition that w is before r.

If there is a data race on the read and write operations on memory x, where at least one operation is unsynchronized, it cannot guarantee happens-before.

If there are no read-write data races or write-write data races on the variable x, any read of x has only one possible W(r), the single w that is immediately before it in the happens-before order.

More generally, any Go program that is data-race-free means that no program exists with read-write data races or write-write data races, and the result can only be guaranteed by some order of execution of goroutines that ensures serial consistency, meaning the execution result of the program is uniquely predictable and satisfies the serial consistency model.

Implementation Limitations of Programs with Data Races#

First, it can report data races and stop program execution when a data race is detected. This can be achieved through (go build -race) ThreadSanitizer.

If a read of a memory location x does not exceed the machine word length, then this read operation must see a previous write or a simultaneous write operation.

Specifically, for memory location x, if a read r does not occur before w, and there is also no write operation w' that occurs before r after a certain w write, then this read r must be able to observe the value produced by a previous or simultaneous write operation.

The following situations do not exist
case 1:
  r --> w
case 2:
  w --> w' --> r

It is prohibited to observe non-causal concerns or non-existent writes.

For multi-byte memory read and write operations, Go language may not guarantee the same semantics as single-byte memory location operations, such as whether the operation is atomic or whether it has order, etc.

To improve performance, Go language implementations may treat multiple byte operations as a set of individual machine word operations, but the order is unspecified.

This means that competition for multi-byte data structures may lead to different values that do not correspond to a single write operation, which may result in inconsistent results when multiple goroutines read or write to the same memory location simultaneously.

Especially for certain data types, such as interface values, maps, slices, strings, etc., this competition may lead to memory data corruption.

Synchronization#

Initialization (init)#

All initialization operations run in a single goroutine, but this goroutine may create other concurrently running goroutines.

If a package p imports a package q, the init function of q will complete before any operations of p begin.

Before main.main starts, all init functions will complete synchronization.

Creation of Goroutines#

When using the go statement to start a new goroutine, this statement will synchronize before the execution of the goroutine.

In other words, the code in the go statement will complete execution before the new goroutine executes. This synchronization mechanism ensures that all necessary variables and states have been correctly initialized and set before starting the goroutine, thus avoiding race conditions and other potential issues.

var a string

func f() {
  print(a)
}

func hello() {
  a = "hello, world"
  go f()
}

In the above code, calling the hello function assigns "hello, world" to a before go f(), so in the f() function, it will print "hello, world". However, since f() is executed in a new goroutine, the execution of f() and hello() is parallel, and their execution order is undefined. Therefore, it may print "hello, world" after the hello() function returns.

Destruction of Goroutines#

The exit of a goroutine does not guarantee synchronization with any events in the program, as shown in the following program:

var a string

func hello() {
  go func() { a = "hello" }()
  print(a)
}

The assignment to a has no synchronization events following it, so it cannot be guaranteed to be observed by other goroutines. In fact, the compiler may eliminate the entire go statement. If the effect of a goroutine must be observed by another goroutine, please use synchronization mechanisms such as locks or channels to establish a relative order.

Channel Communication#

Channel communication is the primary method of synchronization between goroutines. Each send on a specific channel matches the corresponding receive on that channel, usually in different goroutines.

👍 A send on a channel is synchronized before the corresponding receive on that channel is completed (synchronization does not mean immediate execution).

When a goroutine sends data to a channel, if the channel is not being received by other goroutines, that goroutine will be blocked until another goroutine receives data from that channel.

The send operation will not complete until the receive operation is completed. Only after the receive operation is completed will the send operation successfully complete, and the data from the channel will be obtained by the receiver.

Similarly, receiving data from a channel is not immediately completed; instead, the receive operation will be blocked until another goroutine sends data to that channel.

That is: send and receive operations are synchronized, and there is a mutual waiting and synchronization relationship between them.

This synchronization mechanism is one of the characteristics of channels, ensuring the safety of channels and avoiding data race conditions and other concurrency issues. For example:

var c = make(chan int, 10)
var a string

func f() {
  a = "hello, world"
  c <- 0 // close(c)
}

func main() {
  go f()
  <-c
  print(a)
}

The above code guarantees that it prints "hello, world". The write to a occurs before sending to channel c, and sending to channel c synchronizes before receiving from channel c, which occurs before printing.

a="hello, world" ==> c<-0 ==> <-c ==> print(a)

Closing a channel is synchronized before a receive operation on that channel, which returns a zero value due to the channel being closed.

In Go language, closing a channel and receiving data from a closed channel are both synchronized operations. When a channel is closed, sending operations on that channel will cause a panic, but receiving operations can still continue until all values in the channel have been received.

When receiving data from a closed channel, the receive operation will immediately return a zero value, which represents the default value of the channel's element type.

The operation of closing a channel and the operation of returning a zero value are synchronized.

That is, when a goroutine's receive operation from a channel is blocked, when the channel is closed, the receive operation will immediately unblock and return a zero value.

In the above example, changing c <- 0 to close(c) has the same behavior.

👍 For unbuffered channels, the operation of receiving data from the channel is synchronized before the corresponding send operation.

      That is, before executing the send operation, the receive operation is already prepared to wait for receiving (ready for synchronization).

Unbuffered channels are a mechanism for synchronizing and passing data between goroutines. Due to a capacity of 0, both send and receive operations are synchronized.

When a goroutine sends data to an unbuffered channel, it will be blocked until another goroutine receives data from that unbuffered channel.

Similarly, when a goroutine receives data from an unbuffered channel, it will be blocked until another goroutine sends data to that channel.

In this case, the receive and send operations are synchronized.

So rewriting the above code to use an unbuffered channel would be:

var c = make(chan int)
var a string

func f() {
  a = "hello, world"
  <-c
}

func main() {
  go f()
  c <- 0
  print(a)
}

This code also guarantees printing "hello, world". The write to a occurs before receiving from the channel, and the receive from the channel synchronizes before sending data to the channel, while the send operation occurs before printing.

a = "hello, world" ==> c <- 0 ==> <-c ==> print(a)

However, if the channel is buffered, for example, c = make(chan int, 1), then the program cannot guarantee printing "hello, world".

👍 For buffered channels, if the capacity is C, then when a goroutine receives data from that channel for the k-th time, it will synchronize with the k+C-th operation that sends data to that channel.

      In other words, the k-th receive operation will not complete (block) until the k+C-th send operation has completed.

The characteristics of buffered channels can implement a counting semaphore (counter semaphore).

🎈 The number of elements in the channel corresponds to the number of active operations, and the capacity of the channel corresponds to the maximum number of concurrent operations. Sending an element to the channel is equivalent to acquiring a signal (P operation, waiting to acquire the semaphore), while receiving an element from the channel releases a signal (V operation, releasing the semaphore). When the number of elements in the channel reaches capacity, new send operations will be blocked until other receive operations release some space. This ensures that the number of concurrently executing operations does not exceed the channel's capacity, thus limiting concurrency, which is a common concurrent programming pattern.

This program starts a goroutine for each entry in the work list, but these goroutines coordinate using a channel named limit to ensure that at most 3 goroutines run simultaneously.

var limit = make(chan int, 3)

func main() {
  for _, w := range work {
    go func(w func()) {
      limit <- 1
      w()
      <-limit
    }
  }(w)
  
  select{}
}

Locks#

The sync package implements two types of locks, sync.Mutex and sync.RWMutex.

For any variable l of sync.Mutex or sync.RWMutex and n < m, the call to l.Unlock() on n will be synchronized before the call to l.Lock() on m.

var l sync.Mutex
var a string

func f() {
  a = "hello, world"
  l.Unlock()
}

func main() {
  l.Lock()
  go f()
  l.Lock()
  print(a)  
}

The above program guarantees printing "hello, world". The first call to l.Unlock() (in function f) is synchronized before the second call to l.Lock() (in the main function) returns and executes before the print operation.

For any call to sync.RWMutex variable l's l.RLock, there exists an n such that the n-th call to l.Unlock is synchronized before the return of l.RLock, and the matching call to l.RUnlock is synchronized before the n+1-th call to l.Lock.

A successful call to l.TryLock (or l.TryRLock) is equivalent to a call to l.Lock (or l.RLock); if the attempt to acquire the lock fails, there is no synchronization effect, returning false. In terms of memory model, l.TryLock (or l.TryRLock) has lightweight characteristics, meaning it can return false when the lock is not held, which can improve the concurrency performance of the program.

Once#

The sync package provides a safe mechanism for initialization in the presence of multiple goroutines using the Once type. Multiple threads can execute once.Do(f) for a specific f, but only one thread will run f(), while other calls will block until f() returns.

When calling the Do method of sync.Once, for the same sync.Once instance and the same function f, any call to Do(f) will wait for the previous call to complete before continuing execution.

var a string
var once sync.Once

func setup() {
  a = "hello, world"
}

func doprint() {
  once.Do(setup)
  print(a)
}

func kprint(k int) {
  for i:=0; i < k; i++ {
    go doprint()
  }
}

In calling kprint, setup will be called exactly once, and before calling print, the setup function will complete. The result is that "hello, world" will be printed k times.

Atomic Values#

The APIs in the sync/atomic package are collectively referred to as "atomic operations" and can be used to synchronize the execution of different goroutines. If the effect of an atomic operation A is observed by atomic operation B, then A is synchronized before B.

This definition has the same semantics as C++'s sequentially consistent atomics and Java's volatile variables.

Finalizers#

The runtime package provides a SetFinalizer function that adds a finalizer, which will be called when an object is no longer reachable by the program. The call to SetFinalizer(x, f) is synchronized before calling f(x) for finalization.

In Go's memory model, finalizers refer to a mechanism for executing certain cleanup operations when an object is garbage collected.

Specifically, in Go, when an object becomes unreachable, it will be collected by the garbage collector. If this object defines a finalizer function, the garbage collector will call this finalizer function to perform cleanup operations on the object before it is collected.

Finalizer functions can be defined as methods that take an object as a parameter and do not return any values. This function is typically used to release some resources, such as closing file descriptors or freeing network connections.

It is important to note that the execution time of finalizer functions is uncertain because it depends on the workings of the garbage collector. Additionally, finalizer functions may affect the performance of the garbage collector because they need to execute before the garbage collector collects the object.

Therefore, caution should be exercised when using finalizer functions to avoid impacting performance and ensure that finalizer functions do not cause other memory management issues, such as memory leaks.

Other Mechanisms#

The sync package provides additional synchronization abstractions, including condition variables (condition variables), lock-free maps (lock-free maps), allocation pools (allocation-pools), and wait groups (wait groups). The documentation for each package explains its guarantees regarding synchronization.

Incorrect Synchronization#

Programs with data races are incorrect and exhibit non-sequential consistency. It is especially important to note that a read operation r may observe the value written by any write operation w that executes concurrently with r. Even so, it cannot be guaranteed that read operation r will observe a write operation that occurred before w.

The occurrence of race conditions can lead to unpredictable execution results of the program. If there are race conditions in read and write operations during concurrent execution, the read operation may observe the value of any concurrently executing write operation, which may occur before or after the read operation. Therefore, it is essential to avoid static conditions and ensure the correctness and sequential consistency of the program.

var a, b int

func f() {
  a = 1
  b = 2
}

func g() {
  print(b)
  print(a)
}

func main() {
  go f()
  g()
}

The execution result of the above code is uncertain; it may print "00" or "01".

Double-checking is intended to avoid the overhead of synchronization. For example, the kprint program may be incorrectly written as:

var a string
var done bool

func setup() {
  a = "hello, world"
  done = true
}

func doprint() {
  if !done {
    once.Do(setup)
  }
  print(a)
}

func kprint(k int) {
  for i := 0; i < k; i++ {
    go doprint()
  }
}

The above code cannot guarantee that in doprint, the write operations to done and variable a are observed simultaneously. However, this does not guarantee that observing the write operation to done in the doprint function means observing the write operation to a. Therefore, this version may incorrectly print an empty string instead of "hello, world".

Generally speaking, both the compiler and CPU will optimize the program for reorder operations, and in multithreading, the execution order of threads is uncertain. A single instruction may also be split into multiple instructions. Therefore, what seems to be a single operation at the programming language level may be split by the compiler or CPU. Thus, when writing concurrent programs, it is advisable to use appropriate synchronization mechanisms to ensure the correctness of the program.

Do not try to be clever by rewriting the code like this:

var a string
var done bool

func setup() {
  a = "hello, world"
  done = true
}

func main() {
  go setup()
  for !done() {
  }
  print(a)
}

As mentioned earlier, in main, there is no guarantee that observing the write to done means observing the write to a, so this program may also print an empty string. Worse, since there are no synchronization events between the two threads, there is no guarantee that the write to done will be observed by main. The loop in main also cannot guarantee completion.

Another variant:

type T struct {
  msg string
}

var g *T

func setup() {
  t := new(T)
  t.msg = "hello, world"
  g = t
}

func main() {
  go setup()
  for g == nil {
  }
  print(g.msg)
}

Even if main observes g != nil and exits its loop, it cannot guarantee that it will observe the initialized value of g.msg.

👍 In all these examples, the solution is the same: use explicit synchronization.

Incorrect Compilation#

The Go memory model places as many restrictions on compiler optimizations as it does on Go programs. Some compiler optimizations that are valid in single-threaded programs are invalid in all Go programs. In particular, the compiler cannot introduce writes that do not exist in the original program, cannot allow a single read to observe multiple values, and cannot allow a single write to write multiple values.

That is, if there are multiple threads (multiple goroutines), the compiler will not optimize the program in the same way as it does for single-threaded programs.

The purpose of this rule is to ensure the correctness and predictability of shared variables between multiple goroutines. If the compiler allows optimizations, it may lead to inconsistent values of shared variables because the execution order of read and write operations is uncertain. Therefore, the compiler must adhere to the rules.

All the following examples assume that *p and *q point to memory locations accessible by multiple goroutines.

🎈 When writing data-race-free programs, to avoid data races, write operations cannot be moved out of conditional statements.

For example, the compiler cannot reverse the condition in this program:

*p = 1
if cond {
  *p = 2
}

That is, the compiler cannot rewrite the program as follows:

*p = 2
if !cond {
  *p = 1
}

If the condition is false and another goroutine reads *p, in the original program, the other goroutine can only observe the previous value of *p as 1. However, if rewritten, the other goroutine can observe 2, which is impossible in the original program.

🎈 Not introducing data races also means not assuming that loops will terminate.

For example, in general, the compiler cannot move accesses to *p or *q before the loop in this program:

n := 0
for e := list; e != nil; e = e.next {
  n++
}
i := *p
*q = 1

If list points to a circular linked list, then the original program will never access *p or *q, but the rewritten program will access them. (If the compiler can prove that *p will not cause a panic, then moving *p to the front is safe; moving *q to the front also requires the compiler to prove that no other goroutine will access *q.)

🎈 Ensuring that no data races are introduced also means that the compiler cannot assume that function calls always return or do not contain synchronization operations.

For example, in the following program, the compiler cannot move accesses to *p or *q before the function call (at least not without directly understanding the precise behavior of function f):

f()
i := *p
*q = 1

If the function call never returns, then the original program will again never access *p or *q, but the rewritten program will access them. If the function call contains synchronization operations, then the original program can establish a happens-before relationship before accessing *p and *q, but the rewritten program will not.

🎈 Not allowing a single read operation to observe multiple values means that local variables cannot be reloaded from shared memory.

For example, in the following program, the compiler cannot discard i and reload it from *p a second time:

i := *p
if i < 0 || i >= len(funcs) {
  panic("invalid function index")
}

... complex code ...

// compiler must NOT reload i = *p here
funcs[i]()

If the complex code requires many registers, the compiler for a single-threaded program can discard i without saving a copy and then reload i = *p just before funcs[i](). However, the Go compiler cannot do this because the value of *p may have changed. (In contrast, the compiler can spill i to the stack.)

🎈 Not allowing a single write operation to write multiple values also means that memory cannot be used as temporary storage before writing to local variables.

For example, in the following program, the compiler cannot use *p as temporary storage:

*p = i + *p/2

That is, it cannot rewrite the program as follows:

*p /= 2
*p += i

If the initial values of i and *p are equal and both are 2, then the original code will execute *p = 3, so the competing thread can only read 2 or 3 from *p. However, the rewritten code first executes *p = 1 and then executes *p = 3, allowing the competing thread to read 1.

🎈 Note that all these optimizations are allowed in C/C++ compilers: Go compilers sharing a backend with C/C++ compilers must be careful to disable optimizations that are invalid for Go.

When the compiler can prove that data races will not affect correct execution on the target platform, the rules against introducing data races no longer apply. For example, on almost all CPUs, the following rewriting is valid:

n := 0
for i := 0; i < m; i++ {
  n += *shared
}

Rewritten as:

n := 0
local := *shared
for i := 0; i < m; i++ {
  n += local
}

Provided that it can be proven that accessing *shared will not cause a fault, because potential additional reads will not affect any existing concurrent reads or writes. On the other hand, this rewriting is invalid in source-to-source converters.

Summary#

Go programmers writing data-race-free programs can rely on the sequential consistency of these programs, just like almost all other modern programming languages.

When dealing with programs with data races, both programmers and compilers should keep this advice in mind: Do not be clever.