Understanding Concurrency in Go-Part 1

Understanding Concurrency in Go - Part 1

What are goroutines

An application is a process running on a machine; a process is an independently executing entity that runs in its own address space in memory. A process is composed of one or more operating system threads which are simultaneously executing entities that share the same address space.Almost all real programs are multithreaded, so as not to introduce wait times for the user or the computer, or to be able to service many requests simultaneously (like web servers), or to increase performance and throughput.Such an application can execute on a single core using multiple threads or the same process can execute on multiple cores at the same time,which is than called parallelized

Parallelism is the ability to make things run quickly by using multiple processors. So concurrent program may or may not be parallel.

An excllent talk can be found here by Rob Pike Concurrency is not Parallelism

Multithreaded applications are notoriously difficult to get right, the main problem is the shared data in memory, which can be manipulated by the different threads in a non-predictable manner,thereby delivering sometimes irreproducible and random results. The solution lies in synchronizing the different threads, and locking the data, so that only one thread at a time can change data. Go has facilities for locking in its standard library in the package sync for when they’re needed in lower level code. But this leads to complex,error-prone and diminishing performance.

Go adheres to another concept known as Communicating Sequential Process (CSP) invented by C.A.R HOARE and can be found here Communicating Sequential Process

The parts of an application that run concurrently are called goroutines in Go, they are in effect concurrently executing computations. There is no one-to-one correspondence between a goroutine and an operating system thread: a goroutine is mapped onto (multiplexed, executed by) one or more threads, according to their availability; this is accomplished by the goroutine-scheduler in the Go runtime.

So than, what are goroutines:

  1. Goroutines run in the same address space, so access to shared memory must be synchronized; this could be done via the sync package, but this is highly discouraged: Go use channels to synchronize goroutines.
  2. goroutines are much lighter than threads and have a very small footprint.
  3. They are created with a 4K memory stack-space on the heap.
  4. They use a segemented stack for dynamically growing or shrinking their memory-usage;stack management is automatic. The stacks are not managed by the garbage collector,instead they are freed directly when the goroutine exits.

Two styles of concurrency exist: deterministic (well-defined ordering) and non-deterministic (locking/mutual exclusion but order undefined). Go’s goroutines and channels promote deterministic concurrency (e.g. channels with one sender, one receiver), which is easier to reason about.

A goroutine is implemented as a function or method (this can also be an anonymous or lambda function) and called (invoked) with the keyword go. This starts the function running in parallel with the current computation but in the same address space and with its own stack

Example

func main(){
    go sayHello()
    //continue with other things
}
func sayHello(){
    fmt.Println("Hello There")
}

Anonymous functions work here too:

func main() {
    go func() {
        fmt.Println("Hello There")
    }() //Invoking the goroutine immediately
}

Goroutines and Coroutines

Goroutines are unique to Go. They are not OS threads and are not exactly green threads - threads that are managed by the language runtime.They’re a higher level of abstraction known as coroutines.Coroutines are simply concurrent subroutines (functions, closures, or methods in Go) that are nonpreemptive — that is, they cannot be interrupted. Instead, coroutines have multiple points throughout which allow for suspension or reentry.

What makes goroutines unique to Go are their deep integration with Go’s runtime. Goroutines don’t define their own suspension or reentry points; Go’s runtime observes the runtime behavior of goroutines and automatically suspends them when they block and then resumes them when they become unblocked. In a way this makes them preemptable, but only at points where the goroutine has become blocked.

Go’s mechanism for hosting goroutines is an implementation of what’s called an M:N scheduler, which means it maps M green threads to N OS threads. Goroutines are then scheduled onto the green threads. When we have more goroutines than green threads available, the scheduler handles the distribution of the goroutines across the available threads and ensures that when these goroutines become blocked, other goroutines can be run.

Communicating Between gorutines using Channels

Channels are one of the synchronization primitives in Go derived from Hoare’s CSP. While they can be used to synchronize memory access, they are best used to communicate information between goroutines.

Creating Channels

Channels can be created as:

var stream chan interface{}
stream := make(chan interface{})

Here we have defined a channel which can be written or read. However, it is possible to declare channels to support unidirectional flow of data.

To declare a channel that can only read, include the <- operator and place it on the left hand side of chan, like this:

var stream <-chan interface{}
stream := make(<-chan interface{})

And to declare a channel that can only send, place the <- operator on the right hand side:

var stream chan <- interface{}
stream := make(chan<-interface{})

Channels are typed, in the above example we created a channel of type interface, which mean we can place any kind of data in the channel.We can also give it a stricter type:

func main(){
    stringStream := make(chan string)
    go func(){
        stringStream <-"Hello,There!"
    }()
    fmt.Println(<-stringStream)
}

This example works because channels in Go are said to be blocking. This means that any goroutine that attempts to write to a channel that is full will wait until the channel has been emptied, and any goroutine that attempts to read from a channel that is empty will wait until at least one item is placed on it. In this example, our fmt.Println contains a pull from the channel stringStream and will sit there until a value is placed on the channel. Likewise, the anonymous goroutine is attempting to place a string literal on the stringStream, and so the goroutine will not exit until the write is successful. Thus, the main goroutine and the anonymous goroutine block deterministically.

If we do not structure our program correctly, it’s possible to have a deadlock situation:

func main(){

    stringStream := make(chan string)
    go func(){
        if 0 != 1 {
            return
        }    
        stringStream <-"Hello There"
    }()
    fmt.Println(<-stringStream)
}

The if block is ensuring that nothing gets placed on the channel,while the main goroutine is waiting for a value to be placed on the channel.The program will panic with a deadlock.

The receiving form of the <-operator can return two values:

func main(){
    stringStream := make(chan string)
    go func(){
        stringStream <-"Hello there"
    }()
    data,ok := <-stringStream
    fmt.Printf("(%v): %v",ok,data)
}

Here the second boolean value signifies whether the read off channel was a value generated by a write elsewhere in the process, or a defualt value generated from a closed channel. It’s possible to close a channel to help the downstream processed to know when to move on,exit,re-open communications on a new or different channel.To close a channel we can use the close keyword.

valueStream := make(chan int)
close(valueStream)

Here if we read from a closed channel, the second value will be a false.

Ranging over a channel

It is possible to iterate over the values on a channel:

func main(){
    intStream := make(chan int)
    go func(){
        //Ensuring the channel is closed before the gorutine exits
        defer close(intStream)
        for i := 1; i<=5; i++{
            intStream <-i
        }
    }()
    //ranging over the intStream
    for data := range intStream{
        fmt.Printf("%v",data)
    }

}

Multiple goroutines can be unblocked by closing a channel.If you have n goroutines waiting on a single channel, instead of writing n times to the channel to unblock each goroutine, you can simply close the channel:

func main(){
    begin := make(chan interface{})
    var wg sync.WaitGroup
    for i := 0; i<5; i++{
        wg.Add(1)
        go func(i int){
            defer wg.done()
            <-begin //Here the goroutine waits until it is told it can continue.
            fmt.Printf("%v.. has begun\n",i)
        }(i)

    }
    fmt.Println("Unblocking goroutines")
    close(begin) //Here we close the channel,thus unblocking all the goroutines simultaneously.
    wg.wait()
}

Buffered Channels

Buffered channels are given a capacity when they are instantiated.This means that even if no reads are performed on the channel, a goroutine can still perform n writes, where n is the capacity of the buffered channel. They can be declared as:

var dataStream chan interface{}
dataStream := make(chan interface{},4) // Buffered channel of capacity four

An unbuffered channel has a capacity of zero and so it’s already full before any writes. A buffered channel with no receivers and a capacity of four would be full after four writes, and block on the fifth write since it has nowhere else to place the fifth element. Like unbuffered channels, buffered channels are still blocking; the preconditions that the channel be empty or full are just different. In this way, buffered channels are an in-memory FIFO queue for concurrent processes to communicate over.

It also bears mentioning that if a buffered channel is empty and has a receiver, the buffer will be bypassed and the value will be passed directly from the sender to the receiver. In practice, this happens transparently, but it’s worth knowing for understanding the performance profile of buffered channels.

Example:

func main(){
    //Creating an in-memory buffer
    var stdoutBuff bytes.Buffer

    //Ensuring the buffer is written to stdout before the process exits
    defer stdoutBuff.WriteTo(os.Stdout)

    //Creating a buffered channel
    intStream := make(chan int,4)

    go func(){
        defer close(intStream)
        defer fmt.Fprintln(&stdoutBuff,"Producer Done.")
        for i=0; i<5; i++{
            fmt.Fprintf(&stdoutBuff,"Sending: %d\n",i)
            intStream <- i
        }
    }()
    for data := range intStream{
        fmt.Fprintf(&stdoutBuff,"Recieved %v \n,data)
    }
}

The anonymous goroutine is able to place all five of its results on the intStream and exit before the main goroutine pulls even one result off:

Sending: 0
Sending: 1
Sending: 2
Sending: 3
Sending: 4
Producer Done.
Received 0.
Received 1.
Received 2.
Received 3.
Received 4.

Summary of Channel Operations

Operation Channel State Result
Read nil Block
Open and Not empty Value
Open and empty Block
Closed default value,false
Write only Compilation error
Write nil Block
Open and Full Block
Open and not full write value
Closed Panic
Receive only Compilation error
close nil panic
Open and Not empty Closes Channel;reads succeed until channel is drained
then reads produce default value
Open and Empty Closes Channel;read produce default value
Closed panic
Receive Only Compilation Error