Sysadmin Still Surviving: Channels, WaitGroups and Goroutines: Simply Complex

This post is less about coding and more about creating a mental model for your application with some simple concurrency baked in. With a little code thrown in at the end.

Bob is an amateur programmer turned fast food restaurant entrepreneur. He franchised a Burrito Barn and a Pizza Boot restaurant (I made those up, but don't know if such places exist. If there is such a thing as a Burrito Barn, it sounds good, mainly because burritos are great. And now I wonder if there's such a thing as a pizza burrito...)

Bob has only these two restaurants, but he has eyes on the future, and in that future he's going to own more Burrito Barns and Pizza Boots. But Bob wants statistics on his franchises. He wants to see how busy they are during the day and what kinds of orders are going through the kitchen so he can find trends over the course of the day and optimize service options and kitchen operations. This will help ease him into a growing empire of burrito and pizza outlets.

Fortunately for Bob his two restaurants have electronic service systems that display fast food orders to the kitchen; unfortunately, they're missing the type of options he's looking for. He decides to write a Go program that will scrape the data from the restaurants and display the results on a web page.

He starts coding. On the surface this isn't too hard; it seems the task is ideally suited for two of Go's strongest features: goroutines and channels. The main() function will spawn a goroutine to scrape the pizza restaurant's order system at periodic intervals and a second goroutine to pull data from the burrito restaurant's order system. Channels can be used to pass data back to a logging goroutine.

That seems pretty straightforward. A timer in Main() launches the scraping goroutines periodically; they launch, drop messages into the channel, and the logger reads messages. The goroutines then exit, leaving the Logger() to run and listen for future launches of the scrapers.

For testing purposes, Bob wrote a small goroutine whose only job is to use time.NewTimer() to send a "quit" signal to his application (otherwise, the way it's written as he's working on it, the application just runs in a loop and killing it means no cleanup/housekeeping is done to close out his scraping processes.)

Now the scrapers run periodically with time.NewTicker(), and there's a StopTime that sends a quit message to Main() using a separate channel myStopChannel. When that quit message is sent to Main(), Main() stops launching the scrapers and then sends a "quit" message into myChannel to tell Logger() it needs to close up file handles and whatever other housekeeping chores it must perform before exiting. But how does Main() know when Logger() is done?

Bob thinks about it for a moment and decides to dump a message from Logger() into myStopChannel, and Main() will just keep reading myStopChannel until the message comes in from Logger().

There! That looks pretty simple. The scrapers run and exit until a new timer.NewTicker() ticks to kick off another scraper, Logger() is listening for information, and StopTime() runs until the timer goes off and signals a quitting time while Bob is testing everything, and Main() coordinates the kickoff and graceful shutdown of goroutines.

Bob's tests are working well, too. But then he realizes another problem; what happens when he opens another restaurant?

There are two problems here. One is that there are two PizzaScrape() goroutines running, one for each restaurant. How do we know which is running for which restaurant? They'll have to identify which restaurant they represent, and send that information with any logging information sent to Logger(). That's actually not a difficult problem to solve.

The second is how Main() knows when all the processes are done running. It's not hard to know how many goroutines are spawned. Bob wants a neat and orderly shutdown when the quit signal comes in; that means not closing until it knows for sure that all the scrape processes are done, then closing out the Logger() routine knowing that nothing will be trying to send data to it in the middle of shutting it down. There's no direct communication communication method back to Main().

Main() already sends a "quit" signal into myChannel for Logger() to process. Maybe Main() could have the scrapers listen to myChannel as well as send on that channel, and quit if they find a quit signal? Channels are the idiomatic way to communicate among goroutines, after all.

But turning the scrapers into readers and writers with myChannel is a bad idea. Why? Because channels aren't broadcasting information; they are water troughs where someone, or many someones, places a paper boat upon the water, and someone picks up the paper boat to read the message they carry. Or you can have many someones pick up the boats. But once they're removed from the water, they're removed. Other goroutines can't read them. For example...

// Chantest - a way of testing the behavior of channels and routines that receive messages

package main

import (
    "fmt"
    "strconv"
    "sync"
)

func main() {
    
    // Create a channel for communicating and a channel for quitting
    chanMyChannel := make(chan string)
    chanQuit := make(chan string)
    
    // We need to make sure the processes all finish
    var wg sync.WaitGroup
    
    // Launch 5 listening processes
    for a := 0; a < 5; a++ {
        wg.Add(1)
        go Listen(chanMyChannel, chanQuit, a, &wg)
    }
    
    // Say hello 500 times to the channel
    for a := 0; a < 500; a++ {
        chanMyChannel <- "Hello"
    }
    
    // Now say "quit" 5 times.
    for a:= 0; a < 5; a++ {
        chanQuit <- "Quit"
    }

    wg.Wait()
}

// Listen() takes the communications channel, quit channel, an identifier, and a waitgroup
func Listen(chanMyChannel chan string, chanQuit chan string, intRoutineNum int, wg *sync.WaitGroup) {
    
    // Simple counter to keep track of our hello's
    var counter int
    
    // Tell the waitgroup this process is done when we return
    defer wg.Done()
    
    for{
        select {
            // If we get anything on the channel, increment the counter
            case <- chanMyChannel:
            counter = counter + 1
            
            // If we get a quit, print out the state of the counter and return
            case <- chanQuit:
            fmt.Println("Goroutine " + strconv.Itoa(intRoutineNum)+" received " + strconv.Itoa(counter) +" messages")
            return
            
        }
    }
}

Sample chantest output:

Goroutine 4 received 17 messages
Goroutine 1 received 173 messages
Goroutine 0 received 14 messages
Goroutine 2 received 209 messages
Goroutine 3 received 87 messages

Goroutine 1 received 190 messages
Goroutine 2 received 33 messages
Goroutine 4 received 225 messages
Goroutine 3 received 26 messages
Goroutine 0 received 26 messages

See how they end up randomly distributed? They add up to 500. But the distribution across the channels is seemingly random...whenever a goroutine happened to check the channel to see if a message was waiting, it took it and processed it (by incrementing the counter). They didn't even quit in a predictable manner because there was no way to control which one got the "quit" message in what order.

The answer to Main() knowing when all the scrapers are finished, as well as confirmation for Logger() being closed, is in the chantest program. They're called WaitGroups, part of the sync package.

sync.WaitGroups "group" together processes; each process signals, using Done(), when they're exiting and are then removed from the WaitGroup pool. Ordinarily if Main() spins off goroutines and there's nothing to process or make it wait, the moment Main() hits the end of the function the application exits. WaitGroups, using sync.Wait(), block until all the processes have called Done().

Using a WaitGroup on Logger() is a simple way to know that it has exited without needing to add complication through communication channel processing.

The example here is admittedly contrived, but nonetheless demonstrates that there are added complications when dealing with concurrently-running processes (or goroutines, in this case.) Go gives you great tools for communicating among processes, but they have limitations that can trip up beginners, such as thinking that sending a message from one processes into the channel will be picked up by all the other "listeners" instead of just whichever process reads the channel first. If you have processes doing something...such as logging to a file, or processing data, or anything that should be "cleaned up" rather than abruptly cut off with a call to Exit(), it can be easy to forget that some mechanism needs to be employed to make sure they close down cleanly using something like a WaitGroup.

This also illustrates what can happen when you have a test case...in this example, two restaurants...that doesn't test for conditions you want to expand to down the road (in this case, the multiple restaurants of the same kind.) There are items here that would have worked just fine in single-checking scenarios; Main() could keep track of launching one process and used a channel to tell when that process was returning, but weird things could start happening when dealing with multiple processes (or the logic behind channels to track goroutine starts and exits could get complicated with multiple processes in comparison to the use of WaitGroups.)

Thinking of the implications of applications running with concurrent processes can be difficult to wrap your head around if you're not accustomed to the tools Go provides specifically for managing concurrency.

Sysadmin Still Surviving

Thursday, March 24, 2016

Channels, WaitGroups and Goroutines: Simply Complex

No comments:

Post a Comment