Thursday, July 20, 2017

Golang: HTTP Client Opens Too Many Sockets ("Too many open files")

This relates to a project that is work related, so I have to fuzz some of the details. But on the other hand, some details are naturally fuzzed because I have to remember some of the details and my memory is naturally fuzzy...

I'm working on a utility that is, on the surface, simple. It makes a call to an API endpoint using the http.Client, compares some quick results, and if certain conditions are met it makes a series of API calls to save the JSON responses.

The processing/checking process is carried out in a set of goroutines because the comparisons are easy to do in parallel. If the routine needs to pull a JSON reply, it calls a function that is laid out pretty much like the standard examples from the "here's how you get a web page in Go!" sites.


func GetReply(strAPI string, strServer string) string {

    // The URL to request
    strURL := strServer + /service/" + strAPI

        // Also add timeouts for connections
        tr := &http.Transport{
            Dial: (&net.Dialer{
               Timeout: 5 * time.Second,
            }).Dial,
            TLSHandshakeTimeout: 5 * time.Second,
        }
    client := &http.Client{
        Transport: tr,
        Timeout:   time.Second * 10,
    }

    // Turn it into a request
    req, err := http.NewRequest("GET", strURL, nil)
    if err != nil {
        fmt.Println("\nError forming request: " + err.Error())
        return ""
    }
    req.Header.Set("Content-Type", "application/json")
    req.Header.Set("Accept", "application/json")

    // Get the URL
    res, err := client.Do(req)
    if err != nil {
        fmt.Println("\nError reading response body: " + err.Error())
        if res != nil {
            res.Body.Close()
        }
        return ""
    }

    // What was the response status from the server?
    var strResult string
    if res.StatusCode != 200 {
        fmt.Println("\nError reading response body, status code: " + res.Status)
        if res != nil {
            res.Body.Close()
        }
        return ""
    }

    // Read the reply
    body, err := ioutil.ReadAll(res.Body)
    if err != nil {
        fmt.Println("\nError reading response body: " + err.Error())
        if res != nil {
            res.Body.Close()
        }
        return ""  
   }
    res.Body.Close()

    // Cut down on calls to convert this
    strResult = string(body)

    // Done
    return strResult

}

This is actually a modified version of what I've pulled from various tutorials and examples, adding more calls to Close() and doing a check for whether res is nil before performing that call in an error.

I also added a timeout to the client because by default it is set to 0; no timeout. As you can probably guess this version was modified while troubleshooting.

After a few hours of the application running we had alerts come in about failing functions on the production servers. When I opened logs I discovered a number of "too many files open" errors, and a developer on the call said there were over 18,000 socket connections on each of the balanced servers.

The only difference was my use of this test program, so I killed it. The socket count fell.

Welp...guess we found the cause. But why?

There are a couple basics for beginners when using http.Client requests.
1) Close the response body after reading.
2) Clients are reused.
3) If you defer a call to Close() (as this one originally did, and most tutorials show) the function should call Close() when the function returns. The modified sample I posted simply closes it after reading the Body and checking for errors.

At first I thought it was due to clients not closing; they must close in order to be re-used. I traced the execution path a dozen ways and added more explicit Close()'s in error checks...but those errors were never printing anything during the run, so errors shouldn't be causing spill of sockets.

I added timeouts to the client and dialer. While that didn't hurt and probably made things a little cleaner, it still didn't help the too many open files/sockets error.

Another lead came from a close reading of a Stack Overflow answer. The function is creating a new Transport, tr, with each call. That Transport is what holds the Clients pool for reuse. See where I'm going with that?

Another answer on that page talked about creating a global client for his functions to reuse.

The theme was scope of variables matters when dealing with what allows re-use. Because I'm hitting the same server repeatedly and the function kept re-instantiating the mechanism that was used to govern client re-use,  the number of new connections and left-open sockets ballooned.

My next move was to go to the goroutines that were in charge of processing the replies from the API endpoints and have them create Transport instances, then when they call the function they passed the Transport as a parameter.

I uploaded the program to a remote system instance and re-ran it while watching netstat on the server and the client systems. After initially ballooning to about 4,000 connections it soon settled down to well under 100 connections (using netstat |wc -l).

Takeaways:
1) Modify the default client, and maybe the transport dialer, to add sane timeouts.
2) If you're hitting the same server repeatedly, do it all within the same scope as your transport instantiation or create a transport and pass it as a parameter to functions so you optimize the re-use of the client pool
3) Check that you properly close the response body so the client can be re-used. Check in error paths that it can be properly closed without panicking.

What about separating not just the Transport, but also the Client, then passing the Client around as a parameter? I didn't test that because I wasn't sure how "goroutine-safe" that would be against race conditions, despite the one answer on that Stack Overflow that demonstrated using a global Client instance for use.  It's possible it works fine. At this point it looks like passing the Transport worked fine, though.

I'll also note that my usual self-loathing and insecurity isn't getting the better of me this time because the top answer on that question that inspired me to try this solution was the usual advice I found repeatedly in other sites and blogs (and SO answers); check that you close your response properly. It's the top answer by a significant margin. It was almost an afterthought to realize that maybe what I was doing was pummeling one particular website with multiple instantiations of Client pools so Client reuse was a minimum.

Happy HTTP Client-ing!

Thursday, July 13, 2017

Your Experiences Create Your Methods

Sounds obvious, doesn't it?

But at the same time, I feel like it's one of those things that shapes our worldview to the point where you lose sight of the fact that it's obvious; we end up taking our views for granted and ignoring why you approach problems the way you do.

(Or, perhaps worse, we ignore why other people approach problems the way they do, which in turn you react towards them in a possibly negative fashion.)

I'm thinking of this because the other day I was working with a coworker on a problem assigned to us by a manager. Without getting into too many details, one step involved a program reading a list from a text file.

The file was tens of thousands of lines; the program expected the format:
12345,string of text,state_name

The file we got was formatted:
12345,"string of text",state_abbreviation

We were coming up with a game plan and reviewing steps when the file came in, and were divvying up the work needed to get the ball rolling.

My very first thought was to write a Go program that read the file contents into a slice, range over the slice to replace the comma-quote and quote-comma with just commas, then split by comma and replace the item[2] with the full state name using a map I could copy and paste from a previous program I had worked on. Give the size of the file to work on, it shouldn't have taken too long, from my estimation.

My coworker volunteered to reformat the file. After he completed it, I asked him how he cleaned the file. His background is ostensibly in sales, although he also does programming in PHP and can create mockups and web utilities for other employees to use in pulling reports and demos, so I expected he ran it through a one- or two-line PHP filter or something similar.

He pulled up Excel, imported the file as a comma-delimited file, then showed me a formula that pulled state abbreviations-to-full-names from another spreadsheet he already had set up.

In a way the approach wasn't too different. We split the lines into fields and used what amounted to a map to do a value replace, then export the results to a new text file. But the execution was very different. His sales experiences, and having to deal with formatting reports from our system, meant the first tool he used to solve the problem was a spreadsheet (which was a faster and more efficient solution than I was going to use for a one-off reformatting job like this.)

I've been working heavily on Go-based utilities; manipulating log files, manipulating APIs, making text dance as it was processed through pipelines and sending results through databases and monitoring systems. When I saw this text file I immediately saw strings.Split and map[string]string solutions running through my head.

What other solutions are there? Tons, no doubt. Filtering through a series of AWKs and pipes and redirects...maybe PERL...maybe PHP...I know there's plenty of people who would have used Excel to import it and alter it by hand. While I'd probably argue that the manual method could be considered "wrong", I'm equally sure there are people who would have arguments why every approach considered (or used) would have been "wrong."

In the end it was the (timely) results that mattered.

So next time you see someone with a different approach to doing something, don't be quick to criticize. Think about why that person has that approach. Maybe they do know something that is more efficient. Maybe not. Sometimes it's interesting to learn how someone came to use the methods they use and you'll learn something about what it's like for people who aren't you.