Friday, December 22, 2017

Golang Web Server: Don't Do This

I still consider myself new to programming. The new job allows me to create a lot of small system tools using Go mostly for augmenting monitoring and create utilities to replace manual API calls using JQ and CURL with single executables created in Go. It's been a wonderful learning experience.

Sometimes I try to add some new features to utilities that are snazzy but also a bit of an experiment.

This is a bit of reflection on the design I originally used and I am not in a mood to pull out layers of source code to show what I had done, especially if no one is asking for it. But I will describe the basic design in an effort to not only avoid implementing it that way again but to warn others not to make the same design pattern mistake.

The utility is mainly a long-running process that is interrogating one of our services for database information. It gets raw data from the database, pulls some stats like record size and type, and tallies the information. Millions and millions of records.

What if, I thought, I provided a peek into what the state of the tallying is beyond what I already had showing? It would output a count of some basic information as a one-liner every thirty seconds to the console, but that wasn't good enough. I thought, why not create a web interface that would output a simple text page of information?

Go loves channels. And I had several "worker goroutines" that handled specific tasks in the tally program, passing messages to a coordination process that serialized scheduling record analysis, directing results, and monitoring the state of various workers. Breaking them up made things pretty fast once I stuck in a few tweaks here and there.

Adding a web server routine wasn't hard. Then I thought, I could just add a couple of channels to plug them into routines that held statistics.

Here's where I made what later turned into a mistake.

Instead of individual handlers, I created a single handler that took message strings via channels. The messages consisted of a random ID and a type, where the type was the page request.

The reader on the other side of the channel split the message, used a select{} to determine which page it should construct, and returned through another channel the page with that ID string prepended. The receiver on the other side would look for the message and see if the ID belonged to its request. If it wasn't the proper ID, it just re-fed it to the channel, hoping that the right recipient would pick it up later, and the next message in the channel was intended for that particular reader. Line by line the page was fed back down the channel, with the ID attached to each message, until the ID was attached to a message: "END OF PAGE", at which point the page was done and connection closed.

Don't do that.

The thing is, this seemed to work. I opened a web browser, opened the page, and it worked. I could request the different pages and it worked just fine.

It worked until one page got kind of big and I opened two web pages to the server. Something seemed to get "stuck." One of my statuses gave a snapshot of the fill state of some channels and I noticed some of the web-related channels were...throbbing? Growing huge and slipping down, as if revving up with more lines of messages than should possibly be needed. Something was getting misdirected and the lightweight speed of goroutines meant it was flooding channels with useless information.

No problem, I thought. I'll add a third field, a counter, which once it reached a certain level would simply discard the message. The web page was meant to be read by a person who was trying to get some stats on the status of this utility while it was running, not the general public...refresh the page, hopefully you'll get a working reply that time. Sloppy, but might work.

Tested again. It seemed to keep the channels from getting as clogged up, but I still had some kind of crosstalk that when pages grew larger, and it wasn't hard to create some kind of denial of service from the web server when two different pages were opened. It almost seemed as if sometimes the two pages got completely confused which tab was supposed to get what page.

Maybe it was too easy to get messages mixed up because pages were feeding line by line. I went through the page composition and instead of feeding each line through, I had the process create one big string and feed the result.

This cut down on responsiveness but increased reliability. Kind of. It was significant, but not enough to be proud of. If anyone tried pulling a web page from the utility while someone else used it there was a non-zero chance it would get a weirdly formatted page, if not a timeout.

After finishing some work on other utilities, I decided to refactor the 4 web pages into their own handlers with separate functions and move some of the information being read into global structs with mutex's for protection. Before making the change I ran a test with Bombardier, a handing web server throughput tester. The test totally choked on the channel handler architecture.

I refactored, separated out the page composition into individual handlers, and eliminated channels for web page feeding. No more IDs. No more parsing out replies. No more tracking how many times this particular message is making rounds before "expiring" it.

Bombardier hammered away on the server with no issues. Multiple tabs reading different web pages? No problem. The biggest trigger for problems, clicking back or a link to one of the other pages while a large page hadn't finished rendering, was no longer a problem.

What I wanted to do was find a way to read a URL request and use one handler to interpret what the client wanted, so I didn't need a number of individual handlers defined. I'm pretty sure I still could do that, but I think the weakness was in using channels with an associated ID to parse replies back to the client from a dedicated goroutine holding stats.

The solution I ended up using was individual functions that read from a global struct holding the current state of statistics, and this was protected with a lot of locking.

I suppose another way to do it, with channels, would be finding a way to spawn dedicated channels with each request so the replies didn't need parsing or redirecting; a channel with multiple readers has no guarantee of who is going to get the message at what point. This kind of fix seemed needlessly complicated, though.

I suppose I could also have enhanced the global statistics struct to have functions associated with it, so calls could be made that would automatically lock and reply with information requested by callers. The utility is relatively small, though, and I thought that implementing that would have been more complicated than necessary. I'm not sure if this would enhance the speed of the program, though, and may be worth trying for the learning benefit.

But what I definitely now know is not to pass web pages as composed lines with an ID tagged down a shared channel for a reader to parse and decided, "Is this line meant for me? No? Here, back into the channel you go, floating rubber ducky of information, while I read the next ducky...float away!"

Don't do that.

Sunday, November 26, 2017

StackOverflow and Newcomers

Stackoverflow (SO) is the premiere question and answer site for programmers. It's a joke now that when SO goes down, programmers go home because no work can get done. It is their mission to make life better for programmers, and the men and women working behind the scenes at SO have poured much sweat and tears into growing a useful community for programmers to share solutions to various problems encountered in their algorithm-laden lives.

That is not to say there aren't issues, though. As the site has grown (and it is bit on the huge side now) SO has had to make decisions that define (and refine) the site's character, and not all of these desicions have passed without detractors. They have also had to try addressing criticism of the site, and one of the most common criticisms seems to be related to how (un)welcoming the site can be for newcomers.

I think I can relate to this. I am not a programmer by trade, but I do try to create useful utilities for use in my day job and enjoy programming in at least a hobbyist capacity. I am not very confident in my abilities, though, and definitely do not need someone to remind me of an obvious skill gap (why do you think I'm asking the question in the first place?)

I do not have the answers regarding how to make SO more welcoming to beginners. Perhaps once a community grows to a certain point it naturally fractures into a strata of people who are skilled to a point where they aren't aware of their own bias against lesser-experienced individuals. Or maybe there are rules in the system that encourage what one person interprets to be a "man up, you snowflake!" mentality while an insecure individual interprets the same feedback system to be validation that they don't have what it takes to join with programming peers.

I suppose that when so much of the technology culture centers on a "Brogrammer" mentality rife with competition using knowledge and perceived cleverness as a ranking system, it's natural for some snark to become ingrained in interactions among programmer peers. It's not hard when reading some comments and answers to a SO question to sense a tone of judgement, that the questioner must pass some bar of having earned an answer before they may have one, something beyond the basic search of the site for the same problem before duplicating it.

There have been cases where people will take more time to criticize the questioner than it would have taken to edit or refine the question into something useful and post an answer.

Sometimes it seems you can do everything seemingly right but still fall short in someone's judgement; the ability to down vote a question while leaving no constructive feedback and incurring no penalty in the process (except to the question-asker) seems like a pretty obvious way to discourage interacting with the community for help.

Note that I'm not saying down votes are necessarily bad, although I do wonder if alternative feedback methods could be useful. I'm saying that one of the more frustrating interactions on the site, in my experience, stems from being penalized and not knowing why; if you down vote, maybe you should have to leave some constructive feedback or enhancement to fix the problem or take some penalty to your own Internet-points reputation score.

For example, I recently had trouble with an intermittent panic when exiting a Go utility and posted to StackOverflow for help. I posted a title that succinctly summarized the issue. I posted the panic message. I posted the function definition. The panic had a line number from the definition that seemed to trigger the intermittent error; I posted the specific "line X is..." followed by the line of code so there was no question what snippet triggered the panic. I tagged it with appropriate tags. There were a couple of comments, and I posted a link to another question citing some code to explain (justify?) why I implemented the function call the way I did. What happened?
I took two down votes of penalty to my reputation.

In the comments I asked if the down voters could explain what I could do to improve the question for future reference. After all, SO may be for answering questions related to your immediate problems, but it's also supposed to be of use to future questioners looking to solve similar problems. Last time I checked no one explained why they did it.

The nearest I got to helpful feedback on the down votes was from one of the helpful people who submitted an answer to my question; that person speculated that it was because I had not RTFM'd to the satisfaction of some of the other users since the problematic line was in the panic and the source code for a function call used in my definition shows it probably didn't like a nil context parameter.

So as a relatively insecure beginner, I crafted a question with lots of context, source code, and clarification, only to get dinged with damage (negative reputation) by anonymous clicks from people who couldn't leave a reason why or offer feedback on improving the reference value of the question.

It shouldn't be difficult to understand why this would be discouraging to some people, especially when the goal (I thought) was to build a useful reference for many people, not (possibly) penalize someone for not meeting some arbitrary criteria for having passed a bar of RTFM to be blessed with community membership in order to be assisted without a passive aggressive backhand.

I don't count myself as a detractor of StackOverflow. I have found help from members of their community to be invaluable. I do wonder if some of the feedback mechanisms sometimes encourages certain behaviors that deter less experienced and less thick-skinned programmers from interacting while enabling programmers with the "rock star" or "ninja brogrammer" mindset to set a less friendly tone. There comes a point where it's less commiserating and sharing with a community and more a necessary chore to solve a problem, and I suspect the gray area of that transition is where new users begin complaining about the tone of the site.

Friday, November 3, 2017

Turning 40

I turned 40 this week.

Four decades. I remember there was a time I thought I'd grow up to "die alone as a hermit in the woods." I remember thinking maybe working as a programmer for Microsoft would be interesting. There was a time I thought I might become a marine biologist, specifically an ichthyologist, and study sharks. Later on I even flirted with the notion of working to become a successful author.

Today I'm not working for Microsoft. I don't live in the woods, although the town I reside in is rapidly withering economically and some might argue our tiny dot on the map is not far removed from being woodland. I don't even own diving equipment and am nowhere near the ocean (although we do live on a river that ends in the ocean, if you want to travel a few hundred miles.) The closest I've come to becoming an author was finishing and editing exactly one manuscript.

I'm pretty sure, at this point, that I have depression issues. I know it's more common today for people to talk about depression. For some people it is dismissed as an excuse of the week, or they brush it off as a "feeling blue" thing that you can exercise away or "just cheer up" to move past; "Just cheer up!" they say, totally ignoring that clinical depression is a thing.

While this little shadow has always been lingering to some degree in the back of my mind, I've had some things really raise that shadow higher in prominence in the past few years. It would take chapters of a book to cover details, but the highlight reel would include attempts by my wife's employer to eliminate her from her job using what could be (in my view, as this is my opinion) charitably be labeled slanderous accusations. That was a year-long ordeal that took a huge emotional and financial toll on the family.

After that drawn out mess, things finally felt like they were turning around. There was a light at the end of the tunnel! Unfortunately, it was a train's headlamp.

The employer I had come to rely on for emotional and financial support decided to terminate my contract, which is a nice way of saying I was sent home with a box of my belongings. Now it was my turn to plunge into a world of uncertainty, doubt, and the five stages of grief. I was blindsided and even the act of getting out of bed felt like fighting a dark shroud squeezing the life out of me.

Worse yet, if you feel like taking a moral stance and voicing support for teachers in the never ending fight over contracts, even if your family has been working in public education for decades, even if you do this by pointing out actual evidence straight from the faces of the people you feel are in the wrong, you might want to think twice if this takes place in a town that is turning into the economic equivalent of a mummy and you might have to return and look for a job. I made some statements that gained some traction among certain circles here; at the time I felt secure in the idea that my employment was secure in the land of gummy bears and unicorns. The reversal of fortune played right into the hands of depression's self doubt and uncertainty, whispering that "they" are laughing at my incompetence as I searched for job openings in a town propped up by Wal-Mart, McDonalds, a hospital system and the public education system whose administration and board are not pleased with you for writing something that was popular for a couple days among their staff.

I also experienced firsthand the silence from most of the people I had taken for granted as friends and associates from what I eventually came to regard as my "previous life."

These were two major events. I was already dealing with issues and stresses that many others have to deal with in life. These two major events just fanned the depression flames.

Now we have a national problem; we became a Trumpster fire nation. Every day came a new display of ignorance and people taking pride in how terrible they can be. I don't feel that there's much to act as a counterbalance against the papercuts of negativity he and his followers display.

It's been a long, stressful, painful period of time.

It's also been nearly a year since I started my new job, which gave me some sense of self worth again. Slowly it helped build up some sense of validation that I'm not worthless. I'm not sure if that makes sense or if I'm laying another misplaced sense of power into the hands of something in which I shouldn't emotionally invest. But for now it's there and helping me.

My family has been supportive during this emotional roller coaster, or tried to be. I don't think I quite acknowledge the good they do as much as I focus on negative things that families deal with. That's a side effect of both depression and Aspergian brain wiring, I think. Given the reflection hitting four decades of sentience has triggered, I think I need to continue trying to improve on that behavior.

All of these things have combined into a hazy mire that congealed into a cloud around me, affecting my worldview and keeping me in a perpetual weariness. I thought my birthday, despite being a magic number (I love the number 4, and 10 is a binary number as well as the number of digits on my hands and the number of digits on my feet, and is even, and possesses several other attributes that lend an irrational appreciation in my mind), would be yet another quiet passage marked by some cards and well wishes and soon forgotten. It was even on a Wednesday, my least favorite day for events to occur.

Usually the big booster in looking forward to my birthday is that it is preceded by Halloween. I love the idea of Halloween; the image of trick or treat, costume parties, awesome DIY costumes, parades, and horror movies are so much fun for me. But this year was different; the Friday before my birthday brought an announcement that indictments were coming against Trumpster acquaintances! After an anticipation-filled weekend, Monday had people brought in to testify, and we discovered one of his campaign associates had already pled guilty to lying to police and was cooperating with investigators!

We went out for dinner on my birthday with my in-laws and parents. One of the TV's played MSNBC's coverage of Trump's Russian connections and the mounting investigations. I was giddy.

My birthday was also marked by the Daily Show having an interview with Hillary Clinton. I don't know why that made me happy...I guess because she's the symbol of everything "I told you so" during the Presidential election.

These were things that worked to fight the shroud of depression whispering in my ear, and were totally counter to the idea that my 40th birthday would be quiet. These were things that were happy events for me.

There were other, not so happy events that marked the birthday-time. Unexpected shocks like the guy who rented a truck and ran over bike riders in downtown Manhattan. Because he wasn't white, it was labeled as an act of terrorism, unlike the recent Vegas shooting of around 600 people by a white guy where the fallout is basically several people going bankrupt from medical bills and modifications the shooter made to his guns staying perfectly legal and Congress clutching pearls at the idea that nothing can prevent these things from happening.

Yet another shocking event involved layoffs at a previous employer. I discovered it as oddly worded and vague tweets began floating along my Twitter timeline; today there was a Techcrunch article giving conflicting details of what had happened. In the end I could only confirm that a relatively large number of people were let go, some of whom I knew and had worked with so it wasn't just trimming the newest of hires. In keeping with the "Me me me!" theme, this news caused me to revisit all the thoughts of despair and hopelessness that I felt as my wife drove me home from the apartment after I was told my time there had ended. I empathized with what must be a swirl of confusion and fear that these people now feel. I also watched as people who escaped the cutting block echoed their support for one another and words of sadness to their departed colleagues. Selfishly I felt like the bandage was ripped off an old wound.

I turned 40 this week.

Nothing I thought was going to happen as a teen happened. Getting older shifted into a pattern where almost every day blended into the next; mostly unremarkable, smeared with a veneer of depression and frustration, life is mostly a comfortable pattern of routine. I expected it to be yet another average day, but this birthday was marked with some surprises. Some good. Some bad. But one thing this birthday wasn't is uneventful.

Wednesday, October 18, 2017

Reflection on Coding

There's a subject I've been thinking about lately. I suppose it's more of a feeling than a topic; I'm not even sure how to put it into words.

I have a vague feeling that I've discussed it before, too. In some form. On the other hand, maybe writing about it will help get it out of my head.

The best I've managed to do to express this feeling is to frame it as "elegant beauty," or a kind of beauty that comes from expression through the logic of programming.

It's not that this is an entirely new concept. I've often read descriptions of Ruby as poetic, and there are other works that try examining questions like whether programming is more art than science, or whether programming is poetry.

Perhaps part of this is my own brains weird wiring. I sometimes have trouble understanding poetry; good poetry can "work" on so many levels. Clever word use, double entendre, use of linguistic beats to emphasize points, references to other events and works, parallels to other art forms...I'm sure my wife, an English major, is able to expound on (and expand) the topic far more than I.

Programming adds yet another dimension: it is functional. It takes a language, with its own unique grammar and syntax, and processes input into something else. It's an expression of formulas through rules. If you get the syntax wrong, your work won't compile into a finished product. Programming is notoriously unforgiving when straying from the language rules.

And yet programs that take a set of input and produce the same output can still have so much variety!

I suppose a simple example can use the infamous FizzBuzz program. It's a staple of many a coding interview; relatively simple, it has, over time, become almost cliche (and in some circles, despised, depending on the blogs you read and the type of programmer bemoaning how demeaning it is to be asked to demonstrate it...)

The rules are simple; usually some variant of, "Count from 1 to 100, and if a number is divisible by 3, print "Fizz." If it is divisible by 5, print "Buzz". If it is divisible by 3 and 5, print "FizzBuzz." Otherwise, print the number.

The simplest and most crude way to program this is to literally lay out a program that counts from 0 to 100 and use if statements to output Fizz, Buzz, and FizzBuzz in the appropriate places. It would achieve the goal of the rules, but be highly inefficient and inflexible.

The next step up might be something like this:

// FizzBuzz
package main

import (
 "fmt"
 "strconv"
)

func main() {

 // Create a loop to count 1 to 100
 for i := 1; i <= 100; i++ {

  // Create a string variable that gets reinitialized each iteration
  var strOutput string
  strOutput = ""

  // Fizz on 3
  if i%3 == 0 {
   strOutput = strOutput + "Fizz"
  }
  // Buzz on 5
  if i%5 == 0 {
   strOutput = strOutput + "Buzz"
  }
  // Otherwise, output the number
  if strOutput == "" {
   strOutput = strconv.Itoa(i)
  }
  // Print the result
  fmt.Println(strOutput)
 }

}

If you know modulo, FizzBuzz is a pretty straightforward logic problem. But what if you didn't know about that piece of math?

// fizzbuzz-simple.go
package main

import (
 "fmt"
 "strconv"
)

func main() {

 for a := 1; a <= 100; a++ {

  var strOutput string = ""

  intTmp := a / 3
  if intTmp*3 == a {
   strOutput = "Fizz"
  }

  intTmp = a / 5
  if intTmp*5 == a {
   strOutput = strOutput + "Buzz"
  }

  if strOutput == "" {
   strOutput = strconv.Itoa(a)
  }

  fmt.Println(strOutput)
 }

}

This is probably a little slower...to be honest, I'm not sure if the compiler would optimize this into similar binary algorithms. But the end result is still the same.

The first issue I'd have with the basic implementation is that it's not very modular. It might be better to use a function to determine the fizzing and the buzzing.


// fizzbuzz-func.go
package main

import (
 "fmt"
 "strconv"
)

func main() {

 // Create a loop to count 1 to 100
 for i := 1; i <= 100; i++ {

  // Fizz on 3
  strOutput := CheckMod(i, 3, "Fizz")

  // Buzz on 5
  strOutput = strOutput + CheckMod(i, 5, "Buzz")

  // Otherwise, output the number
  if strOutput == "" {
   strOutput = strconv.Itoa(i)
  }

  // Print the result
  fmt.Println(strOutput)
 }

}

func CheckMod(intCount int, intCheck int, strLabel string) string {

 if intCount%intCheck == 0 {
  return strLabel
 } else {
  return ""
 }

}

This version includes a simple CheckMod() function that can be called to see if the remainder when divided by a supplied integer should get a label; now it takes minimal editing to change the numbers for which Fizz, Buzz, or FizzBuzz are used as output!

And, of course, this still has the same output as the previous versions.

But what if we don't want to keep modifying the source code to alter the Fizz and Buzz triggers? That's simple too.

// fizzbuzz-func-flags.go
package main

import (
 "flag"
 "fmt"
 "strconv"
)

func main() {

 intCountTo := flag.Int("countto", 100, "Count from 1 to this number")
 intFirstNum := flag.Int("firstnum", 3, "First number to label")
 strFirstLabel := flag.String("firstlabel", "Fizz", "First label to substitute")
 intSecondNum := flag.Int("secondnum", 5, "Second number to label")
 strSecondLabel := flag.String("secondlabel", "Buzz", "Second label to substitute")
 flag.Parse()

 // Create a loop to count 1 to x
 for i := 1; i <= *intCountTo; i++ {

  // Fizz on y
  strOutput := CheckMod(i, *intFirstNum, *strFirstLabel)

  // Buzz on z
  strOutput = strOutput + CheckMod(i, *intSecondNum, *strSecondLabel)

  // Otherwise, output the number
  if strOutput == "" {
   strOutput = strconv.Itoa(i)
  }

  // Print the result
  fmt.Println(strOutput)
 }

}

func CheckMod(intCount int, intCheck int, strLabel string) string {

 if intCount%intCheck == 0 {
  return strLabel
 } else {
  return ""
 }

}

Now there are command line flags that designate the Fizz and the Buzz (as well as possible new labels for Fizz and Buzz) and the number to count to!

Because there are defaults added in to the flag variables, the default version of this...with no flags set at the command line...will have identical output to the previous applications.

This version added quite a bit of flexibility to the program, and that flexibility is accessible from the command line by the end user. There is another problem, though; if you intend for an end user to use this application, there should be some sanity checking for the things they can change.

// fizzbuzz-func-flags-errcheck.go
package main

import (
 "flag"
 "fmt"
 "os"
 "strconv"
)

// A struct of flags
type stctFlags struct {
 intCountTo     *int
 intFirstNum    *int
 strFirstLabel  *string
 intSecondNum   *int
 strSecondLabel *string
}

func main() {

 var strctFlags stctFlags

 strctFlags.intCountTo = flag.Int("countto", 100, "Count from 1 to this number")
 strctFlags.intFirstNum = flag.Int("firstnum", 3, "First number to label")
 strctFlags.strFirstLabel = flag.String("firstlabel", "Fizz", "First label to substitute")
 strctFlags.intSecondNum = flag.Int("secondnum", 5, "Second number to label")
 strctFlags.strSecondLabel = flag.String("secondlabel", "Buzz", "Second label to substitute")
 flag.Parse()

 EvalFlags(&strctFlags)

 // Create a loop to count 1 to 100
 for i := 1; i <= *strctFlags.intCountTo; i++ {

  // Fizz on 3
  strOutput := CheckMod(i, *strctFlags.intFirstNum, *strctFlags.strFirstLabel)

  // Buzz on 5
  strOutput = strOutput + CheckMod(i, *strctFlags.intSecondNum, *strctFlags.strSecondLabel)

  // Otherwise, output the number
  if strOutput == "" {
   strOutput = strconv.Itoa(i)
  }

  // Print the result
  fmt.Println(strOutput)
 }

}

func EvalFlags(strctFlags *stctFlags) {

 if *strctFlags.intCountTo <= 0 {

  fmt.Println("-countto must be greater than 0")
  os.Exit(1)
 }

 if *strctFlags.intFirstNum <= 0 {

  fmt.Println("-firstnum must be greater than 0")
  os.Exit(1)
 }

 if *strctFlags.strFirstLabel == "" {

  fmt.Println("-firstlabel must have a text label")
  os.Exit(1)
 }

 if *strctFlags.intSecondNum <= 0 {

  fmt.Println("-secondnum must be greater than 0")
  os.Exit(1)
 }

 if *strctFlags.strSecondLabel == "" {

  fmt.Println("-secondlabel must have a text label")
  os.Exit(1)
 }

 // Done
 return
}

func CheckMod(intCount int, intCheck int, strLabel string) string {

 if intCount%intCheck == 0 {
  return strLabel
 } else {
  return ""
 }

}

Now the application checks for things like labels being set to some kind of string and not an empty string, and all the numbers are set to something greater than 0. Basic error checking.

And once again...the output, by default, will match the output of the previous programs!

These are all rather straightforward. It doesn't really take advantage of features specific to Go, like channels (Here is the link to the Go playground implementation from Russ Cox, reproduced here:)


package main

import "fmt"

func main() {
 c := generate()
 c = filter(c, 3, "Fizz")
 c = filter(c, 5, "Buzz")
 for i := 1; i <= 100; i++ {
  if s := <-c; s != "" {
   fmt.Println(s)
  } else {
   fmt.Println(i)
  }
 }
}

func generate() <-chan string {
 c := make(chan string)
 go func() {
  for {
   c <- ""
  }
 }()
 return c
}

func filter(c <-chan string, n int, label string) <-chan string {
 out := make(chan string)
 go func() {
  for {
   for i := 0; i < n-1; i++ {
    out <- <-c
   }
   out <- <-c + label
  }
 }()
 return out
}

I should note that I created a blog past blog post that explored the channels implementation above...

The simple Fizz Buzz test in the forms above have the same output, but it's accomplished in many ways. I'm sure there are people who would be able to send variations that also have the same end result using a different algorithmic logic; logical, and possessing a strict set of rules that must conform to the expectations of the compiler, but still arriving to the same destination through different means.

To understand the source code means twisting your brain into understanding how the programmer responsible for the source code thinks and expresses his or her way of thinking against those rules of the programming language's grammar and syntax.

The examples above are a peek into some of the evolution in my own thinking about how to program a task, how my own thinking in Go had gradually focused on aspects to increase maintainability and flexibility while accomplishing a goal. I wonder if this is the kind of evolution that is looked for by interviewers for programming jobs...although that's a dangerous thought, considering that the expectation for defining the rungs of skill on that ladder of skill could be dangerously arbitrary.

I'm still refining my methods of modeling tasks when programming. I'm changing workflows, how I comment, and what I comment. I still occasionally reel back, perplexed, when seeing some samples of other people's code and have no idea why...or how...they thought the problem through the way they did.

Each sample I write or read is a reflection of the person who wrote it.

Sometimes I wonder what my own reflects about me.

Sunday, September 24, 2017

One Example of How To Tell When Obligations are Wastes of Time

(Disclaimer: everything here is my own opinion. I ordinarily shouldn't have to mention this, but there are times where mentioning certain catalysts for thoughts tends to make those catalysts angry and definitely not do things that resemble acts of retaliation against people who aren't me but are related in some way to me. This isn't even about them. But I feel I have to explicitly state that because sometimes the catalysts may not be very good at reading comprehension.)

The local paper recently had an article announcing that a local school superintendent was awarded the highest rating for his performance. I personally wasn't too surprised given that during negotiations over teacher contracts, questions for his opinion on the matter were something to the effect of, "I serve at the will of the board."

But I wouldn't speak ill of the school board. When criticized, things happen that are definitely not retaliation against people who are related to me in their district. And this isn't about the board. It's about the obligations that make boards...or any regulated body...look like they're doing work when really it's a waste of time and opportunity to rubber stamp their own work (or use it as an excuse to get rid of someone that displeases the regulated body in some way).

The news report had said that there were four performance ranks that could be given: distinguished, proficient, needs improvement or unsatisfactory. Having a set of scores aggregated in sets of one to four isn't necessarily bad...even Netflix now has a rating system based on a 1 or 2, which of course eliminates all the nuance of "The movie didn't make me want to throw up, but I definitely wouldn't want to watch it again" and instead reduces the viewing experience to "I LOVED THIS FILM" or "This film is so terrible that it will become a niche cult classic in 10 years when the latest group of self-appointed film buffs rediscovers it and nit picks the flaws into virtues."

What areas were evaluated? "Professionalism, human resource management, district operations and financial management, student growth and achievement, organizational leadership, and communication and community relations."

A key question to ask is, how are these evaluated? These are standards. It's spelled out in school laws established in "1949 Act 14": "...the employment contract for a district superintendent or assistant district superintendent shall include objective performance standards..."

To figure out if this is actually useful or a waste of time when referring to an obligatory standard, you need to ask yourself against what ruler the standards are measured, and ask how the areas being measured are established.

The article said nothing about the scores other than the board members sat down and filled in sheets that were aggregated and found to be wonderful. Some objectives seemed like they'd be easy to measure, such as "student growth and achievement," something that has plenty of semi-effective rules regulating measures of student test results. Other things are blatantly arbitrary. How do you measure professionalism? You get a minus one each time you show up wearing a clown outfit? Or do you get a minus one for not wearing a tie, a minus two for wearing a "fun" tie, and a minus five for dressing as Pennywise? 

Not having standards that can be objectively measured is a strong indication that you're dealing with a feel-good waste of time.

What about what or who establishes the items to be measured? This time around the superintendent had to post a list of what was to be measured on the school website. After some digging around, I found the list. Apparently the list is determined by the person being evaluated, then the board okays it (which again is allowed by the school code...it turns out the "standards" for evaluation are "mutually agreed to").

I won't comment on how weird it is that the first half of the letter to the board is a word for word match to another district's older set of "standards" (although it does make me wonder if those items are actually, as it states, "set forth in the Superintendent's Contract are as follows:"...)

Instead I'll point out statements such as, under "School District Operations and Financial Management", that the "Superintendent shall manage effectively, ensuring completion of activities associated with the annual budget, oversee distribution of resources in support of School District priorities, and direct overall operational activities within the School District."

What does that even mean? Manage effectively meaning, this job is completed? And what is the job? Ensuring the completion of activities related to the budget would basically mean you check in on the person or people in charge of actually creating the budget. Oversee resources being allocated to District priorities means what, if not making sure money goes into proper budgets and books go to the right classes? And directing overall operational activities means he's in charge of the district which, oddly enough, is what a superintendent DOES.

This whole paragraph sounds like he's being evaluated on whether he actually does his job. And I also noticed there's no actual gauge by which to measure it. The measure is arbitrary.

There's a section called Organizational Leadership, under which it states, "Superintendent shall work collaboratively with the Board to develop a vision for the School District, display an ability to identify and rectify problems affecting the School District, work collaboratively with School District administration to ensure best practices for instruction, supervision, curriculum development, and management are being utilized, and work to influence the climate and culture of the School District."

What does that mean? The superintendent will work with the board to establish a vision for the district, which under ordinary conditions would make sense, except when he clearly said during contract negotiations that he serves at the will of the board. The translation would therefore imply that either the board is coming up with the vision, or he's going to propose something that the board will vote to pass if they don't want to come up with one.

And "display an ability to identify and rectify problems affecting the School District"? I'd be interested in hearing someone talk about a time when a superintendent talks about the problems of their district. I don't recall hearing something like that from the superintendents of our local districts.

The last part is also vague--influence the climate and culture of the district? I'm not sure there is an objective measure for cultural influence. Most "culture and climate" I've heard regarding the school comes from the community, and much of that is influenced by the public and opinions spread by the school board during contract negotiations...and it's rarely positive. The statement itself doesn't even say he's going to positively or negatively influence the climate and culture. As the "head" of the district serving at the pleasure of the board, he could achieve this objective just by establishing a baseline expectation that when an issue is brought to his attention, the staff knows what they'll expect will probably happen, for better or worse.

And again, this has no objective measure against which to base a standard to score.

That brings me to the next sign you're dealing with a waste of time. The language is flowery, but vague. Stopping to translate paragraphs into actual meaning shows they aren't really meaning much at all once boiled down.

The last part of the letter is supposed to spell out how he is going to meet his objectives. It has items like, "Increase interventions and remediation's (sic) for students who need it most before, during, and after school", and, "Create a long range and comprehensive strategic plan - WILDCAT 2025".

If you thought I'd call this a waste of time, you'd be wrong. The list reads like a checklist, and having a checklist isn't a bad thing. If your goal is to get these things accomplished in the course of the upcoming year, that's great.

If anything were wrong with it, it's that this is a checklist in the context of a subjective set of standards by which to measure the performance of the person in charge of the district. If you judge a sports player and his or her checklist includes an item to improve the distance he or she throws the ball, that's great. But how much? 10% farther? 10 feet farther? Does he or she get points based on how many feet they improve the throw, or the quality of the throw by combining the distance with accuracy?

So why go through the effort of publishing a story about a superintendent being rated insanely great by the board that hired him in the first place and spent the past year "serving at the will of the board?" It's entirely a matter of speculation, and I can't engage in speculation because that could lead to definitely not retaliation. And this evaluation is just one more example of something mandated by the state that probably started with good intentions and mutated into a pathetic waste of time as it bounced around various fingers before becoming part of the law. But it's important to be able to apply critical thinking and differentiate when something reaching the public is worthwhile news and when something is little more than a waste of time.

Monday, September 4, 2017

Formatting Woes

Am I the only one that uses Blogger and keeps discovering formatting issues?

I take time to review my posts. I preview them. I lay out the fonts and paragraphs to include spacing that breaks up the sections for increased readability. I wrap text around graphics and use captions for text specific to that image.

It seems like no matter how much time I put into carefully laying out the format of the page, at some point I view the post as a regular user and...WHY IS THE SPACING GOOFED UP?

Is it Blogger? Is it a side effect of the template used for the layout? Certain fonts used?

I don't know.

I just know that it's incredibly frustrating.

It seems odd that even when I preview a post, adjust spacing, and finally post, the end result is still..."off".

I have several subjects to write about. Periodically noticing screwed up posts led me to write this up first.

As I type this I'm still using blogger...but I'm tempted to try another platform. Maybe someday I'll shift everything to another site and if I do, it will simply not make sense because formatting actually looks sane.

On the other hand, moving to a sane site...different template?...may cause the adjustments I tried using to "fix" errors to actually make things weird in some other way.

Maybe time will tell.

Tuesday, August 1, 2017

More Tuning Golang Apps for High Concurrency Tasks on Linux

I have a project that is fairly straightforward. Again it's work related, so I have to fuzz some details, and again my memory is naturally fuzzy so I doubt it's an issue.

Background


This program I've been working on makes calls to a service (a REST endpoint) that in turn pulls data from a database, then my application parses that information into components and checks the disk to see if the file already exists. If it doesn't exist on disk already, the program makes a call to the API endpoint again asking for specific record information and writes it to the disk. In the end I get a huge set of files sorted in a structure resembling ./files/year/month/day/datatype/subtype/filename.txt.

There are literally millions of records to sort through. A single thread handling this would probably take weeks. Therefore, the program uses several (configurable!) goroutines to pull records simultaneously.

First Problem: Too Many Open Files


I wrote about this fix earlier in the blog, but I'll give a quick recap.

At first everything seemed fine. I have a simple bit of math being output periodically to the console, and it was chugging along at around 20,000 records/minute. The system was functioning fine, no errors were showing up. All was right with the world.

Then a few hours later a few alerts arrived in the email. At this point the utility was running on its own instance with its own storage and was making calls to a load balancer that held a few endpoint servers behind it. The only change to the system that could possibly prevent the system from making connections was the utility I was running, so I killed it, and when the API servers were checked there were still 18,000+ network connections in TIME_WAIT on each system.

Linux systems treat files on the disk as well as sockets as "open files" due to the way Linux handles file handles. Too Many Open Files can mean literally too many files are open or it can mean too many network connections are open, but it usually is a combination of the two.

Research time. The problem here is usually related to "you didn't close the connections." That wasn't the cause here. The calls were straightforward; I had a function that created a transport, created a client, made the connection, and called GET then read the data to return to the caller. It was a textbook example fragment of Go adapted to my purposes, and that included a defer Close() call so when the function exited it should make really sure everything was closed properly.

Check the "did you close the connection" off the list. And I also read the data from the socket before closing it, so that can be checked off the list. I had a hacked together bit of logic to retry connections if there was an error, but it also printed that status to the console when that happened. Nothing appeared as the too many open files errors popped up, so even if that caused a socket leak, it wasn't the likely cause.

The issue was the call to instantiate a transport each time the function was called. Transports hold the pool of client connections; the system should be re-using connections. Because the transport was destroyed each time the function returned, it was creating a new pool of connections, which meant new sets of client connections to the server instead of recycling previous connections and that led to thousands of "open files".

The solution was to create the transport and pass it as a parameter to the call to GET the web endpoint. This allowed the transport to continue to manage the client pool outside the scope of the function call, and that allowed the system to keep a managed pool of connections for re-use.

This wouldn't have shown up if I were making periodic, occasional calls to different websites every few minutes. The problem would still be there, but chances are the connections would eventually close and time out before piling up and becoming a problem.

Too Many Files Leads to Terrible Times



There are a few things that are obvious in affecting the speed information is being processed, and at the risk of sounding immodest, I've been told I'm pretty good at spotting the obvious.

Warning: I'm not a Go expert. I'm citing information here that is just my current understanding, so if I'm wrong, please correct me in the comments.

Because I'm writing files, the drive can definitely affect performance. I have multiple processes that could be trying multiple disk operations in parallel at a given time. To that end, disk seek times, write times, and cache can directly impact the utility's speed.

I'm dealing with millions of files. During the initial testing and design of the utility, I had to deal with a file that would unzip into a directory holding around 100,000 files; then I had to deal with several of those 100K-file containing directories for processing. If you haven't tried that on a Macintosh using the HFS+ filesystem, it's not fun. EXT4 doesn't really handle it well either. Even on an SSD, getting a directory listing is downright painful. Too many files in one directory is difficult for some filesystems to handle.

One solution is to split the directory into more subdirectories, reducing the number of entries the system has to track per directory. This is in fact the solution I used, splitting information into logical subsets.

Timing Out Connections


Another fun fact I learned during this project; by default, the Golang client doesn't have timeouts set. This leads to some fun havoc with stray connections left in weird states that if you're using them to hit random sites in a semi-random fashion, you'd probably never notice. Hammer the same site with hundreds of requests per second, and you can bet this can have some ramifications.

I read about this in a blog post warning against using the default settings in http.Client. After reviewing that information, I went back to the source code and added some timeouts, like so:


 tr := &http.Transport{
  Dial: (&net.Dialer{
   Timeout: 30 * time.Second,
  }).Dial,
  TLSHandshakeTimeout: 30 * time.Second,
 }

 client := &http.Client{
  Transport: tr,
  Timeout:   time.Second * 10,
 }

This is a modification I made to the most intensively-used connection set; I didn't move the transport's scope for a far less-used connection in another function, figuring that yes, they would pile up to a degree, but they should properly close and age out as closed connections. This set will hammer the server with thousands of connections in parallel.

This basically added some sane timeouts to functions that previously did not have any timeouts. This helped noticeably reduce my ghost connections disappear.

Remove a Hindrance, Create a New One


The initial run finished a few days later. I realized that there was a bug in my loop logic. There were some bad words uttered and an updated version compiled.

At this point we also moved the utility, and the volume to which data was being saved, to the same system that held the API endpoint server. Basically the server being queried for information was now also hosting the client requesting and processing results from the API queries.

This eliminated what before was creating a kind of natural bottleneck that throttled performance; hundreds of connections per second simultaneously hitting the server but separated by the network transit time. Sure, it was on the scale of tens of milliseconds (if things were working well), but it really added up.

Now the client was requesting it from the localhost. *Bam*. Within a few moments, the number of open connections (using netstat |wc -l, since I only needed a rough estimate) ballooned to 40,000 connections before this appeared on the console:

dial tcp <ip address redacted>: can't assign requested address

Because dial was in the error, it was most likely the client causing the issue. After some poking around, I ended up making two more changes.

First, I tried to make a change to the number of idle connections the client keeps open. The default is two; more than that, and the client was closing the connections in the idle pool instead of making more efficient use of re-using the clients. Again, working with random connections aren't so bad, but hammering the same IP will highlight the need to alter this (and you probably don't want to change this if you're not making a large number of frequent calls to the same host):


 tr := &http.Transport{
  Dial: (&net.Dialer{
   Timeout: 30 * time.Second,
  }).Dial,
  TLSHandshakeTimeout: 30 * time.Second,
  MaxIdleConnsPerHost: intIdleConns,
 }

The changed setting is MaxIdleConnsPerHost in the transport. Here I set it to a variable that in turn is set from the command line so I could tune it at runtime, but instead of the default 2 I set it closer to 400.

The next change was an alteration on the host server. There is some guidance on a SO question explaining some tuning tweaks, but the gist of the change I made is this...

When the TCP connection is made, the connection is made to an ephemeral port. When I have a ton of tcp connections hitting the server, it would starve the number of ephemeral ports available. The next step was to try increasing the number of ports available, and then the server could support more connections per second, hopefully at a level where the connections would close and age out properly before overloading the system.

In this case, I changed net.ipv4.ip_local_port_range from "32768 61000" to "9000 64500".  From the SO question, this means I changed the connectivity from (61000-32768)/60 = 470 sockets/second to (64500-9000)/60 = 925 sockets/second.

There was another change I could make from the page that involved changing the net.ipv4.tcp_fin_timeout setting, along with a couple of others. I avoided that, opting instead to test these changes because the tuning advice was more like "change this on the client" or "change this on the server", not really geared to a situation where the server and client were eating resources on the same host. Making minimal changes to keep it working, for this project, would be fine.

I ran netstat in a loop while the application ran again. This time the open connections quickly climbed to 70,000 connections before leveling out, and it held steady. After 15 hours of elapsed runtime, it had 3 connection errors show up. Otherwise it kept up with the load just fine.

I should also mention that I ran 4 parallel processing tasks, one for each core. When I boosted that number it seemed to be a hindrance to the processing speed; keeping it at 4, the estimated processing speed was over 100K records/minute, easily holding sustained bursts 5 or 6 times the processing speed when the client was on a separate machine.

This Was a Minimal Set of Changes


There were a number of lessons learned so far; above the basic novice checking that connections are properly read from a network client response before calling Close(), be aware that the transport is what controls the pool of connection clients for efficient re-use. 

Next, be aware that by default timeouts are missing from the transport and client. Add them. 

Also if you're hitting a particular server or set of servers with requests, change your MaxIdleConnsPerHost. Otherwise you're wasting connection use.

Last, an easy way to boost connection rates is to increase the number of ephemeral ports available. There are limits to this...and you don't want to starve other resources by taking away those ports from other clients or servers on the host.

There are plenty of other changes that can be made to increase horsepower of your servers. Some additional changes are in the SO question I linked to; another good blog post discusses how MigratoryData scaled servers to 12 million concurrent connections. I'd only caution that not every task requires this kind of engineering and you might want to exercise restraint in changing things when a few tweaks can accomplish decent performance for your use case. 

Performance is a scale. Some things can be overcome with throwing lots of hardware at it. Sometimes a few tweaks will make your app run 5 or 6 times faster. 

Happy tuning!

Thursday, July 20, 2017

Golang: HTTP Client Opens Too Many Sockets ("Too many open files")

This relates to a project that is work related, so I have to fuzz some of the details. But on the other hand, some details are naturally fuzzed because I have to remember some of the details and my memory is naturally fuzzy...

I'm working on a utility that is, on the surface, simple. It makes a call to an API endpoint using the http.Client, compares some quick results, and if certain conditions are met it makes a series of API calls to save the JSON responses.

The processing/checking process is carried out in a set of goroutines because the comparisons are easy to do in parallel. If the routine needs to pull a JSON reply, it calls a function that is laid out pretty much like the standard examples from the "here's how you get a web page in Go!" sites.


func GetReply(strAPI string, strServer string) string {

    // The URL to request
    strURL := strServer + /service/" + strAPI

        // Also add timeouts for connections
        tr := &http.Transport{
            Dial: (&net.Dialer{
               Timeout: 5 * time.Second,
            }).Dial,
            TLSHandshakeTimeout: 5 * time.Second,
        }
    client := &http.Client{
        Transport: tr,
        Timeout:   time.Second * 10,
    }

    // Turn it into a request
    req, err := http.NewRequest("GET", strURL, nil)
    if err != nil {
        fmt.Println("\nError forming request: " + err.Error())
        return ""
    }
    req.Header.Set("Content-Type", "application/json")
    req.Header.Set("Accept", "application/json")

    // Get the URL
    res, err := client.Do(req)
    if err != nil {
        fmt.Println("\nError reading response body: " + err.Error())
        if res != nil {
            res.Body.Close()
        }
        return ""
    }

    // What was the response status from the server?
    var strResult string
    if res.StatusCode != 200 {
        fmt.Println("\nError reading response body, status code: " + res.Status)
        if res != nil {
            res.Body.Close()
        }
        return ""
    }

    // Read the reply
    body, err := ioutil.ReadAll(res.Body)
    if err != nil {
        fmt.Println("\nError reading response body: " + err.Error())
        if res != nil {
            res.Body.Close()
        }
        return ""  
   }
    res.Body.Close()

    // Cut down on calls to convert this
    strResult = string(body)

    // Done
    return strResult

}

This is actually a modified version of what I've pulled from various tutorials and examples, adding more calls to Close() and doing a check for whether res is nil before performing that call in an error.

I also added a timeout to the client because by default it is set to 0; no timeout. As you can probably guess this version was modified while troubleshooting.

After a few hours of the application running we had alerts come in about failing functions on the production servers. When I opened logs I discovered a number of "too many files open" errors, and a developer on the call said there were over 18,000 socket connections on each of the balanced servers.

The only difference was my use of this test program, so I killed it. The socket count fell.

Welp...guess we found the cause. But why?

There are a couple basics for beginners when using http.Client requests.
1) Close the response body after reading.
2) Clients are reused.
3) If you defer a call to Close() (as this one originally did, and most tutorials show) the function should call Close() when the function returns. The modified sample I posted simply closes it after reading the Body and checking for errors.

At first I thought it was due to clients not closing; they must close in order to be re-used. I traced the execution path a dozen ways and added more explicit Close()'s in error checks...but those errors were never printing anything during the run, so errors shouldn't be causing spill of sockets.

I added timeouts to the client and dialer. While that didn't hurt and probably made things a little cleaner, it still didn't help the too many open files/sockets error.

Another lead came from a close reading of a Stack Overflow answer. The function is creating a new Transport, tr, with each call. That Transport is what holds the Clients pool for reuse. See where I'm going with that?

Another answer on that page talked about creating a global client for his functions to reuse.

The theme was scope of variables matters when dealing with what allows re-use. Because I'm hitting the same server repeatedly and the function kept re-instantiating the mechanism that was used to govern client re-use,  the number of new connections and left-open sockets ballooned.

My next move was to go to the goroutines that were in charge of processing the replies from the API endpoints and have them create Transport instances, then when they call the function they passed the Transport as a parameter.

I uploaded the program to a remote system instance and re-ran it while watching netstat on the server and the client systems. After initially ballooning to about 4,000 connections it soon settled down to well under 100 connections (using netstat |wc -l).

Takeaways:
1) Modify the default client, and maybe the transport dialer, to add sane timeouts.
2) If you're hitting the same server repeatedly, do it all within the same scope as your transport instantiation or create a transport and pass it as a parameter to functions so you optimize the re-use of the client pool
3) Check that you properly close the response body so the client can be re-used. Check in error paths that it can be properly closed without panicking.

What about separating not just the Transport, but also the Client, then passing the Client around as a parameter? I didn't test that because I wasn't sure how "goroutine-safe" that would be against race conditions, despite the one answer on that Stack Overflow that demonstrated using a global Client instance for use.  It's possible it works fine. At this point it looks like passing the Transport worked fine, though.

I'll also note that my usual self-loathing and insecurity isn't getting the better of me this time because the top answer on that question that inspired me to try this solution was the usual advice I found repeatedly in other sites and blogs (and SO answers); check that you close your response properly. It's the top answer by a significant margin. It was almost an afterthought to realize that maybe what I was doing was pummeling one particular website with multiple instantiations of Client pools so Client reuse was a minimum.

Happy HTTP Client-ing!