I turned 40 this week.
Four decades. I remember there was a time I thought I'd grow up to "die alone as a hermit in the woods." I remember thinking maybe working as a programmer for Microsoft would be interesting. There was a time I thought I might become a marine biologist, specifically an ichthyologist, and study sharks. Later on I even flirted with the notion of working to become a successful author.
Today I'm not working for Microsoft. I don't live in the woods, although the town I reside in is rapidly withering economically and some might argue our tiny dot on the map is not far removed from being woodland. I don't even own diving equipment and am nowhere near the ocean (although we do live on a river that ends in the ocean, if you want to travel a few hundred miles.) The closest I've come to becoming an author was finishing and editing exactly one manuscript.
I'm pretty sure, at this point, that I have depression issues. I know it's more common today for people to talk about depression. For some people it is dismissed as an excuse of the week, or they brush it off as a "feeling blue" thing that you can exercise away or "just cheer up" to move past; "Just cheer up!" they say, totally ignoring that clinical depression is a thing.
While this little shadow has always been lingering to some degree in the back of my mind, I've had some things really raise that shadow higher in prominence in the past few years. It would take chapters of a book to cover details, but the highlight reel would include attempts by my wife's employer to eliminate her from her job using what could be (in my view, as this is my opinion) charitably be labeled slanderous accusations. That was a year-long ordeal that took a huge emotional and financial toll on the family.
After that drawn out mess, things finally felt like they were turning around. There was a light at the end of the tunnel! Unfortunately, it was a train's headlamp.
The employer I had come to rely on for emotional and financial support decided to terminate my contract, which is a nice way of saying I was sent home with a box of my belongings. Now it was my turn to plunge into a world of uncertainty, doubt, and the five stages of grief. I was blindsided and even the act of getting out of bed felt like fighting a dark shroud squeezing the life out of me.
Worse yet, if you feel like taking a moral stance and voicing support for teachers in the never ending fight over contracts, even if your family has been working in public education for decades, even if you do this by pointing out actual evidence straight from the faces of the people you feel are in the wrong, you might want to think twice if this takes place in a town that is turning into the economic equivalent of a mummy and you might have to return and look for a job. I made some statements that gained some traction among certain circles here; at the time I felt secure in the idea that my employment was secure in the land of gummy bears and unicorns. The reversal of fortune played right into the hands of depression's self doubt and uncertainty, whispering that "they" are laughing at my incompetence as I searched for job openings in a town propped up by Wal-Mart, McDonalds, a hospital system and the public education system whose administration and board are not pleased with you for writing something that was popular for a couple days among their staff.
I also experienced firsthand the silence from most of the people I had taken for granted as friends and associates from what I eventually came to regard as my "previous life."
These were two major events. I was already dealing with issues and stresses that many others have to deal with in life. These two major events just fanned the depression flames.
Now we have a national problem; we became a Trumpster fire nation. Every day came a new display of ignorance and people taking pride in how terrible they can be. I don't feel that there's much to act as a counterbalance against the papercuts of negativity he and his followers display.
It's been a long, stressful, painful period of time.
It's also been nearly a year since I started my new job, which gave me some sense of self worth again. Slowly it helped build up some sense of validation that I'm not worthless. I'm not sure if that makes sense or if I'm laying another misplaced sense of power into the hands of something in which I shouldn't emotionally invest. But for now it's there and helping me.
My family has been supportive during this emotional roller coaster, or tried to be. I don't think I quite acknowledge the good they do as much as I focus on negative things that families deal with. That's a side effect of both depression and Aspergian brain wiring, I think. Given the reflection hitting four decades of sentience has triggered, I think I need to continue trying to improve on that behavior.
All of these things have combined into a hazy mire that congealed into a cloud around me, affecting my worldview and keeping me in a perpetual weariness. I thought my birthday, despite being a magic number (I love the number 4, and 10 is a binary number as well as the number of digits on my hands and the number of digits on my feet, and is even, and possesses several other attributes that lend an irrational appreciation in my mind), would be yet another quiet passage marked by some cards and well wishes and soon forgotten. It was even on a Wednesday, my least favorite day for events to occur.
Usually the big booster in looking forward to my birthday is that it is preceded by Halloween. I love the idea of Halloween; the image of trick or treat, costume parties, awesome DIY costumes, parades, and horror movies are so much fun for me. But this year was different; the Friday before my birthday brought an announcement that indictments were coming against Trumpster acquaintances! After an anticipation-filled weekend, Monday had people brought in to testify, and we discovered one of his campaign associates had already pled guilty to lying to police and was cooperating with investigators!
We went out for dinner on my birthday with my in-laws and parents. One of the TV's played MSNBC's coverage of Trump's Russian connections and the mounting investigations. I was giddy.
My birthday was also marked by the Daily Show having an interview with Hillary Clinton. I don't know why that made me happy...I guess because she's the symbol of everything "I told you so" during the Presidential election.
These were things that worked to fight the shroud of depression whispering in my ear, and were totally counter to the idea that my 40th birthday would be quiet. These were things that were happy events for me.
There were other, not so happy events that marked the birthday-time. Unexpected shocks like the guy who rented a truck and ran over bike riders in downtown Manhattan. Because he wasn't white, it was labeled as an act of terrorism, unlike the recent Vegas shooting of around 600 people by a white guy where the fallout is basically several people going bankrupt from medical bills and modifications the shooter made to his guns staying perfectly legal and Congress clutching pearls at the idea that nothing can prevent these things from happening.
Yet another shocking event involved layoffs at a previous employer. I discovered it as oddly worded and vague tweets began floating along my Twitter timeline; today there was a Techcrunch article giving conflicting details of what had happened. In the end I could only confirm that a relatively large number of people were let go, some of whom I knew and had worked with so it wasn't just trimming the newest of hires. In keeping with the "Me me me!" theme, this news caused me to revisit all the thoughts of despair and hopelessness that I felt as my wife drove me home from the apartment after I was told my time there had ended. I empathized with what must be a swirl of confusion and fear that these people now feel. I also watched as people who escaped the cutting block echoed their support for one another and words of sadness to their departed colleagues. Selfishly I felt like the bandage was ripped off an old wound.
I turned 40 this week.
Nothing I thought was going to happen as a teen happened. Getting older shifted into a pattern where almost every day blended into the next; mostly unremarkable, smeared with a veneer of depression and frustration, life is mostly a comfortable pattern of routine. I expected it to be yet another average day, but this birthday was marked with some surprises. Some good. Some bad. But one thing this birthday wasn't is uneventful.
Friday, November 3, 2017
Wednesday, October 18, 2017
Reflection on Coding
There's a subject I've been thinking about lately. I suppose it's more of a feeling than a topic; I'm not even sure how to put it into words.
I have a vague feeling that I've discussed it before, too. In some form. On the other hand, maybe writing about it will help get it out of my head.
The best I've managed to do to express this feeling is to frame it as "elegant beauty," or a kind of beauty that comes from expression through the logic of programming.
It's not that this is an entirely new concept. I've often read descriptions of Ruby as poetic, and there are other works that try examining questions like whether programming is more art than science, or whether programming is poetry.
Perhaps part of this is my own brains weird wiring. I sometimes have trouble understanding poetry; good poetry can "work" on so many levels. Clever word use, double entendre, use of linguistic beats to emphasize points, references to other events and works, parallels to other art forms...I'm sure my wife, an English major, is able to expound on (and expand) the topic far more than I.
Programming adds yet another dimension: it is functional. It takes a language, with its own unique grammar and syntax, and processes input into something else. It's an expression of formulas through rules. If you get the syntax wrong, your work won't compile into a finished product. Programming is notoriously unforgiving when straying from the language rules.
And yet programs that take a set of input and produce the same output can still have so much variety!
I suppose a simple example can use the infamous FizzBuzz program. It's a staple of many a coding interview; relatively simple, it has, over time, become almost cliche (and in some circles, despised, depending on the blogs you read and the type of programmer bemoaning how demeaning it is to be asked to demonstrate it...)
The rules are simple; usually some variant of, "Count from 1 to 100, and if a number is divisible by 3, print "Fizz." If it is divisible by 5, print "Buzz". If it is divisible by 3 and 5, print "FizzBuzz." Otherwise, print the number.
The simplest and most crude way to program this is to literally lay out a program that counts from 0 to 100 and use if statements to output Fizz, Buzz, and FizzBuzz in the appropriate places. It would achieve the goal of the rules, but be highly inefficient and inflexible.
The next step up might be something like this:
If you know modulo, FizzBuzz is a pretty straightforward logic problem. But what if you didn't know about that piece of math?
This is probably a little slower...to be honest, I'm not sure if the compiler would optimize this into similar binary algorithms. But the end result is still the same.
The first issue I'd have with the basic implementation is that it's not very modular. It might be better to use a function to determine the fizzing and the buzzing.
This version includes a simple CheckMod() function that can be called to see if the remainder when divided by a supplied integer should get a label; now it takes minimal editing to change the numbers for which Fizz, Buzz, or FizzBuzz are used as output!
And, of course, this still has the same output as the previous versions.
But what if we don't want to keep modifying the source code to alter the Fizz and Buzz triggers? That's simple too.
Now there are command line flags that designate the Fizz and the Buzz (as well as possible new labels for Fizz and Buzz) and the number to count to!
Because there are defaults added in to the flag variables, the default version of this...with no flags set at the command line...will have identical output to the previous applications.
This version added quite a bit of flexibility to the program, and that flexibility is accessible from the command line by the end user. There is another problem, though; if you intend for an end user to use this application, there should be some sanity checking for the things they can change.
Now the application checks for things like labels being set to some kind of string and not an empty string, and all the numbers are set to something greater than 0. Basic error checking.
And once again...the output, by default, will match the output of the previous programs!
These are all rather straightforward. It doesn't really take advantage of features specific to Go, like channels (Here is the link to the Go playground implementation from Russ Cox, reproduced here:)
I should note that I created a blog past blog post that explored the channels implementation above...
The simple Fizz Buzz test in the forms above have the same output, but it's accomplished in many ways. I'm sure there are people who would be able to send variations that also have the same end result using a different algorithmic logic; logical, and possessing a strict set of rules that must conform to the expectations of the compiler, but still arriving to the same destination through different means.
To understand the source code means twisting your brain into understanding how the programmer responsible for the source code thinks and expresses his or her way of thinking against those rules of the programming language's grammar and syntax.
The examples above are a peek into some of the evolution in my own thinking about how to program a task, how my own thinking in Go had gradually focused on aspects to increase maintainability and flexibility while accomplishing a goal. I wonder if this is the kind of evolution that is looked for by interviewers for programming jobs...although that's a dangerous thought, considering that the expectation for defining the rungs of skill on that ladder of skill could be dangerously arbitrary.
I'm still refining my methods of modeling tasks when programming. I'm changing workflows, how I comment, and what I comment. I still occasionally reel back, perplexed, when seeing some samples of other people's code and have no idea why...or how...they thought the problem through the way they did.
Each sample I write or read is a reflection of the person who wrote it.
Sometimes I wonder what my own reflects about me.
I have a vague feeling that I've discussed it before, too. In some form. On the other hand, maybe writing about it will help get it out of my head.
The best I've managed to do to express this feeling is to frame it as "elegant beauty," or a kind of beauty that comes from expression through the logic of programming.
It's not that this is an entirely new concept. I've often read descriptions of Ruby as poetic, and there are other works that try examining questions like whether programming is more art than science, or whether programming is poetry.
Perhaps part of this is my own brains weird wiring. I sometimes have trouble understanding poetry; good poetry can "work" on so many levels. Clever word use, double entendre, use of linguistic beats to emphasize points, references to other events and works, parallels to other art forms...I'm sure my wife, an English major, is able to expound on (and expand) the topic far more than I.
Programming adds yet another dimension: it is functional. It takes a language, with its own unique grammar and syntax, and processes input into something else. It's an expression of formulas through rules. If you get the syntax wrong, your work won't compile into a finished product. Programming is notoriously unforgiving when straying from the language rules.
And yet programs that take a set of input and produce the same output can still have so much variety!
I suppose a simple example can use the infamous FizzBuzz program. It's a staple of many a coding interview; relatively simple, it has, over time, become almost cliche (and in some circles, despised, depending on the blogs you read and the type of programmer bemoaning how demeaning it is to be asked to demonstrate it...)
The rules are simple; usually some variant of, "Count from 1 to 100, and if a number is divisible by 3, print "Fizz." If it is divisible by 5, print "Buzz". If it is divisible by 3 and 5, print "FizzBuzz." Otherwise, print the number.
The simplest and most crude way to program this is to literally lay out a program that counts from 0 to 100 and use if statements to output Fizz, Buzz, and FizzBuzz in the appropriate places. It would achieve the goal of the rules, but be highly inefficient and inflexible.
The next step up might be something like this:
// FizzBuzz package main import ( "fmt" "strconv" ) func main() { // Create a loop to count 1 to 100 for i := 1; i <= 100; i++ { // Create a string variable that gets reinitialized each iteration var strOutput string strOutput = "" // Fizz on 3 if i%3 == 0 { strOutput = strOutput + "Fizz" } // Buzz on 5 if i%5 == 0 { strOutput = strOutput + "Buzz" } // Otherwise, output the number if strOutput == "" { strOutput = strconv.Itoa(i) } // Print the result fmt.Println(strOutput) } }
If you know modulo, FizzBuzz is a pretty straightforward logic problem. But what if you didn't know about that piece of math?
// fizzbuzz-simple.go package main import ( "fmt" "strconv" ) func main() { for a := 1; a <= 100; a++ { var strOutput string = "" intTmp := a / 3 if intTmp*3 == a { strOutput = "Fizz" } intTmp = a / 5 if intTmp*5 == a { strOutput = strOutput + "Buzz" } if strOutput == "" { strOutput = strconv.Itoa(a) } fmt.Println(strOutput) } }
This is probably a little slower...to be honest, I'm not sure if the compiler would optimize this into similar binary algorithms. But the end result is still the same.
The first issue I'd have with the basic implementation is that it's not very modular. It might be better to use a function to determine the fizzing and the buzzing.
// fizzbuzz-func.go package main import ( "fmt" "strconv" ) func main() { // Create a loop to count 1 to 100 for i := 1; i <= 100; i++ { // Fizz on 3 strOutput := CheckMod(i, 3, "Fizz") // Buzz on 5 strOutput = strOutput + CheckMod(i, 5, "Buzz") // Otherwise, output the number if strOutput == "" { strOutput = strconv.Itoa(i) } // Print the result fmt.Println(strOutput) } } func CheckMod(intCount int, intCheck int, strLabel string) string { if intCount%intCheck == 0 { return strLabel } else { return "" } }
This version includes a simple CheckMod() function that can be called to see if the remainder when divided by a supplied integer should get a label; now it takes minimal editing to change the numbers for which Fizz, Buzz, or FizzBuzz are used as output!
And, of course, this still has the same output as the previous versions.
But what if we don't want to keep modifying the source code to alter the Fizz and Buzz triggers? That's simple too.
// fizzbuzz-func-flags.go package main import ( "flag" "fmt" "strconv" ) func main() { intCountTo := flag.Int("countto", 100, "Count from 1 to this number") intFirstNum := flag.Int("firstnum", 3, "First number to label") strFirstLabel := flag.String("firstlabel", "Fizz", "First label to substitute") intSecondNum := flag.Int("secondnum", 5, "Second number to label") strSecondLabel := flag.String("secondlabel", "Buzz", "Second label to substitute") flag.Parse() // Create a loop to count 1 to x for i := 1; i <= *intCountTo; i++ { // Fizz on y strOutput := CheckMod(i, *intFirstNum, *strFirstLabel) // Buzz on z strOutput = strOutput + CheckMod(i, *intSecondNum, *strSecondLabel) // Otherwise, output the number if strOutput == "" { strOutput = strconv.Itoa(i) } // Print the result fmt.Println(strOutput) } } func CheckMod(intCount int, intCheck int, strLabel string) string { if intCount%intCheck == 0 { return strLabel } else { return "" } }
Now there are command line flags that designate the Fizz and the Buzz (as well as possible new labels for Fizz and Buzz) and the number to count to!
Because there are defaults added in to the flag variables, the default version of this...with no flags set at the command line...will have identical output to the previous applications.
This version added quite a bit of flexibility to the program, and that flexibility is accessible from the command line by the end user. There is another problem, though; if you intend for an end user to use this application, there should be some sanity checking for the things they can change.
// fizzbuzz-func-flags-errcheck.go package main import ( "flag" "fmt" "os" "strconv" ) // A struct of flags type stctFlags struct { intCountTo *int intFirstNum *int strFirstLabel *string intSecondNum *int strSecondLabel *string } func main() { var strctFlags stctFlags strctFlags.intCountTo = flag.Int("countto", 100, "Count from 1 to this number") strctFlags.intFirstNum = flag.Int("firstnum", 3, "First number to label") strctFlags.strFirstLabel = flag.String("firstlabel", "Fizz", "First label to substitute") strctFlags.intSecondNum = flag.Int("secondnum", 5, "Second number to label") strctFlags.strSecondLabel = flag.String("secondlabel", "Buzz", "Second label to substitute") flag.Parse() EvalFlags(&strctFlags) // Create a loop to count 1 to 100 for i := 1; i <= *strctFlags.intCountTo; i++ { // Fizz on 3 strOutput := CheckMod(i, *strctFlags.intFirstNum, *strctFlags.strFirstLabel) // Buzz on 5 strOutput = strOutput + CheckMod(i, *strctFlags.intSecondNum, *strctFlags.strSecondLabel) // Otherwise, output the number if strOutput == "" { strOutput = strconv.Itoa(i) } // Print the result fmt.Println(strOutput) } } func EvalFlags(strctFlags *stctFlags) { if *strctFlags.intCountTo <= 0 { fmt.Println("-countto must be greater than 0") os.Exit(1) } if *strctFlags.intFirstNum <= 0 { fmt.Println("-firstnum must be greater than 0") os.Exit(1) } if *strctFlags.strFirstLabel == "" { fmt.Println("-firstlabel must have a text label") os.Exit(1) } if *strctFlags.intSecondNum <= 0 { fmt.Println("-secondnum must be greater than 0") os.Exit(1) } if *strctFlags.strSecondLabel == "" { fmt.Println("-secondlabel must have a text label") os.Exit(1) } // Done return } func CheckMod(intCount int, intCheck int, strLabel string) string { if intCount%intCheck == 0 { return strLabel } else { return "" } }
Now the application checks for things like labels being set to some kind of string and not an empty string, and all the numbers are set to something greater than 0. Basic error checking.
And once again...the output, by default, will match the output of the previous programs!
These are all rather straightforward. It doesn't really take advantage of features specific to Go, like channels (Here is the link to the Go playground implementation from Russ Cox, reproduced here:)
package main import "fmt" func main() { c := generate() c = filter(c, 3, "Fizz") c = filter(c, 5, "Buzz") for i := 1; i <= 100; i++ { if s := <-c; s != "" { fmt.Println(s) } else { fmt.Println(i) } } } func generate() <-chan string { c := make(chan string) go func() { for { c <- "" } }() return c } func filter(c <-chan string, n int, label string) <-chan string { out := make(chan string) go func() { for { for i := 0; i < n-1; i++ { out <- <-c } out <- <-c + label } }() return out }
I should note that I created a blog past blog post that explored the channels implementation above...
The simple Fizz Buzz test in the forms above have the same output, but it's accomplished in many ways. I'm sure there are people who would be able to send variations that also have the same end result using a different algorithmic logic; logical, and possessing a strict set of rules that must conform to the expectations of the compiler, but still arriving to the same destination through different means.
To understand the source code means twisting your brain into understanding how the programmer responsible for the source code thinks and expresses his or her way of thinking against those rules of the programming language's grammar and syntax.
The examples above are a peek into some of the evolution in my own thinking about how to program a task, how my own thinking in Go had gradually focused on aspects to increase maintainability and flexibility while accomplishing a goal. I wonder if this is the kind of evolution that is looked for by interviewers for programming jobs...although that's a dangerous thought, considering that the expectation for defining the rungs of skill on that ladder of skill could be dangerously arbitrary.
I'm still refining my methods of modeling tasks when programming. I'm changing workflows, how I comment, and what I comment. I still occasionally reel back, perplexed, when seeing some samples of other people's code and have no idea why...or how...they thought the problem through the way they did.
Each sample I write or read is a reflection of the person who wrote it.
Sometimes I wonder what my own reflects about me.
Sunday, September 24, 2017
One Example of How To Tell When Obligations are Wastes of Time
(Disclaimer: everything here is my own opinion. I ordinarily shouldn't have to mention this, but there are times where mentioning certain catalysts for thoughts tends to make those catalysts angry and definitely not do things that resemble acts of retaliation against people who aren't me but are related in some way to me. This isn't even about them. But I feel I have to explicitly state that because sometimes the catalysts may not be very good at reading comprehension.)
The local paper recently had an article announcing that a local school superintendent was awarded the highest rating for his performance. I personally wasn't too surprised given that during negotiations over teacher contracts, questions for his opinion on the matter were something to the effect of, "I serve at the will of the board."
But I wouldn't speak ill of the school board. When criticized, things happen that are definitely not retaliation against people who are related to me in their district. And this isn't about the board. It's about the obligations that make boards...or any regulated body...look like they're doing work when really it's a waste of time and opportunity to rubber stamp their own work (or use it as an excuse to get rid of someone that displeases the regulated body in some way).
The news report had said that there were four performance ranks that could be given: distinguished, proficient, needs improvement or unsatisfactory. Having a set of scores aggregated in sets of one to four isn't necessarily bad...even Netflix now has a rating system based on a 1 or 2, which of course eliminates all the nuance of "The movie didn't make me want to throw up, but I definitely wouldn't want to watch it again" and instead reduces the viewing experience to "I LOVED THIS FILM" or "This film is so terrible that it will become a niche cult classic in 10 years when the latest group of self-appointed film buffs rediscovers it and nit picks the flaws into virtues."
What areas were evaluated? "Professionalism, human resource management, district operations and financial management, student growth and achievement, organizational leadership, and communication and community relations."
A key question to ask is, how are these evaluated? These are standards. It's spelled out in school laws established in "1949 Act 14": "...the employment contract for a district superintendent or assistant district superintendent shall include objective performance standards..."
To figure out if this is actually useful or a waste of time when referring to an obligatory standard, you need to ask yourself against what ruler the standards are measured, and ask how the areas being measured are established.
The article said nothing about the scores other than the board members sat down and filled in sheets that were aggregated and found to be wonderful. Some objectives seemed like they'd be easy to measure, such as "student growth and achievement," something that has plenty of semi-effective rules regulating measures of student test results. Other things are blatantly arbitrary. How do you measure professionalism? You get a minus one each time you show up wearing a clown outfit? Or do you get a minus one for not wearing a tie, a minus two for wearing a "fun" tie, and a minus five for dressing as Pennywise?
Not having standards that can be objectively measured is a strong indication that you're dealing with a feel-good waste of time.
What about what or who establishes the items to be measured? This time around the superintendent had to post a list of what was to be measured on the school website. After some digging around, I found the list. Apparently the list is determined by the person being evaluated, then the board okays it (which again is allowed by the school code...it turns out the "standards" for evaluation are "mutually agreed to").
I won't comment on how weird it is that the first half of the letter to the board is a word for word match to another district's older set of "standards" (although it does make me wonder if those items are actually, as it states, "set forth in the Superintendent's Contract are as follows:"...)
Instead I'll point out statements such as, under "School District Operations and Financial Management", that the "Superintendent shall manage effectively, ensuring completion of activities associated with the annual budget, oversee distribution of resources in support of School District priorities, and direct overall operational activities within the School District."
What does that even mean? Manage effectively meaning, this job is completed? And what is the job? Ensuring the completion of activities related to the budget would basically mean you check in on the person or people in charge of actually creating the budget. Oversee resources being allocated to District priorities means what, if not making sure money goes into proper budgets and books go to the right classes? And directing overall operational activities means he's in charge of the district which, oddly enough, is what a superintendent DOES.
This whole paragraph sounds like he's being evaluated on whether he actually does his job. And I also noticed there's no actual gauge by which to measure it. The measure is arbitrary.
There's a section called Organizational Leadership, under which it states, "Superintendent shall work collaboratively with the Board to develop a vision for the School District, display an ability to identify and rectify problems affecting the School District, work collaboratively with School District administration to ensure best practices for instruction, supervision, curriculum development, and management are being utilized, and work to influence the climate and culture of the School District."
What does that mean? The superintendent will work with the board to establish a vision for the district, which under ordinary conditions would make sense, except when he clearly said during contract negotiations that he serves at the will of the board. The translation would therefore imply that either the board is coming up with the vision, or he's going to propose something that the board will vote to pass if they don't want to come up with one.
And "display an ability to identify and rectify problems affecting the School District"? I'd be interested in hearing someone talk about a time when a superintendent talks about the problems of their district. I don't recall hearing something like that from the superintendents of our local districts.
The last part is also vague--influence the climate and culture of the district? I'm not sure there is an objective measure for cultural influence. Most "culture and climate" I've heard regarding the school comes from the community, and much of that is influenced by the public and opinions spread by the school board during contract negotiations...and it's rarely positive. The statement itself doesn't even say he's going to positively or negatively influence the climate and culture. As the "head" of the district serving at the pleasure of the board, he could achieve this objective just by establishing a baseline expectation that when an issue is brought to his attention, the staff knows what they'll expect will probably happen, for better or worse.
And again, this has no objective measure against which to base a standard to score.
That brings me to the next sign you're dealing with a waste of time. The language is flowery, but vague. Stopping to translate paragraphs into actual meaning shows they aren't really meaning much at all once boiled down.
The last part of the letter is supposed to spell out how he is going to meet his objectives. It has items like, "Increase interventions and remediation's (sic) for students who need it most before, during, and after school", and, "Create a long range and comprehensive strategic plan - WILDCAT 2025".
If you thought I'd call this a waste of time, you'd be wrong. The list reads like a checklist, and having a checklist isn't a bad thing. If your goal is to get these things accomplished in the course of the upcoming year, that's great.
If anything were wrong with it, it's that this is a checklist in the context of a subjective set of standards by which to measure the performance of the person in charge of the district. If you judge a sports player and his or her checklist includes an item to improve the distance he or she throws the ball, that's great. But how much? 10% farther? 10 feet farther? Does he or she get points based on how many feet they improve the throw, or the quality of the throw by combining the distance with accuracy?
So why go through the effort of publishing a story about a superintendent being rated insanely great by the board that hired him in the first place and spent the past year "serving at the will of the board?" It's entirely a matter of speculation, and I can't engage in speculation because that could lead to definitely not retaliation. And this evaluation is just one more example of something mandated by the state that probably started with good intentions and mutated into a pathetic waste of time as it bounced around various fingers before becoming part of the law. But it's important to be able to apply critical thinking and differentiate when something reaching the public is worthwhile news and when something is little more than a waste of time.
The local paper recently had an article announcing that a local school superintendent was awarded the highest rating for his performance. I personally wasn't too surprised given that during negotiations over teacher contracts, questions for his opinion on the matter were something to the effect of, "I serve at the will of the board."
But I wouldn't speak ill of the school board. When criticized, things happen that are definitely not retaliation against people who are related to me in their district. And this isn't about the board. It's about the obligations that make boards...or any regulated body...look like they're doing work when really it's a waste of time and opportunity to rubber stamp their own work (or use it as an excuse to get rid of someone that displeases the regulated body in some way).
The news report had said that there were four performance ranks that could be given: distinguished, proficient, needs improvement or unsatisfactory. Having a set of scores aggregated in sets of one to four isn't necessarily bad...even Netflix now has a rating system based on a 1 or 2, which of course eliminates all the nuance of "The movie didn't make me want to throw up, but I definitely wouldn't want to watch it again" and instead reduces the viewing experience to "I LOVED THIS FILM" or "This film is so terrible that it will become a niche cult classic in 10 years when the latest group of self-appointed film buffs rediscovers it and nit picks the flaws into virtues."
What areas were evaluated? "Professionalism, human resource management, district operations and financial management, student growth and achievement, organizational leadership, and communication and community relations."
A key question to ask is, how are these evaluated? These are standards. It's spelled out in school laws established in "1949 Act 14": "...the employment contract for a district superintendent or assistant district superintendent shall include objective performance standards..."
To figure out if this is actually useful or a waste of time when referring to an obligatory standard, you need to ask yourself against what ruler the standards are measured, and ask how the areas being measured are established.
The article said nothing about the scores other than the board members sat down and filled in sheets that were aggregated and found to be wonderful. Some objectives seemed like they'd be easy to measure, such as "student growth and achievement," something that has plenty of semi-effective rules regulating measures of student test results. Other things are blatantly arbitrary. How do you measure professionalism? You get a minus one each time you show up wearing a clown outfit? Or do you get a minus one for not wearing a tie, a minus two for wearing a "fun" tie, and a minus five for dressing as Pennywise?
Not having standards that can be objectively measured is a strong indication that you're dealing with a feel-good waste of time.
What about what or who establishes the items to be measured? This time around the superintendent had to post a list of what was to be measured on the school website. After some digging around, I found the list. Apparently the list is determined by the person being evaluated, then the board okays it (which again is allowed by the school code...it turns out the "standards" for evaluation are "mutually agreed to").
I won't comment on how weird it is that the first half of the letter to the board is a word for word match to another district's older set of "standards" (although it does make me wonder if those items are actually, as it states, "set forth in the Superintendent's Contract are as follows:"...)
Instead I'll point out statements such as, under "School District Operations and Financial Management", that the "Superintendent shall manage effectively, ensuring completion of activities associated with the annual budget, oversee distribution of resources in support of School District priorities, and direct overall operational activities within the School District."
What does that even mean? Manage effectively meaning, this job is completed? And what is the job? Ensuring the completion of activities related to the budget would basically mean you check in on the person or people in charge of actually creating the budget. Oversee resources being allocated to District priorities means what, if not making sure money goes into proper budgets and books go to the right classes? And directing overall operational activities means he's in charge of the district which, oddly enough, is what a superintendent DOES.
This whole paragraph sounds like he's being evaluated on whether he actually does his job. And I also noticed there's no actual gauge by which to measure it. The measure is arbitrary.
There's a section called Organizational Leadership, under which it states, "Superintendent shall work collaboratively with the Board to develop a vision for the School District, display an ability to identify and rectify problems affecting the School District, work collaboratively with School District administration to ensure best practices for instruction, supervision, curriculum development, and management are being utilized, and work to influence the climate and culture of the School District."
What does that mean? The superintendent will work with the board to establish a vision for the district, which under ordinary conditions would make sense, except when he clearly said during contract negotiations that he serves at the will of the board. The translation would therefore imply that either the board is coming up with the vision, or he's going to propose something that the board will vote to pass if they don't want to come up with one.
And "display an ability to identify and rectify problems affecting the School District"? I'd be interested in hearing someone talk about a time when a superintendent talks about the problems of their district. I don't recall hearing something like that from the superintendents of our local districts.
The last part is also vague--influence the climate and culture of the district? I'm not sure there is an objective measure for cultural influence. Most "culture and climate" I've heard regarding the school comes from the community, and much of that is influenced by the public and opinions spread by the school board during contract negotiations...and it's rarely positive. The statement itself doesn't even say he's going to positively or negatively influence the climate and culture. As the "head" of the district serving at the pleasure of the board, he could achieve this objective just by establishing a baseline expectation that when an issue is brought to his attention, the staff knows what they'll expect will probably happen, for better or worse.
And again, this has no objective measure against which to base a standard to score.
That brings me to the next sign you're dealing with a waste of time. The language is flowery, but vague. Stopping to translate paragraphs into actual meaning shows they aren't really meaning much at all once boiled down.
The last part of the letter is supposed to spell out how he is going to meet his objectives. It has items like, "Increase interventions and remediation's (sic) for students who need it most before, during, and after school", and, "Create a long range and comprehensive strategic plan - WILDCAT 2025".
If you thought I'd call this a waste of time, you'd be wrong. The list reads like a checklist, and having a checklist isn't a bad thing. If your goal is to get these things accomplished in the course of the upcoming year, that's great.
If anything were wrong with it, it's that this is a checklist in the context of a subjective set of standards by which to measure the performance of the person in charge of the district. If you judge a sports player and his or her checklist includes an item to improve the distance he or she throws the ball, that's great. But how much? 10% farther? 10 feet farther? Does he or she get points based on how many feet they improve the throw, or the quality of the throw by combining the distance with accuracy?
So why go through the effort of publishing a story about a superintendent being rated insanely great by the board that hired him in the first place and spent the past year "serving at the will of the board?" It's entirely a matter of speculation, and I can't engage in speculation because that could lead to definitely not retaliation. And this evaluation is just one more example of something mandated by the state that probably started with good intentions and mutated into a pathetic waste of time as it bounced around various fingers before becoming part of the law. But it's important to be able to apply critical thinking and differentiate when something reaching the public is worthwhile news and when something is little more than a waste of time.
Monday, September 4, 2017
Formatting Woes
Am I the only one that uses Blogger and keeps discovering formatting issues?
I take time to review my posts. I preview them. I lay out the fonts and paragraphs to include spacing that breaks up the sections for increased readability. I wrap text around graphics and use captions for text specific to that image.
It seems like no matter how much time I put into carefully laying out the format of the page, at some point I view the post as a regular user and...WHY IS THE SPACING GOOFED UP?
Is it Blogger? Is it a side effect of the template used for the layout? Certain fonts used?
I don't know.
I just know that it's incredibly frustrating.
It seems odd that even when I preview a post, adjust spacing, and finally post, the end result is still..."off".
I have several subjects to write about. Periodically noticing screwed up posts led me to write this up first.
As I type this I'm still using blogger...but I'm tempted to try another platform. Maybe someday I'll shift everything to another site and if I do, it will simply not make sense because formatting actually looks sane.
On the other hand, moving to a sane site...different template?...may cause the adjustments I tried using to "fix" errors to actually make things weird in some other way.
Maybe time will tell.
I take time to review my posts. I preview them. I lay out the fonts and paragraphs to include spacing that breaks up the sections for increased readability. I wrap text around graphics and use captions for text specific to that image.
It seems like no matter how much time I put into carefully laying out the format of the page, at some point I view the post as a regular user and...WHY IS THE SPACING GOOFED UP?
Is it Blogger? Is it a side effect of the template used for the layout? Certain fonts used?
I don't know.
I just know that it's incredibly frustrating.
It seems odd that even when I preview a post, adjust spacing, and finally post, the end result is still..."off".
I have several subjects to write about. Periodically noticing screwed up posts led me to write this up first.
As I type this I'm still using blogger...but I'm tempted to try another platform. Maybe someday I'll shift everything to another site and if I do, it will simply not make sense because formatting actually looks sane.
On the other hand, moving to a sane site...different template?...may cause the adjustments I tried using to "fix" errors to actually make things weird in some other way.
Maybe time will tell.
Tuesday, August 1, 2017
More Tuning Golang Apps for High Concurrency Tasks on Linux
I have a project that is fairly straightforward. Again it's work related, so I have to fuzz some details, and again my memory is naturally fuzzy so I doubt it's an issue.
This program I've been working on makes calls to a service (a REST endpoint) that in turn pulls data from a database, then my application parses that information into components and checks the disk to see if the file already exists. If it doesn't exist on disk already, the program makes a call to the API endpoint again asking for specific record information and writes it to the disk. In the end I get a huge set of files sorted in a structure resembling ./files/year/month/day/datatype/subtype/filename.txt.
There are literally millions of records to sort through. A single thread handling this would probably take weeks. Therefore, the program uses several (configurable!) goroutines to pull records simultaneously.
I wrote about this fix earlier in the blog, but I'll give a quick recap.
At first everything seemed fine. I have a simple bit of math being output periodically to the console, and it was chugging along at around 20,000 records/minute. The system was functioning fine, no errors were showing up. All was right with the world.
Then a few hours later a few alerts arrived in the email. At this point the utility was running on its own instance with its own storage and was making calls to a load balancer that held a few endpoint servers behind it. The only change to the system that could possibly prevent the system from making connections was the utility I was running, so I killed it, and when the API servers were checked there were still 18,000+ network connections in TIME_WAIT on each system.
Linux systems treat files on the disk as well as sockets as "open files" due to the way Linux handles file handles. Too Many Open Files can mean literally too many files are open or it can mean too many network connections are open, but it usually is a combination of the two.
Research time. The problem here is usually related to "you didn't close the connections." That wasn't the cause here. The calls were straightforward; I had a function that created a transport, created a client, made the connection, and called GET then read the data to return to the caller. It was a textbook example fragment of Go adapted to my purposes, and that included a defer Close() call so when the function exited it should make really sure everything was closed properly.
Check the "did you close the connection" off the list. And I also read the data from the socket before closing it, so that can be checked off the list. I had a hacked together bit of logic to retry connections if there was an error, but it also printed that status to the console when that happened. Nothing appeared as the too many open files errors popped up, so even if that caused a socket leak, it wasn't the likely cause.
The issue was the call to instantiate a transport each time the function was called. Transports hold the pool of client connections; the system should be re-using connections. Because the transport was destroyed each time the function returned, it was creating a new pool of connections, which meant new sets of client connections to the server instead of recycling previous connections and that led to thousands of "open files".
The solution was to create the transport and pass it as a parameter to the call to GET the web endpoint. This allowed the transport to continue to manage the client pool outside the scope of the function call, and that allowed the system to keep a managed pool of connections for re-use.
This wouldn't have shown up if I were making periodic, occasional calls to different websites every few minutes. The problem would still be there, but chances are the connections would eventually close and time out before piling up and becoming a problem.
There are a few things that are obvious in affecting the speed information is being processed, and at the risk of sounding immodest, I've been told I'm pretty good at spotting the obvious.
Warning: I'm not a Go expert. I'm citing information here that is just my current understanding, so if I'm wrong, please correct me in the comments.
Because I'm writing files, the drive can definitely affect performance. I have multiple processes that could be trying multiple disk operations in parallel at a given time. To that end, disk seek times, write times, and cache can directly impact the utility's speed.
I'm dealing with millions of files. During the initial testing and design of the utility, I had to deal with a file that would unzip into a directory holding around 100,000 files; then I had to deal with several of those 100K-file containing directories for processing. If you haven't tried that on a Macintosh using the HFS+ filesystem, it's not fun. EXT4 doesn't really handle it well either. Even on an SSD, getting a directory listing is downright painful. Too many files in one directory is difficult for some filesystems to handle.
One solution is to split the directory into more subdirectories, reducing the number of entries the system has to track per directory. This is in fact the solution I used, splitting information into logical subsets.
Another fun fact I learned during this project; by default, the Golang client doesn't have timeouts set. This leads to some fun havoc with stray connections left in weird states that if you're using them to hit random sites in a semi-random fashion, you'd probably never notice. Hammer the same site with hundreds of requests per second, and you can bet this can have some ramifications.
I read about this in a blog post warning against using the default settings in http.Client. After reviewing that information, I went back to the source code and added some timeouts, like so:
This is a modification I made to the most intensively-used connection set; I didn't move the transport's scope for a far less-used connection in another function, figuring that yes, they would pile up to a degree, but they should properly close and age out as closed connections. This set will hammer the server with thousands of connections in parallel.
This basically added some sane timeouts to functions that previously did not have any timeouts. This helped noticeably reduce my ghost connections disappear.
The initial run finished a few days later. I realized that there was a bug in my loop logic. There were some bad words uttered and an updated version compiled.
At this point we also moved the utility, and the volume to which data was being saved, to the same system that held the API endpoint server. Basically the server being queried for information was now also hosting the client requesting and processing results from the API queries.
This eliminated what before was creating a kind of natural bottleneck that throttled performance; hundreds of connections per second simultaneously hitting the server but separated by the network transit time. Sure, it was on the scale of tens of milliseconds (if things were working well), but it really added up.
Now the client was requesting it from the localhost. *Bam*. Within a few moments, the number of open connections (using netstat |wc -l, since I only needed a rough estimate) ballooned to 40,000 connections before this appeared on the console:
dial tcp <ip address redacted>: can't assign requested address
Because dial was in the error, it was most likely the client causing the issue. After some poking around, I ended up making two more changes.
First, I tried to make a change to the number of idle connections the client keeps open. The default is two; more than that, and the client was closing the connections in the idle pool instead of making more efficient use of re-using the clients. Again, working with random connections aren't so bad, but hammering the same IP will highlight the need to alter this (and you probably don't want to change this if you're not making a large number of frequent calls to the same host):
The changed setting is MaxIdleConnsPerHost in the transport. Here I set it to a variable that in turn is set from the command line so I could tune it at runtime, but instead of the default 2 I set it closer to 400.
The next change was an alteration on the host server. There is some guidance on a SO question explaining some tuning tweaks, but the gist of the change I made is this...
When the TCP connection is made, the connection is made to an ephemeral port. When I have a ton of tcp connections hitting the server, it would starve the number of ephemeral ports available. The next step was to try increasing the number of ports available, and then the server could support more connections per second, hopefully at a level where the connections would close and age out properly before overloading the system.
In this case, I changed net.ipv4.ip_local_port_range from "32768 61000" to "9000 64500". From the SO question, this means I changed the connectivity from (61000-32768)/60 = 470 sockets/second to (64500-9000)/60 = 925 sockets/second.
There was another change I could make from the page that involved changing the net.ipv4.tcp_fin_timeout setting, along with a couple of others. I avoided that, opting instead to test these changes because the tuning advice was more like "change this on the client" or "change this on the server", not really geared to a situation where the server and client were eating resources on the same host. Making minimal changes to keep it working, for this project, would be fine.
I ran netstat in a loop while the application ran again. This time the open connections quickly climbed to 70,000 connections before leveling out, and it held steady. After 15 hours of elapsed runtime, it had 3 connection errors show up. Otherwise it kept up with the load just fine.
I should also mention that I ran 4 parallel processing tasks, one for each core. When I boosted that number it seemed to be a hindrance to the processing speed; keeping it at 4, the estimated processing speed was over 100K records/minute, easily holding sustained bursts 5 or 6 times the processing speed when the client was on a separate machine.
Background
There are literally millions of records to sort through. A single thread handling this would probably take weeks. Therefore, the program uses several (configurable!) goroutines to pull records simultaneously.
First Problem: Too Many Open Files
At first everything seemed fine. I have a simple bit of math being output periodically to the console, and it was chugging along at around 20,000 records/minute. The system was functioning fine, no errors were showing up. All was right with the world.
Then a few hours later a few alerts arrived in the email. At this point the utility was running on its own instance with its own storage and was making calls to a load balancer that held a few endpoint servers behind it. The only change to the system that could possibly prevent the system from making connections was the utility I was running, so I killed it, and when the API servers were checked there were still 18,000+ network connections in TIME_WAIT on each system.
Linux systems treat files on the disk as well as sockets as "open files" due to the way Linux handles file handles. Too Many Open Files can mean literally too many files are open or it can mean too many network connections are open, but it usually is a combination of the two.
Research time. The problem here is usually related to "you didn't close the connections." That wasn't the cause here. The calls were straightforward; I had a function that created a transport, created a client, made the connection, and called GET then read the data to return to the caller. It was a textbook example fragment of Go adapted to my purposes, and that included a defer Close() call so when the function exited it should make really sure everything was closed properly.
Check the "did you close the connection" off the list. And I also read the data from the socket before closing it, so that can be checked off the list. I had a hacked together bit of logic to retry connections if there was an error, but it also printed that status to the console when that happened. Nothing appeared as the too many open files errors popped up, so even if that caused a socket leak, it wasn't the likely cause.
The issue was the call to instantiate a transport each time the function was called. Transports hold the pool of client connections; the system should be re-using connections. Because the transport was destroyed each time the function returned, it was creating a new pool of connections, which meant new sets of client connections to the server instead of recycling previous connections and that led to thousands of "open files".
The solution was to create the transport and pass it as a parameter to the call to GET the web endpoint. This allowed the transport to continue to manage the client pool outside the scope of the function call, and that allowed the system to keep a managed pool of connections for re-use.
This wouldn't have shown up if I were making periodic, occasional calls to different websites every few minutes. The problem would still be there, but chances are the connections would eventually close and time out before piling up and becoming a problem.
Too Many Files Leads to Terrible Times
Warning: I'm not a Go expert. I'm citing information here that is just my current understanding, so if I'm wrong, please correct me in the comments.
Because I'm writing files, the drive can definitely affect performance. I have multiple processes that could be trying multiple disk operations in parallel at a given time. To that end, disk seek times, write times, and cache can directly impact the utility's speed.
I'm dealing with millions of files. During the initial testing and design of the utility, I had to deal with a file that would unzip into a directory holding around 100,000 files; then I had to deal with several of those 100K-file containing directories for processing. If you haven't tried that on a Macintosh using the HFS+ filesystem, it's not fun. EXT4 doesn't really handle it well either. Even on an SSD, getting a directory listing is downright painful. Too many files in one directory is difficult for some filesystems to handle.
One solution is to split the directory into more subdirectories, reducing the number of entries the system has to track per directory. This is in fact the solution I used, splitting information into logical subsets.
Timing Out Connections
I read about this in a blog post warning against using the default settings in http.Client. After reviewing that information, I went back to the source code and added some timeouts, like so:
tr := &http.Transport{ Dial: (&net.Dialer{ Timeout: 30 * time.Second, }).Dial, TLSHandshakeTimeout: 30 * time.Second, } client := &http.Client{ Transport: tr, Timeout: time.Second * 10, }
This is a modification I made to the most intensively-used connection set; I didn't move the transport's scope for a far less-used connection in another function, figuring that yes, they would pile up to a degree, but they should properly close and age out as closed connections. This set will hammer the server with thousands of connections in parallel.
This basically added some sane timeouts to functions that previously did not have any timeouts. This helped noticeably reduce my ghost connections disappear.
Remove a Hindrance, Create a New One
At this point we also moved the utility, and the volume to which data was being saved, to the same system that held the API endpoint server. Basically the server being queried for information was now also hosting the client requesting and processing results from the API queries.
This eliminated what before was creating a kind of natural bottleneck that throttled performance; hundreds of connections per second simultaneously hitting the server but separated by the network transit time. Sure, it was on the scale of tens of milliseconds (if things were working well), but it really added up.
Now the client was requesting it from the localhost. *Bam*. Within a few moments, the number of open connections (using netstat |wc -l, since I only needed a rough estimate) ballooned to 40,000 connections before this appeared on the console:
dial tcp <ip address redacted>: can't assign requested address
Because dial was in the error, it was most likely the client causing the issue. After some poking around, I ended up making two more changes.
First, I tried to make a change to the number of idle connections the client keeps open. The default is two; more than that, and the client was closing the connections in the idle pool instead of making more efficient use of re-using the clients. Again, working with random connections aren't so bad, but hammering the same IP will highlight the need to alter this (and you probably don't want to change this if you're not making a large number of frequent calls to the same host):
tr := &http.Transport{ Dial: (&net.Dialer{ Timeout: 30 * time.Second, }).Dial, TLSHandshakeTimeout: 30 * time.Second, MaxIdleConnsPerHost: intIdleConns, }
The changed setting is MaxIdleConnsPerHost in the transport. Here I set it to a variable that in turn is set from the command line so I could tune it at runtime, but instead of the default 2 I set it closer to 400.
The next change was an alteration on the host server. There is some guidance on a SO question explaining some tuning tweaks, but the gist of the change I made is this...
When the TCP connection is made, the connection is made to an ephemeral port. When I have a ton of tcp connections hitting the server, it would starve the number of ephemeral ports available. The next step was to try increasing the number of ports available, and then the server could support more connections per second, hopefully at a level where the connections would close and age out properly before overloading the system.
In this case, I changed net.ipv4.ip_local_port_range from "32768 61000" to "9000 64500". From the SO question, this means I changed the connectivity from (61000-32768)/60 = 470 sockets/second to (64500-9000)/60 = 925 sockets/second.
There was another change I could make from the page that involved changing the net.ipv4.tcp_fin_timeout setting, along with a couple of others. I avoided that, opting instead to test these changes because the tuning advice was more like "change this on the client" or "change this on the server", not really geared to a situation where the server and client were eating resources on the same host. Making minimal changes to keep it working, for this project, would be fine.
I ran netstat in a loop while the application ran again. This time the open connections quickly climbed to 70,000 connections before leveling out, and it held steady. After 15 hours of elapsed runtime, it had 3 connection errors show up. Otherwise it kept up with the load just fine.
I should also mention that I ran 4 parallel processing tasks, one for each core. When I boosted that number it seemed to be a hindrance to the processing speed; keeping it at 4, the estimated processing speed was over 100K records/minute, easily holding sustained bursts 5 or 6 times the processing speed when the client was on a separate machine.
This Was a Minimal Set of Changes
There were a number of lessons learned so far; above the basic novice checking that connections are properly read from a network client response before calling Close(), be aware that the transport is what controls the pool of connection clients for efficient re-use.
Next, be aware that by default timeouts are missing from the transport and client. Add them.
Also if you're hitting a particular server or set of servers with requests, change your MaxIdleConnsPerHost. Otherwise you're wasting connection use.
Last, an easy way to boost connection rates is to increase the number of ephemeral ports available. There are limits to this...and you don't want to starve other resources by taking away those ports from other clients or servers on the host.
There are plenty of other changes that can be made to increase horsepower of your servers. Some additional changes are in the SO question I linked to; another good blog post discusses how MigratoryData scaled servers to 12 million concurrent connections. I'd only caution that not every task requires this kind of engineering and you might want to exercise restraint in changing things when a few tweaks can accomplish decent performance for your use case.
Performance is a scale. Some things can be overcome with throwing lots of hardware at it. Sometimes a few tweaks will make your app run 5 or 6 times faster.
Happy tuning!
Thursday, July 20, 2017
Golang: HTTP Client Opens Too Many Sockets ("Too many open files")
This relates to a project that is work related, so I have to fuzz some of the details. But on the other hand, some details are naturally fuzzed because I have to remember some of the details and my memory is naturally fuzzy...
I'm working on a utility that is, on the surface, simple. It makes a call to an API endpoint using the http.Client, compares some quick results, and if certain conditions are met it makes a series of API calls to save the JSON responses.
The processing/checking process is carried out in a set of goroutines because the comparisons are easy to do in parallel. If the routine needs to pull a JSON reply, it calls a function that is laid out pretty much like the standard examples from the "here's how you get a web page in Go!" sites.
func GetReply(strAPI string, strServer string) string { // The URL to request strURL := strServer + “/service/" + strAPI // Also add timeouts for connections tr := &http.Transport{ Dial: (&net.Dialer{ Timeout: 5 * time.Second, }).Dial, TLSHandshakeTimeout: 5 * time.Second, } client := &http.Client{ Transport: tr, Timeout: time.Second * 10, } // Turn it into a request req, err := http.NewRequest("GET", strURL, nil) if err != nil { fmt.Println("\nError forming request: " + err.Error()) return "" } req.Header.Set("Content-Type", "application/json") req.Header.Set("Accept", "application/json") // Get the URL res, err := client.Do(req) if err != nil { fmt.Println("\nError reading response body: " + err.Error()) if res != nil { res.Body.Close() } return "" } // What was the response status from the server? var strResult string if res.StatusCode != 200 { fmt.Println("\nError reading response body, status code: " + res.Status) if res != nil { res.Body.Close() } return "" } // Read the reply body, err := ioutil.ReadAll(res.Body) if err != nil { fmt.Println("\nError reading response body: " + err.Error()) if res != nil { res.Body.Close() } return "" } res.Body.Close() // Cut down on calls to convert this strResult = string(body) // Done return strResult }
This is actually a modified version of what I've pulled from various tutorials and examples, adding more calls to Close() and doing a check for whether res is nil before performing that call in an error.
I also added a timeout to the client because by default it is set to 0; no timeout. As you can probably guess this version was modified while troubleshooting.
After a few hours of the application running we had alerts come in about failing functions on the production servers. When I opened logs I discovered a number of "too many files open" errors, and a developer on the call said there were over 18,000 socket connections on each of the balanced servers.
The only difference was my use of this test program, so I killed it. The socket count fell.
Welp...guess we found the cause. But why?
There are a couple basics for beginners when using http.Client requests.
1) Close the response body after reading.
2) Clients are reused.
3) If you defer a call to Close() (as this one originally did, and most tutorials show) the function should call Close() when the function returns. The modified sample I posted simply closes it after reading the Body and checking for errors.
At first I thought it was due to clients not closing; they must close in order to be re-used. I traced the execution path a dozen ways and added more explicit Close()'s in error checks...but those errors were never printing anything during the run, so errors shouldn't be causing spill of sockets.
I added timeouts to the client and dialer. While that didn't hurt and probably made things a little cleaner, it still didn't help the too many open files/sockets error.
Another lead came from a close reading of a Stack Overflow answer. The function is creating a new Transport, tr, with each call. That Transport is what holds the Clients pool for reuse. See where I'm going with that?
Another answer on that page talked about creating a global client for his functions to reuse.
The theme was scope of variables matters when dealing with what allows re-use. Because I'm hitting the same server repeatedly and the function kept re-instantiating the mechanism that was used to govern client re-use, the number of new connections and left-open sockets ballooned.
My next move was to go to the goroutines that were in charge of processing the replies from the API endpoints and have them create Transport instances, then when they call the function they passed the Transport as a parameter.
I uploaded the program to a remote system instance and re-ran it while watching netstat on the server and the client systems. After initially ballooning to about 4,000 connections it soon settled down to well under 100 connections (using netstat |wc -l).
Takeaways:
1) Modify the default client, and maybe the transport dialer, to add sane timeouts.
2) If you're hitting the same server repeatedly, do it all within the same scope as your transport instantiation or create a transport and pass it as a parameter to functions so you optimize the re-use of the client pool
3) Check that you properly close the response body so the client can be re-used. Check in error paths that it can be properly closed without panicking.
What about separating not just the Transport, but also the Client, then passing the Client around as a parameter? I didn't test that because I wasn't sure how "goroutine-safe" that would be against race conditions, despite the one answer on that Stack Overflow that demonstrated using a global Client instance for use. It's possible it works fine. At this point it looks like passing the Transport worked fine, though.
I'll also note that my usual self-loathing and insecurity isn't getting the better of me this time because the top answer on that question that inspired me to try this solution was the usual advice I found repeatedly in other sites and blogs (and SO answers); check that you close your response properly. It's the top answer by a significant margin. It was almost an afterthought to realize that maybe what I was doing was pummeling one particular website with multiple instantiations of Client pools so Client reuse was a minimum.
Happy HTTP Client-ing!
I also added a timeout to the client because by default it is set to 0; no timeout. As you can probably guess this version was modified while troubleshooting.
After a few hours of the application running we had alerts come in about failing functions on the production servers. When I opened logs I discovered a number of "too many files open" errors, and a developer on the call said there were over 18,000 socket connections on each of the balanced servers.
The only difference was my use of this test program, so I killed it. The socket count fell.
Welp...guess we found the cause. But why?
There are a couple basics for beginners when using http.Client requests.
1) Close the response body after reading.
2) Clients are reused.
3) If you defer a call to Close() (as this one originally did, and most tutorials show) the function should call Close() when the function returns. The modified sample I posted simply closes it after reading the Body and checking for errors.
At first I thought it was due to clients not closing; they must close in order to be re-used. I traced the execution path a dozen ways and added more explicit Close()'s in error checks...but those errors were never printing anything during the run, so errors shouldn't be causing spill of sockets.
I added timeouts to the client and dialer. While that didn't hurt and probably made things a little cleaner, it still didn't help the too many open files/sockets error.
Another lead came from a close reading of a Stack Overflow answer. The function is creating a new Transport, tr, with each call. That Transport is what holds the Clients pool for reuse. See where I'm going with that?
Another answer on that page talked about creating a global client for his functions to reuse.
The theme was scope of variables matters when dealing with what allows re-use. Because I'm hitting the same server repeatedly and the function kept re-instantiating the mechanism that was used to govern client re-use, the number of new connections and left-open sockets ballooned.
My next move was to go to the goroutines that were in charge of processing the replies from the API endpoints and have them create Transport instances, then when they call the function they passed the Transport as a parameter.
I uploaded the program to a remote system instance and re-ran it while watching netstat on the server and the client systems. After initially ballooning to about 4,000 connections it soon settled down to well under 100 connections (using netstat |wc -l).
Takeaways:
1) Modify the default client, and maybe the transport dialer, to add sane timeouts.
2) If you're hitting the same server repeatedly, do it all within the same scope as your transport instantiation or create a transport and pass it as a parameter to functions so you optimize the re-use of the client pool
3) Check that you properly close the response body so the client can be re-used. Check in error paths that it can be properly closed without panicking.
What about separating not just the Transport, but also the Client, then passing the Client around as a parameter? I didn't test that because I wasn't sure how "goroutine-safe" that would be against race conditions, despite the one answer on that Stack Overflow that demonstrated using a global Client instance for use. It's possible it works fine. At this point it looks like passing the Transport worked fine, though.
I'll also note that my usual self-loathing and insecurity isn't getting the better of me this time because the top answer on that question that inspired me to try this solution was the usual advice I found repeatedly in other sites and blogs (and SO answers); check that you close your response properly. It's the top answer by a significant margin. It was almost an afterthought to realize that maybe what I was doing was pummeling one particular website with multiple instantiations of Client pools so Client reuse was a minimum.
Happy HTTP Client-ing!
Thursday, July 13, 2017
Your Experiences Create Your Methods
Sounds obvious, doesn't it?
But at the same time, I feel like it's one of those things that shapes our worldview to the point where you lose sight of the fact that it's obvious; we end up taking our views for granted and ignoring why you approach problems the way you do.
(Or, perhaps worse, we ignore why other people approach problems the way they do, which in turn you react towards them in a possibly negative fashion.)
I'm thinking of this because the other day I was working with a coworker on a problem assigned to us by a manager. Without getting into too many details, one step involved a program reading a list from a text file.
The file was tens of thousands of lines; the program expected the format:
12345,string of text,state_name
The file we got was formatted:
12345,"string of text",state_abbreviation
We were coming up with a game plan and reviewing steps when the file came in, and were divvying up the work needed to get the ball rolling.
My very first thought was to write a Go program that read the file contents into a slice, range over the slice to replace the comma-quote and quote-comma with just commas, then split by comma and replace the item[2] with the full state name using a map I could copy and paste from a previous program I had worked on. Give the size of the file to work on, it shouldn't have taken too long, from my estimation.
My coworker volunteered to reformat the file. After he completed it, I asked him how he cleaned the file. His background is ostensibly in sales, although he also does programming in PHP and can create mockups and web utilities for other employees to use in pulling reports and demos, so I expected he ran it through a one- or two-line PHP filter or something similar.
He pulled up Excel, imported the file as a comma-delimited file, then showed me a formula that pulled state abbreviations-to-full-names from another spreadsheet he already had set up.
In a way the approach wasn't too different. We split the lines into fields and used what amounted to a map to do a value replace, then export the results to a new text file. But the execution was very different. His sales experiences, and having to deal with formatting reports from our system, meant the first tool he used to solve the problem was a spreadsheet (which was a faster and more efficient solution than I was going to use for a one-off reformatting job like this.)
I've been working heavily on Go-based utilities; manipulating log files, manipulating APIs, making text dance as it was processed through pipelines and sending results through databases and monitoring systems. When I saw this text file I immediately saw strings.Split and map[string]string solutions running through my head.
What other solutions are there? Tons, no doubt. Filtering through a series of AWKs and pipes and redirects...maybe PERL...maybe PHP...I know there's plenty of people who would have used Excel to import it and alter it by hand. While I'd probably argue that the manual method could be considered "wrong", I'm equally sure there are people who would have arguments why every approach considered (or used) would have been "wrong."
In the end it was the (timely) results that mattered.
So next time you see someone with a different approach to doing something, don't be quick to criticize. Think about why that person has that approach. Maybe they do know something that is more efficient. Maybe not. Sometimes it's interesting to learn how someone came to use the methods they use and you'll learn something about what it's like for people who aren't you.
But at the same time, I feel like it's one of those things that shapes our worldview to the point where you lose sight of the fact that it's obvious; we end up taking our views for granted and ignoring why you approach problems the way you do.
(Or, perhaps worse, we ignore why other people approach problems the way they do, which in turn you react towards them in a possibly negative fashion.)
I'm thinking of this because the other day I was working with a coworker on a problem assigned to us by a manager. Without getting into too many details, one step involved a program reading a list from a text file.
The file was tens of thousands of lines; the program expected the format:
12345,string of text,state_name
The file we got was formatted:
12345,"string of text",state_abbreviation
We were coming up with a game plan and reviewing steps when the file came in, and were divvying up the work needed to get the ball rolling.
My very first thought was to write a Go program that read the file contents into a slice, range over the slice to replace the comma-quote and quote-comma with just commas, then split by comma and replace the item[2] with the full state name using a map I could copy and paste from a previous program I had worked on. Give the size of the file to work on, it shouldn't have taken too long, from my estimation.
My coworker volunteered to reformat the file. After he completed it, I asked him how he cleaned the file. His background is ostensibly in sales, although he also does programming in PHP and can create mockups and web utilities for other employees to use in pulling reports and demos, so I expected he ran it through a one- or two-line PHP filter or something similar.
He pulled up Excel, imported the file as a comma-delimited file, then showed me a formula that pulled state abbreviations-to-full-names from another spreadsheet he already had set up.
In a way the approach wasn't too different. We split the lines into fields and used what amounted to a map to do a value replace, then export the results to a new text file. But the execution was very different. His sales experiences, and having to deal with formatting reports from our system, meant the first tool he used to solve the problem was a spreadsheet (which was a faster and more efficient solution than I was going to use for a one-off reformatting job like this.)
I've been working heavily on Go-based utilities; manipulating log files, manipulating APIs, making text dance as it was processed through pipelines and sending results through databases and monitoring systems. When I saw this text file I immediately saw strings.Split and map[string]string solutions running through my head.
What other solutions are there? Tons, no doubt. Filtering through a series of AWKs and pipes and redirects...maybe PERL...maybe PHP...I know there's plenty of people who would have used Excel to import it and alter it by hand. While I'd probably argue that the manual method could be considered "wrong", I'm equally sure there are people who would have arguments why every approach considered (or used) would have been "wrong."
In the end it was the (timely) results that mattered.
So next time you see someone with a different approach to doing something, don't be quick to criticize. Think about why that person has that approach. Maybe they do know something that is more efficient. Maybe not. Sometimes it's interesting to learn how someone came to use the methods they use and you'll learn something about what it's like for people who aren't you.
Wednesday, May 31, 2017
Programming a Stargate
I've really loved using the Go language. Part of my exploration and tinkering has involved side projects where I'd pull information from outside sources, usually websites, and parse the response for the information I'm looking for.
I always try to be a good citizen for web scraping; I pull the minimum information I need, close connections once I get the response, insert delays between multiple page views, etc. I always try to put only as much load on a service as a regular user would when web browsing.
"What does that have to do with Stargates?"
I really like Stargate. SG-1, Atlantis, or Discovery, doesn't matter (except the animated series...I pretend that doesn't exist.)
Some people hate it when geeks watch movies and get nitpicky about details. "CAN'T YOU JUST ENJOY THE MOVIE?!"
Not always, no. When I enjoy something, I'm the type of person who enjoys not just the story, but the universe in which it is set; this means learning about the feasibility of that story universe. Oh, sure, there are some rules you have to accept in order for that story to work (such as faster than light travel magic handwaving or using lightsabers and not having them vaporize anything too close to the wielder since, you know, REALLY HOT PLASMA...)
One of the key bits to Stargate involves using the Stargate; the dial home device for Earth's portal was not found with the gate. The device can, however, be manually "dialed", which is what SG command does...they have a computer control massive motors that sets each of the chevrons into a lock position, as well as reading diagnostic signals from the gate.
The show handwaves a lot of this process away, but I think it's implied that someone had to program the computer to attempt dialing control and reading (and sending) signals to control the gate. It's a black box; they needed to figure out "If I do X, do I get Y?" and more importantly, "Do I get Y consistently?" (Then maybe figure out what Y means. I mean, you're screwing around with an alien device that connects to other worlds, after all...) I like to think about what it took for that person to approach that black box and coax information out of it in a way that was useful.
Getting information from these websites, designed for human interaction using a web client, is like trying to programmatically poke a stargate. In the process I've discovered that our many websites are frustrating and inconsistent (I sometimes wonder, when I just want to get a list of text to parse, how many common websites are compliant for devices used by people with poor eyesight or braille systems.)
For example, I tried looking at a way to query the status of my orders from a frequently used store site. I thought it would be simple...log in and pull the orders page. Nope. If you order too many items, you might have to query another page with more order details. Sometimes order statuses change in unexpected ways. The sort order of your items isn't always consistent, either. And those were the simpler problems I encountered...figuring out consistency in delivery estimate
I tried a similar quick command line checker for a computer parts company. Turned out they had far more order statuses than I thought they did, and alerting me to changes in that order status was an interesting exercise in false alarms when they'd abruptly change from shipped to unknown and back again.
Another mini-utility I worked on was checking validity of town locations. Pray you never have to work with FIPS...
The website I chose seemed to be fairly consistent in the format of the information. Turns out I was naive in how various towns are designated, and this website was not internally consistent in showing information in a particular order. I get all sorts of interesting but very weird results for different areas around the country.
I'm sure that if I had a dial-home device (in this case, a clear API to the websites or access to an internal database) these lookups would be more straightforward. As it stands, the closest API I can use is the same as anyone with a mouse and keyboard...parsing the web page.
While frustrating at times, I am thankful that these mini-projects have taught me a few things.
I always try to be a good citizen for web scraping; I pull the minimum information I need, close connections once I get the response, insert delays between multiple page views, etc. I always try to put only as much load on a service as a regular user would when web browsing.
"What does that have to do with Stargates?"
I really like Stargate. SG-1, Atlantis, or Discovery, doesn't matter (except the animated series...I pretend that doesn't exist.)
Some people hate it when geeks watch movies and get nitpicky about details. "CAN'T YOU JUST ENJOY THE MOVIE?!"
Not always, no. When I enjoy something, I'm the type of person who enjoys not just the story, but the universe in which it is set; this means learning about the feasibility of that story universe. Oh, sure, there are some rules you have to accept in order for that story to work (such as faster than light travel magic handwaving or using lightsabers and not having them vaporize anything too close to the wielder since, you know, REALLY HOT PLASMA...)
One of the key bits to Stargate involves using the Stargate; the dial home device for Earth's portal was not found with the gate. The device can, however, be manually "dialed", which is what SG command does...they have a computer control massive motors that sets each of the chevrons into a lock position, as well as reading diagnostic signals from the gate.
The show handwaves a lot of this process away, but I think it's implied that someone had to program the computer to attempt dialing control and reading (and sending) signals to control the gate. It's a black box; they needed to figure out "If I do X, do I get Y?" and more importantly, "Do I get Y consistently?" (Then maybe figure out what Y means. I mean, you're screwing around with an alien device that connects to other worlds, after all...) I like to think about what it took for that person to approach that black box and coax information out of it in a way that was useful.
Getting information from these websites, designed for human interaction using a web client, is like trying to programmatically poke a stargate. In the process I've discovered that our many websites are frustrating and inconsistent (I sometimes wonder, when I just want to get a list of text to parse, how many common websites are compliant for devices used by people with poor eyesight or braille systems.)
For example, I tried looking at a way to query the status of my orders from a frequently used store site. I thought it would be simple...log in and pull the orders page. Nope. If you order too many items, you might have to query another page with more order details. Sometimes order statuses change in unexpected ways. The sort order of your items isn't always consistent, either. And those were the simpler problems I encountered...figuring out consistency in delivery estimate
I tried a similar quick command line checker for a computer parts company. Turned out they had far more order statuses than I thought they did, and alerting me to changes in that order status was an interesting exercise in false alarms when they'd abruptly change from shipped to unknown and back again.
Another mini-utility I worked on was checking validity of town locations. Pray you never have to work with FIPS...
The website I chose seemed to be fairly consistent in the format of the information. Turns out I was naive in how various towns are designated, and this website was not internally consistent in showing information in a particular order. I get all sorts of interesting but very weird results for different areas around the country.
I'm sure that if I had a dial-home device (in this case, a clear API to the websites or access to an internal database) these lookups would be more straightforward. As it stands, the closest API I can use is the same as anyone with a mouse and keyboard...parsing the web page.
While frustrating at times, I am thankful that these mini-projects have taught me a few things.
- Websites, some of which I've routinely used, are not as standardized as I thought within their own site. I just hadn't noticed when I'm searching for particular information the items I click on to get what I'm searching for.
- I end up rethinking a lot of parsing logic when digging and sorting through human language.
- Web sites implement some seemingly convoluted logic for interacting with clients and I now have a new appreciation for web browsers.
- I also have a new appreciation for the usefulness of a good API. If I start a business and there's anything that can be exposed through API, I'm making it available through an API.
Saturday, April 29, 2017
Learning By Creating Support Applications
Not long ago I started a job with a company whose primary product is a very custom application that is comprised of many smaller interoperating applications. Without getting into too much detail, the applications communicate through various APIs, many of which are not well documented.
(What follows are thoughts that are not focused solely on the new employer, but rather a set of experiences I've gathered over the years from several jobs and interactions with others in the technology field. In other words, this isn't about the current employer. It's a conglomeration of experiences, and it's my own opinion. Just figured I'd have to clarify that...)
As a company focuses on growth, there comes a time when maintenance and monitoring is moved to staff that are dedicated to those tasks so the developers no longer have to do triple duty; for the new hire tasked with pioneering that position, gathering statistics to get a feel for the behavior of their systems over time, and taking care of regular maintenance and basic troubleshooting is very daunting when there is little (or no) documentation available outlining how to get the necessary metrics for gauging the health of the system.
And it isn't just a lack of documentation that acts as an obstacle. When a software-based company is first conceived and grows, it's natural for the programmers to work on getting the product into a usable, testable state. This means overcoming problems as they arise and focusing on results, not laying framework for delegating future operations.
That fosters institutional knowledge. The more of your system that is developed in-house, the more information future maintainers must glean about your system without help of outside references. Sites like Serverfault can help when you're trying to figure out why a new deployment of Nginx won't work, but it won't be useful when a log contains output from a Java application Bob, three desks away, wrote while debugging a particular reply encountered from another subsystem's API response.
Small companies with a small number of developers may feel it is inconvenient to be interrupted by the new person's constant questions about why application A is dependant on application B, or how application C discovers a service status on server 3. As a new hire, I feel a little hesitant to approach others with these types of questions, preferring to try looking for answers through other means before taking someone else's time.
(In my opinion, if the answer is to check the source code from the repo and read that to get the answers, you may as well have hired a new programmer; recognizing a need for someone dedicated to operating and maintaining your system outside the coterie of coders is a sign that there may be a need to dedicate time to documenting and tooling the application for non-programmer use.)
How can a new hire get a grasp on this situation?
In this case, I've been writing a series of Nagios plugins specifically configured to pull metrics from the various subsystems in the company application. There are cases where I thought a simple task was actually more nuanced that first appeared; each time, I ended up discovering something more about the operation of the system, and I made sure it was documented for later reference.
Each time there's a failure case, I would make a note and start work on a new monitor so we'd know about it in the future. These monitors didn't just collect a snapshot of the current state of a service, it would gather some metric that was then sent to a database and from there plotted on a graphing application for performance monitoring.
The current product relies on database performance; some queries behave different from others, where some are straightforward and others require processing of filters. Some of my checks measure response times.
Others are querying API endpoints for replies of what the services believe are their current health states.
Some queries are pulling the status of database indexing.
In cases where the application is exposing information through Java beans, my plugins are pulling numbers from JMX and checking for values within established expectations.
In other cases, plugins are checking for the existence of files that are supposed to be regularly updated and when certain records are updated in the database.
Each of these plugins, once finished and deployed, are being documented for operation in a way that when new people are hired he or she should be able to easily find a list of how these work and gather indirect information on some aspects of the in-house application operation without programmer-level institutional knowledge.
In the case of my new position, I've gained a higher respect for the value of meta-applications in gaining insight on how a complicated system works. Having information written out or explained to you is enlightening, and I never feel that documenting how something works is a waste of time. But until you find yourself executing on that knowledge, I'm not sure you really understand the subject. Creating support applications that meaningfully interact with the system pushes knowledge into the realm of wisdom the way reading about the science of flight comes alive after building your first remote control plane.
When confronted with the task of comprehending the colossal, try learning about the limited first with applications that monitor and interact with small aspects of the system. Not only will others benefit with the support applications, but you'll benefit with the mental exercise and in the end have a better model of how everything works!
(What follows are thoughts that are not focused solely on the new employer, but rather a set of experiences I've gathered over the years from several jobs and interactions with others in the technology field. In other words, this isn't about the current employer. It's a conglomeration of experiences, and it's my own opinion. Just figured I'd have to clarify that...)
As a company focuses on growth, there comes a time when maintenance and monitoring is moved to staff that are dedicated to those tasks so the developers no longer have to do triple duty; for the new hire tasked with pioneering that position, gathering statistics to get a feel for the behavior of their systems over time, and taking care of regular maintenance and basic troubleshooting is very daunting when there is little (or no) documentation available outlining how to get the necessary metrics for gauging the health of the system.
And it isn't just a lack of documentation that acts as an obstacle. When a software-based company is first conceived and grows, it's natural for the programmers to work on getting the product into a usable, testable state. This means overcoming problems as they arise and focusing on results, not laying framework for delegating future operations.
That fosters institutional knowledge. The more of your system that is developed in-house, the more information future maintainers must glean about your system without help of outside references. Sites like Serverfault can help when you're trying to figure out why a new deployment of Nginx won't work, but it won't be useful when a log contains output from a Java application Bob, three desks away, wrote while debugging a particular reply encountered from another subsystem's API response.
Small companies with a small number of developers may feel it is inconvenient to be interrupted by the new person's constant questions about why application A is dependant on application B, or how application C discovers a service status on server 3. As a new hire, I feel a little hesitant to approach others with these types of questions, preferring to try looking for answers through other means before taking someone else's time.
(In my opinion, if the answer is to check the source code from the repo and read that to get the answers, you may as well have hired a new programmer; recognizing a need for someone dedicated to operating and maintaining your system outside the coterie of coders is a sign that there may be a need to dedicate time to documenting and tooling the application for non-programmer use.)
How can a new hire get a grasp on this situation?
In this case, I've been writing a series of Nagios plugins specifically configured to pull metrics from the various subsystems in the company application. There are cases where I thought a simple task was actually more nuanced that first appeared; each time, I ended up discovering something more about the operation of the system, and I made sure it was documented for later reference.
Each time there's a failure case, I would make a note and start work on a new monitor so we'd know about it in the future. These monitors didn't just collect a snapshot of the current state of a service, it would gather some metric that was then sent to a database and from there plotted on a graphing application for performance monitoring.
The current product relies on database performance; some queries behave different from others, where some are straightforward and others require processing of filters. Some of my checks measure response times.
Others are querying API endpoints for replies of what the services believe are their current health states.
Some queries are pulling the status of database indexing.
In cases where the application is exposing information through Java beans, my plugins are pulling numbers from JMX and checking for values within established expectations.
In other cases, plugins are checking for the existence of files that are supposed to be regularly updated and when certain records are updated in the database.
Each of these plugins, once finished and deployed, are being documented for operation in a way that when new people are hired he or she should be able to easily find a list of how these work and gather indirect information on some aspects of the in-house application operation without programmer-level institutional knowledge.
In the case of my new position, I've gained a higher respect for the value of meta-applications in gaining insight on how a complicated system works. Having information written out or explained to you is enlightening, and I never feel that documenting how something works is a waste of time. But until you find yourself executing on that knowledge, I'm not sure you really understand the subject. Creating support applications that meaningfully interact with the system pushes knowledge into the realm of wisdom the way reading about the science of flight comes alive after building your first remote control plane.
When confronted with the task of comprehending the colossal, try learning about the limited first with applications that monitor and interact with small aspects of the system. Not only will others benefit with the support applications, but you'll benefit with the mental exercise and in the end have a better model of how everything works!
Subscribe to:
Posts (Atom)