Sysadmin Still Surviving

Friday, March 2, 2018

On The Importance of Planning A Program

I'm not a professional programmer.

I'm not sure I could even qualify as a junior programmer.

What I have been doing is programming at a level that is above basic scripting, but below creating full applications. I've been churning out command line utilities for system activities (status checking and manipulating my employer's proprietary system, mostly, along with a bevy of Nagios plugins) with the occasional dabbling into more advanced capabilities to slowly stretch what I can accomplish with my utilities.

That said, I've been trying to reflect on my applications after they've been deemed "good enough" to be useful. In a way, I try running a self-post-mortem in hopes of figuring out what I think works well and what can be improved.

I was recently in a position where I had to create a utility, then months later, got permission to rewrite it, giving me a unique opportunity to take an application that had a specific set of expectations for output and let me refactor its workflow in hopes of improving performance and information it gathered in the process.

For reference, the 10,000 foot view is that I have a large set of data from a large database, and we wanted to dump the contents of that database, using an intermediate service providing REST endpoint API calls, to save each record as a text file capable of being stored and uploaded in another database. A vendor-neutral backup, if you will...all you need is an interpreter that is familiar with the text file format and you could feed the contents back into another service or archive the files offsite.

It seems like this would be a small order. You have a database. You have an API. The utility would get a set of records, then iterate over them and pull records to save to disk.

Only...things are never that simple.

First, there's a lot of records. I realize "a lot" is relative, so I'll just say it's in the 9 digits range. If that's not a lot of records to you, then...good on you. But when you reach that many files, most filesystems will begin to choke, so I think that qualifies as "a lot."

That means I have to break up the files into subdirectories, especially if the utility gets interrupted and needs to restart. Otherwise filesystem lookups would kill performance. Fortunately there's a kind of built-in encoding to the record name that can be translated so I can break it down into a sane system of self-organizing subdirectories.

Great! Straightforward workflow. Get the record names. Iterate to get the record contents. Decode the record name to get a proper subdirectory. Check if it exists. If not, save it.

Oh, there are some records that are a kind of cache...they are referred to for a few days, then drop out of the database. No need to save them.

Not a problem, just add a small step. Get the record names. Iterate to get the record contents. Check if it's a record we're not supposed to archive. If we are, decode the record name to get a proper subdirectory. Check if it exists. If not, save it.

During testing, I discover there are records whose records cannot be pulled. The database will give me a record name but when I try to pull them, nothing comes back. That's odd, but I add a tally of these odd names and a check is inserted for non-200 responses from the API calls.

Then there are records that I can't readily decode. They're too short and end up missing parts for the decoding process. At first I write them off as something I have to tally as an odd record in the logs, but discover that when I try pulling them, the API call returns an actual record. I take this to the person who has institutional knowledge of the database contents and after examining the sample of records, states that it looks like the records were from an early time in the company history.

Basically, there's a set of specs that current records should follow, but there are records from days of yore that are valid but don't follow the current specs.

So there are records that should be backed up...but don't follow the workflow, where I have functions that check for record validity through a few tests before going through the steps of making network calls and adding to the load on the servers acting as intermediaries for the transfer. To fix this, I insert a new pathway for processing those "odd" records when they're encountered, so they end up being queried and translated and, if they are a full record, saved to an alternative location. The backups are now separated into the set of "spec" records and another "alternative" path.

The problem is that this organic change cascades into a number of other parts of the utility; my tally counts for statistics are thrown off. The running list of queued records to process have to take into account records that are flowing into this alternative path. Error logging, which also handled some tallying duties since it was an end-of-life for some of the records to be processed, weren't always actually errors but actually a notification that something had happened during the process that was helpful during tracing and debugging but a problem when it would mark certain stats off before the alternative record was processed.

That one organic change in the database contents during the history of the company had implications that totally derailed some of the design of my utility that took into account only the current expected behavior.

In the end, I lost several days of debugging and testing when I introduced fixes that took into account these one-offs and variations. What were my takeaways?

It would be simple to say that I should have spend some days just sketching out workflows and creating a full spec before trying to write the software. The trouble is that I didn't know the full extent to which there were hidden variations in the database; the institutional knowledge wasn't readily available for perusing when it resides in other people's heads, and they're often too busy to try coming up with a list of gotchas I could watch out for in making this utility.

What I really needed to do was create a workflow that anticipated nothing going quite right, and made it easy to break down the steps for processing in a way that could elegantly handle unexpected changes in that workflow.

After thinking about this some more, I realized that it was just experience applied to actively trying to modularize the application. The new version did have some noticeable improvements; the biggest involved changing how channels and goroutines were used to process records in a way that cut the number of open network sockets dramatically and thus reduce the load on the load balancers and servers. Another was changing the way the queue of tasks was handled; as far as the program was concerned, it was far simpler to add or subtract worker routines in this version than the previous iteration.

I'd also learned more about how to break down tasks into functions and disentangle what each did, which simplified tracing and debugging. Granted, there are places where this could still have been improved. But the curveballs introduced as I found exceptions to the expected output from the system, for the most part, just ate time as I reworked the workflow and weren't showstoppers.

I think I could have definitely benefited from creating a spec that broke tasks down and figured out the workflow a bit better, along with considering "what-ifs" when things would go off-spec. But the experience I've been growing in my time making other utilities and mini-applications still imparted improvements. Maybe they're small steps forward, but steps forward are steps forward.

Saturday, January 13, 2018

Regulations and Dieting (and Surgery)

This is a few thoughts that involve something common in the new year; dieting. Well, tangentially diet related.

Part of the issues I've had cascade down in the past few months...thanks life!...has led to appointments with the rather new bariatric unit at the local hospital unit. They take a whole-in approach of using a team of nutritionists, fitness experts, gastric surgeons, psychologists...the whole nine yards...to create a program with support system for patients.

Part of the intake process meant reviewing your history. This is where I learned something nifty (beyond this machine that weighs you while zapping you with a current that measured all sorts of density information regarding the different kinds of body fat and densities in your body to come up with a profile of good and bad stuff in your body).

They asked about my past history and I told them about the gastric bypass procedure I underwent many years ago...I believe it was around 2009. April. Somewhere in there. My memory is fuzzy.

At the time, the local hospital system didn't really have a bariatric unit. While they very much seemed to support the idea that if you're fat, most of your illnesses and afflictions were weight-based and you needed to lose weight to deserve to be better, they were not well known for their "let's cut parts of the digestive system apart to help lose weight."

There was another hospital, about an hour away from us, that did have a small bariatric surgery unit. They took me into the program, agreed to do the surgery if I lost X amount of weight first, and after reaching that milestone I had the surgery.

Not long after, during the latter phases of physical recovery, I unceremoniously discovered that not only did my surgeon retire, but the hospital killed their bariatric surgery program. There was no notice. There was no letter, no email, no announcement ever reached us. Just...nothing. No more appointments kept.

I soured on the medical system a little more at that point. There was emphasis on how important a support system was...and there is certainly no shortage of continued feeling that when a doctor looks at you, your weight is first a foremost on their mind when figuring out how much a person is worth.

One day I had a consult about something at the local hospital and they mentioned the bariatric surgery, and how I could get followup at the other hospital.

"We can't," I said. "They shut down their bariatric unit."

"They restarted it a little while ago," they said.

Turns out, with little (read: no) fanfare or notification, they revived their bariatric unit. I have no doubt the doctors I worked with are gone; my surgeon had retired, and I can't imagine the younger doctors stuck around once their specialty had been shut down.

This came at a time when fat people were becoming (medically) profitable. Oh, sure, we're still a huge expense in cardiac care (and in this time the local hospital became a leader in cardiac care), but now some of those costs are being recouped through insurance companies through growing sleep apnea care, diabetes drugs and bariatric surgery. What was justification for treating people as sub-human was becoming a PR race to open the best fat-care centers, which before was the market for hucksters and easy diet schemers on television ads.

In other words, upon hearing that the other hospital had re-opened their bariatric unit without any announcement to former patients, I figured it was because it was becoming fashionable and probably profitable to do so. I certainly didn't trust them to give a damn, though. They didn't notify their old patients about it. They expressed no damns about my status. So...screw them.

The annoying thing is that the local hospital decided to focus more and more money into developing a local bariatric/weight loss program. As time went on they moved more staff into specializing on weight care. They repurposed a building just for weight loss. They focused resources on their weight loss center.

But when the topic of weight loss came up with my appointments, the moment my surgery history came up it was suggested I drive another half hour to the other hospital and continue care there.

It was during intake that I finally found out why. During the consult they mentioned something about checking the size of the stomach pouch, as it was obvious I could eat more than I was supposed to be able to. My history came up, and she said something about going to the other hospital.

I recounted my history and my distaste for dealing with a hospital that made it so blatant they didn't give a damn about their patients. She said that she could talk to the surgeon in the local hospital's weight clinic, but she knew what he'd say...no, he wouldn't work with me on it. That was when I learned why.

The government made rules.

See, to make hospitals "accountable" (that's a big buzzword for hospitals now, not just schools!) they were getting evaluated based on patient followup. In this example, I was operated on by hospital A. They had a program they wanted to end, and they did...essentially dumping their patients.

I ended up going to hospital B, my preferred hospital for most medical issues since I only went to A for a procedure B refused to do at the time. But this means that if anything was bariatric-related, B was getting (federally) evaluated for my poor outcome. At some point it seems A was pressured to re-open their bariatric program and make available their resources to old and new patients (although they didn't advertise it...take that as you will.)

That was why I was repeatedly "encouraged" to go to another hospital for some weight treatment followups. It's also why I'm not able to access certain resources at a hospital that in the years following my surgery dumped not insignificant resources into developing a "cutting edge" bariatric unit.

Once again the government is interfering in efforts they don't understand. Or at a minimum lots of hands in the pot have created a system that benefits not the patient, but some other interests, with the net effect of screwing the patient.

In the end I still have to go through their weight clinic, just with some options limited. I get to begin the new year miserably tracking calorie counts and using words like "carbs" and "abs" and "veggies," and dealing with the neuroses that I know will flare up while pursuing the accurate tracking of goals.

Will I be successful? Will I find more reason to distrust and/or outrightly dislike the hospital? Or will I fail miserably? Time will tell. But if you'll excuse me, I have to go prepare a big old egg patty with...egg. Lots of protein. Minimal carbs. Low calorie!

I really miss food.

Friday, December 22, 2017

Golang Web Server: Don't Do This

I still consider myself new to programming. The new job allows me to create a lot of small system tools using Go mostly for augmenting monitoring and create utilities to replace manual API calls using JQ and CURL with single executables created in Go. It's been a wonderful learning experience.

Sometimes I try to add some new features to utilities that are snazzy but also a bit of an experiment.

This is a bit of reflection on the design I originally used and I am not in a mood to pull out layers of source code to show what I had done, especially if no one is asking for it. But I will describe the basic design in an effort to not only avoid implementing it that way again but to warn others not to make the same design pattern mistake.

The utility is mainly a long-running process that is interrogating one of our services for database information. It gets raw data from the database, pulls some stats like record size and type, and tallies the information. Millions and millions of records.

What if, I thought, I provided a peek into what the state of the tallying is beyond what I already had showing? It would output a count of some basic information as a one-liner every thirty seconds to the console, but that wasn't good enough. I thought, why not create a web interface that would output a simple text page of information?

Go loves channels. And I had several "worker goroutines" that handled specific tasks in the tally program, passing messages to a coordination process that serialized scheduling record analysis, directing results, and monitoring the state of various workers. Breaking them up made things pretty fast once I stuck in a few tweaks here and there.

Adding a web server routine wasn't hard. Then I thought, I could just add a couple of channels to plug them into routines that held statistics.

Here's where I made what later turned into a mistake.

Instead of individual handlers, I created a single handler that took message strings via channels. The messages consisted of a random ID and a type, where the type was the page request.

The reader on the other side of the channel split the message, used a select{} to determine which page it should construct, and returned through another channel the page with that ID string prepended. The receiver on the other side would look for the message and see if the ID belonged to its request. If it wasn't the proper ID, it just re-fed it to the channel, hoping that the right recipient would pick it up later, and the next message in the channel was intended for that particular reader. Line by line the page was fed back down the channel, with the ID attached to each message, until the ID was attached to a message: "END OF PAGE", at which point the page was done and connection closed.

Don't do that.

The thing is, this seemed to work. I opened a web browser, opened the page, and it worked. I could request the different pages and it worked just fine.

It worked until one page got kind of big and I opened two web pages to the server. Something seemed to get "stuck." One of my statuses gave a snapshot of the fill state of some channels and I noticed some of the web-related channels were...throbbing? Growing huge and slipping down, as if revving up with more lines of messages than should possibly be needed. Something was getting misdirected and the lightweight speed of goroutines meant it was flooding channels with useless information.

No problem, I thought. I'll add a third field, a counter, which once it reached a certain level would simply discard the message. The web page was meant to be read by a person who was trying to get some stats on the status of this utility while it was running, not the general public...refresh the page, hopefully you'll get a working reply that time. Sloppy, but might work.

Tested again. It seemed to keep the channels from getting as clogged up, but I still had some kind of crosstalk that when pages grew larger, and it wasn't hard to create some kind of denial of service from the web server when two different pages were opened. It almost seemed as if sometimes the two pages got completely confused which tab was supposed to get what page.

Maybe it was too easy to get messages mixed up because pages were feeding line by line. I went through the page composition and instead of feeding each line through, I had the process create one big string and feed the result.

This cut down on responsiveness but increased reliability. Kind of. It was significant, but not enough to be proud of. If anyone tried pulling a web page from the utility while someone else used it there was a non-zero chance it would get a weirdly formatted page, if not a timeout.

After finishing some work on other utilities, I decided to refactor the 4 web pages into their own handlers with separate functions and move some of the information being read into global structs with mutex's for protection. Before making the change I ran a test with Bombardier, a handing web server throughput tester. The test totally choked on the channel handler architecture.

I refactored, separated out the page composition into individual handlers, and eliminated channels for web page feeding. No more IDs. No more parsing out replies. No more tracking how many times this particular message is making rounds before "expiring" it.

Bombardier hammered away on the server with no issues. Multiple tabs reading different web pages? No problem. The biggest trigger for problems, clicking back or a link to one of the other pages while a large page hadn't finished rendering, was no longer a problem.

What I wanted to do was find a way to read a URL request and use one handler to interpret what the client wanted, so I didn't need a number of individual handlers defined. I'm pretty sure I still could do that, but I think the weakness was in using channels with an associated ID to parse replies back to the client from a dedicated goroutine holding stats.

The solution I ended up using was individual functions that read from a global struct holding the current state of statistics, and this was protected with a lot of locking.

I suppose another way to do it, with channels, would be finding a way to spawn dedicated channels with each request so the replies didn't need parsing or redirecting; a channel with multiple readers has no guarantee of who is going to get the message at what point. This kind of fix seemed needlessly complicated, though.

I suppose I could also have enhanced the global statistics struct to have functions associated with it, so calls could be made that would automatically lock and reply with information requested by callers. The utility is relatively small, though, and I thought that implementing that would have been more complicated than necessary. I'm not sure if this would enhance the speed of the program, though, and may be worth trying for the learning benefit.

But what I definitely now know is not to pass web pages as composed lines with an ID tagged down a shared channel for a reader to parse and decided, "Is this line meant for me? No? Here, back into the channel you go, floating rubber ducky of information, while I read the next ducky...float away!"

Don't do that.

Sunday, November 26, 2017

StackOverflow and Newcomers

Stackoverflow (SO) is the premiere question and answer site for programmers. It's a joke now that when SO goes down, programmers go home because no work can get done. It is their mission to make life better for programmers, and the men and women working behind the scenes at SO have poured much sweat and tears into growing a useful community for programmers to share solutions to various problems encountered in their algorithm-laden lives.

That is not to say there aren't issues, though. As the site has grown (and it is bit on the huge side now) SO has had to make decisions that define (and refine) the site's character, and not all of these desicions have passed without detractors. They have also had to try addressing criticism of the site, and one of the most common criticisms seems to be related to how (un)welcoming the site can be for newcomers.

I think I can relate to this. I am not a programmer by trade, but I do try to create useful utilities for use in my day job and enjoy programming in at least a hobbyist capacity. I am not very confident in my abilities, though, and definitely do not need someone to remind me of an obvious skill gap (why do you think I'm asking the question in the first place?)

I do not have the answers regarding how to make SO more welcoming to beginners. Perhaps once a community grows to a certain point it naturally fractures into a strata of people who are skilled to a point where they aren't aware of their own bias against lesser-experienced individuals. Or maybe there are rules in the system that encourage what one person interprets to be a "man up, you snowflake!" mentality while an insecure individual interprets the same feedback system to be validation that they don't have what it takes to join with programming peers.

I suppose that when so much of the technology culture centers on a "Brogrammer" mentality rife with competition using knowledge and perceived cleverness as a ranking system, it's natural for some snark to become ingrained in interactions among programmer peers. It's not hard when reading some comments and answers to a SO question to sense a tone of judgement, that the questioner must pass some bar of having earned an answer before they may have one, something beyond the basic search of the site for the same problem before duplicating it.

There have been cases where people will take more time to criticize the questioner than it would have taken to edit or refine the question into something useful and post an answer.

Sometimes it seems you can do everything seemingly right but still fall short in someone's judgement; the ability to down vote a question while leaving no constructive feedback and incurring no penalty in the process (except to the question-asker) seems like a pretty obvious way to discourage interacting with the community for help.

Note that I'm not saying down votes are necessarily bad, although I do wonder if alternative feedback methods could be useful. I'm saying that one of the more frustrating interactions on the site, in my experience, stems from being penalized and not knowing why; if you down vote, maybe you should have to leave some constructive feedback or enhancement to fix the problem or take some penalty to your own Internet-points reputation score.

For example, I recently had trouble with an intermittent panic when exiting a Go utility and posted to StackOverflow for help. I posted a title that succinctly summarized the issue. I posted the panic message. I posted the function definition. The panic had a line number from the definition that seemed to trigger the intermittent error; I posted the specific "line X is..." followed by the line of code so there was no question what snippet triggered the panic. I tagged it with appropriate tags. There were a couple of comments, and I posted a link to another question citing some code to explain (justify?) why I implemented the function call the way I did. What happened?
I took two down votes of penalty to my reputation.

In the comments I asked if the down voters could explain what I could do to improve the question for future reference. After all, SO may be for answering questions related to your immediate problems, but it's also supposed to be of use to future questioners looking to solve similar problems. Last time I checked no one explained why they did it.

The nearest I got to helpful feedback on the down votes was from one of the helpful people who submitted an answer to my question; that person speculated that it was because I had not RTFM'd to the satisfaction of some of the other users since the problematic line was in the panic and the source code for a function call used in my definition shows it probably didn't like a nil context parameter.

So as a relatively insecure beginner, I crafted a question with lots of context, source code, and clarification, only to get dinged with damage (negative reputation) by anonymous clicks from people who couldn't leave a reason why or offer feedback on improving the reference value of the question.

It shouldn't be difficult to understand why this would be discouraging to some people, especially when the goal (I thought) was to build a useful reference for many people, not (possibly) penalize someone for not meeting some arbitrary criteria for having passed a bar of RTFM to be blessed with community membership in order to be assisted without a passive aggressive backhand.

I don't count myself as a detractor of StackOverflow. I have found help from members of their community to be invaluable. I do wonder if some of the feedback mechanisms sometimes encourages certain behaviors that deter less experienced and less thick-skinned programmers from interacting while enabling programmers with the "rock star" or "ninja brogrammer" mindset to set a less friendly tone. There comes a point where it's less commiserating and sharing with a community and more a necessary chore to solve a problem, and I suspect the gray area of that transition is where new users begin complaining about the tone of the site.

Posted with Blogsy

Friday, November 3, 2017

Turning 40

I turned 40 this week.

Four decades. I remember there was a time I thought I'd grow up to "die alone as a hermit in the woods." I remember thinking maybe working as a programmer for Microsoft would be interesting. There was a time I thought I might become a marine biologist, specifically an ichthyologist, and study sharks. Later on I even flirted with the notion of working to become a successful author.

Today I'm not working for Microsoft. I don't live in the woods, although the town I reside in is rapidly withering economically and some might argue our tiny dot on the map is not far removed from being woodland. I don't even own diving equipment and am nowhere near the ocean (although we do live on a river that ends in the ocean, if you want to travel a few hundred miles.) The closest I've come to becoming an author was finishing and editing exactly one manuscript.

I'm pretty sure, at this point, that I have depression issues. I know it's more common today for people to talk about depression. For some people it is dismissed as an excuse of the week, or they brush it off as a "feeling blue" thing that you can exercise away or "just cheer up" to move past; "Just cheer up!" they say, totally ignoring that clinical depression is a thing.

While this little shadow has always been lingering to some degree in the back of my mind, I've had some things really raise that shadow higher in prominence in the past few years. It would take chapters of a book to cover details, but the highlight reel would include attempts by my wife's employer to eliminate her from her job using what could be (in my view, as this is my opinion) charitably be labeled slanderous accusations. That was a year-long ordeal that took a huge emotional and financial toll on the family.

After that drawn out mess, things finally felt like they were turning around. There was a light at the end of the tunnel! Unfortunately, it was a train's headlamp.

The employer I had come to rely on for emotional and financial support decided to terminate my contract, which is a nice way of saying I was sent home with a box of my belongings. Now it was my turn to plunge into a world of uncertainty, doubt, and the five stages of grief. I was blindsided and even the act of getting out of bed felt like fighting a dark shroud squeezing the life out of me.

Worse yet, if you feel like taking a moral stance and voicing support for teachers in the never ending fight over contracts, even if your family has been working in public education for decades, even if you do this by pointing out actual evidence straight from the faces of the people you feel are in the wrong, you might want to think twice if this takes place in a town that is turning into the economic equivalent of a mummy and you might have to return and look for a job. I made some statements that gained some traction among certain circles here; at the time I felt secure in the idea that my employment was secure in the land of gummy bears and unicorns. The reversal of fortune played right into the hands of depression's self doubt and uncertainty, whispering that "they" are laughing at my incompetence as I searched for job openings in a town propped up by Wal-Mart, McDonalds, a hospital system and the public education system whose administration and board are not pleased with you for writing something that was popular for a couple days among their staff.

I also experienced firsthand the silence from most of the people I had taken for granted as friends and associates from what I eventually came to regard as my "previous life."

These were two major events. I was already dealing with issues and stresses that many others have to deal with in life. These two major events just fanned the depression flames.

Now we have a national problem; we became a Trumpster fire nation. Every day came a new display of ignorance and people taking pride in how terrible they can be. I don't feel that there's much to act as a counterbalance against the papercuts of negativity he and his followers display.

It's been a long, stressful, painful period of time.

It's also been nearly a year since I started my new job, which gave me some sense of self worth again. Slowly it helped build up some sense of validation that I'm not worthless. I'm not sure if that makes sense or if I'm laying another misplaced sense of power into the hands of something in which I shouldn't emotionally invest. But for now it's there and helping me.

My family has been supportive during this emotional roller coaster, or tried to be. I don't think I quite acknowledge the good they do as much as I focus on negative things that families deal with. That's a side effect of both depression and Aspergian brain wiring, I think. Given the reflection hitting four decades of sentience has triggered, I think I need to continue trying to improve on that behavior.

All of these things have combined into a hazy mire that congealed into a cloud around me, affecting my worldview and keeping me in a perpetual weariness. I thought my birthday, despite being a magic number (I love the number 4, and 10 is a binary number as well as the number of digits on my hands and the number of digits on my feet, and is even, and possesses several other attributes that lend an irrational appreciation in my mind), would be yet another quiet passage marked by some cards and well wishes and soon forgotten. It was even on a Wednesday, my least favorite day for events to occur.

Usually the big booster in looking forward to my birthday is that it is preceded by Halloween. I love the idea of Halloween; the image of trick or treat, costume parties, awesome DIY costumes, parades, and horror movies are so much fun for me. But this year was different; the Friday before my birthday brought an announcement that indictments were coming against Trumpster acquaintances! After an anticipation-filled weekend, Monday had people brought in to testify, and we discovered one of his campaign associates had already pled guilty to lying to police and was cooperating with investigators!

We went out for dinner on my birthday with my in-laws and parents. One of the TV's played MSNBC's coverage of Trump's Russian connections and the mounting investigations. I was giddy.

My birthday was also marked by the Daily Show having an interview with Hillary Clinton. I don't know why that made me happy...I guess because she's the symbol of everything "I told you so" during the Presidential election.

These were things that worked to fight the shroud of depression whispering in my ear, and were totally counter to the idea that my 40th birthday would be quiet. These were things that were happy events for me.

There were other, not so happy events that marked the birthday-time. Unexpected shocks like the guy who rented a truck and ran over bike riders in downtown Manhattan. Because he wasn't white, it was labeled as an act of terrorism, unlike the recent Vegas shooting of around 600 people by a white guy where the fallout is basically several people going bankrupt from medical bills and modifications the shooter made to his guns staying perfectly legal and Congress clutching pearls at the idea that nothing can prevent these things from happening.

Yet another shocking event involved layoffs at a previous employer. I discovered it as oddly worded and vague tweets began floating along my Twitter timeline; today there was a Techcrunch article giving conflicting details of what had happened. In the end I could only confirm that a relatively large number of people were let go, some of whom I knew and had worked with so it wasn't just trimming the newest of hires. In keeping with the "Me me me!" theme, this news caused me to revisit all the thoughts of despair and hopelessness that I felt as my wife drove me home from the apartment after I was told my time there had ended. I empathized with what must be a swirl of confusion and fear that these people now feel. I also watched as people who escaped the cutting block echoed their support for one another and words of sadness to their departed colleagues. Selfishly I felt like the bandage was ripped off an old wound.

I turned 40 this week.

Nothing I thought was going to happen as a teen happened. Getting older shifted into a pattern where almost every day blended into the next; mostly unremarkable, smeared with a veneer of depression and frustration, life is mostly a comfortable pattern of routine. I expected it to be yet another average day, but this birthday was marked with some surprises. Some good. Some bad. But one thing this birthday wasn't is uneventful.

Wednesday, October 18, 2017

Reflection on Coding

There's a subject I've been thinking about lately. I suppose it's more of a feeling than a topic; I'm not even sure how to put it into words.

I have a vague feeling that I've discussed it before, too. In some form. On the other hand, maybe writing about it will help get it out of my head.

The best I've managed to do to express this feeling is to frame it as "elegant beauty," or a kind of beauty that comes from expression through the logic of programming.

It's not that this is an entirely new concept. I've often read descriptions of Ruby as poetic, and there are other works that try examining questions like whether programming is more art than science, or whether programming is poetry.

Perhaps part of this is my own brains weird wiring. I sometimes have trouble understanding poetry; good poetry can "work" on so many levels. Clever word use, double entendre, use of linguistic beats to emphasize points, references to other events and works, parallels to other art forms...I'm sure my wife, an English major, is able to expound on (and expand) the topic far more than I.

Programming adds yet another dimension: it is functional. It takes a language, with its own unique grammar and syntax, and processes input into something else. It's an expression of formulas through rules. If you get the syntax wrong, your work won't compile into a finished product. Programming is notoriously unforgiving when straying from the language rules.

And yet programs that take a set of input and produce the same output can still have so much variety!

I suppose a simple example can use the infamous FizzBuzz program. It's a staple of many a coding interview; relatively simple, it has, over time, become almost cliche (and in some circles, despised, depending on the blogs you read and the type of programmer bemoaning how demeaning it is to be asked to demonstrate it...)

The rules are simple; usually some variant of, "Count from 1 to 100, and if a number is divisible by 3, print "Fizz." If it is divisible by 5, print "Buzz". If it is divisible by 3 and 5, print "FizzBuzz." Otherwise, print the number.

The simplest and most crude way to program this is to literally lay out a program that counts from 0 to 100 and use if statements to output Fizz, Buzz, and FizzBuzz in the appropriate places. It would achieve the goal of the rules, but be highly inefficient and inflexible.

The next step up might be something like this:

// FizzBuzz
package main

import (
 "fmt"
 "strconv"
)

func main() {

 // Create a loop to count 1 to 100
 for i := 1; i <= 100; i++ {

  // Create a string variable that gets reinitialized each iteration
  var strOutput string
  strOutput = ""

  // Fizz on 3
  if i%3 == 0 {
   strOutput = strOutput + "Fizz"
  }
  // Buzz on 5
  if i%5 == 0 {
   strOutput = strOutput + "Buzz"
  }
  // Otherwise, output the number
  if strOutput == "" {
   strOutput = strconv.Itoa(i)
  }
  // Print the result
  fmt.Println(strOutput)
 }

}

If you know modulo, FizzBuzz is a pretty straightforward logic problem. But what if you didn't know about that piece of math?

// fizzbuzz-simple.go
package main

import (
 "fmt"
 "strconv"
)

func main() {

 for a := 1; a <= 100; a++ {

  var strOutput string = ""

  intTmp := a / 3
  if intTmp*3 == a {
   strOutput = "Fizz"
  }

  intTmp = a / 5
  if intTmp*5 == a {
   strOutput = strOutput + "Buzz"
  }

  if strOutput == "" {
   strOutput = strconv.Itoa(a)
  }

  fmt.Println(strOutput)
 }

}

This is probably a little slower...to be honest, I'm not sure if the compiler would optimize this into similar binary algorithms. But the end result is still the same.

The first issue I'd have with the basic implementation is that it's not very modular. It might be better to use a function to determine the fizzing and the buzzing.

// fizzbuzz-func.go
package main

import (
 "fmt"
 "strconv"
)

func main() {

 // Create a loop to count 1 to 100
 for i := 1; i <= 100; i++ {

  // Fizz on 3
  strOutput := CheckMod(i, 3, "Fizz")

  // Buzz on 5
  strOutput = strOutput + CheckMod(i, 5, "Buzz")

  // Otherwise, output the number
  if strOutput == "" {
   strOutput = strconv.Itoa(i)
  }

  // Print the result
  fmt.Println(strOutput)
 }

}

func CheckMod(intCount int, intCheck int, strLabel string) string {

 if intCount%intCheck == 0 {
  return strLabel
 } else {
  return ""
 }

}

This version includes a simple CheckMod() function that can be called to see if the remainder when divided by a supplied integer should get a label; now it takes minimal editing to change the numbers for which Fizz, Buzz, or FizzBuzz are used as output!

And, of course, this still has the same output as the previous versions.

But what if we don't want to keep modifying the source code to alter the Fizz and Buzz triggers? That's simple too.

// fizzbuzz-func-flags.go
package main

import (
 "flag"
 "fmt"
 "strconv"
)

func main() {

 intCountTo := flag.Int("countto", 100, "Count from 1 to this number")
 intFirstNum := flag.Int("firstnum", 3, "First number to label")
 strFirstLabel := flag.String("firstlabel", "Fizz", "First label to substitute")
 intSecondNum := flag.Int("secondnum", 5, "Second number to label")
 strSecondLabel := flag.String("secondlabel", "Buzz", "Second label to substitute")
 flag.Parse()

 // Create a loop to count 1 to x
 for i := 1; i <= *intCountTo; i++ {

  // Fizz on y
  strOutput := CheckMod(i, *intFirstNum, *strFirstLabel)

  // Buzz on z
  strOutput = strOutput + CheckMod(i, *intSecondNum, *strSecondLabel)

  // Otherwise, output the number
  if strOutput == "" {
   strOutput = strconv.Itoa(i)
  }

  // Print the result
  fmt.Println(strOutput)
 }

}

func CheckMod(intCount int, intCheck int, strLabel string) string {

 if intCount%intCheck == 0 {
  return strLabel
 } else {
  return ""
 }

}

Now there are command line flags that designate the Fizz and the Buzz (as well as possible new labels for Fizz and Buzz) and the number to count to!

Because there are defaults added in to the flag variables, the default version of this...with no flags set at the command line...will have identical output to the previous applications.

This version added quite a bit of flexibility to the program, and that flexibility is accessible from the command line by the end user. There is another problem, though; if you intend for an end user to use this application, there should be some sanity checking for the things they can change.

// fizzbuzz-func-flags-errcheck.go
package main

import (
 "flag"
 "fmt"
 "os"
 "strconv"
)

// A struct of flags
type stctFlags struct {
 intCountTo     *int
 intFirstNum    *int
 strFirstLabel  *string
 intSecondNum   *int
 strSecondLabel *string
}

func main() {

 var strctFlags stctFlags

 strctFlags.intCountTo = flag.Int("countto", 100, "Count from 1 to this number")
 strctFlags.intFirstNum = flag.Int("firstnum", 3, "First number to label")
 strctFlags.strFirstLabel = flag.String("firstlabel", "Fizz", "First label to substitute")
 strctFlags.intSecondNum = flag.Int("secondnum", 5, "Second number to label")
 strctFlags.strSecondLabel = flag.String("secondlabel", "Buzz", "Second label to substitute")
 flag.Parse()

 EvalFlags(&strctFlags)

 // Create a loop to count 1 to 100
 for i := 1; i <= *strctFlags.intCountTo; i++ {

  // Fizz on 3
  strOutput := CheckMod(i, *strctFlags.intFirstNum, *strctFlags.strFirstLabel)

  // Buzz on 5
  strOutput = strOutput + CheckMod(i, *strctFlags.intSecondNum, *strctFlags.strSecondLabel)

  // Otherwise, output the number
  if strOutput == "" {
   strOutput = strconv.Itoa(i)
  }

  // Print the result
  fmt.Println(strOutput)
 }

}

func EvalFlags(strctFlags *stctFlags) {

 if *strctFlags.intCountTo <= 0 {

  fmt.Println("-countto must be greater than 0")
  os.Exit(1)
 }

 if *strctFlags.intFirstNum <= 0 {

  fmt.Println("-firstnum must be greater than 0")
  os.Exit(1)
 }

 if *strctFlags.strFirstLabel == "" {

  fmt.Println("-firstlabel must have a text label")
  os.Exit(1)
 }

 if *strctFlags.intSecondNum <= 0 {

  fmt.Println("-secondnum must be greater than 0")
  os.Exit(1)
 }

 if *strctFlags.strSecondLabel == "" {

  fmt.Println("-secondlabel must have a text label")
  os.Exit(1)
 }

 // Done
 return
}

func CheckMod(intCount int, intCheck int, strLabel string) string {

 if intCount%intCheck == 0 {
  return strLabel
 } else {
  return ""
 }

}

Now the application checks for things like labels being set to some kind of string and not an empty string, and all the numbers are set to something greater than 0. Basic error checking.

And once again...the output, by default, will match the output of the previous programs!

These are all rather straightforward. It doesn't really take advantage of features specific to Go, like channels (Here is the link to the Go playground implementation from Russ Cox, reproduced here:)

package main

import "fmt"

func main() {
 c := generate()
 c = filter(c, 3, "Fizz")
 c = filter(c, 5, "Buzz")
 for i := 1; i <= 100; i++ {
  if s := <-c; s != "" {
   fmt.Println(s)
  } else {
   fmt.Println(i)
  }
 }
}

func generate() <-chan string {
 c := make(chan string)
 go func() {
  for {
   c <- ""
  }
 }()
 return c
}

func filter(c <-chan string, n int, label string) <-chan string {
 out := make(chan string)
 go func() {
  for {
   for i := 0; i < n-1; i++ {
    out <- <-c
   }
   out <- <-c + label
  }
 }()
 return out
}

I should note that I created a blog past blog post that explored the channels implementation above...

The simple Fizz Buzz test in the forms above have the same output, but it's accomplished in many ways. I'm sure there are people who would be able to send variations that also have the same end result using a different algorithmic logic; logical, and possessing a strict set of rules that must conform to the expectations of the compiler, but still arriving to the same destination through different means.

To understand the source code means twisting your brain into understanding how the programmer responsible for the source code thinks and expresses his or her way of thinking against those rules of the programming language's grammar and syntax.

The examples above are a peek into some of the evolution in my own thinking about how to program a task, how my own thinking in Go had gradually focused on aspects to increase maintainability and flexibility while accomplishing a goal. I wonder if this is the kind of evolution that is looked for by interviewers for programming jobs...although that's a dangerous thought, considering that the expectation for defining the rungs of skill on that ladder of skill could be dangerously arbitrary.

I'm still refining my methods of modeling tasks when programming. I'm changing workflows, how I comment, and what I comment. I still occasionally reel back, perplexed, when seeing some samples of other people's code and have no idea why...or how...they thought the problem through the way they did.

Each sample I write or read is a reflection of the person who wrote it.

Sometimes I wonder what my own reflects about me.

Sunday, September 24, 2017

One Example of How To Tell When Obligations are Wastes of Time

(Disclaimer: everything here is my own opinion. I ordinarily shouldn't have to mention this, but there are times where mentioning certain catalysts for thoughts tends to make those catalysts angry and definitely not do things that resemble acts of retaliation against people who aren't me but are related in some way to me. This isn't even about them. But I feel I have to explicitly state that because sometimes the catalysts may not be very good at reading comprehension.)

The local paper recently had an article announcing that a local school superintendent was awarded the highest rating for his performance. I personally wasn't too surprised given that during negotiations over teacher contracts, questions for his opinion on the matter were something to the effect of, "I serve at the will of the board."

But I wouldn't speak ill of the school board. When criticized, things happen that are definitely not retaliation against people who are related to me in their district. And this isn't about the board. It's about the obligations that make boards...or any regulated body...look like they're doing work when really it's a waste of time and opportunity to rubber stamp their own work (or use it as an excuse to get rid of someone that displeases the regulated body in some way).

The news report had said that there were four performance ranks that could be given: distinguished, proficient, needs improvement or unsatisfactory. Having a set of scores aggregated in sets of one to four isn't necessarily bad...even Netflix now has a rating system based on a 1 or 2, which of course eliminates all the nuance of "The movie didn't make me want to throw up, but I definitely wouldn't want to watch it again" and instead reduces the viewing experience to "I LOVED THIS FILM" or "This film is so terrible that it will become a niche cult classic in 10 years when the latest group of self-appointed film buffs rediscovers it and nit picks the flaws into virtues."

What areas were evaluated? "Professionalism, human resource management, district operations and financial management, student growth and achievement, organizational leadership, and communication and community relations."

A key question to ask is, how are these evaluated? These are standards. It's spelled out in school laws established in "1949 Act 14": "...the employment contract for a district superintendent or assistant district superintendent shall include objective performance standards..."

To figure out if this is actually useful or a waste of time when referring to an obligatory standard, you need to ask yourself against what ruler the standards are measured, and ask how the areas being measured are established.

The article said nothing about the scores other than the board members sat down and filled in sheets that were aggregated and found to be wonderful. Some objectives seemed like they'd be easy to measure, such as "student growth and achievement," something that has plenty of semi-effective rules regulating measures of student test results. Other things are blatantly arbitrary. How do you measure professionalism? You get a minus one each time you show up wearing a clown outfit? Or do you get a minus one for not wearing a tie, a minus two for wearing a "fun" tie, and a minus five for dressing as Pennywise?

Not having standards that can be objectively measured is a strong indication that you're dealing with a feel-good waste of time.

What about what or who establishes the items to be measured? This time around the superintendent had to post a list of what was to be measured on the school website. After some digging around, I found the list. Apparently the list is determined by the person being evaluated, then the board okays it (which again is allowed by the school code...it turns out the "standards" for evaluation are "mutually agreed to").

I won't comment on how weird it is that the first half of the letter to the board is a word for word match to another district's older set of "standards" (although it does make me wonder if those items are actually, as it states, "set forth in the Superintendent's Contract are as follows:"...)

Instead I'll point out statements such as, under "School District Operations and Financial Management", that the "Superintendent shall manage effectively, ensuring completion of activities associated with the annual budget, oversee distribution of resources in support of School District priorities, and direct overall operational activities within the School District."

What does that even mean? Manage effectively meaning, this job is completed? And what is the job? Ensuring the completion of activities related to the budget would basically mean you check in on the person or people in charge of actually creating the budget. Oversee resources being allocated to District priorities means what, if not making sure money goes into proper budgets and books go to the right classes? And directing overall operational activities means he's in charge of the district which, oddly enough, is what a superintendent DOES.

This whole paragraph sounds like he's being evaluated on whether he actually does his job. And I also noticed there's no actual gauge by which to measure it. The measure is arbitrary.

There's a section called Organizational Leadership, under which it states, "Superintendent shall work collaboratively with the Board to develop a vision for the School District, display an ability to identify and rectify problems affecting the School District, work collaboratively with School District administration to ensure best practices for instruction, supervision, curriculum development, and management are being utilized, and work to influence the climate and culture of the School District."

What does that mean? The superintendent will work with the board to establish a vision for the district, which under ordinary conditions would make sense, except when he clearly said during contract negotiations that he serves at the will of the board. The translation would therefore imply that either the board is coming up with the vision, or he's going to propose something that the board will vote to pass if they don't want to come up with one.

And "display an ability to identify and rectify problems affecting the School District"? I'd be interested in hearing someone talk about a time when a superintendent talks about the problems of their district. I don't recall hearing something like that from the superintendents of our local districts.

The last part is also vague--influence the climate and culture of the district? I'm not sure there is an objective measure for cultural influence. Most "culture and climate" I've heard regarding the school comes from the community, and much of that is influenced by the public and opinions spread by the school board during contract negotiations...and it's rarely positive. The statement itself doesn't even say he's going to positively or negatively influence the climate and culture. As the "head" of the district serving at the pleasure of the board, he could achieve this objective just by establishing a baseline expectation that when an issue is brought to his attention, the staff knows what they'll expect will probably happen, for better or worse.

And again, this has no objective measure against which to base a standard to score.

That brings me to the next sign you're dealing with a waste of time. The language is flowery, but vague. Stopping to translate paragraphs into actual meaning shows they aren't really meaning much at all once boiled down.

The last part of the letter is supposed to spell out how he is going to meet his objectives. It has items like, "Increase interventions and remediation's (sic) for students who need it most before, during, and after school", and, "Create a long range and comprehensive strategic plan - WILDCAT 2025".

If you thought I'd call this a waste of time, you'd be wrong. The list reads like a checklist, and having a checklist isn't a bad thing. If your goal is to get these things accomplished in the course of the upcoming year, that's great.

If anything were wrong with it, it's that this is a checklist in the context of a subjective set of standards by which to measure the performance of the person in charge of the district. If you judge a sports player and his or her checklist includes an item to improve the distance he or she throws the ball, that's great. But how much? 10% farther? 10 feet farther? Does he or she get points based on how many feet they improve the throw, or the quality of the throw by combining the distance with accuracy?

So why go through the effort of publishing a story about a superintendent being rated insanely great by the board that hired him in the first place and spent the past year "serving at the will of the board?" It's entirely a matter of speculation, and I can't engage in speculation because that could lead to definitely not retaliation. And this evaluation is just one more example of something mandated by the state that probably started with good intentions and mutated into a pathetic waste of time as it bounced around various fingers before becoming part of the law. But it's important to be able to apply critical thinking and differentiate when something reaching the public is worthwhile news and when something is little more than a waste of time.

Monday, September 4, 2017

Formatting Woes

Am I the only one that uses Blogger and keeps discovering formatting issues?

I take time to review my posts. I preview them. I lay out the fonts and paragraphs to include spacing that breaks up the sections for increased readability. I wrap text around graphics and use captions for text specific to that image.

It seems like no matter how much time I put into carefully laying out the format of the page, at some point I view the post as a regular user and...WHY IS THE SPACING GOOFED UP?

Is it Blogger? Is it a side effect of the template used for the layout? Certain fonts used?

I don't know.

I just know that it's incredibly frustrating.

It seems odd that even when I preview a post, adjust spacing, and finally post, the end result is still..."off".

I have several subjects to write about. Periodically noticing screwed up posts led me to write this up first.

As I type this I'm still using blogger...but I'm tempted to try another platform. Maybe someday I'll shift everything to another site and if I do, it will simply not make sense because formatting actually looks sane.

On the other hand, moving to a sane site...different template?...may cause the adjustments I tried using to "fix" errors to actually make things weird in some other way.

Maybe time will tell.

Tuesday, August 1, 2017

More Tuning Golang Apps for High Concurrency Tasks on Linux

I have a project that is fairly straightforward. Again it's work related, so I have to fuzz some details, and again my memory is naturally fuzzy so I doubt it's an issue.

Background

This program I've been working on makes calls to a service (a REST endpoint) that in turn pulls data from a database, then my application parses that information into components and checks the disk to see if the file already exists. If it doesn't exist on disk already, the program makes a call to the API endpoint again asking for specific record information and writes it to the disk. In the end I get a huge set of files sorted in a structure resembling ./files/year/month/day/datatype/subtype/filename.txt.

There are literally millions of records to sort through. A single thread handling this would probably take weeks. Therefore, the program uses several (configurable!) goroutines to pull records simultaneously.

First Problem: Too Many Open Files

I wrote about this fix earlier in the blog, but I'll give a quick recap.

At first everything seemed fine. I have a simple bit of math being output periodically to the console, and it was chugging along at around 20,000 records/minute. The system was functioning fine, no errors were showing up. All was right with the world.

Then a few hours later a few alerts arrived in the email. At this point the utility was running on its own instance with its own storage and was making calls to a load balancer that held a few endpoint servers behind it. The only change to the system that could possibly prevent the system from making connections was the utility I was running, so I killed it, and when the API servers were checked there were still 18,000+ network connections in TIME_WAIT on each system.

Linux systems treat files on the disk as well as sockets as "open files" due to the way Linux handles file handles. Too Many Open Files can mean literally too many files are open or it can mean too many network connections are open, but it usually is a combination of the two.

Research time. The problem here is usually related to "you didn't close the connections." That wasn't the cause here. The calls were straightforward; I had a function that created a transport, created a client, made the connection, and called GET then read the data to return to the caller. It was a textbook example fragment of Go adapted to my purposes, and that included a defer Close() call so when the function exited it should make really sure everything was closed properly.

Check the "did you close the connection" off the list. And I also read the data from the socket before closing it, so that can be checked off the list. I had a hacked together bit of logic to retry connections if there was an error, but it also printed that status to the console when that happened. Nothing appeared as the too many open files errors popped up, so even if that caused a socket leak, it wasn't the likely cause.

The issue was the call to instantiate a transport each time the function was called. Transports hold the pool of client connections; the system should be re-using connections. Because the transport was destroyed each time the function returned, it was creating a new pool of connections, which meant new sets of client connections to the server instead of recycling previous connections and that led to thousands of "open files".

The solution was to create the transport and pass it as a parameter to the call to GET the web endpoint. This allowed the transport to continue to manage the client pool outside the scope of the function call, and that allowed the system to keep a managed pool of connections for re-use.

This wouldn't have shown up if I were making periodic, occasional calls to different websites every few minutes. The problem would still be there, but chances are the connections would eventually close and time out before piling up and becoming a problem.

Too Many Files Leads to Terrible Times

There are a few things that are obvious in affecting the speed information is being processed, and at the risk of sounding immodest, I've been told I'm pretty good at spotting the obvious.

Warning: I'm not a Go expert. I'm citing information here that is just my current understanding, so if I'm wrong, please correct me in the comments.

Because I'm writing files, the drive can definitely affect performance. I have multiple processes that could be trying multiple disk operations in parallel at a given time. To that end, disk seek times, write times, and cache can directly impact the utility's speed.

I'm dealing with millions of files. During the initial testing and design of the utility, I had to deal with a file that would unzip into a directory holding around 100,000 files; then I had to deal with several of those 100K-file containing directories for processing. If you haven't tried that on a Macintosh using the HFS+ filesystem, it's not fun. EXT4 doesn't really handle it well either. Even on an SSD, getting a directory listing is downright painful. Too many files in one directory is difficult for some filesystems to handle.

One solution is to split the directory into more subdirectories, reducing the number of entries the system has to track per directory. This is in fact the solution I used, splitting information into logical subsets.

Timing Out Connections

Another fun fact I learned during this project; by default, the Golang client doesn't have timeouts set. This leads to some fun havoc with stray connections left in weird states that if you're using them to hit random sites in a semi-random fashion, you'd probably never notice. Hammer the same site with hundreds of requests per second, and you can bet this can have some ramifications.

I read about this in a blog post warning against using the default settings in http.Client. After reviewing that information, I went back to the source code and added some timeouts, like so:

 tr := &http.Transport{
  Dial: (&net.Dialer{
   Timeout: 30 * time.Second,
  }).Dial,
  TLSHandshakeTimeout: 30 * time.Second,
 }

 client := &http.Client{
  Transport: tr,
  Timeout:   time.Second * 10,
 }

This is a modification I made to the most intensively-used connection set; I didn't move the transport's scope for a far less-used connection in another function, figuring that yes, they would pile up to a degree, but they should properly close and age out as closed connections. This set will hammer the server with thousands of connections in parallel.

This basically added some sane timeouts to functions that previously did not have any timeouts. This helped noticeably reduce my ghost connections disappear.

Remove a Hindrance, Create a New One

The initial run finished a few days later. I realized that there was a bug in my loop logic. There were some bad words uttered and an updated version compiled.

At this point we also moved the utility, and the volume to which data was being saved, to the same system that held the API endpoint server. Basically the server being queried for information was now also hosting the client requesting and processing results from the API queries.

This eliminated what before was creating a kind of natural bottleneck that throttled performance; hundreds of connections per second simultaneously hitting the server but separated by the network transit time. Sure, it was on the scale of tens of milliseconds (if things were working well), but it really added up.

Now the client was requesting it from the localhost. *Bam*. Within a few moments, the number of open connections (using netstat |wc -l, since I only needed a rough estimate) ballooned to 40,000 connections before this appeared on the console:

dial tcp <ip address redacted>: can't assign requested address

Because dial was in the error, it was most likely the client causing the issue. After some poking around, I ended up making two more changes.

First, I tried to make a change to the number of idle connections the client keeps open. The default is two; more than that, and the client was closing the connections in the idle pool instead of making more efficient use of re-using the clients. Again, working with random connections aren't so bad, but hammering the same IP will highlight the need to alter this (and you probably don't want to change this if you're not making a large number of frequent calls to the same host):

 tr := &http.Transport{
  Dial: (&net.Dialer{
   Timeout: 30 * time.Second,
  }).Dial,
  TLSHandshakeTimeout: 30 * time.Second,
  MaxIdleConnsPerHost: intIdleConns,
 }

The changed setting is MaxIdleConnsPerHost in the transport. Here I set it to a variable that in turn is set from the command line so I could tune it at runtime, but instead of the default 2 I set it closer to 400.

The next change was an alteration on the host server. There is some guidance on a SO question explaining some tuning tweaks, but the gist of the change I made is this...

When the TCP connection is made, the connection is made to an ephemeral port. When I have a ton of tcp connections hitting the server, it would starve the number of ephemeral ports available. The next step was to try increasing the number of ports available, and then the server could support more connections per second, hopefully at a level where the connections would close and age out properly before overloading the system.

In this case, I changed net.ipv4.ip_local_port_range from "32768 61000" to "9000 64500". From the SO question, this means I changed the connectivity from (61000-32768)/60 = 470 sockets/second to (64500-9000)/60 = 925 sockets/second.

There was another change I could make from the page that involved changing the net.ipv4.tcp_fin_timeout setting, along with a couple of others. I avoided that, opting instead to test these changes because the tuning advice was more like "change this on the client" or "change this on the server", not really geared to a situation where the server and client were eating resources on the same host. Making minimal changes to keep it working, for this project, would be fine.

I ran netstat in a loop while the application ran again. This time the open connections quickly climbed to 70,000 connections before leveling out, and it held steady. After 15 hours of elapsed runtime, it had 3 connection errors show up. Otherwise it kept up with the load just fine.

I should also mention that I ran 4 parallel processing tasks, one for each core. When I boosted that number it seemed to be a hindrance to the processing speed; keeping it at 4, the estimated processing speed was over 100K records/minute, easily holding sustained bursts 5 or 6 times the processing speed when the client was on a separate machine.

This Was a Minimal Set of Changes

There were a number of lessons learned so far; above the basic novice checking that connections are properly read from a network client response before calling Close(), be aware that the transport is what controls the pool of connection clients for efficient re-use.

Next, be aware that by default timeouts are missing from the transport and client. Add them.

Also if you're hitting a particular server or set of servers with requests, change your MaxIdleConnsPerHost. Otherwise you're wasting connection use.

Last, an easy way to boost connection rates is to increase the number of ephemeral ports available. There are limits to this...and you don't want to starve other resources by taking away those ports from other clients or servers on the host.

There are plenty of other changes that can be made to increase horsepower of your servers. Some additional changes are in the SO question I linked to; another good blog post discusses how MigratoryData scaled servers to 12 million concurrent connections. I'd only caution that not every task requires this kind of engineering and you might want to exercise restraint in changing things when a few tweaks can accomplish decent performance for your use case.

Performance is a scale. Some things can be overcome with throwing lots of hardware at it. Sometimes a few tweaks will make your app run 5 or 6 times faster.

Happy tuning!