Friday, December 22, 2017

Golang Web Server: Don't Do This

I still consider myself new to programming. The new job allows me to create a lot of small system tools using Go mostly for augmenting monitoring and create utilities to replace manual API calls using JQ and CURL with single executables created in Go. It's been a wonderful learning experience.

Sometimes I try to add some new features to utilities that are snazzy but also a bit of an experiment.

This is a bit of reflection on the design I originally used and I am not in a mood to pull out layers of source code to show what I had done, especially if no one is asking for it. But I will describe the basic design in an effort to not only avoid implementing it that way again but to warn others not to make the same design pattern mistake.

The utility is mainly a long-running process that is interrogating one of our services for database information. It gets raw data from the database, pulls some stats like record size and type, and tallies the information. Millions and millions of records.

What if, I thought, I provided a peek into what the state of the tallying is beyond what I already had showing? It would output a count of some basic information as a one-liner every thirty seconds to the console, but that wasn't good enough. I thought, why not create a web interface that would output a simple text page of information?

Go loves channels. And I had several "worker goroutines" that handled specific tasks in the tally program, passing messages to a coordination process that serialized scheduling record analysis, directing results, and monitoring the state of various workers. Breaking them up made things pretty fast once I stuck in a few tweaks here and there.

Adding a web server routine wasn't hard. Then I thought, I could just add a couple of channels to plug them into routines that held statistics.

Here's where I made what later turned into a mistake.

Instead of individual handlers, I created a single handler that took message strings via channels. The messages consisted of a random ID and a type, where the type was the page request.

The reader on the other side of the channel split the message, used a select{} to determine which page it should construct, and returned through another channel the page with that ID string prepended. The receiver on the other side would look for the message and see if the ID belonged to its request. If it wasn't the proper ID, it just re-fed it to the channel, hoping that the right recipient would pick it up later, and the next message in the channel was intended for that particular reader. Line by line the page was fed back down the channel, with the ID attached to each message, until the ID was attached to a message: "END OF PAGE", at which point the page was done and connection closed.

Don't do that.

The thing is, this seemed to work. I opened a web browser, opened the page, and it worked. I could request the different pages and it worked just fine.

It worked until one page got kind of big and I opened two web pages to the server. Something seemed to get "stuck." One of my statuses gave a snapshot of the fill state of some channels and I noticed some of the web-related channels were...throbbing? Growing huge and slipping down, as if revving up with more lines of messages than should possibly be needed. Something was getting misdirected and the lightweight speed of goroutines meant it was flooding channels with useless information.

No problem, I thought. I'll add a third field, a counter, which once it reached a certain level would simply discard the message. The web page was meant to be read by a person who was trying to get some stats on the status of this utility while it was running, not the general public...refresh the page, hopefully you'll get a working reply that time. Sloppy, but might work.

Tested again. It seemed to keep the channels from getting as clogged up, but I still had some kind of crosstalk that when pages grew larger, and it wasn't hard to create some kind of denial of service from the web server when two different pages were opened. It almost seemed as if sometimes the two pages got completely confused which tab was supposed to get what page.

Maybe it was too easy to get messages mixed up because pages were feeding line by line. I went through the page composition and instead of feeding each line through, I had the process create one big string and feed the result.

This cut down on responsiveness but increased reliability. Kind of. It was significant, but not enough to be proud of. If anyone tried pulling a web page from the utility while someone else used it there was a non-zero chance it would get a weirdly formatted page, if not a timeout.

After finishing some work on other utilities, I decided to refactor the 4 web pages into their own handlers with separate functions and move some of the information being read into global structs with mutex's for protection. Before making the change I ran a test with Bombardier, a handing web server throughput tester. The test totally choked on the channel handler architecture.

I refactored, separated out the page composition into individual handlers, and eliminated channels for web page feeding. No more IDs. No more parsing out replies. No more tracking how many times this particular message is making rounds before "expiring" it.

Bombardier hammered away on the server with no issues. Multiple tabs reading different web pages? No problem. The biggest trigger for problems, clicking back or a link to one of the other pages while a large page hadn't finished rendering, was no longer a problem.

What I wanted to do was find a way to read a URL request and use one handler to interpret what the client wanted, so I didn't need a number of individual handlers defined. I'm pretty sure I still could do that, but I think the weakness was in using channels with an associated ID to parse replies back to the client from a dedicated goroutine holding stats.

The solution I ended up using was individual functions that read from a global struct holding the current state of statistics, and this was protected with a lot of locking.

I suppose another way to do it, with channels, would be finding a way to spawn dedicated channels with each request so the replies didn't need parsing or redirecting; a channel with multiple readers has no guarantee of who is going to get the message at what point. This kind of fix seemed needlessly complicated, though.

I suppose I could also have enhanced the global statistics struct to have functions associated with it, so calls could be made that would automatically lock and reply with information requested by callers. The utility is relatively small, though, and I thought that implementing that would have been more complicated than necessary. I'm not sure if this would enhance the speed of the program, though, and may be worth trying for the learning benefit.

But what I definitely now know is not to pass web pages as composed lines with an ID tagged down a shared channel for a reader to parse and decided, "Is this line meant for me? No? Here, back into the channel you go, floating rubber ducky of information, while I read the next ducky...float away!"

Don't do that.