Saturday, January 30, 2016

Golang, FizzBuzz, and Channels (Analyzing an Interesting Implementation)

One: The Setting

I recently ran across an interesting implementation, written by Russ Cox, of FizzBuzz. He said:

Every time I see FizzBuzz mentioned somewhere on the internet I think two things.

1. That's a dumb problem.
2. There's actually a nice Go solution.

I've seen lots of people make the first observation but have never seen anyone make the second. It's basically a tiny prime sieve.

That sounded interesting, so I took a look at it. The code was short and at a glance looked simple, but once I tried tracing what was happening I was completely lost. But this should be simple. I must have been missing something obvious. So I thought I'd try to unravel how it works.

Two: Let's Review, What is FizzBuzz?

FizzBuzz is a relatively simple programming exercise. It comes from a game meant to help kids learn division; it's played by counting one to some number, and when a number is divisible by three, the player says "Fizz" instead of the number. If the number is divisible by five, the player says "Buzz," and if the number is divisible by three and five, the player says "FizzBuzz."

Fizz: What's the Big Deal?

I said that this is a relatively simple programming exercise, but I'm surprised to read that there are a large number of professional programmers who apparently can't implement FizzBuzz. Jeff Atwood even wrote a whole blog post about it, and his post links to several other posts that talk about candidate interviewees that couldn't perform simple programming tasks, which I still have trouble believing given that, in my experience, FizzBuzz is nearly as "standard" a programming exercise as "Hello, World!"

You can, with very little effort, find implementations of FizzBuzz in...I'd not say every language, but just about every language in practical use today. Even if someone who claims to be an experienced programmer hasn't heard of FizzBuzz by name, they should be able to implement some version of it if they know basic math, looping, and string manipulation.

Four: How Do You FizzBuzz?

Problem statement: count from 1 to 100 and if the number is divisible by 3, output "Fizz". If the number is divisible by 5, output "Buzz". If it is divisible by 3 and 5, output "FizzBuzz". Otherwise, output just the number.

A simple implementation in Go looks something like this:

// FizzBuzz
package main

import "fmt"
import "strconv"

func main() {

 // Create a loop to count 1 to 100
 for i := 1; i <= 100; i++ {

  // Create a string variable that gets reinitialized each iteration
  var strOutput string
  strOutput = ""

  // Fizz on 3
  if i%3 == 0 {
   strOutput = strOutput + "Fizz"
  }
  // Buzz on 5
  if i%5 == 0 {
   strOutput = strOutput + "Buzz"
  }
  // Otherwise, output the number
  if strOutput == "" {
   strOutput = strconv.Itoa(i)
  }
  // Print the result
  fmt.Println(strOutput)
 }

}

Of course there are other variations that can achieve the same output, such as using a switch/case block. I used this way because my brain likes using if statements and there were relatively few things to compare, and it also was a simple way to not check for divisible by 3, divisible by 5, and divisible by 3 and 5 separately...this just adds on to a string.

Most solutions will have this basic pattern. A for loop to iterate 1 through 100. A string to hold what will be printed to the console. And some modulo math, using the % or in some languages the "mod" keyword, to divide the current counter value by 3 and 5 and if the remainder is 0 that means that yes, the value is equally divisible by 3 or 5. Because that's what mod is. The remainder. In case you forgot.

Buzz: What Are Channels?


For me, the purpose of checking into this novel method of FizzBuzz was to better understand how channels work, so I won't really get into the nitty gritty details and instead give an overview of what a channel is for.

The Go language has concurrency built into its DNA. Without much effort from the developer, Go will try to take advantage of multiple logical processors; using the keyword "go" will send a function off to do its thing independently scheduled from the other goroutines.

Sometimes you send a process off to do something that doesn't impact other running processes. When that isn't the case, you need to find a way to synchronize data or have the processes communicate with each other (keep in mind that process has a specific technical definition that may not really fit with the specifics of Go's implementation, as pedants will point out that threads, processes, tasks, etc. vary in definition depending on operating system, memory sharing, how and by what mechanism they are scheduled, etc. I'm being generic when I say processes are off doing something. You have a function working on something that may happen asynchronous from your main() function and result in weird stuff happening if you assume order of execution timing...I'll leave it at that.)

In many languages you would have to set up some kind of mutex, or pipe, or socket to get independent processes to talk to each other and pass data or signal each other. Go uses channels. When you see a "<-" in Go source code, that's a channel passing data. That's really the simplest definition...it's a mechanism for passing data, whether it's ints, strings, booleans...among different running goroutines. Google can provide plenty of samples for channel use in Go. I find the problem is more like...it's a simple concept...and simple examples seem easy enough to grasp...but when I looked at the Russ Cox example, it left me scratching my head what exactly was happening.

The only other thing I can offer about channels is that the arrow always points to the left.

Fizz: What is the Novel FizzBuzz Implementation?

Here it is (from the Russ Cox link to Go playground code):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
package main

import "fmt"

func main() {
 c := generate()
 c = filter(c, 3, "Fizz")
 c = filter(c, 5, "Buzz")
 for i := 1; i <= 100; i++ {
  if s := <-c; s != "" {
   fmt.Println(s)
  } else {
   fmt.Println(i)
  }
 }
}

func generate() <-chan string {
 c := make(chan string)
 go func() {
  for {
   c <- ""
  }
 }()
 return c
}

func filter(c <-chan string, n int, label string) <-chan string {
 out := make(chan string)
 go func() {
  for {
   for i := 0; i < n-1; i++ {
    out <- <-c
   }
   out <- <-c + label
  }
 }()
 return out
}

Short. Kind of...simple? Maybe?

Let's see if I can trace what's happening; lines 1 and 3 are standard stuff...this is the main program (package main) and it's importing the fmt library. Line 5 is defining function main().

Line 6 runs generate() and the return value is placed into "c".

Generate() is on line 18; takes no arguments, and returns a channel that passes string values.

Line 19 makes a string channel and assigns it to a locally scoped "c." It then spawns a goroutine on line 20, and that goroutine, called func(), is a infinite loop that just sends an empty string (line 22) down the channel. Then generate returns (line 25.)

Back to line 7. Here channel c is given the return value from filter(c, 3, "Fizz"). Filter is defined on line 28; it takes a string-type channel, integer, and a string for arguments and returns another string channel.

Line 29 creates a string channel called out. Then 30 through 37 defines another goroutine called func(), which defines an infinite loop (lines 31 through 36) running another for loop doing some math and variable passing (lines 32 through 34.)

The initial call to filter() passes the channel (c) created by generate(), pumping a stream of "" strings because of the infinite loop on 21 through 23, along with the integer 3. So the for loop in 32 through 34 would start at 0, then ... I'm not entirely sure what "<- <-" does. A single one passes channel information from whatever is on the right to the left side. So the first iteration (i = 0) sends "" into channel out then increments i to 1. It then does the same thing until i = 2 (because 2 < (3 - 1) makes the condition false and breaks the loop).  That would send 0, 1, and 2 through the loop, break out, then send the string "label" (which was, in that call on line 7, "Fizz") down channel c into channel out.

Only it's not passing 0, 1 and 2. Those are just the iterations of the loop; the channel (<-c, not c, technically) is handling a string value, not an integer. There's no conversions here. And the initial channel is passing "" through it. So each time through that loop for i = 0, 1, and 2, it's just passing "", "", and "" from <-c to <-out.

In other words, for that "filter" call, the recipient will get "", "", "", "Fizz", ...and that's it.

That's the goroutine for lines 31 through 37; once it's spawned, filter() returns the channel "out" to the caller (c on line 7.)

Then it returns to line 8, which is another filter call assigned to channel c, and the call is made with the value of 5 and the label "Buzz". That means this time the for loop would slip through 0, 1, 2, 3, and 4 (because again 4 < (5-1) would be false) before breaking out of the loop and sending "Buzz" through the channel.

Then execution would return to line 9 which is a for loop encompassing lines 9 through 15. It creates a counter i and assigning it the value of 1, then loops until i = 100 by incrementing by one each time.

Line 10 creates variable s and assigns it whatever is pulled next from channel c (so it would be a string, as the channels are all of string types.) If the string is not empty (""), line 11 prints the string sent by the channel. Otherwise line 12 and 13 says to print the value of i.

(Quick note - yes, fmt.Println will convert the integer to a string. The argument it takes is an interface, not a string or int or other specific value. But if line 13 were something like, "fmt.Println("The count is " + i), that would fail from the mixing of types.)

So somehow the channels are acting as a kind of...chain...? Can you chain channels together?

Seven: Can I Trace It?

Go has some tools included that let you enable profiling, which is really kind of neat. When enabled, the pprof tools sample your application periodically, and save the output to a special file that when interpreted by the profiler can create these bubble graphs to show what percent of time and memory are being used by what functions. While helpful in tracking performance issues, this doesn't really tell me the execution path of the application.

I next tried to create a call graph using an included Golang tool. That was a bit of a mistake...it didn't just trace the functions of my immediate source code but of all the libraries the application touched, resulting in what seemed like thousands of bubbles of data points graphed out in a dense spiderweb of interconnections. It...wasn't very useful.

I then tried a tool called godebug. It's really kind of awesome; once downloaded, I just added a breakpoint in my source by inserting the line

_ = "breakpoint"

...into my source code and then running

godebug run ./fizzbuzz-channels.go

...and it would enter a step-through of the source at the breakpoint. The problem was that no matter where I put a breakpoint, execution would end up stepping through lines 6 through 15. If I place a breakpoint in just the generate or filter function, it would stop and step through that function once, but then keep looping through the for loop in main() and never revisit the other functions even though they were evidently doing something.

I wasn't having much luck finding an automatic way to trace the execution path of the program.

Eight: Tinker Time?

I had a hypothesis of how to model the application's execution in my head. There were four main goroutines; main(), the function pumping "" strings from generate through the channel each time something read from the channel, and two filter() functions, one after another either passing nothing more to the chain of strings or tacking on a Fizz or Buzz before passing on the message.

One way to test that would be to add a little extra logic to the code.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
package main

import "fmt"

func main() {
 c := generate()
 c = filter(c, 3, "Fizz")
 c = filter(c, 5, "Buzz")
 for i := 1; i <= 100; i++ {
  if s := <-c; s != "" {
   fmt.Println(s)
  } else {
   fmt.Println(i)
  }
 }
}

func generate() <-chan string {
 c := make(chan string)
 go func() {
  for {
   c <- ""
  }
 }()
 return c
}

func filter(c <-chan string, n int, label string) <-chan string {
 out := make(chan string)
 go func() {
  for {
   for i := 0; i < n-1; i++ {
    if n == 3 {
     out <- <-c + "3check"
    }
    if n == 5 {
     out <- <-c + "5check"
    }
   }
   out <- <-c + label
  }
 }()
 return out
}

The filter function now adds a check for the value of the integer passed to it and tacks on a string of 3check or 5check, depending on which filter() is run. When I first ran FizzBuzz, the initial output looked like this:
1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz

The modified version looked like this:
3check5check
3check5check
Fizz5check
3check5check
3checkBuzz
Fizz5check
3check5check
3check5check
Fizz5check
3checkBuzz

This means that each in each iteration, both of those "filters" are checked, and they both run back to back each time. Generate() sends a "", the first filter checks for a 3, and the second filter checks for the 5 count, and together they assemble a string to slide down the channels to form a ""+filter()1+filter()2 response that is then evaluated by the 1 to 100 for loop in main().

Fizz: Why Aren't Fizz and Buzz, or 3check and 5check, Interleaved at Random?

Goroutines are running independent of each other. Without some sort of queuing or syncing mechanism, output from the goroutines would occur whenever a task is completed. It's very possible to have an application where the output is not the same each run.

But in my test, the output is pretty consistent; 3check always precedes 5check (not to mention the output for the 3- and 5-factor numbers are in the right spot.) That means the two goroutines are chained together, not totally separate, so they're always running in the same order. I suspect that is from the order of the three assignments of "c" in main(). But how to test that?

Let's reverse the order of the assignment. That should change the output to 5check3check each time. Here's the new main():


func main() {
 c := generate()
 c = filter(c, 5, "Buzz")
 c = filter(c, 3, "Fizz")
// c = filter(c, 5, "Buzz")
 for i := 1; i <= 100; i++ {
  if s := <-c; s != "" {
   fmt.Println(s)
  } else {
   fmt.Println(i)
  }
 }
}


The rest of the application is unchanged, and the only thing in main() that was altered was copying the Buzz/5 assignment to c to precede the Fizz/3 check (I commented out the existing line out of habit for easier reversion.) If my theory that the three assignments to c acts as a way to create an ordered chain of channels is correct, 3check5check should become 5check3check. Here's the output:
5check3check
5check3check
5checkFizz
5check3check
Buzz3check
5checkFizz
5check3check
5check3check
5checkFizz
Buzz3check

Bingo!

Buzz: But How Does It Know i Each Time?

I feel like this is a "newbie" issue in trying to trace out what the program is doing. I was staring at the source code, thinking I have main() with a for loop counting i and I have two filter() goroutines that have loops incrementing i to determine if the current counter is at 3 or 5. But how do the goroutines know what the count is to determine if they should use a Fizz or Buzz?

Somewhere in the cobwebbed corners of my brain I recalled reading that goroutines shared memory; could the goroutines see the same variable? Nope. Shared memory or not, they are scoped differently and are invisible to each other. Plus the i's are independently declared (with := ) and that means that they are different variables.

It wasn't until I was rubber-ducking the problem that I realized my mental model for this FizzBuzz solution was wrong. 

Here's what is evaluated down the channel in each iteration:
"" "1" "1"
"" "2" "2"
"" "Fizz" "3"
"" "1" "4"
"" "2" "Buzz"
"" "Fizz" "1"
"" "1" "2"
"" "2" "3"
"" "Fizz" "4"
"" "1" "Buzz"

In other words, I was confused because I had the original basic solution in my head, where you increment the counter by one then figure out if the result is divisible by 3 or 5. This solution just counts to 3 (or 5, via the second filter() goroutine) then starts over again. It's not evaluating numbers 1 through 100.

The output of the numbers 1 through 100 comes from the main() loop. Every time the loop in main() gets a bundle from the channel that doesn't have a string (the Fizz or Buzz) it instead prints the number of the current iteration.

End result...a FizzBuzz program!

Monday, January 18, 2016

Where Am I Rsyncing? (OS X Edition)

This is not about rsync options and using rsync. Rsync is so esoteric that there are tons of Google pixels dedicated to tutorials and examples of how to use rsync on the Internet. A search for the word "rsync" on Google gave me over a million results in .43 seconds. A search for "rsync examples" yielded 473,000 results in .39 seconds.

There's likely little I can add to that unless I want to create a cheatsheet for myself on my own blog.

What I do want to talk about...since it was surprising enough to me that I'm still thinking about it a day later...is an incident where rsync seemed to be running fine, but it was totally $%^& with me to the point where I needed to call in a coworker to rubber duck the issue.

I had two Macintoshes. On MacSource, I had a folder I'll say went to "/Users/myname/files/Town Problems". On MacDestination, I also have a "Users/myname/files/Town Problems" folder.

I used my typical rsync command line that...to my knowledge...had not given me issues. The whole archive and recursion and delete settings with a sprinkle of verbose and progress. It was pretty simple, and since it was a modified version of lines pulled from my quick and dirty sync scripts, it was nothing that should have had trouble.

Only I knew it had trouble when the first few lines scrolling rapidly through my terminal included, I could have sworn, the word "delete."

Huh? Not that many files should be getting deleted on that remote machine, I thought. There should have been two files deleted. Not enough that I should notice it in the rapidly-filled scroll buffer jittering past my eyes.

I secure-shelled into the remote machine and scanned the directory. The files that should have been deleted were still there. And a directory that should have been added, wasn't.

I scanned the command line I used. No typos.

Re-ran rsync with more verbose options. Everything, it swore, checked out and was a match. "Synced!," it seemed to say.

I fiddled with quotes, since there was a space in the name and rsync can be a pain in the arse with properly handling spaces, depending on how (and what) is doing the interpretation of the command on the machines involved. I tried various combinations of escape characters mixed with " and '. Each attempt gave no feedback indicating anything other than a successful sync. I even triple checked the use of a trailing slash on the specification, as common with rsync issues as off-by-one errors in loops for programmers.

"Dammit!"

I called in a coworker to glance at the issue and see if he could see what I was missing. At first, he couldn't. We talked it through as we navigated up and down the directory tree to see if it failed to expand the command line correctly, dumping it into "/Users/myname/files/Town/ instead of Town Problems. "No, that's not it. There's a /Users/myname/files/town directory, but not Town..."

Insert the screeching halt of realization there.

OS X does some really fun things to hide the fact that it's a POSIX UNIX engine hiding behind a pretty interface. And most of the time, it's really good at hiding the UNIX bits to the point of being usable by home users. The rest of the time those tricks create...quirks. And I realized we were running into one of those quirks.

"Dammit!"

diskutil info /

That's a command that gives information about your drives. Here's a couple lines from an example output:

File System Personality:  Case-sensitive Journaled HFS+
Name (User Visible):      Mac OS Extended (Case-sensitive, Journaled)

...See that "Case-sensitive" bit buried in there? Yeah, that's missing from the drives I was using in the sync. Apple recommends, probably due to compatibility issues with certain software, not to use case-sensitive HFS+ on root partitions. The thing is this is easily forgettable because the bash shell hides this when doing autocompletion. For example...

cd temp
mkdir Test
mkdir test
->mkdir: test: File exists
ls
->Test
cd t<tab>
->autocomplete doesn't do anything
cd T<tab>
->autocomplete changes it to "cd Test/"
<backspace the line>
rmdir test
ls
->
mkdir Test
ls
->Test
cd test
pwd
->/Users/myname/temp/test
cd ..
ls
->Test

Autocomplete treats the folders as case sensitive. Other commands do not. Because bash enforced case sensitivity while the OS didn't.

HAHAHA...the whole time rsync was syncing (and had wiped) my "/Users/myname/files/town" folder and I hadn't had issues with my previous syncs because I wasn't using folders that had shared namespace with partial-string-matched folder names with different capitalization.

How to solve the problem at hand? According to the history file, I used this:

rsync --progress -av --delete '/Users/myname/files/Town Problems/' myname@MacDestination:"/Users/myname/files/Town\ Problems"

...and that interpreted the file paths correctly. With a slight modification, I restored my "town" folder from my backup copy, ending my hour of expletive-filled head scratching.

Once again, bitten by the attempt to make things more user-friendly. Dammit!

Wednesday, January 13, 2016

Raspbian (Or Most Linux): Always Mount My External Hard Drive

This is mainly documenting how I set up my Raspberry Pi2 to always mount an external hard disk connected via USB to a specific mount point. It should also work on most recent Linux distros but I don't assume that.

WARNING: playing with hard drive partitions and formatting can destroy data. Don't blame me if you lose information because of a typo.

First, where is the drive located? Usually you'll get the output from dmesg when the drive is connected. In my case, it's /dev/sda.
dmesg |grep sd

Be careful that you've found the correct drive. Linux + root privileges lets you do a lot of damage.

If the drive is completely fresh you'll want to partition and format it.

  • sudo fdisk /dev/sda
  • n - new partition
  • p - primary
  • 1
  • <enter> - default start sector
  • <enter> - default end sector
  • w - write changes to the drive

If the drive had a partition already on it, you can delete it with "d". Using "m" will give you a little guidance on help commands.

Format the drive:
sudo mkfs.ext4 /dev/sda1

Find the UUID of the drive.
sudo blkid

Copy the drive's UUID to the clipboard. Create a mount point (directory) where you want the drive mounted.
sudo mkdir /mnt/mydrive

Alter the fstab file; first make a backup.
cd /etc
sudo cp ./fstab ./fstab.orig

Open the fstab file in your favorite text editor of choice. Add your modification line at the end.
UUID=<uuid> /mnt/mydrive ext4 defaults,noatime 0 2

...where <uuid> is the copied UUID from blkid output.

Reboot. Your drive should be mounted when you log back in.

Saturday, January 9, 2016

Powershell is Being a Pain ("Get-ADUser is not recognized...")

While diagnosing some performance issues on a virtual machine, I decided to rebuild my Windows 7 VM. Reload, update (and update and update), and begin the process of reinstalling applications and tweaking preferences.

One of the applications I've grown accustomed to using is a small Powershell script that gives a brief list of stats on a user's Active Directory account. It's pretty handy for quickly figuring out if Bob forgot to reset his password or his account is now locked out from too many incorrect password attempts.

My old machine just ran the script without issues. The new one, however, wanted nothing to do with it.
The term 'Get-ADUser' is not recognized as the name of a cmdlet, function, script file, or operable program. How annoying...

Turns out lots of other people have that issue too, if the Google search results are any indication.

I had the Remote Server Administration Tools installed for Windows 7 (KB958830), the prerequisite for running the Get-ADUser module.

The second step was to actually enable the feature, which is, counter-intuitively, then added as a Windows feature.
  1. Click the start menu and search for "windows features", click on "Turn Windows features on or off"
  2. Expand Remote Server Administration Tools
  3. Expand Role Administration Tools
  4. Expand AD DS and AD LDS Tools
  5. Tick the checkbox for Active Directory Module for Windows Powershell
  6. Confirm that it's okay to install the feature
And you know what? It still didn't work.

BUT I discovered that the module did exist; if I opened a command prompt and first ran "import-module activedirectory," then manually ran the script instead of right-clicking the script and clicking "Run with Powershell" everything worked.

So how do I get it to run by automatically loading that module?

From a Powershell prompt I ran "$profile" to discover where my PowerShell profile is located; it returned 

C:\Users\bsilverstrim\Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1

I opened an Explorer window and navigated to C:\Users\bsilverstrim\Documents. There was no WindowsPowerShell folder.

Next step: make the folder.

Then in that folder I used a text editor to create a file called Microsoft.PowerShell_profile.ps1. In it, I had the line:

import-module activedirectory

I saved the file and closed out of my open PowerShell sessions. Make sure the suffix on that text file is .ps1, as Windows likes to try hiding a .txt extension if you use the common method for creating new text files. I right-clicked the Powershell script on my desktop, ran it with PowerShell, and it worked fine!

Friday, January 1, 2016

My Adventure With Docker on an Early Version Raspberry Pi: Part Three

Time to Make Docker Useful: Dockerizing EINAL

At this point I decided to use Docker, then I installed Docker on the Pi, and configured it with a few customization tweaks. The last step is to get EINAL (Email Is Not Always Loved) running in a Docker container.

Docker is supposed to be used to consistently deploy an application to a customized container. What I need to do is gather up the application executable and dependencies, along with any environment dependencies, create a Dockerfile that tells Docker how to configure itself for the application and then run the resulting image in a new container.

Grab the Latest EINAL

I store the source for EINAL in a Git repo located, in this case, on the same machine as I'm experimenting with Docker. Because I am running Docker on a Pi, I need to make sure I use a version of EINAL that is compiled on the Pi (for the ARM architecture). 

I change to my Go src directory, clone my repo, and compile. From my Go workspace, I ran:

git clone /mnt/mydrive/gitrepo/einal.git
go get github.com/howeyc/gopass
go get github.com/mxk/go-imap/imap
cd einal
go install

Now the latest version (confirmed with einal -version) of EINAL is residing in my Go workspace's /bin directory. Go compiles applications as single executables, making the resulting application a little easier to work with for Dockerization.

To keep things tidy, I created a /mnt/mydrive/projects_docker/einal_docker directory for staging the image; first thing to add? Copy the /mnt/mydrive/go_projects/bin/einal binary to the einal_docker directory.

While the executable itself didn't have direct dependencies, I had created files with configuration and search informatino that would be useful for my running instance. I copied credentials (with encrypted credentials), senderstrings (with strings searched in from: lines of emails), and subjectstrings (with strings searched in subject: lines) to the einal_docker directory.

Indirect Dependencies

EINAL, in background mode, listens on a definable port for connections with SSH. This means that it reads the <HOMEDIR>/.ssh/id_rsa file for properly encrypting the connection; since the image I'm using for building the container doesn't have the proper SSH configuration files, it'll need to have that key generated.

Added to my staging directory: ssh-keygen, and its dependency, libcrypto.so.1.0.0.

More testing showed that EINAL failed when trying to connect to GMail; the container was throwing an x509 error. This is caused by a lack of CA certificates in the container.

Ordinarily this could be fixed with a simple apt-get update and apt-get install routine; this was when I learned that in the process of a Docker build, Docker doesn't appear, as of this version, to force a host network command at build time. Something with our configuration in our network prevented the NAT networking to work properly; apt-get would consistently fail trying to look up the repo host, and the build seems to work by creating a new image for each step, building one upon another, caching each stage until there is a complete success or halting at a failure (caching the steps up to that point.)

My solution was to run yet another Docker container and running apt-get to only download the .deb files necessary to install the certificates. I used this to grab them to the running container:

apt-get update
apt-get download ca-certificates && apt-cache depends -i ca-certificates | awk '{print $2}' | xargs  apt-get download

There were a few iterations using different methods of getting the .deb files by themselves. I may have ended up just getting a list of what apt wanted to install, then running apt-get download <package> a few times. Regardless...

I transferred the .deb packages and used the "docker cp <containerID>:/home/<filename> ." from the host to get the files out of the running container.

Added to my staging folder: ca-certificates_20141019_all.deb,  libssl1.0.0_1.0.1k-3+deb8u2_armhf.deb, and  openssl_1.0.1k-3+deb8u2_armhf.deb

Dockerfile Automates the Build

The last file in my staging directory is the Dockerfile, which outlines the steps to deploy a working container running the application. With only minor tweaks, I should be able to use the files in my staging area and the dockerfile to deploy EINAL to a Docker host (in this case, though, it'll only work on other ARM-compatible hosts.)

Using my text editor, I create a file named Dockerfile with the following contents:

FROM resin/rpi-raspbian

# Create a working directory
RUN mkdir /opt/einal_files
RUN mkdir /opt/einal_files/logfiles

# Add a volume
VOLUME /opt/einal_files/logfiles

# Add a working directory directive
WORKDIR /opt

# Add the ca-certs and deps
ADD ca-certificates_20141019_all.deb /tmp
ADD libssl1.0.0_1.0.1k-3+deb8u2_armhf.deb /tmp
ADD openssl_1.0.1k-3+deb8u2_armhf.deb /tmp
RUN dpkg -i /tmp/*.deb

# Need to generate a keyfile
ADD ssh-keygen /usr/bin
ADD libcrypto.so.1.0.0 /usr/lib/arm-linux-gnueabihf
RUN mkdir /root/.ssh
RUN /usr/bin/ssh-keygen -b 2048 -t rsa -f /root/.ssh/id_rsa -q -N ""

# Add some files
ADD einal /opt
ADD credentials /opt/einal_files
ADD senderstrings /opt/einal_files
ADD subjectstrings /opt/einal_files

# A port to connect and give the "magic word" to
EXPOSE 1234

# Run the command
CMD /opt/einal -background -checkinterval 30 -port 1234

Much of the file is rather self-explanatory, plus there are handy hash-comments. The first line is a mandatory FROM line telling Docker what image to base the build upon (in this case, the rpi-raspbian image from the resin repo.)

I then create a folder for my EINAL files by telling Docker to run mkdir to create the EINAL directory and a logfiles folder for EINAL.

My next step is to create a VOLUME for persistent logfiles. If the container disappears, so would my logfiles. I didn't want that to happen. If you notice, the Dockerfile only creates the container volume; it doesn't specify a host directory to map the files to. That's because (unless something changes in a later version) Docker doesn't support mapping to specific folders on the host; this makes the deployment less generic (remember, the object is to make the deployments generic enough that you can deploy across a range of Docker hosts with minimal tweaking...creating host-specific dependencies could break deployments.) The only workaround is the configuration change I made in the previous post where I changed Docker to use a specific mounted drive on which to store Docker files.

EINAL works in part by grabbing the current working directory, then tacking on a subdirectory to find the credentials and configuration files and yet another directory (which happens to be the one that is turned into a Volume) for logfiles. The WORKDIR directive sets the current working directory so EINAL isn't creating files in the wrong place or exiting when it can't find the proper configuration files.

The three ADD directives copy the .deb files for ca-certs to the Docker image. The build is running from the folder with the Dockerfile, which is my staging folder, so this is copying the .deb files from that staging folder to the /tmp folder in the Docker container.

Like the commands to run mkdir, the next RUN line tells Docker to run dpkg to install the three deb files in /tmp.

Next I need my encryption keyfiles generated. I ADDed ssh-keygen to the container and add the library it needs to run, then created the subdirectory .ssh in Root's home (it turns out Docker containers run under the Root user) and then use ssh-keygen to create the keyfile needed to make connections.

Now I need the application I was creating the container for in the first place. I add the EINAL executable along with the configuration files to /opt and /opt/einal_files.

EINAL needs a port open when running with -background; the user connects via ssh to that port and enters the phrase that decrypts the credentials file. The EXPOSE command opens a particular port along with port forwarding rules to the firewall, in this case port 1234.

Everything is now in place; the final directive, CMD, runs einal in the background mode on port 1234 and directs the application to check every 30 minutes.

Deployment Time!

Now comes the simple part. Just to make things cleaner, I removed the many unneeded containers and interim images scattered on the system. To  be clear, I'm only do this because I am not using any other images on this system. Otherwise this would wipe other images and I'd have to rebuild all applications from their respective Dockerfiles with the entailed downtime from stupidly wiping images away that I might have needed.

Here's a simple cleanup:

docker rm `docker ps --no-trunc -aq`
docker rmi $(docker images -q)

Also, again while no images are running, I deleted volume directories to get rid of persistent (test) data.

sudo bash
cd /mnt/mydrive/docker_rootdir/volumes
rm -fr *
exit

Now I build the new image:

docker build .
docker images

The end of the build should give you the final image name, but the images command also gives a list of available local images.

REPOSITORY           TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
<none>               <none>              8e1a76a5d31f        2 minutes ago       88.07 MB
resin/rpi-raspbian   latest              e97a8531a526        4 days ago          80.28 MB

In this case the image with no repo and no tag is the one I just built. I need to turn that image into a running container:

docker run --net="host" -d -P 8e1a76a5d31f

The -d tells Docker to run detached (or in daemon) mode instead of attaching to a terminal. The -P tells Docker to map the exposed ports to the host, changing iptable entries as needed. And of course the image name itself is the one I want to run.

I then ssh to the host IP on port 1234, give the proper command and monitor the logs to verify that it's off to the races!

When I want to stop the Docker container, I can just run


docker stop 8e1a76a5d31f

...from the host.

What I've Learned

Docker is an interesting deployment option. The act of turning my application into a deployable set of build directions reminded me of the many interesting ways a build could break; I had forgotten about things like the SSH key file and CA certs necessary to run, but are normally already installed on hosts. The Go executable itself doesn't have extra libraries as dependencies, but the development environment often (accidentally) shields me from the indirect dependencies.

Docker also teaches you about the deployability of your application, and think about how you're modeling your infrastructure. Docker doesn't force a completely strict configuration method on you, so I suspect that my workarounds for problems I encountered is not completely "portable" for deployment.

For example, my staging directory has the executable, some library files, and .deb files. Probably a better way to work is to have Docker, in the build process, download the proper files with apt.

That would be fine except something in our environment prevented the default networking from properly working, and the Docker build process doesn't support the host networking directive. Workarounds I found online involve trying to pass the proper internal DNS servers to the Docker container, but that didn't work in my case. The best I could do is download the necessary files, meaning I may not have the latest versions if updates are issued compared to a system where I can run apt-get in a different host.

My method of deploying the executable also entails copying an executable to the staging area, when the "proper" deployment is most likely to have the Docker pull EINAL from the Git repo itself, along with a current version of Go, and compile EINAL from within the container. That would better fully automate the build process for the latest, newest everything (even though this would still mean a break in the toolchain will keep Docker from deploying the container.)

Docker does seem to strongly encourage you to create an account on the Docker hub and upload images; I'm leery of how this works, because there's a possibility that if I build an image or snapshot it with a sensitive application, wouldn't it upload that data to a remote server? What if I were running something that had sensitive transient data stored in it? Would there be a nonzero chance this data is copied to a server I don't have control over? Without complete familiarity with the way Docker works and the "Docker way" of doing things, I'm not sure I'm not accidentally uploading things to an untrusted server for other people to get access to. When I tried figuring out how to add a tag to label my image so I could, for example, always run or monitor an "einal_versionxyz_image", instructions sounded like I could only tag the image by pushing it to a Docker hub account.

The promise of Docker is quite intriguing, and I'm not familiar with all the tricks to completely automate and monitor container deployment. I feel like I've only touched upon the most basic deployment of a Docker image. Documentation I found sounded like enhancements and tweaks are anticipated to enhance the Docker platform as a way to augment deploying and monitoring into a company's own virtual "cloud" or perhaps automate deployments to existing cloud systems, much like Chef or Puppet is used to configure template virtual machines.

My experience with deploying EINAL was two days of dissection and testing to get a working machine container. It wasn't simple, but it is something I feel was worth doing and learning from!