Sunday, November 11, 2012

Romney's Whale Fail: ORCA

I was recently introduced to the story that the Romney campaign had a massive IT failure during the election.

I try not to follow the mudslinging and such. I don't have a cable subscription. I don't actively see the latest gaffes and goofs along the way. As such, I hadn't been aware of the Romney campaign bragging in media outlets of his state of the art voter tracking system that was set to put him leagues above anything the Obama campaign had in place.

I ran into the story after the fact so it's hard for me to know that I would have predicted the scale of fail this project ended up becoming. Fortunately we have enough angry and schadenfreude-filled individuals that details are available for those who are curious. 

The gist was that ORCA was meant for volunteers on the ground at election places to track who is out voting and report back to Romney Central so they can coordinate targeted robo-calls to Republicans urging them to go out and vote while collecting the most up-to-date information on the progress of the election.

The Technology Fail

In the video, the communications director for the campaign, Gail Gitcho, gives the following details:
  • 800 volunteers will be working in the Garden (their headquarters?) collecting information.
  • They will have volunteers in the swing states, where votes "really do matter for the outcome of this election."
  • The purpose of ORCA is really to target low-turnout in the target precincts for calling registered Republicans
  • Narwhal is what Obama's campaign calls their similar tracking system, and Orcas are the predator to the narwhal, so that's the origin of the clever name.
That's really not much in the way of details, but I suppose it makes for a nice soundbite in an interview. I'll not comment on the fact that this highlighted yet another reason I am ambivalent about voting on our electoral college system (they're targeting only those precincts that "really do matter," eh? Thanks...) nor will I point out the irony that the campaign has characterized Obama supporters as entitled, lazy, unmotivated leeches and apparently ORCA was meant to motivate the Republicans who haven't gotten off their arses to vote for the real leader in the presidential race.

No need to point those out.

But what else can we find about the details of ORCA?

The Huffington Post had an article about plans for the massive poll-monitoring system:
  •  It will rely on 34,000 volunteers in swing states to send back data
  • Volunteers will be using smartphones to send information
  • It will be a web application
  • Volunteers log in, see names and ages of eligible voters, and report who has voted.
  • Incorrect information, fraud, etc. can be reported from the application.
  • There is some kind of social media tie-in so volunteers can send instant messages of what they're seeing in their polling places.
  • A link to what is claimed to be the training manual for the software shows what could be a rather strange screwup in the FAQ section, saying: "The answer to Question 13 -- "Am I allowed to speak on my cell phone inside the polling place?" -- states, "Yes you may be allowed to use the smart phone inside the polling place. Please follow your poll manager's instructions." That answer appears to have been swapped with the answer to Question 11 -- "Am I allowed to use the smart phone app inside the polling place?" -- which currently reads, "No, you are only allowed to speak on your phone outside the voting area."
I'm hesitant to believe that is the actual training manual for volunteers; it is a 3 page document that literally appears to be something whipped up with PowerPoint. At best, it's a quick-reference guide for people with short attention spans; it's short on details. There must be more somewhere.

What other information can we find? An ArsTechnica piece claims the program was created over the course of 7 months.

Seven. Months.

The campaign apparently hired Microsoft and a consultant firm to create the application. Like other sources, it said that the application would be used by 37,000 volunteers sending data back to 800 volunteers at the headquarters, and there was a backup voice system that allowed people to phone in results if they couldn't access the web system.

It also says there were 11 database servers, one web server and one application server.

It's important to understand that when you hear these numbers, what isn't brought up is the architecture of the application(s) involved. We don't know how the application was structured; an inefficient architecture could double or triple the number of servers and/or bandwidth required to achieve what a "properly" architected solution would require. Even so, if those numbers are accurate, I find it troubling.

Why?

I work for a company that deals with some big numbers in terms of access and also happens to rely on Microsoft databases in the back end. Here's some numbers based on public information:
  • 95 million page views a month
  • 800 HTTP requests a second
  • 180 DNS requests a second
  • 55 Megabits per second
What was this running on?
  • 10 web servers
  • 2 database servers
  • 2 HAProxy servers
  • 2 Redis servers
What we see here is that the heavy work comes at the front end...the user interface...rather than the database side. There are a number of web servers handling the heavy lifting along with proxy servers to accommodate user interaction. The article gives technical specs for the systems used at the time, which were (and continue to be) beefy servers.

In other words, the big traffic concern is on the web server side, not the database side.

The article is quoting ORCA using 11 database servers and only on application and one web server?

Again, I don't know what the application is written in and what framework was used. But the numbers quoted give me some pause; at least enough to stop and say that is something that requires some in-depth real-world testing.

 Followed by the Training and Testing Fails

Next in the articles we see claims that the human side of the system had failed. The SD Times claimed in an article that ORCA wasn't released until 6AM on Election Day.

A massive, brand new system, set to work with over 40,000 people, was released the day it was meant to "go live" for use? Are you insane?

This, of course, led to other quirks that I'd come to expect when releasing a new project. None of the quirks are things you'd want to find on the day you're expected to make your best showing. Things like users not being able to log in because of incorrect username/password/PIN combinations, and the volunteers not being able to bring up the site because he or she used "http" instead of "https" in the URL name (which could have been a relatively easy fix if they had tested this beforehand.)

Apparently it wasn't until 6PM on Election day that they admitted the passwords and PINs issued for people in North Carolina and Colorado were wrong.

At one point ComCast shut them down because they thought it was a denial of service attack. In the ArsTechnica piece it was said: "They told us Comcast thought it was a denial of service attack and shut it down," Dittuobu recounted. "(Centinello) was giddy about it," he added—presumably because he thought that so much traffic was sign of heavy system use.

He was happy about the traffic being blocked? I'm really hoping this is a miscommunication. I can't imagine someone building a system dependent on receiving information for analysis being blocked by mistake and having it interpreted as a good thing during a key time in the election. "We're so popular they're cutting us off! Isn't that great?!"

No, it's not.

The Ars piece also stated that training packets arrived on Election Day Eve as late as 10PM consisting of 60+ pages of instructions and voter rolls (so I don't know if that would mean the 3 page "manual" wasn't really the manual, since it didn't say how big the voter rolls were.) But really, they expected tech-illiterate volunteers to print all this out the night before?

Wouldn't a campaign spending millions on advertising not be able to afford to print these packets ahead of time and deliver them to volunteer centers for distribution?

What do I take away from this?
  • Inadequate training for users
  • Inadequate testing of the application using real-world usage models
  • Inadequate communication of problems as they were occurring
Overall the program had a systematic failure at multiple levels. It wasn't just application design, or hardware, or training, or implementation and execution.

What I also found surprising, although in hindsight I probably shouldn't, was the hubris behind the application rollout. The campaign was bragging about this program leading up to the election. They ignored problems as they were occurring. And they didn't appear to hold any accountability behind the project.

Oh, the accountability...

Who Was to Blame?

I'm of course not privy to what happened behind closed doors, but the public story released in the aftermath should tell you something about leadership.

Bad things happen. Projects go south. Things go wrong.

Leaders are people who may not have directly had a hand in the problem, but they are the ones for whom the buck stops. Consequently they are responsible for acknowledging when something goes wrong, learning from the mistakes, and planning how to move ahead.

Things may get ugly in the "war room." Heads may roll. New holes are reamed. Maybe there's some screaming or moments of unprofessional language. But at some point the public response is formed and the game face is put on. Then the public sees the leader at work.

Mitt Romney was campaigning to become President of the United States. He repeatedly criticized Barack Obama's leadership of the country. Surely, he'd take responsibility for this failure and take his lumps.

Well, no. Not from what I found.

Redstate.com reported that it was all the consultant's fault. From the article:

They say that the truth is the consultants essentially used the Romney campaign as a money making scheme, forcing employees to spin false data as truth in order to paint a rosy picture of a successful campaign as a form of job security.

I have to admit there's a bit of humor to reading the quote, “the Obama training manuals made ORCA look like drunken monkey slapped together a powerpoint” however.

Instead of acknowledging the mistakes, the campaign apparently decided that it was all the consultant's fault, and then everything will sink away from memory because the election is over.

Funny how that works.

The failures here are played out more frequently than is generally paid attention to. It's almost a running gag; hire consultants, slap together an application with impossible deadlines, then blame the consultants when everything collapses. And of course, the people who hired the consultants in the first place take no responsibility for not actually performing oversight on the project.

In the end this once again demonstrated what kind of leader we almost elected. One filled with hubris, egos, and a culture of blame, and from the articles I've read on the project, I had the distinct impression that the campaign lived in some kind of bubble reality that denies the existence of issues that seem obvious to people outside that bubble.

The unfortunate part is that the events that unfolded here will no doubt be forgotten until they play out yet again in four years, and this will become little more than a footnote to quote when the next major technology failure occurs in a political campaign. Yay!

No comments:

Post a Comment