Monday, April 21, 2014

Performing an Old School DD Over Netcat Clone With Speed Mysteries on the Way

This Post Evolved

I initially wrote this as a tutorial (okay, yet another tutorial) on using dd and netcat to clone systems because I was having trouble getting a successful clone with our existing toolset.

What happened was that what should have worked in my modified attempts at speeding up the process didn't work. The tutorial, written as notes following along the processes, became part of a head-scratching mystery.

I'm sure there are other admins out there that will know that "of course" this happened and "he should have done this" to find the problem. But I'm also sure there are others who can relate to my discoveries and troubleshooting process.

What follows is material that you can read and suss out your own notes for a dd-over-netcat solution along with a bit of a narrative ride investigating a what-the-hell puzzle. I have just enough variables thrown in to the mix that you may not run into these issues. But if you do, especially when you run a mixed platform environment, this scenario may sound all too familiar.

The Cloning

I previously ranted about the difficulty in cloning Windows and how it is seemingly architected to make the cloning process more difficult. But hey...if it were simple, we wouldn't have to find creative solutions to seemingly simple tasks, right?

So how do you perform an old-school clone?

If you have 2 drives that are the same in size, or you need to clone out from a source disk that is smaller than the target drive, this method should work.

Here's how I did it, and from the description you should be able to adapt it to your needs.

I have one Windows 7 laptop with our custom settings and software installs; it needs to be duplicated to three other laptops of the same make and model (and hard disk sizes.)

First Windows has to be prepped. I have software installed for the laptop to act as a pseudo-kiosk and lock settings down. I had to tell this software to temporarily disable and allow for an "imaging mode" which, if the Internet didn't lie to me, basically tells this software to not only disable but to not load its custom drivers. Otherwise there could be some fun times ahead of you.

The next step in prepping Windows is running Sysprep. This strips out system-specific information and puts the machine in OOBE (out of box experience) mode...that's what causes it to ask questions like how to connect to a network and what name you want for the machine when it first boots up.

Just running %WINDIR%\system32\sysprep\sysprep.exe will bring up the little GUI interface for it. I select "generalize" and "OOBE" options. Otherwise run the executable with the "/generalize /oobe /shutdown" options. All turned off? Good. Leave it off. If you turn it on, the monster will try

Second I will need a bootable Linux distribution. There are instructions galore on the webbertubes for creating a bootable USB image. I used UNetbootin. It's an application that automated the creation of a USB distribution. And I mean automated. Right down to downloading a distribution from the source website for you so you didn't need an ISO ready to go first. And UNetbootin has downloads for Windows, Linux and OS X. Create the USB flash drive boot install and plug it into your source computer.

Third step is to boot the original laptop, but catch it before Windows boots. You want to go into the BIOS (or "setup" as it's becoming more popularly labeled) and change the boot order to boot first from the USB device. I put this step here because it was easier when the USB drive was plugged in; in my particular case it saw the drive and in setup specifically listed it as a boot device by name. Save settings, reboot. Some variations of this you can use a one-time boot menu to boot from the USB drive, or set it up earlier to boot from "USB Device." This entirely depends on your BIOS and ability to puzzle through settings.

Fourth step is to actually boot to Linux. I used Ubuntu. Boot it LIVE. Do NOT run the installer. That would be tremendously kick-yourself-stupid at this point. Also, don't let it boot from the hard disk. That'll ruin your Sysprep state. Boot the Live CD version. Once it's open, navigate to a terminal. The beautiful, beautiful command line.

Fifth step is to prep my computer to get the image. I want to image this out to multiple systems, so I'm going to create an image file that is then served out to the target laptops, rather than keep running the copy from the source laptop to the target laptops.

On my Mac (yes, cross platform!) I open a prompt and switch to my external drive, where I can spare 500 gig.

Note that yes, you can make the image smaller. Dd will create an exact image of the hard disk being sourced; you can compress it and decompress on the receiving end. And if the drive is mostly empty this will be significantly faster. However, this is also another place where you can make a mistake. Get the procedure working the simpler way first. Then worry about complicating your life a little more.

That's my advice, anyway. If you're a "Screw it! We'll do it LIVE!" risk taker, feel free to use a compression variant of the commands.

The Mac is the RECEIVER in this case. It's getting data FROM the source (reference) laptop. So I can open a connection to LISTEN for data from it.

nc -l 19000 | dd of=./DriveImageName.img

This runs Netcat, tells it to listen to port 19000, then pipes that data to dd, with the output file of workingdirectory/imagename. On the Mac it popped up a warning asking if I wanted to allow listening to the network because it otherwise needs a firewall rule. Say yes, and it allowed it. Also, I was on the drive I wanted to save to. That's why there's a period for the current directory; otherwise give a full path name.

Do you know your computer's IP address? If not, grab it from another terminal session or your network configuration utility. You'll need it for the...

Sixth step which is sending your image! On the source computer, search through dmesg to find the device pointing to the hard drive. Usually it'll be something like /dev/sda or /dev/sdb, but this is entirely dependent on your configuration and the Linux detection mechanism. Search the messages for the device matching your hard disk size.

Write it down. Verify that it's correct using fdisk to print the partition table. If you want to be extra careful, mount the partition with Windows and verify it's the correct stuff. Be. Careful.

And if you mounted the drive to check it, UNMOUNT IT. For the copy to properly work you want it to be completely unmounted. Write down the drive device. And triple check your command before hitting enter.

What command?

dd if=/dev/device | nc TargetIP 19000

Of course you replace the device with sda or sdb or whatever you found was the device, and the TargetIP is where you're sending the image (in my case the Mac.)

The command will look like it's doing nothing. I went back to my Mac and in another console told it to do a directory listing and lo and behold, DriveImageName.img was created and rapidly growing.

This can take a couple of hours. When completed, the dd command returns to the prompt and your DriveImageName.img file will be approximately the size of your source system's hard drive.

The Seventh step is to take the image and overlay it into a target laptop. This is destructive. That image file is the whole hard drive. That means boot sector, multiple partitions (including the recovery on the original laptop), the whole shebang. I didn't copy a particular partition...this was the lock, the stock, and the barrel. Just so you know.

Shut down the LiveCD Linux on the reference (source) laptop. Close it up. Set it aside. Stick the USB drive into the sacrificial target laptop. Configure the BIOS to boot from USB drive as you did with the source laptop (or use the one-time boot). Go to the command prompt once Linux is up and running on the network. 

Now it's time to reverse the flow of the data stream. On the TARGET laptop, scour the dmesg logs for the name of the hard drive. THIS COULD BE DIFFERENT FROM THE SOURCE LAPTOP.

I know, it shouldn't be. But it was for me. Don't assume the detection will be the same. Find that proper device name.

All set? Took notes? Good.

Then we'll move to the eighth step, which is telling the laptop to accept the image. From a command prompt:

nc -l 19000 | dd of=/dev/device

Look familiar? It resembles the command I used on the Mac for listening for the incoming data stream. Mainly because that's what it's doing. Only this time the output file (of) is the device name pointing to the laptop's hard drive.

Now we need the drive data. Do you have the laptop's IP address? If not, grab it from another terminal session.

Step nine is on the Mac. Let's send the file.

dd if=./DriveImageName.img | nc LaptopIP 19000

Double check your information before hitting enter, especially on the laptop, and let it rip. The Mac will read the image file and and stream it to Netcat, sending to the IP address on port 19000. Again nothing will seem to happen, until several hours later, when the command prompt abruptly returns control after telling you how many records were sent out.

The tenth step is to tell the laptop to reboot. When it starts the boot cycle (or it powers down, if you told it to shut down) you yank the drive, boot it to setup, tell it to boot from the hard drive now, and do a boot into Windows. It should run the "out of box experience" setup and once you enter a name and user and blah blah...WINDOWS.

Then repeat steps seven though ten for each of the other target systems.

How Do I Know If It Is Doing Anything?

The process is relatively quiet. That is annoying, to say the least...how do you know you're not wasting a few hours of becoming antsy?

Method 1: tcpdump. I opened another terminal an ran tcpdump, where I saw a helluva lot of packets dumping to the target machine IP. That at least tells me it's working.

Method 2: Activity Monitor. On the Mac, I can run Activity Monitor, which gives some surprisingly useful information. I open the "network" tab and look for nc, and Activity Monitor will tell me how much data has transferred.

Note, though, that if you're compressing data, you won't know when it's done transferring the image. You don't get guaranteed compression ratios and if the drive was mostly empty, you've got a lot of nothing that compresses into negative quantum data or something like that. I saw it on Star Trek.

If you aren't compressing the transfer, you still get an idea of the remaining time and not an exact since block sizes and packet sizes affect the reading. But approximate is better than nothing, right?

Method 3pv. Pipe Viewer was one of my favorite methods of checking the progress of piped transfers back in the day. The follow-along I used to model and check my instructions here didn't have pv installed by default on the live-boot Linux nor Mac; I was too lazy to install it for my one-off purposes here. If you're going to do this frequently (which I did back in the days of getting a #$%^ lab to work) or stream to multiple systems, this thing was fantastic. Just make sure you insert it into the process at the right point to get an accurate read on data flow.

Method 4: Sending the right signal to dd should make it throw out a status. The command

kill -USR1 dd_pid

...where "dd_pid" is dd's process ID should throw the status to standard error. The link above combines it with the watch command so you can periodically throw out the status update to standard error every X seconds.

Note none of these will tell you if the data is successfully being sent or read or written properly. Dd is pretty stupid when it comes to errors. I haven't tested if you can dump a file to a machine that isn't actually reading them, so all the bits are banging away like a legion of Goa'uld hitting the iris on the stargate. That would be kind of awesome if the bits made thumpy noises like that as they hit the firewall, though.

Let's Make It Faster: The Compressioning

The initial copy of the configured laptop to my Mac's FireWire drive took about 2.5 hours (max speed on the interface from my Mac is 800 Mb/sec). The copy from the Mac FireWire drive to the laptop to be overwritten took roughly 5.6 hours. 

Ouch?

Why? It could have been difference in block size. FireWire bottleneck. Cache. Something not quite right in the network card driver detected at boot. I could experiment more and find another way to tune this, but let's first try a simpler optimization.

We'll play with compression.

I have an uncompressed image on the FireWire drive. I don't particularly feel like potentially corrupting or otherwise screwing up an image that I now know is working. Therefore, I'm not going to compress the image on the drive. I'll compress it in-flight.

Now, the first few times I tried this, the clone failed; the process hung, and after Activity Monitor said around 8GB had transferred data would simply stop transferring. No indication why. I thought maybe something was "off" in the network configuration so I tuned a few settings (OS X seems to have some awful defaults.) Didn't work. But consistently seemed to fail around the 8GB mark. I finally made some progress by changing a few things around in the copy. Not quite sure what happened, but it was strange that the non-compressing copies worked but the compressing with bzip2 method was dying on me.

First, I boot another laptop and adjust to boot from the USB Linux distro, then open a terminal and tell it to listen for a network connection after grabbing the IP with ifconfig.

nc -l 19000 | bzip2 -d | dd bs=16M of=/dev/drivedevice

It's pretty close to what I used before for getting the image, only this time instead of piping from Netcat to the dd command, it first pipes the output to bzip2 with the "decompress" switch. The stream of gibbledibble goes from Netcat to the decompressor to dd.

I'm also adding a block-size of 16 meg. This seemed to be a key in getting the copy to work when streaming from bzip2.

And remember, this is the target laptop, so dd uses the "of" switch.

Second, it's time to send the file from my Mac. In a terminal, I use:

dd bs=16m if=./DriveImageName.img | bzip2 -c -z | nc TargetDeviceIP 19000

Hit enter, and we're off to the races.

This command, again, is similar to my previous commands, but this time I first use dd to read in the image file, pipe that as a stream to bzip2 which compresses the data stream, then dump the results to the target laptop. Also notice I used a lowercase "m" on the Mac, and a capital "M" on the Linux system. Oh, the joy of minor differences in utilities. Linux is Linux, OS X is BSD-ish. Muss up the m vs. M and you'll get an error message.

Well That Was Bad...

The copy took approximately 15 hours.

Seriously? How does the addition of compression triple the transfer time?

My first guess is that it wasn't the compression, but rather something odd in the disk caching or data transfer for the FireWire drive and the block size was throwing it out of whack. Blackmagicdesign's free Disk Speed Test  told me the drive was giving a 67 MB/s write and 41 MB/s read time (which in itself is a little strange...I would have expected faster read than write speeds. Perhaps that's write caching magic.)

Let's retry the test, but this time on the target laptop I'll use:

nc -l 19000 | bzip2 -d | dd bs=1M of=/dev/drivedevice

...and on the Mac side, I'll use:

dd bs=1m if=./DriveImageName.img | bzip2 -c -z | nc TargetDeviceIP 19000

My working theory here is that something in the point where the data is pulled from the FireWire drive (OS X's I/O scheduler? The physical construction of the drive and its cache?) is mismatched with the way bzip2 is reading in chunks to compress, and that mismatch is enough to throw the pipeline out of whack. Did the change to the smaller block size have an effect?


My iostat numbers (sudo iostat -w 1 disk_device)  seem fairly consistent with pushing 3 MB/s using the 1M block size, although there were periods where it shot up to 6 MB/s. Did it translate into any time saved versus the 16M block size?

This took 15 hours.

What the $%^, Lana?!

Okay, Let's see what we have so far. I dd from the FireWire drive to netcat to the network to the remote system to the target's dd on to the target drive, it takes roughly 5 hours. I insert a compression, it triples. And it requires a blocksize when I insert the compression or else it seems to hang.

I still suspect something is problematic with the I/O scheduler in OS X, but I described the problem to some other intelligent people who suggested the problem may be related to bzip2 not being multithreaded. It was speculative, but in checking top while running the transfer, we could see that on my multi-core hyperthreaded Intel-based Mac the bzip2 process was consistently holding near the 100% mark, dipping into the 60% realm once in awhile. When a process holds 100%, it can indicate that the process is gripping a single processor core around the throat and the process isn't multithreaded, or at least not in a way that the scheduler can dole the workload over multiple cores (else you would see a steady over-100% utilization.)

Fortunately there is a (relatively) simple way to test if the compressing stage is the issue.

The Pre-Compressioning

Up to now the Mac is pulling the image file data from the FireWire drive over the bus, compressing it into memory, then pouring it into Netcat and then into the network stack.

If we pre-compress the image on the drive, the Mac will pull the (compressed) image file data from the FireWire drive over the I/O bus and pour it directly into Netcat and the network. If the slowness fault lay in Bzip2, or Bzip2's single-threaded single-core-hogging, this will bypass that issue altogether. It also means there's less to read from the drive since the image is smaller and some advantage gained from not hogging a processor as the Mac had to crunch the data down.

First, we compress the file.

time bzip2 -c DiskImageName.img > DiskImageName.img.bz2

Time isn't really needed; I am using it because I wondered how long it would take to compress the image on the FireWire drive. In case you're not familiar and didn't divine the use of the time command, it tells you how much time the subsequent command took to complete.

The important part is the bzip2 command. The -c is telling it to stream to standard output; the greater-than is redirecting that output to a new file with a .bz2 suffix. If I ran bzip2 alone it would compress the file, but in the process destroy the original and replace it with the compressed version. I'd like to keep both since I know that the original file is not corrupted.



I'll note that this time the iostat output was pretty consistent at bouncing between 7 and 10 MB/s as the bzip2 process read from and wrote to the FireWire drive, slightly more than twice the I/O when doing the dd read previously. I also noted that the KB/t was one meg, just as it was with the dd process that had the block size of one meg specified. I thought before that it was the fact I specified a one-meg block size that caused that KB/t number previously. This hints that I/O transfers from the FireWire interface tops out sending chunks in one meg batches.

Later on in the compression...as in several hours later...iostat was giving results like this:


I suspect the image file was in a portion of the drive that didn't really have data written to the sectors in the image. Just an interesting note and bit of speculation.

When it was finished, time reported:
real 796m39.899s
user 649m0.946s
sys 11m8.899s

I'm getting quite an education on expectations here.

Second I set up the Linux machine to receive the file. 

nc -l 19000 | bzip2 -d | dd bs=1M of=/dev/device

I kept the same block size this time around to maintain the conditions of the previous 15-hour attempt.

Third I tell the Mac to read the file and dump it into the network funnel.

cat ./DiskImageName.img.bz2 | nc TargetDeviceIP 19000

The file is...well, it's a file, so I can cat it directly to standard output. Dd is useful because it can read raw devices. This command will cat the compressed file (remember to use the .bz2 version...) directly to Netcat and over the network. 

The target system in step 2 above will decompress the stream of data and write it directly to the device.

Did It Improve This Time?


The image file shrunk considerably (which makes sense, since the majority of the imaged disk is mostly "blanked" hard disk sectors thus is highly compressible.) The image file on the FireWire drive shrank from 466GB to 108GB.

So the file to be read was reduced to nearly a quarter of the original size, but despite this, the transfer took approximately 12 hours. An improvement over the 15 hours, but still much more than the original 5.

The compressed file is smaller, so there's less to read and stream. It's already compressed, so there's no overhead on the sending machine due to running bzip2.

This could mean that the Netcat communication is acknowledging when it can take more data, and not buffering much of what is sent (or has a small/limited buffer space allocated.) Then it's possible the target computer is a bottleneck as it decompresses and writes the data.

Maybe The Compressor Sucks?

Let's try a quick experiment with a different compressor/decompressor. I compressed the image with gzip (gzip -c DiskImageName.img > DiskImageName.img.gz) to prep the transfer rather than perform an on-the-fly compress. Notice that, like bzip2, I used -c to output to standard output and redirect it to another file in order to preserve the original image file.

Right off the bat I see the gzip process is hovering between 50% and 80%, whereas bzip2 took up roughly 100% of a core. Like bzip2, gzip is not multithreaded, but it apparently isn't pegging the core either.

Gzip, without specifying a greater degree of compression, shrunk the 466GB file down to 106GB (the bzip2 version is 108GB, if you're keeping score.)

First the target is set to listen for the compressed data stream.


nc -l 19000 | zcat | dd of=/dev/<DriveDevice>

Note that zcat is a form of gunzip/gzip that will decompress from standard input and output to standard output.

Second I send the file from my Mac.

cat ./DriveImageName.img.gz | nc <TargetIP> 19000

After a few hours I checked on the process. Strange...the channel was still open, so the cat process was still dumping data, but it seemed to be quiet, according to iostat.



There was one little burst of data there. What was the laptop doing?


Hmm...KB_wrtn goes through periods of 0, then a burst, then back to zero. I'm guessing that zcat is working with blocks of data to decompress, then as the block (or cache) is finished, it dumps that portion out to dd, which is then written. There is a lot of reading from the drive, though...I don't know why that is happening.

The system monitor on Ubuntu lists gzip using around 700KiB resident and 9MiB virtual memory. Dd is listed with 800KiB and 15MiB virtual in use. So if there's a large portion of memory in use, it's not coming from there.

The command (on Ubuntu) free -hm says that I have 133M free, and +/- the buffers/cache says 6.9G free. So somewhere there is a lot of caching going on...I'm guessing it's related to the data being fed to zcat.

So how long did it take?

Dd on the Ubuntu system said 17932 seconds. That's 299 minutes, or 4.9 hours. Slightly faster than the plain copy.

What Are the Takeaways From This?

There are a few conclusions I've reached from this adventure.

  1. The network isn't the limiting factor in the copy. This should be somewhat obvious. The transfer is the transfer is the transfer. Yes, this imaging would be faster if I did it over a direct connection to the target machine from a portable USB drive, but that defeats the purpose of "dd over netcat." Also, the transfer from a USB source to an internal drive target increases the chances of screwing up the transfer if you haven't done this before.
  2. You'd think compression would make the process go faster. Apparently not always. 
  3. The choice of compressor really makes a difference.
  4. These numbers are seriously harshing my calm. I can't help but think there is something "off" in them; I think I'll have to test some more in a separate writeup.
I Tried the Copy, Why Is My Attempt Failing?

I don't know. But some common guesses:
  • You don't have access. I ran most of these through elevated privileges (sudo is your friend) so I can access /dev files without worrying about that. Which also makes this process a little more dangerous.
  • You typo'd something, accessing the wrong device or using the wrong port. I told you to triple check your commands before hitting Enter. I warned you. Hopefully you didn't break something or reverse the target/source configurations. That last one will be a huge problem.
  • Corruption. A copy corrupted in the process of dd, the target file is saved on a drive with a problem sector, the datastream was interrupted, the sun farted a flare at just the wrong moment, who knows?
  • Drive mismatch. Different manufacturer, sizes, some other random gorbblot that shouldn't fail in fact is causing a fail.
  • You are writing a file to a specific partition (which won't work in this case) or possess a severe misunderstanding of device files versus files. That's a favorite I still run into. It leads to people writing a disk image file to something like a CD and burning it, so instead of overlaying a filesystem, you get a CD with MyDisc.ISO burned to it as one big file then having them ask, "Why doesn't this work?"
That's just what occurred to me off the top of my head. There are probably other reasons. Most of my readers are imaginary to begin with, so lay off.

Variations

There are a few variations that I can think of. You can experiment with other compression utilities. Or you can use this method to directly write from one system to another. On the source you dd with the input from the source drive, pipe it to netcat, which sends it to netcat on the target system, piping that into dd with the output file being the target drive. Direct one-on-one imaging with no in-between storage or a giant file.

If you really want to experiment, there are some great things that happen when you throw different variables into the mix. From hard disk firmware abstracting the physical drive specs from what is exposed to the software to the size of your packets shuffling around the network, you can discover how to finely tune your transfers so they are optimized for use in your specific network and hardware scenario at a cost of several days of your life. The best way to get started on this is using different block sizes in dd, so it reads in bigger or smaller chunks than the default. You can sometimes get faster speeds in your transfers doing this. Or you might fragment the hell out of your packets and slow things down. Me, I needed this to actually work, so I didn't keep playing with it like it was a lab. But my imaginary readers out there are free to experiment with this until they can get a perfect throughput scenario.

Secure Dat Data

One of the sites I found while Googling for a reminder of how to properly use dd over netcat came up with a page testing ssh to tunnel the stream to a remote computer. Maybe it's because the author likes benchmark porn. I'm not sure. But it's true that ssh encrypts the transfer, so if someone is intercepting your traffic they can't get the contents of the copied workstation.

Personally, I think that if someone is sniffing that level of your traffic on the internal network, you might have a bigger problem. 

And if you're cloning a drive over the Internet, you're probably strange or doing something desperate in the first place. 

"But the target machine is in a remote office!"

Yeah, I classify that as desperate. Why aren't you using a dedicated VPN? Ugh.

Anyway, if you really really wanted to do that, keep in mind ssh does encrypt things and encryption adds overhead. You thought the dd was slow before? Add a few percent of CPU power to the encryption part. It adds up. But at least no one is eavesdropping.

But really. Get a VPN configured.


No comments:

Post a Comment