Pragmatic IT: IT Infrastructure

Showing posts with label IT Infrastructure. Show all posts

Sunday, 30 September 2012

Long Fat Networks

Long fat networks are high bandwidth, high latency networks. "High latency" is relative, meaning high latency compared to a LAN.

I ran into the LFN phenomena on my last data centre relocation. We moved the data centre from head office to 400 kms from head office, for a round trip latency of 6 ms. We had a 1 Gbps link. We struggled to get a few hundred Mbps out of large file transfers, and one application had to be kept back at head office because it transferred large files back and forth between the client machines at head office and its servers in the data centre.

I learned that one can calculate the maximum throughput you can expect to get over such a network. The calculation is called the "bandwidth delay product", and it's calculated as the bandwidth times the latency. One way to interpret the BDP is the maximum window size for sending data, beyond which you'll see no performance improvement.

For our 1 Gbps network with 6 ms latency, the BDP was 750 KB. Most TCP stacks in the Linux world implement TCP window scaling (RFC1323) and would quickly auto tune to send and receive 750 KB at a time (if there was enough memory available on both sides for such a send and receive buffer).

SMB 1.0 protocols used by most anything you would be doing on pre-Windows Vista are limited to 64 KB blocks. This is way sub-optimal for a LFN. Vista and later Windows use SMB 2.0, which can use larger block sizes when talking to each other. Samba 3.6 is the first version of Samba to support SMB 2.0.

We were a typical corporate network in late 2011 (read, one with lots of Windows machines), and they were likely to suffer the effects of a LFN.

Note that there's not much you can do about it if both your source and destination machines can't do large window sizes. The key factor is the latency, and the latency depends on the speed of light. You can't speed that up.

We had all sorts of fancy WAN acceleration technology, and we couldn't get it to help. In fact, it made things worse in some situations. We never could explain why it was actually worse. Compression might help in some cases, if it gets you more bytes passing through the window size you have, but it depends on how compressible your data is.

(Sidebar: If you're calculating latency because you can't yet measure it, remember that the speed of light in fibre is only about 60 percent of the speed of light in a vacuum, 3 X 10^8 m/s.)

There are a couple of good posts that give more detail here and here.

Sunday, 22 January 2012

Know What You're Building

"Know what you're building" seems like an obvious thing to say, but I don't think we do it that well in IT. For my recent data centre relocation project, we applied that principle successfully to a couple of areas. The network lead wrote up exactly what he was building, and the storage lead listed out every device he needed. But we never did a complete "final state" description of the new data centre.

It all worked pretty well, although we needed a number of meetings during the design phase of our new data centre -- laying out the racks, non-rack equipment, power, cabling for the networks. I think we needed to have a lot of meetings because there isn't a commonly accepted way to draw a plan of a data centre that covers the requirements of all the people in the room.

I'm running into the issue again in a smaller way now that we're designing the new central communication room for the equipment that used to be in the old data centre, but needs to remain behind for local operations (mostly the network gear to service a large office building).

Just as a refresher, here are all the people you need to involve:

The server team(s) know the physical dimensions of the servers, their weight, how many network ports they have and how they need to be configured, whether they need SAN-attached storage, backup requirements, how much power and cooling the server needs
The network team(s) know the network devices, which have most of the same requirements as servers, the approach for connecting, which defines the need for cables and patch panels, and the cabling, which may affect weight of cable trays or floor loading
The storage team(s) know the switching devices, which have most of the same requirements as the network devices
The electrical engineer or consultant needs to know all the power requirements and placement of all the equipment
The mechanical engineer or consultant needs to know the cooling requirements and placement of all the equipment
The structural engineer or consultant needs to know the weight and placement of all the equipment
The trades who actually build it all need to know exactly what they're building
There's likely some other poor person, maybe a building architect, who has to pull this all together

Add to all that the fact that the technology in a data centre is constantly changing, at least in terms of the number and type of servers in the room. Also, the requirements and constraints tend to be circular: For example, the number of network ports on a server affects the amount of network gear you need, which affects how many servers you can have (either through port capacity or rack space), which affects how much power and cooling you need but also how many network ports you need.

You also have to worry about other details than can seriously derail an otherwise great plan. For example, when running fibre, you need to make sure it's the right kind of fibre and that it has the right connectors. Power cables in a data centre can be varied, so again you need to make sure that the power distribution units (PDUs) in the racks can be connected to your servers.

With all this, it can be hard for people to come to an agreement on what to build. We don't have well-established ways of describing what's going to be built in a way that everyone understands. There's software to help do this, but it tends to be unreasonably expensive for a medium-sized enterprise.

Regardless of how hard or expensive it is, there's a lot of value in figuring out what you're going to build, before you built it. We were successful using Excel and Word to describe what to build, and drawings of floor plans. We had to be extremely careful about versions and keeping the different documents in sync. In the end, happily it all worked out.

Friday, 11 November 2011

Running Over the WAN After Relocating a Data Centre

My current data centre relocation has us moving the data centre about 400 kms away from its current location. This has resulted in a total round-trip change in latency of 6 ms. We implemented WAN acceleration in certain locations to address the issue, and we've learned some lessons in the process. Lessons learned is what this post is about.

We have offices all over the province, so not everyone sees the 6 ms change in latency as a negative. Many users are now closer to the data centre than they were before, and we always had users who had worse than 6 ms latency to our data centre. That gave us a lot of confidence that everything would be fine after the relocation.

However, the old data centre location was the head office, so a large number of users are now experiencing latency where they never did before, including senior management. Most of the remote sites were much smaller than head office.

The one or two issues we've had up to recently were due to our phased approach to moving. In one case we had to move a shared database server without moving all the application servers that used it. After the move, we had to do a quick move of one application server, because we discovered it just couldn't live far from its database server.

That changed recently. Like many organizations, we have shared folders on Windows file shares. Windows file shares are generally considered a performance risk for data centre relocations when latency changes. In preparation, we implemented WAN acceleration technology.

We moved the main file share, and by about 10 AM we were experiencing lots of calls to the help desk about slow performance. After a hour or two of measuring and testing, we decided to turn off WAN acceleration to improve the performance. Indeed, the calls to help desk stopped after turning off the WAN acceleration.

Analysis showed that the Windows file share was using SMB signing. SMB signing not only prevents the WAN accelerator from doing its job, but the number of log messages being written by the WAN accelerator may have actually been degrading performance to worse than an un-accelerated state.

So we turned off SMB signing, and tried again a few days later. No luck. Around 9:30 AM we started to get lots of calls, and again we turned off the WAN acceleration. We're lucky that performance is acceptable even without WAN acceleration (for the time being -- we'll need it soon).

We're still working this issue, so I don't know what the final solution is. I'll update this post when I know.

A non-technical lesson learned: If I were to implement WAN acceleration again, I'd get all the silos in a room in the planning stages, before I even bought anything. I'd make the network people, Windows administrators, and storage administrators understand each others' issues. I would have the WAN accelerator vendor and the storage device vendor at the table as well. And I'd make everyone research the topic using Google so we could find out what issues other people ran into.

Oh, and one final lesson learned: Bandwidth hasn't been an issue at all. In this day and age, 1 Gbps WAN connections are within the reach of a medium-sized organization's budget. We're finding 1 Gbps is more than enough bandwidth, even with the large data replication demands of the our project. And those demands will go away once the data centre is fully relocated.

Living with Virtualization

In 2006, I was project manager on a VMware implementation for a health care organization. We virtualized 200 servers in six weeks, after a planning phase of about 2 months. Out of that experience I wondered, "Did virtualization have anything to offer a smaller business?" So I set up a box at home and converted my home "data centre" into a virtualized data centre using VMware's Server product, which was the free product at the time.

After five years it's been an interesting experience and I've learned a lot. At the end of the day, I'm pretty convinced that the small business that has a few servers running in a closet in their office doesn't have a lot to gain from virtualizing within the "closet". (I'm still a big fan of virtualization in a medium or large organization.) I'm going to switch back to running a single server with all the basic services I run (backup, file share, DNS, DHCP, NTP) on a single server image.

I had one experience where the VM approach benefited me: As newer desktops and laptops came into the house, the version of the backup client installed on them by default was newer than the backup master on my backup server (I use Bacula). Rather than play around with installing and updating different versions of the backup client or master, I simply upgraded the backup master VM to a new version of Ubuntu and got the newer version of Bacula. I didn't have to worry about what other parts of my infrastructure I was going to affect by doing the upgrade.

The down side was that I spent a lot of time fooling around with VMware to make it work. Most kernel upgrades require a recompile of the VMware tools on each VM, which was a pain. I spent a fair bit of time working through an issue about timekeeping on the guests versus the VMware host that periodically caused my VMs to slow to a crawl.

Connecting to the web management interface and console plug-in always seemed to be a bit of a black art, and it got worse over time. At the moment, I still don't think modern versions of FireFox can connect to a running VM's console, so I have to keep an old version around when I need to do something with a VM's console (before ssh comes up).

My set-up wasn't very robust in the face of power failures. When the power went off, the VMs would leave their lock files behind. Then, when the power came back, the physical machine would restart but the VMs wouldn't. I would have to go in by hand and clean up the lock files. And often I wouldn't even know there'd been a power failure, so I'd waste a bit of time trying to figure out what was wrong. I should have had a UPS, but that wouldn't solve all the instances where something would crash leaving a lock file behind.

All in all, and even if I had automated some of that, the extra level of complexity didn't buy me anything. In fact, it cost me a lot of time.

Some of these problems would have been solved by using the ESX family of VMware products, but the license fees guarantee that the economics don't work for a small business.

I originally started out planning to give Xen a try, but it turned out not to work with the current (at the time) version of Ubuntu. Today I would try KVM. I played around with it a bit last year and it looked fine for a server VM platform. I needed better USB support, so I switched to VirtualBox. VirtualBox worked fine for me to run the Windows XP VM I used to need to run my accounting program, but it has the free version/enterprise version split that makes me uncomfortable for business use.

So my next home IT project will be to move everything back to a simpler, non-virtualized platform. I'll still keep virtualization around for my sandbox. It's been great to be able to spin up a VM to run, say, an instance of Drupal to test upgrades before rolling out to my web site, for example, or to try out Wordpress, or anything else I need to try.

My blog posts about interesting steps along the virtualization road are here.

Tuesday, 30 August 2011

Linuxcon 2011 Part II

I went to a lot of cloud computing-related talks at Linuxcon 2011. One of the better ones was by Mark Hinkle of cloud.com.

One of his slides showed what he considers the five characteristics of cloud computing. Two important ones for him are self service, and a measured service. I think those are two useful criteria for distinguishing between a VMware cluster and a cloud that is distinct from a VMware cluster.

It was clear listening to all the talks, including Mark's, is the role of open source in the large clouds. Basically, anyone big is building their service on the open source cloud stacks. Of course, there are a number of open source cloud stacks. One of the challenges is to pick which one to use.

Fortunately, there are serious supporters behind the three main stacks. Eucalyptus has a company called Eucalyptus Systems backing it now, headed up by Marten Mikos of MySQL fame. Cloudstack has cloud.com which is part of Citrix. And the OpenStack project is backed by Rackspace and NASA.

One factor that seems to be important is the hypervisors supported by the cloud stack. OpenStack supports the most right now.

Something that struck me listening to the talks is that the cloud, like so much in IT, isn't a slam dunk solution by itself. You need to know what problem you want to solve, and then figure out how to use the cloud to solve it, if indeed the cloud is a solution to your problem.

Related to that insight, it's clear that unless you solve the problem of monitoring your infrastructure with Zenoss or Nagios, and of provisioning it with Puppet or the like, then you're not going to see much benefit from the cloud.

Saturday, 20 August 2011

Linux 2011 Part I

Linux is 20 years old this year, and Linuxcon was in Vancouver, so I had to sign up. The conference ended yesterday. There were a lot of good speakers. As a bonus, we also got to hear some poor guy from HP give a keynote about HP's great WebOS play, at almost exactly the same time as his company was killing the product line.

What I was looking for, frankly, was a business opportunity for a small consultant/system integrator like Jade Systems to use Linux to help businesses with 1,000 servers, give or take a zero at the end. The most obvious opportunity I came away with is storage.

I've written before about the cost of enterprise storage. There are tremendous opportunities with hardware solutions like Backblaze's storage bricks, and the software that will make it all work is Gluster. Install Gluster on a couple of boxes with storage, and you have synchronous replication (locally) or asynchronous replication (over the WAN). It provides what you need to store your virtual machines and move them around your data centre as load or availability needs dictate. It can be your large, reliable, network attached storage device for all your spreadsheets and documents.

Gluster grew out of the needs of a supercomputing project at Lawrence Livermore Labs in 2004 and have an impressive list of users today. They're working to integrate with the OpenStack cloud computing stack to provide a complete cloud storage solution for OpenStack.

This is certainly a solution that could support a business case.

Tuesday, 5 October 2010

CFOs: Use the Cloud Now

It occurred to me that there's an easy way for CFOs and CEOs to use the cloud right now, without waiting for the IT department to touch a single piece of equipment. Here's how:

Ask your IT department how many servers and how much data you have. (Ask how much data is actually being used, not how much capacity you have.) Then, go to Amazon's site for cloud services and calculate how much it would cost to host that on Amazon. Finally, call in the CIO and ask her why your IT infrastructure budget is a lot higher than what it would cost to host on Amazon. It will be. You're asking for the whole infrastructure budget, not just the cost of the equipment.

For example, suppose you have 460 Windows servers and 200 TBs of data. Amazon has different prices for different size servers, but start by assuming all your servers are what Amazon calls "large". Your annual cost for that (October, 2010) is $2.5M. That includes 400 Mbps of network traffic into and out of the data centre 24 hours per day.

Ask your CIO out what services you're getting that justify the premium you pay for having an in-house IT infrastructure department.

In reality, you're CIO's no dummy. She'll be able to give you a pretty good story about why the IT infrastructure budget is so much. That's when you can use an independent IT consultant who's not owned by a company selling the infrastructure that drives up your costs. The real value comes when you start to use the benchmark cost of Amazon to identify and drive improvements in the value provided by your infrastructure department.

For example, when your CIO is talking about the services she provides, ask her when she's going to offer servers that can be spun up by a user, through a web site, with no intervention at all from the IT infrastructure group, like on Amazon? Or when the business will be able to downsize how much it's paying if it discovers that it doesn't need a large server, like on Amazon? Or when you'll start paying only for the data storage you're using, and not for a bunch of empty disk that you had to buy for "future growth", like on Amazon?

And that's how to use the cloud without changing one piece of technology.

Friday, 10 September 2010

The Cost of Storage: Reality Check

A friend pointed me at this awesome blog post from Backblaze, who sell cloud storage: Petabytes on a budget: How to build cheap cloud storage | Backblaze Blog. They build their own storage boxes based on a commodity motherboard running Linux, and standard open source software.

Backblaze gets a cost per gigabyte of under $0.12. Yes, 12 cents per GB. And that's per GB of RAID 6 storage. It's easy to find storage costing $12 or more per GB from the mainstream storage vendors -- two orders of magnitude more. The blog post also compares prices of storage. They show a price difference of up to 2,000 times!

I think there are a lot of areas of IT that a fundamentally broken. Storage is an area that is most obviously broken, and these price differences should make that obvious.

What I find really interesting is Backblaze's approach. They published their hardware design in the blog post. They've open-sourced their hardware. The supplier of their cabinet is already offering the cabinet as a product because they've had so much demand. People are buying and building these boxes, and I'm sure it won't be long before lots of open source software becomes available that provides storage solutions based on this hardware.

This gives hope. In ten years, perhaps, open source will do to storage what it's doing to CPU cycles and the operating system business -- get rid of the artificial cost imposed by proprietary vendors who hoard technology.

Friday, 7 May 2010

Privacy and the Cloud

A friend pointed me at articles from the Privacy Commissioners of Canada and Ontario about cloud computing. They raise some interesting points. By and large they're good articles and raise points that you should consider.

I want to put a bit of context around them. I don't think the cloud should be dismissed because of privacy concerns, but I wouldn't blindly jump onto the cloud, either.

The article from the Privacy Commissioner of Canada had quite a few comments that weren't directly related to privacy, and I think some of them need to be looked at.

First, the Privacy Commissioner for Canada states that cloud computing can mean an on-going cost instead of one-time fee. But there is no such thing as a one-time fee in computing. Your computing gear lasts three to five years. You need to replace it, and you need to service it while you own it. It's much better in computing to convert your costs to a monthly cost, either by using the lease price, or by using the depreciation that your accountant would use.

Consumer lack of control refers to the challenge of moving from one cloud provider to another. For example, you want to take your blog from Blogger to Wordpress. It's an absolutely important point to consider with cloud computing. It's also an absolutely important point to consider when you use proprietary software (e.g. Microsoft) on your own equipment. There is a roughly equivalent amount of technical effort to switch to a different platform in either scenario.

In fact, technically you always have a way to get your data from a web site. The terms of service of the web site may prevent it, but technically you can do it. That's not always the case with a proprietary, in-house solution.

Compromising meaningful consent refers to the fact that the cloud tends towards a single provider of most services: Facebook, Google (for search), Twitter are all dominant in their sphere. However, twenty-five years of Microsoft wasn't exactly a world of diversity, either. Again, it's the monoculture that's undesirable, not the means by which we arrive at a monoculture.

Most of the Ontario Privacy Commissioner's paper is actually about identity. I am not by any means an expert on identity. I learned some interesting things from the Ontario Privacy Commissioner's paper.

One point I'd like to draw your attention to: Identity is impossible without the cloud, or at least the Internet. Most of the effective, practical identity mechanisms rely on an trusted third party. I believe the experts can demonstrate that this is required. You need the Internet to get to the trusted third party, and that third party is effectively a cloud service.

(What I mean by "practical" in the previous sentence is to rule out the public/private key approaches that work, but are too much of a pain for even most geeks to use.)

Finally, I want to step away from the privacy commissioners and talk about one aspect of the cloud debate: Many IT people are reluctant to embrace the cloud. Here is an example of IT backlash against the cloud. It's important to remember that IT jobs will disappear as users migrate to the cloud. If you work in a 4,000 person organization you probably have a couple of people working full-time to support Exchange (the back end of your e-mail system). If your organization used gmail, they wouldn't be needed.

What's that got to do with privacy? Well, it affects the cases that the IT experts bring forward. For example, you'll hear about the Chinese infiltration of gmail (attack on a cloud service), but you won't be reminded about the Chinese attacks on Tibetan nationalist and supporters, which was primarily about compromise people's personal computer.

I know that Google has way smarter people than me working on security, and they do it full time. I think I have a reasonably secure network, but I don't even have time to monitor it to see if I'm being compromised. Security and privacy will be a differentiating factor in the evolution of cloud providers. The market advantage will go to those who provide the level of privacy their customers desire.

In the proprietary, self-hosted world, security and privacy are usually the last thing that gets any resources, because the competitive pressures are always something else.

Sunday, 4 April 2010

Looking for IP Addresses in Files

I've moved a couple of data centres. And I've virtualized a lot of servers. In all cases, the subnets in which the servers were installed changed. If anything depends on hard-coded IP addresses, it's going to break when the server moves.

The next data centre I move, I'm going to search all the servers for files that contain hard-coded IP addresses. The simplest thing to do for Linux and Unix is this:

egrep -R "\b([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}\b" root_of_code

The regular expression matches one to three digits followed by a "." exactly three times, then matches one to three digits, with word boundaries at either end.

That's not the most exact match of an IP address, because valid IP addresses won't have anything higher than 255 in each component. This is more correct:

egrep -R "\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b" /!(tmp|proc|dev|lib|sys) >/tmp/ips.out

It yields about two percent fewer lines when scanning a Linux server (no GUI installed). (Thanks to this awesome site for the regular expression.)

When I run the above egrep command from "/", I have problems. There are a few directories I had to exclude: /tmp, /proc, /dev, /lib and /sys. I used this file pattern match to get all the files in root except those directories:

/!(tmp|proc|dev|lib|sys)

The reason I wanted to exclude /tmp is that I wanted to put the output somewhere. /tmp is a good place, and by excluding it I didn't end up writing to the file while reading it. /sys on a Linux server has recursive directories in it. /proc and /dev have special files in them that cause egrep to simply wait forever. /lib also caused egrep to stop, but I'm not sure why (apparently certain combinations of regular expressions and files cause egrep to take a very long time -- perhaps that's what happened in /lib.)

I'll write about how to do this for Windows in another post. I'll also write about how to do it across a large number of server.

Friday, 2 April 2010

The Cost of Storage

Over the years I've seen SAN storage cost between C$10 and C$20 per GB (C$ is approximately equal to US$ right now). This is the cost of the frame, a mix of disks, redundant director-class fibre channel switches with a number of 48 port cards in each switch, management computer, and a variety of management and replication software. The cost doesn't include the HBAs in the servers, or cabling.

The price above is for a very raw GB, before you apply the loss for whatever classes of RAID you apply.

The management and replication software in all the above cases was the basic stuff you need to manage a SAN and replicate it. There was no fancy de-duplication or information lifecycle management going on.

The costs above also didn't include the cost of training dedicated storage staff to set up and manage a SAN, or the higher salary you'll have to pay to keep them after you train them.

Compare that to direct attached storage: Right now I can get a 1TB drive for less than C$300, or about 30 cents per GB. If I put that in a RAID 1 configuration with a RAID controller (less than $400 easily) you would still be paying less than $1 per GB.

I get RAID 1 storage for an order of magnitude cheaper than raw storage on a SAN. No need for special management software. You've got rsync for replication. You can use standard tools that everyone knows how to use.

No wonder Google uses direct-attached storage in commodity servers for their index. It's just way more cost-effective. What's your business case for SAN-attached storage?

Monday, 15 June 2009

Open Web Vancouver 2009

I went to Open Web Vancouver 2009 last week. It's a two day, low-key conference about open technologies for developers, testers and others at that level of the business. It's a very well-run, well-attended and interesting conference, and very inexpensive.

The most interesting thing I heard about was PhoneGap. It's JavaScript that runs on all the major smart phones, so you have fewer cross-platform issues. And it gives web applications access to some of the functionality in the phone not normally accessible to a web application. On the iPhone, this means the current location and vibration.

There was a good workshop with City of Vancouver staff about their recent direction to open up the city's data, as well as moving to open standards and open source software. The first priority is the data. They're hoping that people will take the City's data and mash it up in useful ways. There's a Google Group about this at http://groups.google.com/group/vancouver-data.

18 months ago there was a lot of stuff about Ruby on Rails at this conference. This time the Drupal community was big. There was a presentation from Momentum magazine about how a volunteer built their website in Drupal. I thought they'd found money to have a professional develop the site, it's so good.

And Mozilla Messenging (i.e. Mozilla Thunderbird) is based in Vancouver. Who knew?

Thursday, 12 February 2009

"Architecture" Gets a Time-Out

Anne Thomas Manes published a great post about how the economic downturn means the end of big IT projects being done for some vaguely defined "value". There's a good podcast about it here.

I think the key messages are:

IT can provide spectacular gains in business value only by transforming the IT department (people, processes and technology), not by changing the interface engine between the same set of disparate applications
SOA is dead not because the technology is necessarily wrong, but because business aren't going to spend money on projects that don't provide immediate business value
IT provides value by understanding the business and providing what the business needs. That's not a simple process and is fraught with political minefields within and outside the IT department

There's a great bit starting at minute 28:00 where she says, "Go take a look at the SOA success stories and invariably there's a new CIO in place. And it only happens because the CEO recognizes that he needs to fix IT, and he knows his current CIO isn't fixing it."

Tuesday, 2 September 2008

Bloat

According to this, in a test of browsers against media intensive sites, Microsoft Internet Explorer 8 takes more memory than Windows XP does (did). Bill (or Steve or whoever): That's not what's meant by the phrase, "the browser is the operating system."

Friday, 18 July 2008

Why IT's So Hard

Why is providing reliable IT infrastructure so hard? Here's a good example.

There was a fire in downtown Vancouver this week that knocked out power to a good part of downtown for up to three days. Angela noted that the Internet was slow the day of the fire. I know there's a major network hub in the area of the fire at Harbour Centre, and I suspected that something had gone wrong there, despite all the precautions that would have been taken. Now I have proof.

The fire knocked out power to the network hub, and the generator kicked in as planned, but the Vancouver Fire and Rescue Services were sucking so much water to fight the fire that the generator had to shut down because it wasn't getting enough cooling water. Not only was that hard to predict, it would have been really hard to test -- I suppose the Fire Department would have loved an excuse to play with their hoses, but I'm not sure the City would have wanted them to run a test that tried to use up all the water in downtown Vancouver.

Friday, 18 April 2008

Securing DNS/bind/named

This is another late posting of some notes when I built some new infrastructure servers on VMs to replace my aging PowerPC Macs that ran my network.

The security info I got when my ISP told me I had a badly configured name server requires that you create a /var/named directory:

sudo mkdir /var/named
sudo chgrp bind /var/named
sudo chmod 770 /var/named
sudo chmod g+s /var/named
sudo mkdir /var/log/named
sudo chmod 770 /var/log/named

sudo chmod g+s /var/log/named

Wednesday, 9 April 2008

Installing Bacula

To install bacula with MySQL (after you do this):

sudo apt-get install mysql-server bacula-director-mysql

Then you have to set up exim4, the mail system. Choose:

mail sent by smarthost; no local mail

After you install the MySQL version of the bacula director, you can install the rest of bacula this way, and also install some recommended packages:

sudo apt-get install bacula
sudo apt-get install dds2tar scsitools sg3-utils

I had these notes from an earlier set-up of exim4:

Look into setting up /etc/aliases later to redirect mail to more useful places. Also, make sure the domain of the outgoing address is one known to the outside world (e.g. jadesystems.ca) or the SMTP server will probably reject the message.

Bacula: Backups

To install bacula on Ubuntu, you need to add the universe repositories to /etc/apt/sources.list. It's just a matter of uncommenting four lines:

deb http://ca.archive.ubuntu.com/ubuntu/ dapper universe 
deb-src http://ca.archive.ubuntu.com/ubuntu/ dapper universe 
... 
deb http://security.ubuntu.com/ubuntu dapper-security universe 
deb-src http://security.ubuntu.com/ubuntu dapper-security universe

Then:

sudo apt-get update

The standard install of bacula uses sqllite, which the bacula guy reports as having problems...

Tuesday, 8 April 2008

Copying VMs

I tried copying my tiny Ubuntu VM, and it ran, except eth0 wouldn't come up, and of course the host name was wrong.

To fix eth0, you have to update /etc/iftab with the new VMWare-generated MAC address for the Ethernet interface. I added a script to the base VM in /usr/local/sbin/changemac to make it easier:

sudo vi /usr/local/sbin/changemac

And add:

#!/bin/sh
mac=`ifconfig -a | grep "HWaddr" | cut -d " " -f 11`
echo "eth0 mac $mac arp 1" > /etc/iftab

Then do:

sudo chmod u+x /usr/local/sbin/changemac

Note that you're adding the script to the "template" VM, so you'll only have create the script once for each template you create, not each time you create a new VM.

Now you can copy the "template" VM. Make sure the "template" VM isn't running. Log in to the VMWare host, change to the directory where you have the VMs, and copy the VM:

cd /usr/local/vmware/Virtual\ Machines
sudo cp -R --preserve=permissions,owner old_VM_directory new_VM_directory

Now in the VMWare console:

Import the new VM and start it.
Log in at the console and run /usr/local/sbin/changemac.
Change /etc/hostname, /etc/dhcp3/dhclient.conf, and /etc/hosts to have the host name you want for the new machine.
Reboot.

I'm sure you should be able to do this without a reboot, but I don't know which startup scripts do what needs to be done. Also, I had some problem with sudo not working after changing /etc/hosts.

If you forget to change the host name in /etc/dhcp3/dhcient.conf the first time around:

Change it
Type sudo date and then enter your password. This is just to make sure that sudo isn't going to prompt you for passwords
Type sudo ifdown eth0 && sudo ifup eth0

The above process will work even if you're on a remote ssh session (e.g. Putty), because the network will go down and up before your terminal times out.

Monday, 7 April 2008

Firewall on the VM Quick Reference

Here's how to set up the firewall. Here's my /etc/iptables.rules:

*filter
:INPUT ACCEPT [273:55355]
:FORWARD ACCEPT [0:0]
:LOGNDROP - [0:0]
:OUTPUT ACCEPT [92376:20668252]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
# Accept SSH so we can manage the VM
-A INPUT -i eth0 -p tcp -m tcp --dport 22 -j ACCEPT
-A INPUT -i lo -j ACCEPT
# Allow ping (Zenoss uses it to see if you're up).
-A INPUT -p icmp --icmp-type echo-request -j ACCEPT
# Allow SNMP.
-A INPUT -p udp -s 0/0 --sport 1024:65535 --dport 161:162 -j ACCEPT
# Silently block NetBIOS because we don't want to hear about Windows
-A INPUT -p udp --dport 137:139 -j DROP
-A INPUT -j LOGNDROP
# Drop and log the rest.
-A LOGNDROP -p tcp -m limit --limit 5/min -j LOG --log-prefix "Denied TCP: " --log-level 7
-A LOGNDROP -p udp -m limit --limit 5/min -j LOG --log-prefix "Denied UDP: " --log-level 7
-A LOGNDROP -p icmp -m limit --limit 5/min -j LOG --log-prefix "Denied ICMP: " --log-level 7
-A LOGNDROP -j DROP
COMMIT

More on this later.