Wednesday, January 25, 2012

Security Opportunity in the BYOD World

I recently came across a post by Floyd Strimling at Zenoss titled 5 Cloud Predictions for 2012 – Private Clouds, PaaS, & Hypervisor Uptick.  Prediction number 3 in particular got my attention:
"You’ve been hearing a lot about BYOD (bring your own device) and employees using free cloud providers for services in the name of productivity outside of the purview of internal IT. In 2012, we will see a large Enterprise experience a breach because of these so-called productivity enhancing cloud applications. An innocent act by a group of users will set this movement back."
My employer is an SMB that takes advantage of both BYOD (bring your own device) and free cloud based services.  The drivers in our case are agility and cost savings, though the order may vary based on who you speak with.  The pace of business and resource constraints of the average SMB often preclude the following of proper risk analysis practices when it comes to security.  Thus, having a real understanding of the ALE and ROI for implementing stronger security measure is lost. 

The same is likely true in the enterprise where departments/teams often exhibit SMB behaviors in order to meet their goals.  This practice brings us to the real opportunity raised by the prediction, defining and developing frameworks, software and/or services that can be quickly and cheaply plugged into the enterprise to more allow for a more secure adoption of mobile/cloud productivity services.  Making the protection and integration seamless for the user will be key to adoption.  A startup that can crack this nut will be in a great position for the next decade.

Thursday, June 18, 2009

F5 BigIP Email Alerts

If you do not have a monitoring package that ties into your F5 LTM SNMP then email notifications provide a decent alternative. You can configure the BigIP 9 software to notify an email address or alias regarding the alerts that concern you.

The first step is to configure Postfix on your LTM devices. F5 provides a step by step doc in SOL3664 on the AskF5 site. Essentially configuring Postfix boils down to 3 things.

1. edit the /etc/postfix/main.cf - The F5 solution provides one configuration. I recommend reviewing the options in the config and modifying the main.cf to match the specifics of your mail infrastructure. Postfix is a popular mail server so a lot of useful info Googleverse.

2. start up Postfix
  • # bigstart start postfix
or
  • From the Main tab of the BIG-IP Configuration utility, click System
  • Click Services
  • Select the box next to postfix
  • Click the Start button
3. test your setup
  • # echo test | mail
  • View the mail queue to ensure the message was sent by typing the following command: mailq
  • To send any unsent mail, type the following command: postfix flush
An optional, but recommended, step is to create an email alias for the team members whom you wish to received the notifications.
  • Edit the /etc/postfix/aliases file.
  • Add a line to the end of the file with your alias info.
    • example: pool-alarms: someone@support.com, otherguy@support.com, metoo@support.com
  • Run the command newaliases to update your running config
Once mail is functioning for the LTM device, the alerts can be setup. F5 offers two ways to do this, modifying the standard alerts (SOL3667) and creating custom alerts (SOL3727). I will focus on the first option here. One note on the custom alert option, if your customer alert is a subset of an alert covered in the standard alert it will not generate an alert. For example, if you create an alert for a specific pool member down, the general pool member down alert will capture the event first and the custom alert will not fire. If there is a way around this, I have not found it yet.

To configure the standard alerts for email notifications do the following:
1. back up the /etc/alertd/alert.conf file by typing the following command:
cp /etc/alertd/alert.conf /etc/alertd/alert.conf.bak
2. edit the /etc/alertd/alert.conf file
From SOL3667
This file consists of numerous alert definitions in the following format:

alert ALERT_NAME {
snmptrap OID=""
}

The alert definitions may appear similar to the following example:

alert BIGIP_BIGPIPE_BP_CONFIGURATION_LOADED {
snmptrap OID=".1.3.6.1.4.1.3375.2.4.0.28"
}

Modify the alert definition for each alert that you want to receive an email as follows:

email toaddress=""
fromaddress=""
body=

Important: Alert entries must be separated with a semi-colon ( ; ) character. You must add a semi-colon to the end of the line for the previous alert entry.

In the following example, the previous alert entry is an snmptrap entry. For example, the following modified alert sends an email using the email toaddress, fromaddress, and body options:

alert BIGIP_BIGPIPE_BP_CONFIGURATION_LOADED {
snmptrap OID=".1.3.6.1.4.1.3375.2.4.0.28";
email toaddress="demo@askf5.com"
fromaddress="root@bigip1.askf5.com"
body="The test of this Solution worked!"
}

Note: You can send the email to multiple recipients by separating the email addresses with a comma ( , ) character, as shown in the following example:

email toaddress="demo@askf5.com,demo2@askf5.com"
3. Save and exit the file
4. Restart the alertd by issuing the command:
#
bigstart restart alertd
5. test your settings. The simplest way to do is to generate a real test. For the example above, running bigpipe load should generate a message.

Another option is using the logger command. This command allows you to generate syslog-ng messages directly. This may be a preferable method in production environments for testing things like pool member offline alerts. To use logger do the following:
  • find the syslog message string in the trap you configured for email alerts. For example, alert BIGIP_MCPD_MCPDERR_NODE_ADDRESS_MON_DOWN "Node (.*?) monitor status down." The quoted section is the syslog message.
  • replace the (.*?) regular expression with valid information from you config. For the example above choose an IP address of one of the monitored nodes.
    • logger -p local0.warning "Node 10.10.10.10 monitor status down."
  • The command will output a syslog message t0 the local0.warning facility and an SNMP trap will be generated. This event should trigger the alertd email.

Tuesday, June 9, 2009

Blade Server Networking Basics pt. 2

Part 1 of this entry discussed some architectural basics for setting up a blade environment. Part 2 will discuss some of the configuration details required to make the designs in Figure 1 and Figure 2 work.
As with any network build you will first need to figure out how you plan to segment you environment. This will help determine your VLAN and subnet needs. For this discussion we will keep things fairly simple: 3 VLANs.
  1. VLAN20 - server access on 10.0.20.0/24 (NAT at upstream firewall)
  2. VLAN172 - private network for backups on 172.16.0.0/24
  3. VLAN100 - private network for device management on 10.0.100.0/24
To get things rolling you will need to IP and configure the management modules of your chassis. Depending on the vendor, this may require an IP per each module (primary/secondary config) or just a single IP (active/standby config) that will move if the active module fails. Refer the to specific configuration instructions for the vendor's chassis. Once the management access is setup, you can setup the blade switch modules for IP access. Once again, how this is achieved will depend on the device but you will want to use an IP from the same range (10.0.100.0/24 in this case) for the internal connection. Setting things up this way provides network access (ssh or telnet) to configure the switch through the chassis' management port in the event connectivity is lost to the bladeswitch's external ports. Continuing with our hardware from Part 1 we can now begin to configure the switch via the Cisco CLI.

First thing that should be done is to configure an interface for the switch on another VLAN. This setup will provide another point of access to the switch when needed.
interface Vlan20
ip address 10.20.2.20 255.255.255.0
no ip route-cache

The IBM BladeCenter H in this example has 14 slot for blades with servers in the first 4 slots. To start I recommend shutting down all the ports for the unused blade slots. This will prevent you from potentially hosing your network when plugging in new servers.
ciscoswitch# conf term
ciscoswitch(config)# int range gi0/4-14
ciscoswitch(config-if)# shut
Next configure the port channels that will connect to the external 3750 switches. Lets assume we are working with the model illustrated in Figure 1. The main difference for setting up is that 2 x 2 port portchannels would be created rather than a 1 x 4 port portchannel as in this example. The external ports for the Cisco blade switch are numbered 17-20. (Notice ports 15 and 16 are skipped? These ports are for the internal connectivity between the management modules and the blade switches.) The following show a sample config for a port-channel using ports 17-20.
interface Port-channel6
switchport trunk allowed vlan 20,30,172
switchport mode trunk
switchport nonegotiate
link state group 1 upstream
!
...
!
interface GigabitEthernet0/17
description To-3750-Top-25
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 20,172
switchport mode trunk
switchport nonegotiate
channel-group 6 mode active
spanning-tree link-type point-to-point (Recommended for rapid-PVST+ mode only)
!
interface GigabitEthernet0/18
description To-3750-Top-26
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 20,172
switchport mode trunk
switchport nonegotiate
channel-group 6 mode active
spanning-tree link-type point-to-point
!
interface GigabitEthernet0/19
description To-3750-Top-27
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 20,172
switchport mode trunk
switchport nonegotiate
channel-group 6 mode active
spanning-tree link-type point-to-point
!
interface GigabitEthernet0/20
description To-3750-Top-28
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 20,172
switchport mode trunk
switchport nonegotiate
channel-group 6 mode active
spanning-tree link-type point-to-point
...

A matching config will need to be created on the 3750 (top location, ports 25-28) to complete the port-channel. The 3750 config should not include the link state group 1 upstream statement in the port-channel config though. The reason will be explained shortly.

Now we are ready to configure the ports connecting the servers. For this example we are using a standar IBM HS21 blade that has two NICs, one to each blade switch. I want high availability to these servers but also need to connect to two different subnets. To accomplish this, configure the ports are trunks that have access to both VLANs. On the servers using the NIC vendor's teaming software (Windows) or NIC Bonding and 802.1q Tagging (Linux) to create virtual interfaces built from NIC pairs that pass VLAN tagged traffic.

The following is an example of the blade switch config for ports 1-4.
interface GigabitEthernet0/1
description blade1
switchport trunk allowed vlan 20,172
switchport mode trunk
link state group 1 downstream
spanning-tree portfast trunk
spanning-tree bpdufilter enable
!
interface GigabitEthernet0/2
description blade2
switchport trunk allowed vlan 20,172
switchport mode trunk
link state group 1 downstream
spanning-tree portfast trunk
spanning-tree bpdufilter enable
!
interface GigabitEthernet0/3
description blade3
switchport trunk allowed vlan 20,172
switchport mode trunk
link state group 1 downstream
spanning-tree portfast trunk
spanning-tree bpdufilter enable
!
interface GigabitEthernet0/4
description blade4
switchport trunk allowed vlan 20,172
switchport mode trunk
link state group 1 downstream
spanning-tree bpdufilter enable
!
The most important part of this config is the statement link state group 1 downstream. This statement is paired with the upstream statement in the port-channel config. What this does is monitor the status of the port-channel. If the entire port-channel goes down (i.e. all 4 links are lost, perhaps to a 3750 failure), all of the ports in group 1 down stream will go into shutdown mode. Why would I want to do this? At any one time, the teamed NICs of the servers may be passing traffic across either switch. Which switches and how much traffic depends on if teaming mode is load balancing or failover. Teaming/bonding monitors the NIC health at layer 2. Without the link state changes, the teaming software will not see any change in the upstream connecitivity and continue to pass traffic through this port. This will result in intermittent connectivity (load balancing method) or possibly complete loss of connectivity (failover method) to the server. With the link state groups set up correctly, the server may only experience a few dropped pings at worst.

Friday, June 5, 2009

Blade Server Networking Basics pt. 1

Blade technology has permeated the datacenter ecosystem. I consider these chassis to be the IT generalist's "dream machines" since they combine a little bit of everything into one neat package. Actually, pairing a blade environment with virtualization should make any techhead salivate... but I digress.

Blades present some opportunity/need to apply a little more network savvy then the typical VLAN|port|plug and play server setup. The concepts and settings I will discuss below come from experience with IBM BladeCenter and HP BladeSystem configurations, but I'm sure they can be applied to offerings from Dell and the like.

The first difference right out of the gate is how the physical NICs of the blade are handled. Since there is no port on the server to plug into, connectivity must be handled through the chassis. The chassis provides hardwired connectivity from the blade slots in the front to the module bays in the back. These hardwired routes can be used to provide Ethernet and/or fiber channel connections to the outside world and between blades. For example, a blade in slot 1 will map out to blade port 1 maps to port 1 in module bay 1, blade port 2 maps to port 1 in module bay 2, etc. depending on how many bays the chassis provides. What is important to note is what type of devices are installed on which ports of your server blade. For instance, a common configuration of an IBM HS21 blade comes with 2 NIC ports and 2 HBA ports. These are installed as the first 4 ports of the blade with the 2 NICs first. When configuring the chassis you will want to make sure you install the appropriate modules in the appropriate bays to match up to this port layout. Also, make sure future blade purchases must match this configuration. *** Note: There are options available, such as HP VirtualConnect Switches, while provide for users defined mapping of ports to blades. These will not be discussed here.***

Two ways to connect the server to the outside world are via a pass-thru port or a switch module. Most vendors offer a pass-thru port option, which is basically a way to extend the hardwired connection from the blade slot to a standard RJ45 port. Personally, I have never used this type of connectivity and find the concept somewhat limiting (though there may be uses I have not thought of).

The switch module alternatives provides more flexibility and offer familiar configuration options from vendors like Cisco, Nortel and HP. There are many choices and capabilities out there but for the purposes of this post I'll stick with the Cisco 4-port GB Ethernet Switch Module. Switch modules of this ilk provide 4 external 10/100/1000Mb copper ports and 1 internal port for each blade slot. As mentioned above, the chassis bay the switch module is installed the into must match up with the NIC ports on the blade. If you do not see link on the blade ports, you've likely install the modules into the wrong bays. The management interface of the chassis should also report an error.

Continuing on with the physical setup, Figure 1 and 2 below show some options to connect the external ports to a redundant switch infrastructure. In these methods thru-put is maximized by aggregating the 4 ports per switch via etherchannel. The resulting portchannel is then configured for 802.1q trunking to pass the VLAN traffic configured for the blades (more on that later).

The configuration illustrated in Figure 1 shows an IBM BladeCenter H connected to a stacked pair of Cisco 3750G switches. A stacking cable is used in this instance but a portchannel between the switches can work as well. This setup uses a 1 to 1 HA pairing between the BC switches and the 3750s. If any one of the devices should fail all traffic will flow across the other path. The upside of this config is that its easy to trace and makes troubleshooting simpler. The downside is there may be a brief interruption to some servers as the path changes (this depends on how teaming is configured for the server NICs).

Figure 1

Figure 2 is similar to the previous example except that it splits switch module connections between the two 3750s. Note, this setup will only work with a stacking cable since it makes the 2 switches behave as a single unit. A single port channels cannot be configured to span across two separate switches. The advantage of this configuration is that the loss of 1 3750 will not impact on the downstream servers. Both blade switches will continue to function over their two remaining ports. The downside is a slightly more complicated implementation. It is important to document and label carefully in order to keep the port mappings straight.

Figure 2


This concludes part 1 of this discussion. Part 2 will cover how to configure the blade switches and servers to take advantage of the configurations illustrated above.

Tuesday, June 2, 2009

Linux 802.1q Tagging

In a previous post I discussed the process of setting up Linux NIC Bonding. This is a great way to provide high availability to your server by creating a single virtual interface across two physical switches. But what do you do when you only have 2 physical NICs, want HA and need to connect to 2 (or more) LAN segments? VLAN Tagging

VLAN (802.1q) Tagging adds an Ethernet header extention to each packet containing the VLAN ID and priority. You must be connecting to a managed switch (hubs and unmanaged switches will no work) with the ports setup for "trunk" mode to make this work. Configuring switches is a topic for another post but briefly you can configure a port to pass tagged traffic for multiple VLANs (Google VLAN and Trunking for more information).

If NIC driver is VLAN capable tagged interfaces can be configured as follows:

In /etc/sysconfig/network-scripts:
# cp ifcfg-ethX ifcfg-ethX.Y \\where X equals the interface and Y equals the VLAN number
# vi ifcfg-ethX.Y
modify the IP information as appropriate for the new VLAN and add the following line to the end:
VLAN=yes

This same procedure can be used to create 802.1Q interfaces from bonded NICs as well.
example ifcfg-bond0.5 (interface bond0 on VLAN 5):
DEVICE=bond0.5
IPADDR=10.0.70.50
NETMASK=255.255.255.0
NETWORK=10.0.70.0
BROADCAST=10.0.70.255
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
VLAN=yes

I have found the combination of bonding and tagging to be especially useful in blade environments. Most blade servers come standard with 2 NICs that map to two separate switch modules (or pass thru) ports. I like to configure the servers for HA but also have a need to isolate my production traffic from my Netbackup traffic. This solution has served me well.

Monday, June 1, 2009

Linux Bonding (aka NIC teaming)

One question that often comes up when transferring from the Windows to the Linux world is "How do I setup NIC teaming?". The answer is "bonding". A great document on what is involved can be found at The Linux Foundation.
"The Linux bonding driver provides a method for aggregating multiple network interfaces into a single logical bonded interface. The behavior of the bonded interfaces depends upon the mode; generally speaking, modes provide either hot standby or load balancing services. Additionally, link integrity monitoring may be performed." - The Linux Foundation
For those seeking a shortcut, below are steps to configure bonding using initscripts on Redhat EL that I have successfully used in the past:

1. Add an alias to /etc/modprobe.conf
alias bond0 bonding
options bond0 mode=active-backup miimon=100

This example loads the bonding module with the option for active-backup (failover teaming). The miimon setting is the time in ms between health checks of the interfaces.

2. In /etc/sysconfig/network-scripts edit the ifcfg-ethX files for each of the NICs to be bonded.
example: ifcfg-eth0
DEVICE=eth0
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none

3. Next, create a script file for bond0 called ifcfg-bond0. In this example I am using a static IP config, but DHCP can be used as well (if you really wanted to).
example: ifcfg-bond0
DEVICE=bond0
IPADDR=xxx.xxx.xxx.xxx
NETWORK=xxx.xxx.xxx.xxx
BROADCAST=xxx.xxx.xxx.xxx
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
4. Finally run as root one the following commands to restart the network subsystem:
/etc/rc.d/init.d/network restart
or
service network restart
The new bonded interface, bond0, should now be alive and functioning. Before you call it a day, I recommend running through some failover tests to ensure this new HA setting of your server functions as planned. On a personal note, I always document the steps and results of my failover testing. This can be useful down the road if something goes wrong. There is nothing worse than finding out you forgot a setting the hard way!

As a follow up, I plan on posting a brief howto on setting up VLAN tagging. This is useful for those of us who have limited NICs but need to connect to multiple VLANs. It also works well with bonded interfaces.

Friday, May 29, 2009

ESX host disconnected from VirtualCenter

I recently ran into a problem when checking a customer's VMware cluster. One of the ESX servers showed disconnected in Virtual Center and would not reconnect. The VI3 setup is fairly simple for this customer, just a two node DRS cluster with 11 VMs in production.

Doing so research on Google I found a similar problem/resolution here:
Malaysia VMware Communities

This solution did not work for me but it did get me pointed in the right direction.
  • Ran the command: services vmware-vpxa status --> indicated the service was offline
  • Attempted to restart the service: services vmware-vpxa start --> This failed with the following output - "Another process is already running for this config file"
  • Checked the running processes to find if it was hung: ps -elf | grep vpxa --> found the process was running with a D in the stat column (disk wait). In this condition a process cannot be killed even with SIGKILL.
Unfortunately, this meant my fix would incur some downtime for the 5 VMs that were currently running on the disconnected server. The first thing I needed to do was shutdown and migrate the VMs to the other cluster member. I performed the following steps to achieve this:
  • Logged into each VM's guest OS on the effected ESX server and shut it down
  • Logged into Virtual Center via the VI Client to removed the ESX server and all of its VM guests from the inventory
  • Added the VM guests back into the cluster via the Datastore view. You can do this by right clicking on the Datastore where the VM resides and selecting "browse datastore". Navigate to the .vmx file of the desired VM, right click it and select "add to inventory".
  • Once the VM guests were back in the cluster I started them back up. This brought the customer's resources back online.
Now the ESX host issue could be addressed without impacting the production VMs further. Since the process could not be killed I wanted to reboot the ESX host to see if it would clear the error. I followed the following steps:
  • SSH to the problem ESX host and ran: vmware-cmd -l --> This listed all of the VM guests the ESX host thought it still owned.
  • Ran the command: vmware-com -s unregister /"path-to-vmx-found-in-previous-step" --> do this to clear each VM guest
  • Ran the command: vmware-cmd -l --> confirmed all VM associations were gone
At this point everything was prepared to safely reboot the ESX host.

In my case, the ESX host did not come back up cleanly after the reboot and indicated a corruption problem with initrd. To clear this I ran esxcfg-boot -b from the console and rebooted again. Once the ESX server was back online it could be added back to the cluster via the Inventory view in Virtual Center. The add is accomplished by right clicking on the cluster name and selecting "add host". With the host back in the cluster I used VMotion to redistribute some of the load back to it.

These steps helped resolve the VC disconnect issue but the root cause was not yet resolved. The "disk wait" status of the hung process indicated a hardware problem with the disk or controller or a driver bug. No problem was found with either hardware device. Since applying the latest updates to the ESX host the problem has not recurred.

KB Article 1003409 from the VMware site was also quite useful in diagnosing the disconnect problem.