I don’t often do much in the way of hardware, but recently I had some problems with a couple of our Proliant DL180 G5 servers and the controller for the RAID array.

We took a power outage to our building and the generator that was supposed to come on never did. the UPS was speced to handle the period of time between a power outage and the generator coming on, so after a few minutes all of the servers powered down. Not necessarily what i wanted to come into on Monday morning, but it did highlight a couple of needs that i had expressed over the past couple of years.

When i powered everything back up, I only had 2 servers that came up with any large problems. Both seemed to be issues with the Smart Array controller as the server would get to that point in the POST and would sit at SmartArray E200 initializing for a few minutes and then fail to boot. I contacted support and they had me do a few things:

1. Boot from the Easy Setup CD and go to Maintenance and run Array diagnostics – Array diagnostics couldn’t find that any array controllers were installed.

2. update the flash rom for the server – There is a download for the server that then creates a bootable USB stick that you can use to update a server that won’t boot into the OS.

3. Reseat the Cache module on the controller card – there’s a chip on the Smart Array controller card that on first glance looks like a chubby piece of memory.

4. upgrade Storage Firmware using Maintenance CD – there is another download on the HP site to take their maintenance CD and create a bootable USB Drive.  then you can update\replace some of the drivers on the CD.  once you boot off the USB stick, it can automatically detect any updates and apply them. it also failed to see that there was a Storage controller installed.

5. boot with Cache removed from SA E200 – they then had me remove the Cache module and boot the server with the slot on the controller empty.

6. Move Smart Array card to another slot, Clear CMOS, upgrade Firmware with Smart Update Manager – i considered this their hail mary before they replaced the controller. they had me move the controller card to another slot on the server, clear the CMOS (this is done by holding down a button on the motherboard labeled CMOS, when you power back up the system date and time will need to be reset) and then try to update using the USB key from step 4.

so after all this we were still at the same place as we started.  so they sent out a new Smart Array E200 controller card.

I replace the controller card and was still having the exact same problem.

I got back on the phone with support after quickly running through the above 6 things.  I was trying to save myself having to run back and forth to the server when support asked me to try them again. since i had already exhausted all the prompts on their screen (i guess) they considered this an odd single case issue (funny that i had two servers doing the same thing), and had to get special instructions which amounted to replace the Cache module.

the new cache module came the next day and once installed the server booted normally.

oddly, for the second server the next support person that i got insisted it was the motherboard since booting with the cache module removed did not make any difference and setup for a tech to be dispatched with a motherboard.

I spoke with the tech and he confirmed my suspicion that the motherboard was most likely not faulty and that it was more likely the cache module.  He explained that it used to be that removing the cache module would allow the server to boot, but he has recently found that if the server shipped with the cache module installed, the servers seem to expect it to be there and if it’s not (or it’s faulty) the controller can’t initialize.

So he came out the next day with a new cache module, installed it and the server worked fine.

I’ve been having a problem accessing my logon scripts through the Group Policy Management console. I would open the policy that the the script was associated with, drill down to the logon scripts and click show files. I’d get a window with all of the logon script associated with that Policy, but when i would right click the .vbs file, it would tell me I don’t have access. logons would process the script correctly, so it wasn’t that the policy was corrupt.

it turns out that it appears it is a result of Internet Explorer’s enhanced security in Server 2003. Knowledge base article 815141 (http://support.microsoft.com/kb/815141)covers the enhanced security, but what we’re particularly interested in is about a third of the way down where it talks about security zones:

Access to scripts, executable files, and other files on Universal Naming Convention (UNC) shared folders is restricted unless the shared folder is added to the Local intranet zone explicitly.

so I went into Internet Explorer, Tools -> Internet Settings, Security tab. clicked on Local Network, and then sites. I added \\mydomain.int to the list and now i can edit the logon scripts through GPMC.

One of the lovely students here was found to have our local admin password, something that inevitably seems to happen every other year. How he got it is a discussion for another post which will include what measures we put in place to try and keep it from happening again.

The more immediate concern is how do we change the local password on 1000+ computers without having to visit each one? I’ve written a vbscript that takes a comma-delimited file with the first column being the IP and the second being the Computer name and outputs two lists, one that were changed & one that were not. I got the input file from our DHCP Server by exporting the Scope for the building where the password was compromised. I then got rid of the other columns that I didn’t care about. I also broke the file down into smaller chunks just to make it more manageable and so that I could send information to the building tech on the computers that failed in smaller batches.

The script I wrote takes this file as input (c:\script\workstations.txt) and spits back out two files (c:\script\Done.txt & c:\script\NotDone.txt) Done contains the computers that the script was able to contact and changed the local admin password on and NotDone has the ones we were unable to contact and therefore were not changed.

I don’t want to claim this all as my own, so I did borrow the IsAlive function from another post I found. If I can find it again, I will link to that post.

Option Explicit

Dim fso, user, ts, temp, src, WshShell, PINGFlag, ComputerArr
Dim dstGood, dstBad, tsGood, tsBad
Set fSO = CreateObject("Scripting.FileSystemObject")
Set WshShell = CreateObject("WScript.Shell")
src = "c:\script\workstations.txt"
dstGood = "c:\script\Done.txt"
dstBad = "c:\script\NotDone.txt"

If Not fso.FileExists(src) Then
WScript.Echo "File: " & src & " cannot be found."
WScript.Quit
End If

Set ts = fso.OpenTextFile(src,1)
Set tsGood = fso.OpenTextFile(dstGood,2)
Set tsBad = fso.OpenTextFile(dstBad,2)
Do Until ts.AtEndOfStream
	temp = ts.ReadLine
	ComputerArr = split(temp, ",")
	if isalive(ComputerArr(0)) then
		wscript.echo "Ping Success: " & ComputerArr(1)
		Set user = GetObject("WinNT://" & ComputerArr(0) & "/Administrator,user")
		user.setpassword "YourNewPassword"
		user.setinfo
		tsGood.writeline ComputerArr(1)
	else
		wscript.echo "Ping Failed: " & ComputerArr(1)
		tsBad.writeline ComputerArr(1)
	End IF
Loop

Function IsAlive(strHost)
    Const OpenAsASCII = 0
     Const FailIfNotExist = 0
     Const ForReading =  1
     Dim objShell, objFSO, sTempFile, fFile
    Set objShell = CreateObject("WScript.Shell")
     Set objFSO = CreateObject("Scripting.FileSystemObject")
    sTempFile = objFSO.GetSpecialFolder(2).ShortPath & "\" & objFSO.GetTempName
    objShell.Run "%comspec% /c ping.exe -n 2 -w 500 " & strHost & ">" & sTempFile, 0 , True
    Set fFile = objFSO.OpenTextFile(sTempFile, ForReading, FailIfNotExist, OpenAsASCII)
    Select Case InStr(fFile.ReadAll, "TTL=")
         Case 0
            IsAlive = False
         Case Else
            IsAlive = True
    End Select
    fFile.Close
     objFSO.DeleteFile(sTempFile)
    Set objFSO = Nothing
    Set objShell = Nothing
End Function

We all have them, those applications where they perform a backup periodically and place the backup inside the programs folder structure. this is great if you’re trying to recover from a problem within the application, but what about if the hard drive goes? and most of the time, the program just keeps dumping backups and doesn’t clean up old ones.

I wrote this script to handle these two problems. At the beginning of the code you can modify strsrc and srdst to be the source(strsrc) and destination(strdst) that you need for your application. You can also set the age variable to be the number of days you want to retain backups in each directory for.

As always, I don’t claim that my code is pretty, just that it works. feel free to modify or comment on it.

import shutil, os, stat, time
from datetime import date, timedelta

strsrc = "c:\\program files\\Application\\backup"
strdst = "S:\\bkup"
age = 14

strDeleteFromDate = date.today() - timedelta(days=age)

for files in os.walk(strsrc):
	for item in files[2]:
		strFileLoc = strsrc + "\\" + item
		strdateused = os.stat(strFileLoc).st_mtime
		year, day, month = time.localtime(strdateused)[:3]
		strLastUsed = date(year, day,month)
		if strLastUsed < strDeleteFromDate:
			print "/* ", item, ",", strLastUsed, ", Delete */"
			os.remove(strFileLoc)
		else:
			print "/* ", item, ",", strLastUsed, ", Keep */"
			shutil.copy2(strFileLoc, strdst)

for contents in os.walk(strdst):
	for things in contents[2]:
		strFileLoc = strsrc + "\\" + things
		strdateused = os.stat(strFileLoc).st_mtime
		year, day, month = time.localtime(strdateused)[:3]
		strLastUsed = date(year, day,month)
		if strLastUsed < strDeleteFromDate:
			print "/* ", item, ",", strLastUsed, ", Delete bkup */"
			os.remove(strFileLoc)
		else:
			print "/* ", item, ",", strLastUsed, ", Keep bkup */"

for whatever reason, the time on our computers seems to always be a debate here. periodically i go in and adjust the time on our domain controller by forcing a time sync or in the past, adjusting the time manually.

The problem is that if you adjust the time and it makes it >5 minutes different from the OS X Clients, the users won’t be able to login. so today i went in and forced a time sync on our domain controller (turns out the NTP port hadn’t been open on our new firewall). I wanted to stem the calls of people not being able to login so i set out to force a time update on the OS X Clients. Looking on google turned up this nice command ntpdate. if you send the following to an OS X client “ntpdate -u <Server-Address>” it will force the client to update it’s time from that Server.

This allowed me to force a time update on ~400 active workstations using this command and Apple Remote Desktop. I broke the clients down into smaller groups so as to not overwhelm the server running as our primary time server, but i don’t know whether that was really necessary.

Another one of the videos i did for our tech people on how to netboot a computer from within OS X and choose your image.

This is a video that i did for our internal people on how to use System Image utility to create a NetInstall image using OS X.  it’s not a very complicated video, but i think it made it easier to do a video than a document walkthrough for them.

I’ve decided to post a couple of videos that i did for our internal tech staff on imaging for OS X. They should be going up over the next few days. not very complex or anything, but i figured since they helped them, maybe they’ll help someone else.

I added a new page to the site that lists applications I use. I’ve got about a dozen utilities and program up there and plan to add to it as i remember programs that have helped me.

I’d love to hear if anyone has a suggestion of something they’ve found that’s better than what i have listed, I’m always looking for things that can make my life easier or just work better.

it can be found under content on the right and is called Stuff I use … or you can get there from here.

One thing that has always bothered me with my network is the monitoring\notifications system that we have in place. It’s What’s up Gold version 8.0. I know that they’ve updated this product since then, but didn’t want to put out the money to upgrade when I had network equipment and servers that desperately needed to be replaced. And it’s not that WUG didn’t notify us when equipment went down, so it was filling that role.

I recently undertook setting up Nagios (www.nagios.org) to replace it. If you’ve never heard of Nagios, follow that link and take a look, it’s a nice free monitoring solution that is easily expanded\modified\customized.

So I will probably write up a couple posts on things as I configure Nagios and I get things customized to the way I want them.

My post for today is getting Nagios to send notifications via Growl to a computer. Growl pops up little notifications in the corner of your screen (like when a download finishes, or an FTP connection disconnects), and I thought this would be a good way to get notified on status changes or equipment. The basis for this was taken from http://www.mail-archive.com/nagios-users@lists.sourceforge.net/msg26900.html. But there were some details left out, so this post will cover everything from A-B.

1. You’ll need to install Growl onto the computer that you want to receive notifications on. The PC version can be downloaded here [http://www.growlforwindows.com/gfw/] and the Mac version can be downloaded here [http://growl.info/].

2. You need to configure growl to receive notifications over the network. This is done on the network tab under OS X, but under the Security tab on Windows. Check Allow Network notifications. On the Mac check to require password and enter a password, on the PC click the + under password manager and enter your password there.

3. Now we’re ready to get on the Nagios Server. You need to install Net::Growl from CPAN, so from a terminal enter cpan Net::Growl

4. Now you need to create a file that makes use of Net::Growl. The script from the above mentioned post is what I used:

#!/usr/bin/perl -w
#
# Created by Mathieu Gagné 2009
#

use strict;
use warnings;
use Net::Growl;
use Getopt::Long qw(:config no_ignore_case bundling);

# Default values
my $application = 'Nagios';
my $title = 'Alert';
my $message = '';
my $priority = 2;
my $sticky = 0;
my $destination = 'localhost';
my $password = '';

my $help = 0;

my $pod2usage = sub {
  # Load Pod::Usage only if needed.
  require "Pod/Usage.pm";
  import Pod::Usage;

        pod2usage(@_);
};

# Declare and retreive options
GetOptions(
  'h|help'          => \$help,
  'a|application=s' => \$application,
  't|title=s'       => \$title,
  'm|message=s'     => \$message,
  'P|priority=i'    => \$priority,
  's|sticky'        => \$sticky,
  'H|host=s'        => \$destination,
  'p|password=s'    => \$password,
) or $pod2usage->(1);

# Print help
$pod2usage->(1) if $help;

# Validate options
if ( $application eq '' ) {
  die "Error: Missing mandatory option: application\n";
}

if ( $title eq '' ) {
  die "Error: Missing mandatory option: title\n";
}

if ( $message eq '' ) {
  die "Error: Missing mandatory option: message\n";
}

if ( $priority eq '' ) {
  die "Error: Missing mandatory option: priority\n";
}

if ( $password eq '' ) {
  die "Error: Missing mandatory option: password\n";
}

#
# Main program
#

# Set up the Socket
my %addr = (
  PeerAddr => $destination,
  PeerPort => Net::Growl::GROWL_UDP_PORT,
  Proto    => 'udp',
);

my $s = IO::Socket::INET->new ( %addr ) || die "Could not create socket: $!\n";

# Register the application
my $p = Net::Growl::RegistrationPacket->new(
  application => $application,
  password    => $password,
);

$p->addNotification();

print $s $p->payload();

# Send a notification
$p = Net::Growl::NotificationPacket->new(
  application => $application,
  title       => $title,
  description => $message,
  priority    => $priority,
  sticky      => $sticky,
  password    => $password,
);

print $s $p->payload();

close($s);

I saved this file with Nagios in /usr/local/nagios/etc and called it Growl.pl. you’ll need to make the file executable, so chmod +x /usr/local/nagios/etc/Growl.pl

5. Now to tell Nagios how to use it. So you want to edit /usr/local/nagios/etc/objects/command.cfg and add the following two commands (also from the mentioned post, with a modification):

define command {
  command_name  notify-host-by-growl
  command_line  /usr/local/nagios/etc/Growl.pl -H YourIP -p YourPassword -a Nagios -t Alert -m "$NOTIFICATIONTYPE$ Alert $HOSTNAME$[$HOSTADDRESS$] is $HOSTSTATE$" –s 1
 }

 define command {
  command_name  notify-service-by-growl
  command_line  /usr/local/nagios/etc/Growl.pl -H YourIP -p YourPassword -a Nagios -t  Alert -m "$NOTIFICATIONTYPE$ Alert $HOSTNAME$[$HOSTADDRESS$]/$SERVICEDESC$ is $SERVICESTATE$" –s 1
 }

You need to modify this to have your IP and Password. I also added the –s 1 to the end to make the notifications sticky. This means that you have to clear the notification from your desktop. The reason that I wanted this was if the notification happened while I was away from my desk, I wanted it to still be there when I got back. This can also be configured through the client on the computer, but figured I’d do it both (belt and suspenders kind of thing).

Since there are a few people here that receive alerts, what I will do is make multiple copies of the commands and append the command_name with the person’s name.

Note: Obviously the computers getting the notifications have to always have the same IP, either through static assignment on the computer or through a DHCP reservation (my preference).

6. Now you need to attach these commands to the contacts that should make use of them. So edit /usr/local/nagios/etc/objects/contacts.cfg. choose the contact that gets growl notifications and add the following to their definition.

service_notification_commands notify-service-by-email, notifiy-service-by-growl-rob
host_notification_commands notifiy-host-by-email, notify-host-by-growl-rob

7. Check your configuration by running:
/usr/local/nagios/bin/nagios –v /usr/local/nagios/etc/nagios.cfg

8. If everything checks out, restart nagios (/etc/init.d/nagios restart) and you should start receiving notifications via growl on your computer.

I’m also looking into Prowl for the iPhone and doing similar for notifications for that. There’s a good writeup on that here [http://reluctanthacker.rollett.org/content/setting-nagios3-send-prowl-notifications].

© 2013 Suffusion theme by Sayontan Sinha