Archive for the 'Systems Administration' Category

Feb 23 2010

Fun with macs, CACs, and Certs (and iPhone dev).

Just a quick post to save others some time and a little pain:

OSX 10.6.2 + SCR3110 CAC reader + new GX4 CAC card == No love on OSX. Keychain sees the card as empty.

However, 10.6.2 + VMWare + win7-64 + reader + CAC card + IE works just fine without any of the add-on software used on XP. You’ll need to pass the reader through to the VM by clicking on the little USB icons in the bottom right.

On another note, a Verisign EAC Certificate loaded in your keychain will cause codesign to hang for 8-10 minutes while it asks oscpd to validate the cert. This also happens when you use Keychain Access to go try and figure out why its taking so long to sign things. Work around it by either dropping your network when you need to sign things, or more permanently, drop your network and then use Keychain Access to remove the cert altogether. Save yourself the pain and load the EAC cert directly into firefox and use that browser to access the EAC enabled sites.

And finally, if you have the reader plugged in with a card in it and try to sign an iPhone application you will probably get the error: CSSMERR_DL_MISSING_VALUE. Keychain Access on 10.6.2 recognizes the reader and if the card is plugged in, Keychain Access seems to want to try and use it for signing. Take the card out of the reader and try again.

>>> Karl

No responses yet

May 19 2008

Experiences with cfengine.

At a customer site I’ve had both the pleasure and pain of working with cfengine as a means to control the environment from a single touchpoint. The client decided to save the precious little time they had and to use readily available packages for the software instead of building the software from scratch. This resulted in mixed versions between the RedHat and Solaris machines with RedHat running the 2.2.x series code and Solaris on 2.1.x. What followed over the months were several exercises in frustration.
Cfengine was picked after an evaluation showed that it was capable of updating system files across machines and pushing files from the repository to the individual hosts. With the proof of concept working and with high hopes from the functionality referred to in the documentation, cfengine was rolled onto the network.
The first task was relatively straightforward; insure that the staff member’s logins were always present in the sudoers file. The site was using several group aliases to define various sudoers functions, which leaded to order dependencies in the file. The solution found was to delete all of the aliases and groups definitions and re-add them each pass. This was found to work until it was realized that this would happen from cron every 15 minutes. Not quite what was hoped for. The configuration was ultimately altered to explicitly look for a lack of team members before beginning any of the updates.
The next task was to distribute a dozen or so files from the central repository to all 250 or so machines. This worked almost perfectly and required only a bit of tuning on the MaxConnections, MaxCfengins, and Splay time directives. The current challenge with this task now is to ensure that all of the installed machines do the pull on an update as it seems that occasionally some small percentage of them, typically Solaris, refuse to pull the files without some manual cfrun intervention.
An Editfiles directive is provided to allow the editing of arbitrary files. Using this for the next task, editing the crontab on linux was straightforward and linux cron noticed the change immediately. On solaris this was not the case. Editing the files was the easy part. After editing the file the cron daemon needs to be made aware of the change, and so either the process needs to be restarted or the crontab command needs to be used to reread the file. Unfortunately, the Process directive proved to be little use on the older cfengine and the Shell directive was used to write custom inline shell scripts to handle the stopping and starting of cron.
The most recent change to the network involved manipulating the fstab on both OSes, removing an NFS mount, and then changing the former mount point from a directory to a symlink. There is an Unmount directive that would have worked beautifully had it worked on both OSes. It worked fine with Linux, unmounting the FS and deleting the entry and directory for us. Sadly, on Solaris it did nothing. To promote overall consistency it was opted to do manipulations on both operating systems in the same manner. An Editfiles stanza was used to manipulate the (v)fstabs, while a long chain of shell commands to ensure that the filesystem was unmounted, the fstabs were edited, and the symlink was created. Each component set a success or failure flag to enable the script to proceed to the next step or fail and alert as the case may be. This is probably the most complex script used at the customer site.
I’m unable to say for certain how much of the pain with the software is caused by the mixed versions being used on the network. It could be that if there was time to build and upgrade the software across the machines to the same version that the problems would magically disappear. It could also be that while powerful and seemingly capable, cfengine does things in a slightly different manner than a sysadmin, making the would-be cfengine administrator configuring it have to double check everything they write. (An example of this is the symlink directive which uses target->source whereas the ln command uses source->target.) Documentation and error reporting for the product was also a challenge, especially with the mixed version. The version specific source was often referred to understand what an error means or to determine why a specific section of code was being ignored. Cfengine will remain in place at the customer site for the foreseeable future, as despite its issues it is still better than touching each machine individually. Only time will tell if a complete upgrade to a consistent version will remove some of the pain or if this is simply the way of cfengine.

One response so far

Feb 18 2008

Performance Testing – Tools of the Trade

In the previous article we covered a brief load test scenario. It involved running load against a webserver and driving the server to failure to determine its capacity. What I did not discuss were the specific tools that could have been used during the testing period.

Client Side:

Wget – A non-interactive command line tool to fetch https, http, and ftp items. It can easily be run from inside a script in a tight loop or used in recursive/mirror mode for crawling. It can be found at http://www.gnu.org/software/wget/.
cURL- Another non-interactive command line tool. Much more extensive in its protocol support. Available from http://curl.haxx.se/. This tool was designed to fetch single URLs. It requires integration into a script to mirror/copy websites if such a behavior is desired.

The server side has several options depending on the OS being run. There are a few commands that are present on all OSes by default, and then there are OS specific commands as well.

Server Side:

Iostat provides thorough IO statstics for the machine. The command is native on Solaris as well as linux with the installation of the sysstat package. After watching the output run over time you can tell if the disk is lightly used, overloaded, or somewhere in-between. You can also tell if the drives are overwhelmed with many small transactions or content with large streaming writes. This is a good all round tool to get familiar with.

Next in line in the *stat family is vmstat. While iostat gives us a view into the disk subsystem, vmstat provides similar insight into the memory subsystem on the host. The command is natively present on Solaris and Linux. Vmstat is capable of showing you statistics on swap, memory paging characteristics, memory fault characteristics, and some additional information about run queues on the machine and CPU utilization. This is another good general tool to be familiar with.

Top is the next tool I used, and is probably the single most popular tool for getting an at-a-glance view to what the system is doing. It is a curses based application that periodically updates itself and constantly shows the top 20ish running items sorted by CPU utilization. While present in Linux, the application is not in the default Solaris installation. www.sunfreeware.com offers the package for download. Top typically requires minimal skill to interpret its output which makes it a good first line tool to see why a machine is behaving oddly.

Solaris offers two additional tools for observing system behavior: prstat and mpstat [edit: Seems Linux offers mpstat as well. It contains similar information to the Solaris version].

Prstat is very similar to top and shows the top 20 or so processes by CPU utilization but through various command line options, can provide insight into threads and offer microstate accounting information as well.

Mpstat is more in line with the vm/iostat line of commands and shows per cpu, or processor set, statistics such as context switches, systems calls, etc. The command can be used to determine if an application is thrashing the cpus in the machine or generally what the machine’s CPUs are doing.

Netstat is a utility which allows the user to see what the network stack on a machine is doing. It is useful for looking at the number of open sessions as well as the number of sessions in TIME_WAIT.

Finally, the sar command is also available to see what a given host is doing. The command is natively present on both linux and solaris. I personally tend not to use that as I can generally determine what I need from watching the *stat commands .

There is one additional tool which I use on Solaris and OSX, and that is dtrace. Dtrace is capable of instrumenting virtually anything on the fly and can provide amazing insight into what an application, or even the host, is doing at that moment without resulting to the instrumentation of binaries or adding typical splatterprint debug code to the binaries.

Essentially, I use the commands in approximately this order: top/prstat first to get a general overview of what the machine is doing. For more insight I open additional sessions on the machine and run iostat, vmstat, netstat, and possibly mpstat if available. Finally if I’m still unable to determine what the host is doing, and the server is running Solaris or OSX, I’ll use dtrace to look into the kernel or at the application for insight into whatever problem I’m chasing.

Whichever commands you decide to use, and whatever order you run them in, keep in mind one important point. Observing an experiment can affect the output of said experiment. This means that running all of those tools on a very burdened server may be enough to push the machine over the edge of unresponsiveness.

This concludes our brief introduction to the tools of the trade. If there is a demand and time permits I intend to write articles on the best practices for each of the commands.

If you like what you’ve read, please share the blog with others. If you have any questions or comments, feel free to send me email at kmajer at karlmajer dot com.

One response so far

Next »