One of the many roles server administrators are asked to provide now-a-days is high availability for those applications that run on the servers we support. Having a history of being associated with 24/7/365 and 8/5 shops, I have narrowed the goal of providing acceptable uptime down to three basic items; hardware, operating system, and applications.
Hardware:
Being in this industry for over 22 years, I have seen hardware make incredible advances in providing stability and I have seen servers in all kinds of environments. Even if your servers aren’t in a controlled HVAC environment, most will run very stable for years even in the most adverse environments. If you have physical access to your servers, you should make sure that your servers aren’t being choked. Keep the firmware and BIOS updated. Vendors have really made this easy compared to the early days where upgrading these could render your servers unusable.
Operating System:
Being primarily a Windows Server Administrator, Microsoft has made great advances in providing a stable operating system. I remember the days of NT 3.51 where one would have to reboot every week to fix memory leaks, etc.
Read the event logs, they tell you stuff! I’m amazed at how many server admins don’t read the event logs. I’m fortunate to have a monitoring system that alerts me for specific event ids, but I have been in environments where I had to read the event logs manually. Remember the old saying, “An ounce of prevention is worth a pound of cure”. Finding an event that a hard drive has failed in a RAID can prevent hours of downtime.
Patch your OS. As with any System Administrator, this is not one of my favorite tasks to perform and I think Microsoft releases too many but if you do them quarterly, you can still keep your SLA. The days of fighting through driver issues are pretty much gone but you still need to update them. I opt to use the manufacturer’s drivers over Microsoft’s.
Never waste a reboot! If you need to reboot a server for maintenance outside your normal window, apply patches before rebooting.
Applications:
Applications typically fall into three categories with respect to resource utilization; cpu intensive, memory intensive, and disk I/O intensive. Knowing what your applications require before provisioning them into production can save you a lot of headaches down the road and help you achieve the SLA you're after.
- Brian Pastre, Premier IT, Inc.