Tuesday, August 15, 2006

100% CPU Utilization - How to pinpoint the root of the problem

Ever had that irritating problem where all of a sudden EVERYTHING starts slowing down to a grinding halt? I've had the problem numerous times, and it always seemed to be some sort of BADWARE infesting on my machine.

This time around, one of our computer labs are facing the very same problem. But after trying all possible anti-virus, anti-spyware, anti-malware, anti-malware, anti-torjans, Hijackthis, and all, there seemed to be no problem what-so-ever, and CPU was still at 100%. So, with some help, I did a little experiment... (this can be used in the future to diagnose the root cause of your OS problem, if all else fails)

First, using the builtin System Configuration Utility of XP by typing "msconfig" at the command prompt, I chose the "Diagnostic Startup", which effectively disables ALL Windows services from starting up, except of course the essential/critical ones, and it also disables ALL startup applications.

After rebooting, I found that the problem has gone, so immediately I now know that it's an OS problem, and it has something to do with either the startup programs or the Windows services. Optimistically, it could be just one culprit, or a combination of a few. So then, I proceeded to step 2, which is using the "Selective Startup" which allows us to choose what to run and what to disable. First, I chose to enable everything except any Windows Services, so I left "Load System Services" unmarked.

Upon rebooting, there were no problems and no 100% CPU Utilization. So I can deduce that it wasn't a program startup problem, and it was one of the MANY services that run on XP. So, this time I selected the system services as well, but this time I navigated to the "Services" tab, and here's the ingenious part. I applied a binary search algorithm to pinpoint the culprit with the assumption that there is only 1 single service that is the root fo the problem. Basically, I split the list in two halves (A & B) and enabled the first part. Upon rebooting, I found that the CPU utilization was normal, so I knew that the problem must be among the second half (B) of the services list. So I proceeded this time by splitting B into two halves again, into C & D and repeat this process of halving until I came down to a few processes starting with "P".

Guess what it came down to? Print Spooler services! Who would've thunk it? And the lab computers were not even hooked to any damned printer! Well, I did a little research on the web and lo and behold... seems like I wasn't the only one. Found out that if you don't have a printer, then chances are you'd have a Microsoft Document Image printer driver kinda thing. It acts like a printer, but doesn't actually produce any output. Unfortunately, when any dumb soul hits the "Print" command and sends it for 'printing', it'll be queued up in that spooler. And these things don't expire! Nor is it infinite space! Similar to a normal printer, I guess, that has a printer buffer/cache/memory, which are now commonly in the 2MB to 16MB range, it can get FULL! So, that was the problem!

All I needed to do then was simple. Just STOP the Print Spooler service via the Manage Console. Empty the "%system directory%\system32\spool\printers" folder. START the Print Spooler service again, and you're DONE! Of course, if you don't have a printer or don't plan to attach any in the near future, you can always disable that service altogether to avoid future problems. I foresee Windows fixing this in the future. Possibly having an expiry to the spooled items, maybe up to 30 days, or enable user to specify. OR get rid of the MS Doc Image thing altogether! Haven't figured out what it's good for yet...

Well, done with that problem. On to the next!