One of the key skills employers want from their people is the ability to problem solve, or troubleshoot as we generally refer to it in engineering. As with common sense, good troubleshooting isn’t so common around here.
Dear Colleagues,
One of the key skills employers want from their people is the ability to problem solve, or troubleshoot as we generally refer to it in engineering. As with common sense, good troubleshooting isn’t so common around here.
Your engineering career is likely to contain a major component of troubleshooting and then fixing. Perhaps mainly remedying someone else’s mistakes ? The trick, I believe, is to keep your mind completely open when troubleshooting – to avoid pre-conceived ideas which can throw you completely off track. And to work through all the phases listed below systematically.
A few suggested steps below when you are confronted by the next problem:
1. Identify and define the real issue
When someone reports a problem to you; you can bet your bottom dollar this may not be the actual problem. When seen through the eyes of a user, the report of the situation may not reflect engineering reality. Ensure you get a careful explanation and if possible a demonstration of the problem. It is your job to ascertain what the real problem is in real engineering terms. Often a problem presents intermittently. Don’t walk away from it, however, presuming it has gone forever – it hasn’t.
Recently, when trying to tune a process control loop, which the operators had complained was sluggish, I found that I was actually dealing with high frequency signals (an aliasing problem) – it wasn’t a tuning problem, after all, but a filtering one.
Another challenge in fixing an IT switch which refused to recognize new MAC addresses from new devices connected to the network. It turned out; the problem had nothing to do with the switch itself and everything to do with a faulty power supply (which was supplying below the required voltage – 160Vac rather than 230Vac) to the switch.
2. Reproduce the problem
It is best to reproduce the problem where possible. You can then observe the full sequence of events, view the error messages and analyse other variables that may be affecting it.
This is terribly hard to do with intermittent problems such as occasional interference on communication cables or common mode voltage problems. In these cases; you have to try and consider what were possible causes of the problem.
3. Localise, isolate and home in
Now you have to zone in on the equipment or software module that is responsible for the problem. The trick is to zone in on the precise element causing the problem. Penetrate the thicket of equipment and find the precise element. Remember that seemingly unrelated elements can cause problems. It is also vitally important to identify exactly what happened before the problem occurred – was a card changed out and the IP address not updated on the server?
4. Make a Plan
Ensure that you assess what is required carefully. As one of my regular correspondents remarked: Beware the Law of unexpected consequences. The process of fixing something may cause other unexpected new problems (a colleague of mine located and remedied severe harmonic problems in a plant network, but blew up three of my precious variable speed drives with an overvoltage condition). When going through your plan, step-by-step, to best remedy the problem, you may find other issues appear that you hadn’t considered.
5. Trace your steps
Ensure that when you fix the problem, you know exactly what you have done in case you need to retrace your steps later to put the equipment back into its original state.
6. Test and retest
Test and retest over a period of time before accepting that the problem has been fixed. If there is any doubt about whether the problem has been fixed or not, there is no doubt. It is, most probably, still a problem.
Most people tend to walk away when they think they have remedied the problem; only to be called back later because they haven’t finished the job.
7. Document for an absolute moron
People who come after you may not be aware of what you have done and how you have solved the problem. The problem may reappear or something similar may happen to another piece of equipment. So – document for someone who may have no knowledge of what you have done.
Again; this phase of the project is often forgotten. And then when a similar problem occurs on another part of the plant; everyone goes through the same learning curve again.
8. Communicate with the client or user and your buddies
Often the user is not convinced the problem has been fixed. Your job is to ensure you communicate honestly; what you have done and why the problem has been fixed. Don’t treat the user as a complete idiot, but as a real partner in operating your facility. This is important for your credibility (and for the engineering profession). Ensure all your buddies (and boss) are also apprised of how you fixed the problem – it could save them a lot of stress in their work.
I like Anthony J. D’Angelo’s take on fixing things: ‘Become a fixer, not just a fixture’.
Yours in engineering learning,
Steve
Mackay’s Musings – 6th January’14 #546
125, 273 readers – www.eit.edu.au/cms/news/blog-steve-mackay