Search My Techie Guy

Wednesday, September 20, 2017

10 Critical “Ask Yourself Questions” for ICT & Telecommunications O&M Engineers before executing a major or critical network change procedure

After reading this article, you are going to be a better O&M engineer than you were before landing on this page.

75% of the work carried out by O&M (Operations & Maintenance) Engineers includes; Systems upgrades, Migrations, swap outs, replacements and configuration changes. Most of the time O&M engineers are criticized and blamed by planning engineers for being reactive rather than proactive. We are known for being firefighters rather than being fire preventers. This is about to change for you. After reading this article, you are going to be a better O&M engineer than you were before landing on this page.

Am four months away from making it a total of 9 years in telecommunications operations and maintenance (O&M), and I have learnt that the ability to pull off a smooth change operation (cutover, upgrade, swap out, migration, configuration change, expansion) majorly depends on how well you prepare yourself before the operation rather than your skills or experience!
You could be highly skilled with enormous experience but if you don’t make a good effort to prepare yourself before you begin a major change operation, you are likely to create a mess and believe me you, that will leave a bad mark on your record and reputation.

The preparation am talking about here includes; making sure you have the right tools (software tools and hardware tools), pre-checklist, the actual execution, post-checklist and service monitoring.

In this article I have compiled a list of critical questions that you should ask yourself before embarking on that major change operation. Trust me this will save you the bumpy ride that is common with all major network change operations.

Ask Yourself Questions

As an O&M engineer, you need to read and answer the questions below to ensure that all situations are considered prior to starting work or making any system/network changes. And if the answer to any of the questions below is NO, then you need to STOP and reorganize yourself. The questions are not in any particular order, you just have to go through all of them.

1.  Do I know why this work is being performed?

Most of the times O&M engineers take instructions from planning engineers or solution architects without making a detailed analysis and understand of why this work is being performed. It could be as minor as a simple restart of a system process but you need to find the reason as to why you have been asked to restart that process.

2.  Am I trained and qualified to do this work?

Usually technical work requires that you possess a certain skill set before you can execute any operation. This question helps you to evaluate your skill set and level. If you find yourself in a situation where you lack some skills to ably execute the work, make sure to have someone with that skill join you or be on standby to support you when you get stuck. It’s not a sign of weakness, you can’t know everything, there is nobody that understands everything and that’s why there are technical teams so that we can complement each other.

3.  Do I have the updated MOP (Method of Procedure) and supporting documents to carry out this work?

A method of procedure (MOP) is a document that outlines all the specific steps in detail of the work to be performed. It includes all the pre- and post-implementation system health checks.
It’s usually prepared by a more specialized technical team for example the research and development (R&D) team. These are the people that actually designed and built the system that you are about to work on. So, don’t feel shy asking for the MOP from the vendor prior to performing major changes on their systems.

4.  Have I walked through the MOP and supporting documents and do I know which network elements and services that are going to be impacted during this procedure?

Don’t just stop at getting the MOP and stashing it under your desk! Read it in detail and while at it, perform what we call impact analysis of all the services and network elements that are going to be affected by that procedure. Also identify the level and severity of the impact.

5.  Have I identified and notified everybody; customers, internal groups, stake holders who will be directly affected by this work?

Having perform the impact analysis and you have a clear picture of which services/systems will be impacted, you need to formally write to all the stake holders. At a bare minimum the notification should go out 3 days prior to performing the work.

6. Can I prevent or control service interruptions?

Still with reference to the impact analysis that you performed, ask yourself if you can prevent or control the impact on services. This will create minimal downtime of critical services. Your boss will be happy if you take that extra step, it will also show that you have the customer’s business at heart 😊

7. Is this the right time to perform this work?

Choose the proper time for your maintenance window, get a time that has minimum traffic. Yet again you are trying to have minimal downtime and this will save the business a lot of money.

8. Have you monitored the service to be stable for at least 24 hours prior to starting any changes?

Before you make any changes/upgrades to the system, make sure it’s error and fault free. Make sure you have visibility of the current status of the system. If there are any existing alarms/faults make sure to capture and report them otherwise you will find yourself trying to troubleshoot old faults that are not as a result of your operation.

9.  Do I have the proper equipment and tools to perform this work?

Tools can be software tools (terminal clients, username/password, monitoring tools, TFTP/FTP servers, diagnostic software, etc.) or hardware tools (console cable, screw drivers, meters, etc).

10. Is everything in place to allow me to quickly and safely restore service if I hit a snag?

This should cover the fall-back procedure, system backup, configuration backup, escalation procedure and hotlines. If you are a few minutes away from your maintenance window and stuff is not working out. You need to have a plan to roll back and restore the system to the last working configuration, save and plan for another day.


I have tried to cover the major “ask yourself questions” in operations and maintenance, if you feel I have left out some please feel free to leave a comment and I will be happy to add it to this article. Otherwise, I wish you a smooth operation.

No comments: