“After reading this article, you are going to be a better O&M
engineer than you were before landing on this page.”
75% of the
work carried out by O&M (Operations & Maintenance) Engineers includes;
Systems upgrades, Migrations, swap outs, replacements and configuration
changes. Most of the time O&M engineers are criticized and blamed by
planning engineers for being reactive rather than proactive. We are known for
being firefighters rather than being fire preventers. This is about to change
for you. After reading this article, you are going to be a better O&M
engineer than you were before landing on this page.
Am four
months away from making it a total of 9 years in telecommunications operations
and maintenance (O&M), and I have learnt that the ability to pull off a
smooth change operation (cutover, upgrade, swap out, migration, configuration
change, expansion) majorly depends on how well you prepare yourself before the
operation rather than your skills or experience!
You could
be highly skilled with enormous experience but if you don’t make a good effort
to prepare yourself before you begin a major change operation, you are likely
to create a mess and believe me you, that will leave a bad mark on your record
and reputation.
The
preparation am talking about here includes; making sure you have the right
tools (software tools and hardware tools), pre-checklist, the actual execution,
post-checklist and service monitoring.
In this
article I have compiled a list of critical questions that you should ask
yourself before embarking on that major change operation. Trust me this will
save you the bumpy ride that is common with all major network change operations.
Ask Yourself Questions
As an O&M engineer, you need to read and answer the questions
below to ensure that all situations are considered prior to starting work or
making any system/network changes. And if the answer to any of the questions
below is NO, then you need to STOP
and reorganize yourself. The questions are not in any particular order, you
just have to go through all of them.
1. Do I know why this work is being
performed?
Most of the times O&M engineers take instructions from planning
engineers or solution architects without making a detailed analysis and
understand of why this work is being performed. It could be as minor as a
simple restart of a system process but you need to find the reason as to why
you have been asked to restart that process.
2. Am I trained and qualified to do
this work?
Usually technical work requires that you possess a certain skill set
before you can execute any operation. This question helps you to evaluate your
skill set and level. If you find yourself in a situation where you lack some
skills to ably execute the work, make sure to have someone with that skill join
you or be on standby to support you when you get stuck. It’s not a sign of
weakness, you can’t know everything, there is nobody that understands
everything and that’s why there are technical teams so that we can complement
each other.
3. Do I have the updated MOP (Method of
Procedure) and supporting documents to carry out this work?
A method of procedure (MOP) is a document that outlines all the specific
steps in detail of the work to be performed. It includes all the pre- and post-implementation
system health checks.
It’s usually prepared by a more specialized technical team for example
the research and development (R&D) team. These are the people that actually
designed and built the system that you are about to work on. So, don’t feel shy
asking for the MOP from the vendor prior to performing major changes on their
systems.
4. Have I walked through the MOP and
supporting documents and do I know which network elements and services that are
going to be impacted during this procedure?
Don’t just stop at getting the MOP and stashing it under your desk! Read
it in detail and while at it, perform what we call impact analysis of all the
services and network elements that are going to be affected by that procedure. Also
identify the level and severity of the impact.
5. Have I identified and notified
everybody; customers, internal groups, stake holders who will be directly
affected by this work?
Having perform the impact analysis and you have a clear picture of which
services/systems will be impacted, you need to formally write to all the stake
holders. At a bare minimum the notification should go out 3 days prior to
performing the work.
6. Can I prevent or control service
interruptions?
Still with reference to the impact analysis that you performed, ask yourself
if you can prevent or control the impact on services. This will create minimal
downtime of critical services. Your boss will be happy if you take that extra
step, it will also show that you have the customer’s business at heart 😊
7. Is this the right time to perform
this work?
Choose the proper time for your maintenance window, get a time that has
minimum traffic. Yet again you are trying to have minimal downtime and this
will save the business a lot of money.
8. Have you monitored the service to be
stable for at least 24 hours prior to starting any changes?
Before you make any changes/upgrades to the system, make sure it’s error
and fault free. Make sure you have visibility of the current status of the
system. If there are any existing alarms/faults make sure to capture and report
them otherwise you will find yourself trying to troubleshoot old faults that
are not as a result of your operation.
9. Do I have the proper equipment and
tools to perform this work?
Tools can be software tools (terminal clients, username/password,
monitoring tools, TFTP/FTP servers, diagnostic software, etc.) or hardware tools
(console cable, screw drivers, meters, etc).
10. Is everything in place to allow me
to quickly and safely restore service if I hit a snag?
This should cover the fall-back procedure, system backup, configuration
backup, escalation procedure and hotlines. If you are a few minutes away from
your maintenance window and stuff is not working out. You need to have a plan
to roll back and restore the system to the last working configuration, save and
plan for another day.
I have
tried to cover the major “ask yourself questions” in operations and
maintenance, if you feel I have left out some please feel free to leave a
comment and I will be happy to add it to this article. Otherwise, I wish you a
smooth operation.