Search My Techie Guy

Showing posts with label systems operations and maintenance. Show all posts
Showing posts with label systems operations and maintenance. Show all posts

Wednesday, September 20, 2017

10 Critical “Ask Yourself Questions” for ICT & Telecommunications O&M Engineers before executing a major or critical network change procedure

After reading this article, you are going to be a better O&M engineer than you were before landing on this page.

75% of the work carried out by O&M (Operations & Maintenance) Engineers includes; Systems upgrades, Migrations, swap outs, replacements and configuration changes. Most of the time O&M engineers are criticized and blamed by planning engineers for being reactive rather than proactive. We are known for being firefighters rather than being fire preventers. This is about to change for you. After reading this article, you are going to be a better O&M engineer than you were before landing on this page.

Am four months away from making it a total of 9 years in telecommunications operations and maintenance (O&M), and I have learnt that the ability to pull off a smooth change operation (cutover, upgrade, swap out, migration, configuration change, expansion) majorly depends on how well you prepare yourself before the operation rather than your skills or experience!
You could be highly skilled with enormous experience but if you don’t make a good effort to prepare yourself before you begin a major change operation, you are likely to create a mess and believe me you, that will leave a bad mark on your record and reputation.

The preparation am talking about here includes; making sure you have the right tools (software tools and hardware tools), pre-checklist, the actual execution, post-checklist and service monitoring.

In this article I have compiled a list of critical questions that you should ask yourself before embarking on that major change operation. Trust me this will save you the bumpy ride that is common with all major network change operations.

Ask Yourself Questions

As an O&M engineer, you need to read and answer the questions below to ensure that all situations are considered prior to starting work or making any system/network changes. And if the answer to any of the questions below is NO, then you need to STOP and reorganize yourself. The questions are not in any particular order, you just have to go through all of them.

1.  Do I know why this work is being performed?

Most of the times O&M engineers take instructions from planning engineers or solution architects without making a detailed analysis and understand of why this work is being performed. It could be as minor as a simple restart of a system process but you need to find the reason as to why you have been asked to restart that process.

2.  Am I trained and qualified to do this work?

Usually technical work requires that you possess a certain skill set before you can execute any operation. This question helps you to evaluate your skill set and level. If you find yourself in a situation where you lack some skills to ably execute the work, make sure to have someone with that skill join you or be on standby to support you when you get stuck. It’s not a sign of weakness, you can’t know everything, there is nobody that understands everything and that’s why there are technical teams so that we can complement each other.

3.  Do I have the updated MOP (Method of Procedure) and supporting documents to carry out this work?

A method of procedure (MOP) is a document that outlines all the specific steps in detail of the work to be performed. It includes all the pre- and post-implementation system health checks.
It’s usually prepared by a more specialized technical team for example the research and development (R&D) team. These are the people that actually designed and built the system that you are about to work on. So, don’t feel shy asking for the MOP from the vendor prior to performing major changes on their systems.

4.  Have I walked through the MOP and supporting documents and do I know which network elements and services that are going to be impacted during this procedure?

Don’t just stop at getting the MOP and stashing it under your desk! Read it in detail and while at it, perform what we call impact analysis of all the services and network elements that are going to be affected by that procedure. Also identify the level and severity of the impact.

5.  Have I identified and notified everybody; customers, internal groups, stake holders who will be directly affected by this work?

Having perform the impact analysis and you have a clear picture of which services/systems will be impacted, you need to formally write to all the stake holders. At a bare minimum the notification should go out 3 days prior to performing the work.

6. Can I prevent or control service interruptions?

Still with reference to the impact analysis that you performed, ask yourself if you can prevent or control the impact on services. This will create minimal downtime of critical services. Your boss will be happy if you take that extra step, it will also show that you have the customer’s business at heart 😊

7. Is this the right time to perform this work?

Choose the proper time for your maintenance window, get a time that has minimum traffic. Yet again you are trying to have minimal downtime and this will save the business a lot of money.

8. Have you monitored the service to be stable for at least 24 hours prior to starting any changes?

Before you make any changes/upgrades to the system, make sure it’s error and fault free. Make sure you have visibility of the current status of the system. If there are any existing alarms/faults make sure to capture and report them otherwise you will find yourself trying to troubleshoot old faults that are not as a result of your operation.

9.  Do I have the proper equipment and tools to perform this work?

Tools can be software tools (terminal clients, username/password, monitoring tools, TFTP/FTP servers, diagnostic software, etc.) or hardware tools (console cable, screw drivers, meters, etc).

10. Is everything in place to allow me to quickly and safely restore service if I hit a snag?

This should cover the fall-back procedure, system backup, configuration backup, escalation procedure and hotlines. If you are a few minutes away from your maintenance window and stuff is not working out. You need to have a plan to roll back and restore the system to the last working configuration, save and plan for another day.


I have tried to cover the major “ask yourself questions” in operations and maintenance, if you feel I have left out some please feel free to leave a comment and I will be happy to add it to this article. Otherwise, I wish you a smooth operation.

Tuesday, August 22, 2017

Serial Console Settings for the Ericsson AXE/APG40 Based Servers

Below is an example of an Ericsson AXE/APG40 based server:

AXE/APG based server showing where to connect the console
APG stands for Adjunct Processor Group
AXE is not an acronym but an Ericsson product code.

Below are the serial settings when connecting to the Ericsson server that is based on their AXE/APG40 architecture. 

Serial settings, Note the Baud Rate



Monday, August 14, 2017

Accton Edge Core Switch - CPU Utilization reaching 100% Causing Latency, Packet Loss and Jitter


Summary:

High CPU utilization on Accton Edge Core switch

Problem or Goal:

SW01-0#show process cpu 
             Process Name 5Sec(%%) 1Min(%%) 5Min(%%) 15Min(%%) Runtime(ms)
                tRootTask   0.00    0.00    0.00    0.00             0
                 tExcTask   0.00    0.00    0.00    0.00          4450
                 tLogTask   0.00    0.00    0.00    0.00             0
                   bcmDPC   0.00    0.00    0.00    0.00             0
                bcmCNTR.0   1.20    0.97    0.88    0.86     403860440
                    bcmTX   0.20    0.08    0.01    0.03      38288910
           bcmXGS3AsyncTX   0.00    0.00    0.00    0.00             0
                bcmCNTR.1   0.20    0.77    0.64    0.65     384888780
                bcmLINK.0   0.20    0.25    0.30    0.44     207997140
                bcmLINK.1   0.60    0.20    0.27    0.40     205123250
                    bcmRX   3.20    2.77    2.41    2.33     932885260
                   ipnetd   9.20    4.05    1.37    0.63     150098590
                SYS_TIMER   0.00    0.00    0.00    0.00            30
                 TASK_MON   0.00    0.00    0.01    0.02      13190510
                     TPLG   0.00    0.00    0.00    0.00       2905040
               STACK_CTRL   0.00    0.00    0.00    0.00             0
         SWDRV_CACHE_TASK   0.00    0.00    0.00    0.00           570
                    SWDRV  11.60   12.60   12.57   12.51    2450845024
             AMTRDRV_TASK   7.40    9.42    9.91    8.46     437234870
            SWDRVL3_CACHE   0.00    0.00    0.00    0.00        723040
                     BSTM   2.40    2.40    2.21    2.18     867314220
                      ISC   0.00    0.00    0.00    0.00        772490
             ISC_CALLBACK   0.00    0.00    0.00    0.00        458880
ISC_LOW_PRIORITY_CALLBACK   0.00    0.00    0.00    0.00             0
                       FS   0.00    0.00    0.00    0.00             0
                   SYSDRV  21.60   21.87   19.88   16.61     394608318
                   SWCTRL   0.00    0.00    0.00    0.00       1075650
                     NMTR   0.00    0.00    0.00    0.00        435680
                     LACP   0.00    0.02    0.03    0.04      39156480
                   AMTRL3   0.00    0.00    0.00    0.00       2383260
                      STA   4.80    6.15    6.22    6.26    3519756390
                     VLAN   0.00    0.00    0.00    0.00        109550
                     GARP   0.00    0.00    0.00    0.00             0
                     IGMP   0.60    1.25    1.15    1.24     436226180
                   RADIUS   0.00    0.00    0.00    0.00             0
                    DOT1X   0.00    0.02    0.00    0.00        383180
                     IMTR   0.00    0.05    0.04    0.05      24671280
                 IML_TASK   0.40    0.25    0.15    0.15      67940540
                     P2IP   2.20    1.37    1.21    1.26     583185760
                     VRRP   0.40    0.15    0.11    0.13      46062760
                    zNCFG   0.00    0.00    0.00    0.00             0
                     zNSM   0.00    0.00    0.00    0.00      17161360
                    zOSPF   0.00    0.00    0.00    0.00           100
                     zRIP   0.00    0.00    0.00    0.00         15570
                   SYSLOG   0.00    0.00    0.00    0.00        615010
                     SMTP   0.00    0.00    0.00    0.00         70630
                  DNS_RES   0.00    0.00    0.00    0.00       4623530
                DNS_PROXY   0.00    0.00    0.00    0.01       4722020
                   KEYGEN   0.00    0.00    0.00    0.00        678350
                     HTTP   0.00    0.00    0.00    0.00        681710
                 TELNET_S   0.00    0.00    0.00    0.00           730
                 TELNET_D   0.00    0.00    0.00    0.00        469450
                     DNLD   0.00    0.00    0.00    0.00             0
                    FXFER   0.00    0.00    0.00    0.00         76520
                     DHCP   0.00    0.00    0.00    0.00       5536000
                     SNTP   0.00    0.00    0.00    0.00       1277890
                SSHD_MAIN   0.00    0.02    0.01    0.01       4395100
                     LLDP   0.00    2.48    2.18    1.88    1093983890
                     SNMP  46.60    8.44   12.15   21.13     500340188
                 CLITASK0   0.00    0.00    0.01    0.02      11541680
                 KICK_NET   0.00    0.00    0.00    0.00          2170
                     TN12   1.00    1.44    0.37    0.12          1080
                     UI37   1.20    1.04    0.34    0.11           980
                     UI08   0.00    0.00    0.00    0.00         30770
===========================================================================
                  total = 100.00   78.06   74.43   77.53

SW01-0#show snmp 
System Contact:  
System Location: 
SNMP Agent: enabled
SNMP traps: 
 Authentication: enabled
 Link-up-down:   enabled

Cause:

SNMP agent seems to be using most of the CPU resources.

Solution:

Disable the SNMP agent

SW01-0#configure 
SW01-0(config)#no snmp-server ?
  community  Defines SNMP community access string
  contact    Sets the system contact string
  enable     Enables this device to send SNMP traps
  engine-id  Configure engine-id
  group      Configure group name
  host       Specifies SNMP notification operation recipients
  location   Sets the system location string
  user       Configure user name
  view       Configure view name
  <cr>
SW01-0(config)#no snmp-server 
SW01-0(config)#end 

SW01-0#show snmp 
System Contact:
System Location: 
SNMP Agent: disabled
SNMP traps: 
 Authentication: enabled
 Link-up-down:   enabled

Problem Solved?

Yes

SW01-0#show process cpu 
             Process Name 5Sec(%%) 1Min(%%) 5Min(%%) 15Min(%%) Runtime(ms)
                tRootTask   0.00    0.00    0.00    0.00             0
                 tExcTask   0.00    0.00    0.00    0.00          4450
                 tLogTask   0.00    0.00    0.00    0.00             0
                   bcmDPC   0.00    0.00    0.00    0.00             0
                bcmCNTR.0   1.00    1.00    0.84    0.77     403865160
                    bcmTX   0.00    0.00    0.02    0.01      38289030
           bcmXGS3AsyncTX   0.00    0.00    0.00    0.00             0
                bcmCNTR.1   0.60    0.70    0.74    0.72     384893630
                bcmLINK.0   0.20    0.10    0.12    0.30     207999140
                bcmLINK.1   0.00    0.00    0.13    0.27     205125060
                    bcmRX   2.20    2.20    2.46    2.33     932899850
                   ipnetd   4.40    2.20    2.26    1.81     150110680
                SYS_TIMER   0.00    0.00    0.00    0.00            30
                 TASK_MON   0.00    0.00    0.00    0.02      13190640
                     TPLG   0.00    0.00    0.00    0.00       2905080
               STACK_CTRL   0.00    0.00    0.00    0.00             0
         SWDRV_CACHE_TASK   0.00    0.00    0.00    0.00           570
                    SWDRV  13.60   13.00   12.80   12.67    2450925764
             AMTRDRV_TASK  11.20    9.80   11.48    9.72     437295640
            SWDRVL3_CACHE   0.00    0.00    0.00    0.00        723070
                     BSTM   1.80    2.10    2.35    2.24     867328350
                      ISC   0.00    0.00    0.00    0.00        772500
             ISC_CALLBACK   0.00    0.00    0.00    0.00        458880
ISC_LOW_PRIORITY_CALLBACK   0.00    0.00    0.00    0.00             0
                       FS   0.00    0.00    0.00    0.00             0
                   SYSDRV  21.60   22.40   22.29   19.06     394725708
                   SWCTRL   0.00    0.00    0.00    0.00       1075650
                     NMTR   0.00    0.00    0.00    0.00        435690
                     LACP   0.00    0.00    0.02    0.05      39156940
                   AMTRL3   0.00    0.00    0.00    0.00       2383270
                      STA   7.20    6.66    6.30    6.26    3519796310
                     VLAN   0.00    0.00    0.00    0.00        109550
                     GARP   0.00    0.00    0.00    0.00             0
                     IGMP   1.60    1.86    1.34    1.26     436234430
                   RADIUS   0.00    0.00    0.00    0.00             0
                    DOT1X   0.00    0.00    0.00    0.00        383180
                     IMTR   0.00    0.00    0.03    0.04      24671570
                 IML_TASK   0.00    0.26    0.22    0.18      67941810
                     P2IP   3.80    2.80    1.80    1.49     583195950
                     VRRP   0.00    0.00    0.15    0.14      46063700
                    zNCFG   0.00    0.00    0.00    0.00             0
                     zNSM   0.00    0.00    0.00    0.00      17161360
                    zOSPF   0.00    0.00    0.00    0.00           100
                     zRIP   0.00    0.00    0.00    0.00         15570
                   SYSLOG   0.00    0.00    0.00    0.00        615010
                     SMTP   0.00    0.00    0.00    0.00         70630
                  DNS_RES   0.00    0.00    0.01    0.00       4623600
                DNS_PROXY   0.00    0.00    0.00    0.00       4722040
                   KEYGEN   0.00    0.00    0.00    0.00        678360
                     HTTP   0.00    0.00    0.00    0.00        681710
                 TELNET_S   0.00    0.00    0.00    0.00           730
                 TELNET_D   0.00    0.00    0.00    0.00        469450
                     DNLD   0.00    0.00    0.00    0.00             0
                    FXFER   0.00    0.00    0.00    0.00         76520
                     DHCP   0.00    0.00    0.00    0.00       5536020
                     SNTP   0.00    0.00    0.00    0.00       1277890
                SSHD_MAIN   0.00    0.06    0.00    0.01       4395160
                     LLDP   0.20    0.06    2.01    1.99    1093996020
                     SNMP   0.00    0.00    1.36   11.84     500418668
                 CLITASK0   0.00    0.00    0.01    0.02      11541830
                 KICK_NET   0.00    0.00    0.00    0.00          2170
                     TN12   4.20    1.66    0.72    0.50          4290
                     UI37   3.20    1.53    0.71    0.47          4100
                     UI08   0.00    0.00    0.00    0.00         30770
===========================================================================
                  total =  76.80   68.39   70.17   74.17
SW01-0#