Aayush: weblog

Archive for the ‘EMS’ Category

Operations And Management (OAM) fundamentals for IMS

Posted by Aayush Bhatnagar on December 8, 2009


Introduction:

OA&M is a central requirement of mission critical telecommunication products.

3GPP defines a generic framework for realizing an OAM architecture. This architecture can be applied for the case of IMS, in order to manage the IMS core network entities. OAM frameworks are implemented as Element Management Systems (EMS) and Network Management Systems (NMS).

Architectural Overview:

The OAM architectural framework is based on Integration Reference Points (IRPs). Each IRP provides a set of requirements for that interface and completely defines the management interface. The IRP is fully described by a requirements specification, an Information Service (IS) specification and finally a Solution Set (SS).  The solution set realizes completely the requirements of the given IRP.

There are four possible solution sets that may be considered for implementation of the IRPs:

1. SNMP Solution Set.

2. CORBA Solution Set.

3. SOAP Solution Set.

4. XML Solution Set.

Of these solution sets, SNMP and CORBA have been around for a long time. SNMP has been considered as a de-facto standard for northbound integration and is generally preferred. CORBA has not seen widespread adoption as compared to SNMP.  SOAP and XML solution sets also present themselves as additional options for IRP realization.

The stress has been on making the IRPs technology independent. Hence multiple options have been provided.

Deployment Architecture:

Each network element that has to be managed, consists of a management client. This management client is called an IRP agent. The IRP agent interacts with the management server, also known as the IRP manager.  In the context of OAM, the management server is the Element Management System for that network element.  The IRP agent is part of the network element that is being managed. These element management systems further integrate with a central network management system.

The overall deployment architecture is shown in the figure below:

The figure above illustrates the following management interfaces:

1. Between the network element and the element manager.

2. Between the element manager and the network manager.

3. Between different element managers.

There can also be interfaces between the network manages of different service provider domains. Those interfaces are not shown in the figure above. The interface between the network element and the element manager implements one of the four proposed solution sets. SNMP is the most widespread protocol used at this interface.

Several element managers integrate to a centralized network management system. At the NMS, abstract functions such as topology management, service provider interface managements, point of interconnects, interfaces towards OSS and BSS systems can be managed and monitored.

Management information also flows between the network elements themselves. However, that interface is not shown as the information may not necessarily be used for monitoring purposes.

Integration Reference Point (IRP) applications:

Some practical IRP applications will be discussed in this section. All IRPs are not covered.

Some of the needed applications are those of FCAPS: (Fault, Configuration,Alarms,Performance and Security).

a) Alarm Management IRP (Fault Management):

Network elements may encounter certain operational faults. These faults have to be escalated in the form of alarms to the element management system. There are two types of faults that may need to be handled.

The first category is that of a ADAC (Automatically Detected and Cleared) fault. Such faults are raised by the network element and then cleared by the network element itself after a certain period of time. These are not critical faults and do not hinder the operational state of the system.

The second category of alarms are those of ADMC (Automatically Detected and Manually Cleared) faults. These faults require that the network element manager or administrator clears the alarms after taking corrective action.

Faults and alarms that are raised by network elements fall under the following basic categories:

1. Hardware faults:

These alarms signify faults in the physical hardware that hosts the application. Hardware failures can cause serious service interruption and need to be conveyed to the element management system. Hardware failures are usually ADMC faults.

2. Software faults:

These alarms signify faults that may occur in the internal modules of a software subsystem of the network element. Such faults may occur due to protocol level errors, exception handling conditions, database errors etc.

3. Functional faults:

These alarms signify that there is an issue with the functionality of the software module that may potentially impede service delivery and may cause potential loss of service.

4. Loss of NE capabilities:

These alarms signify that the network element has lost its capability to provide service due to congestion or overload conditions for example.

5. Communication link failures with other NEs.

These alarms signify that the network element has lost contact with some other network elements. As for example, the S-CSCF may lose its connection to the HSS or the Charging System. Database connectivity may be lost etc.

Fault recovery and corrective actions depends upon the nature of the fault and the process defined for its recovery by the carrier.

b) Configuration Management IRP and Trace Control:

Configuration management pertains to the manipulation of configurable parameters of the network element by the element manager.  This may include logging management, activation or deactivation of tracing on the network element, subscriber specific tracing, enabling or disabling of lawful interception for a certain subscriber etc.

It also includes configuring a high availability cluster of the network element, adding of new nodes, removal of faulty nodes for servicing, dumping of CDRs to an alternate location etc.

Configuration management is of two types:

1. Passive Configuration Management: Passive CM signifies that the management entity gets notified of configuration changes at the network element by virtue of alarms that are raised at the NE.

2. Active Configuration Management: Active CM signifies that the management entity actively participates in manipulating the configurable parameters of the network element.

There can be two approaches by which we may use configuration management. These are:

1. Basic Configuration Management: This involves the changing of a single configurable parameter. Basic CM operations are singular in nature and take place serially.

2. Bulk Configuration Management: This involves the execution of configuration actions by executing a batch file or by spawning a cron job. Such operations affect many configurable parameters parallely and are usually automated.

Configuration management is an important aspect of the product life cycle of a telecom product. In production grade systems, all changes to the network element are carried out using the configuration management reference point. Some of the configurable parameters may also need to be changed for improved performance of the product under certain conditions.

As for example: Database connection pool size manipulation, tracing level modifications and other product specific parameters.

c) Performance Management (Alarm IRP):

Performance of a network element is determined by examining performance counters. As per the architecture, performance management re-uses the Alarm integration reference point.

Performance counters may be of varying types. Network element vendors may also define some of their own proprietary performance counter categories.

However, there are some basic concepts of performance measurement that remain common to most network elements:

1. Traffic measurements and BHCA: This includes the examination of traffic counters pertaining to successfully handled calls and the current BHCA load on the system in production. This gives a fair idea of system performance.

2. Call failure rate: This pertains to the examination of the failed calls, failure reasons and the rate of failure as a percentage of total calls. This provides the administrator with valuable figures to ensure that the system provide carrier grade performance or not (5 nines).

3. Network evaluation: This pertains to the evaluation of the performance on of the system after certain configuration changes take effect on the production grade system. This helps the operator to evaluate whether the changes that took effect were of any benefit to the overall system performance or not.

4. Quality of Service: For media and data intensive applications, there may be performance measurements regarding QoS. This may include parameters such as packet loss and jitter. For more detailed information, please refer to ITU-T Recommendation E.880.

5. Resource consumption: This provides information regarding the resource consumption by the application at a given load. This includes memory footprint, threads, congestion levels, CPU utilization, response time etc.

Performance data may be written to a file and dumped to the filesystem. This file can later be parsed and presented to the administrator on the Graphical User Interface (GUI).

In certain cases, when the performance degrades below a certain level, alarms need to be raised to the network manager. Some examples of such conditions may be overload, congestion, resource exhaustion, call failure rate increase etc. Performance related alarms may be escalated depending upon which performance threshold has been crossed. As for example:

— A new alarm notification A1 (minor) is generated when Level 1 is crossed

–A changed alarm notification A1 (critical) is generated when Level 2 and 3 are crossed

–A changed alarm notification A1 (minor) is generated when Level 3 and 2 are crossed

–A cleared alarm notification A1 is generated when Level 1 is crossed. Alternatively, this alarm may be marked as a ADMC alarm, where the alarm is cleared manually by the administrator.

Alarm generation may happen periodically by a configured monitoring period. This process is illustrated below:

In this regard, performance management functions use the Alarm IRP to communicate with the element manager and raise alarms relating to performance of the network element.

d) Security Management IRP:

Security management has a special meaning when applied to OAM. For the OAM domain, security management refers to the protection of management traffic, operations, authentication of the administrator and role management of various managers that manipulate the network element. The basic goal is to prevent any intentional or accidental damage to the network elements in production.

Security management in OAM should also be accompanied by the generation of security related alarms by the system, in case there is a security breach. What action is taken upon servicing such alarms is based on the local policy of the operator.

Some of the security threats that have been identified by ITU-T in ITU-T Recommendation X.800 are as follows:

  • Masquerade.
  • Eavesdropping.
  • Unauthorized access.
  • Loss or corruption of information.
  • Repudiation.
  • Forgery.
  • Denial of service.

ITU-T Recommendation X.800 also mandates that the following security services be provided in order to protect the network against these security risks:

  • Peer entity authentication.
  • Data origin authentication.
  • Access control service.
  • Connection confidentiality.
  • Connectionless confidentiality.
  • Selective field confidentiality.
  • Traffic flow confidentiality.
  • Connection Integrity with recovery.
  • Connection integrity without recovery.
  • Selective field connection integrity.
  • Connectionless integrity.
  • Selective field connectionless integrity.
  • Non-repudiation Origin.
  • Non-repudiation. Delivery.

For managing security, it is suggested that the IRP Agent authenticates with the IRP managing entity so as to establish a trusted relationship between the NE and the EMS. Authentication should be bidirectional. This means, that the IRP Agent should check whether the IRP manager is authorized to perform certain operations on the NE. On the other hand, the IRP manager should check whether the data/traffic is being received from a trusted IRP Agent.

If there is any bulk data transfer, then the integrity of the data should be examined. At all times, an activity trace log should be maintained of all the operations that have been performed between the IRP Agent and the IRP manager.

In case of any discrepancies,  the management entity should immediately raise security alarms towards the network management system.

Conclusion:

It is imperative that any IMS deployment is supported by a robust EMS and NMS infrastructure. Without a proper standardized management mechanism in place, network management and support is not possible.

In conclusion, the following are some of the salient features that should characterize a management system. This list is not exhaustive and may be extended as per the requirements of the operator.

The salient features of any management infrastructure (MI) are as follows:

1. The MI should be capable of managing nodes supplied by different vendors in addition to the management system itself.

2. It should support standardized interfaces such as SNMP, CORBA or SOAP as specified by the solution sets earlier.

3. It should provide fault management capabilities.

4. It should facilitate remote management operations towards the network elements.

5. Interoperability with other networks regarding the exchange of management information should be supported.

6. It should be secure.

7. It should be able to restore the operational state of the network element in case of failover or switchover scenarios.

8. It should be possible for installing new software releases into the managed network element.

9. It should be possible for managing and installing new instances of the network element into the network.

10. It should be possible for conducting tests against the new installation before commissioning into the live network.

11. Interfaces should be supported towards OSS and BSS subsystems.

12. External non-realtime interfaces towards CRMs should be supported.

There can be many more requirements. However, these ten requirements are an absolute must for any management infrastructure.

In case you found this post useful, please feel free to provide feedback by leaving comments. Thank you.

Posted in 3gpp, alarms, configuration, CORBA, EMS, fault management, FCAPS, I-CSCF, IMS, management, monitoring, network elements, NMS, OAM, performance management, security management, SNMP, telecom | 15 Comments »