Aayush: weblog

Archive for the ‘I-CSCF’ Category

Race Condition for “tel” URI Routing in IMS – Unknown numbers !

Posted by Aayush Bhatnagar on April 4, 2011


Recently, in a discussion I came across a race condition that occurs in IMS for routing tel URIs which no longer exist in the IMS domain (operator’s network).

The problem statement is as follows:


When the operator de-provisions a telephone number from the IMS core, and another IMS subscriber dials this invalid number, then the call traverses the Proxy CSCF and hits the S-CSCF. I am assuming that it is an intra IMS domain call – synonymous to a local call. Here the called party (B-party) is the invalid subscriber. The S-CSCF executes the originating iFC set for the calling party as per the standards. Then, the S-CSCF realizes that the called party address is a tel URI and an ENUM query needs to be performed. As the subscriber has been de-provisioned from the network completely, the ENUM query fails.

This triggers the S-CSCF to forward the call to BGCF believing that legacy interconnect procedures need to be performed and the called party is a legacy subscriber. The BGCF attempts breakout towards the MGCF, which converts it to an IAM and sends it to the PSTN.

The situation would become even more complex with number portability, where we can no longer filter numbers based on ranges alone at the BGCF.

The PSTN domain would send it back to the IMS core thinking that the call was destined towards IMS. As inter-working will take place again at the MGCF for the incoming call leg, this call would land as a brand new call to the I-CSCF (with a refreshed Max-Forwards header) and a new Call-id.

Now, the I-CSCF will perform the DIAMETER user location query to the HSS and this query would fail, as the tel uri never existed in the first place. Based on the procedures in TS 24.229, the I-CSCF will inspect the address (which is the tel URI) and then perform the ENUM query. This query would fail as expected and the call will be forwarded to the BGCF once more mistaking it to the legacy interconnect case.

The sequence of events above would lead to an infinite loop in the network for this call.

Solution:

The solution to this problem is to avoid the breakout to the PSTN through the BGCF in case the user has dialed an invalid number. In order for this to happen, the ENUM query must not fail. Hence, it is proposed that when the user is de-provisioned from the network, the ENUM mapping of his number is changed to a generic SIP URI such as the following –

sip:doesnotexist@domain

This will ensure that this generic SIP URI is returned when the ENUM query is fired. In this case, the S-CSCF will forward the call to the I-CSCF instead of the BGCF. The I-CSCF will then forward the call to the MRFC by executing PSI subdomain routing in the sense of TS 23.228.

This would ensure that the MRF will play an announcement that the “number does not exist”. This is what is needed in this scenario. But, even here, the I-CSCF needs to do some NETANN magic before forwarding the call to the MRF (refer here: http://tools.ietf.org/html/rfc4240). Hence, the I-CSCF implementation has to be careful to make this work well.

 

As and when I come across more race conditions in the IMS network, I will post them here (most probably with a possible solution to get around them).

 

Advertisements

Posted in HSS, I-CSCF, IMS, IMS data, IMS procedures, IMS Release 11 | Tagged: , , , , , , | Leave a Comment »

Operations And Management (OAM) fundamentals for IMS

Posted by Aayush Bhatnagar on December 8, 2009


Introduction:

OA&M is a central requirement of mission critical telecommunication products.

3GPP defines a generic framework for realizing an OAM architecture. This architecture can be applied for the case of IMS, in order to manage the IMS core network entities. OAM frameworks are implemented as Element Management Systems (EMS) and Network Management Systems (NMS).

Architectural Overview:

The OAM architectural framework is based on Integration Reference Points (IRPs). Each IRP provides a set of requirements for that interface and completely defines the management interface. The IRP is fully described by a requirements specification, an Information Service (IS) specification and finally a Solution Set (SS).  The solution set realizes completely the requirements of the given IRP.

There are four possible solution sets that may be considered for implementation of the IRPs:

1. SNMP Solution Set.

2. CORBA Solution Set.

3. SOAP Solution Set.

4. XML Solution Set.

Of these solution sets, SNMP and CORBA have been around for a long time. SNMP has been considered as a de-facto standard for northbound integration and is generally preferred. CORBA has not seen widespread adoption as compared to SNMP.  SOAP and XML solution sets also present themselves as additional options for IRP realization.

The stress has been on making the IRPs technology independent. Hence multiple options have been provided.

Deployment Architecture:

Each network element that has to be managed, consists of a management client. This management client is called an IRP agent. The IRP agent interacts with the management server, also known as the IRP manager.  In the context of OAM, the management server is the Element Management System for that network element.  The IRP agent is part of the network element that is being managed. These element management systems further integrate with a central network management system.

The overall deployment architecture is shown in the figure below:

The figure above illustrates the following management interfaces:

1. Between the network element and the element manager.

2. Between the element manager and the network manager.

3. Between different element managers.

There can also be interfaces between the network manages of different service provider domains. Those interfaces are not shown in the figure above. The interface between the network element and the element manager implements one of the four proposed solution sets. SNMP is the most widespread protocol used at this interface.

Several element managers integrate to a centralized network management system. At the NMS, abstract functions such as topology management, service provider interface managements, point of interconnects, interfaces towards OSS and BSS systems can be managed and monitored.

Management information also flows between the network elements themselves. However, that interface is not shown as the information may not necessarily be used for monitoring purposes.

Integration Reference Point (IRP) applications:

Some practical IRP applications will be discussed in this section. All IRPs are not covered.

Some of the needed applications are those of FCAPS: (Fault, Configuration,Alarms,Performance and Security).

a) Alarm Management IRP (Fault Management):

Network elements may encounter certain operational faults. These faults have to be escalated in the form of alarms to the element management system. There are two types of faults that may need to be handled.

The first category is that of a ADAC (Automatically Detected and Cleared) fault. Such faults are raised by the network element and then cleared by the network element itself after a certain period of time. These are not critical faults and do not hinder the operational state of the system.

The second category of alarms are those of ADMC (Automatically Detected and Manually Cleared) faults. These faults require that the network element manager or administrator clears the alarms after taking corrective action.

Faults and alarms that are raised by network elements fall under the following basic categories:

1. Hardware faults:

These alarms signify faults in the physical hardware that hosts the application. Hardware failures can cause serious service interruption and need to be conveyed to the element management system. Hardware failures are usually ADMC faults.

2. Software faults:

These alarms signify faults that may occur in the internal modules of a software subsystem of the network element. Such faults may occur due to protocol level errors, exception handling conditions, database errors etc.

3. Functional faults:

These alarms signify that there is an issue with the functionality of the software module that may potentially impede service delivery and may cause potential loss of service.

4. Loss of NE capabilities:

These alarms signify that the network element has lost its capability to provide service due to congestion or overload conditions for example.

5. Communication link failures with other NEs.

These alarms signify that the network element has lost contact with some other network elements. As for example, the S-CSCF may lose its connection to the HSS or the Charging System. Database connectivity may be lost etc.

Fault recovery and corrective actions depends upon the nature of the fault and the process defined for its recovery by the carrier.

b) Configuration Management IRP and Trace Control:

Configuration management pertains to the manipulation of configurable parameters of the network element by the element manager.  This may include logging management, activation or deactivation of tracing on the network element, subscriber specific tracing, enabling or disabling of lawful interception for a certain subscriber etc.

It also includes configuring a high availability cluster of the network element, adding of new nodes, removal of faulty nodes for servicing, dumping of CDRs to an alternate location etc.

Configuration management is of two types:

1. Passive Configuration Management: Passive CM signifies that the management entity gets notified of configuration changes at the network element by virtue of alarms that are raised at the NE.

2. Active Configuration Management: Active CM signifies that the management entity actively participates in manipulating the configurable parameters of the network element.

There can be two approaches by which we may use configuration management. These are:

1. Basic Configuration Management: This involves the changing of a single configurable parameter. Basic CM operations are singular in nature and take place serially.

2. Bulk Configuration Management: This involves the execution of configuration actions by executing a batch file or by spawning a cron job. Such operations affect many configurable parameters parallely and are usually automated.

Configuration management is an important aspect of the product life cycle of a telecom product. In production grade systems, all changes to the network element are carried out using the configuration management reference point. Some of the configurable parameters may also need to be changed for improved performance of the product under certain conditions.

As for example: Database connection pool size manipulation, tracing level modifications and other product specific parameters.

c) Performance Management (Alarm IRP):

Performance of a network element is determined by examining performance counters. As per the architecture, performance management re-uses the Alarm integration reference point.

Performance counters may be of varying types. Network element vendors may also define some of their own proprietary performance counter categories.

However, there are some basic concepts of performance measurement that remain common to most network elements:

1. Traffic measurements and BHCA: This includes the examination of traffic counters pertaining to successfully handled calls and the current BHCA load on the system in production. This gives a fair idea of system performance.

2. Call failure rate: This pertains to the examination of the failed calls, failure reasons and the rate of failure as a percentage of total calls. This provides the administrator with valuable figures to ensure that the system provide carrier grade performance or not (5 nines).

3. Network evaluation: This pertains to the evaluation of the performance on of the system after certain configuration changes take effect on the production grade system. This helps the operator to evaluate whether the changes that took effect were of any benefit to the overall system performance or not.

4. Quality of Service: For media and data intensive applications, there may be performance measurements regarding QoS. This may include parameters such as packet loss and jitter. For more detailed information, please refer to ITU-T Recommendation E.880.

5. Resource consumption: This provides information regarding the resource consumption by the application at a given load. This includes memory footprint, threads, congestion levels, CPU utilization, response time etc.

Performance data may be written to a file and dumped to the filesystem. This file can later be parsed and presented to the administrator on the Graphical User Interface (GUI).

In certain cases, when the performance degrades below a certain level, alarms need to be raised to the network manager. Some examples of such conditions may be overload, congestion, resource exhaustion, call failure rate increase etc. Performance related alarms may be escalated depending upon which performance threshold has been crossed. As for example:

— A new alarm notification A1 (minor) is generated when Level 1 is crossed

–A changed alarm notification A1 (critical) is generated when Level 2 and 3 are crossed

–A changed alarm notification A1 (minor) is generated when Level 3 and 2 are crossed

–A cleared alarm notification A1 is generated when Level 1 is crossed. Alternatively, this alarm may be marked as a ADMC alarm, where the alarm is cleared manually by the administrator.

Alarm generation may happen periodically by a configured monitoring period. This process is illustrated below:

In this regard, performance management functions use the Alarm IRP to communicate with the element manager and raise alarms relating to performance of the network element.

d) Security Management IRP:

Security management has a special meaning when applied to OAM. For the OAM domain, security management refers to the protection of management traffic, operations, authentication of the administrator and role management of various managers that manipulate the network element. The basic goal is to prevent any intentional or accidental damage to the network elements in production.

Security management in OAM should also be accompanied by the generation of security related alarms by the system, in case there is a security breach. What action is taken upon servicing such alarms is based on the local policy of the operator.

Some of the security threats that have been identified by ITU-T in ITU-T Recommendation X.800 are as follows:

  • Masquerade.
  • Eavesdropping.
  • Unauthorized access.
  • Loss or corruption of information.
  • Repudiation.
  • Forgery.
  • Denial of service.

ITU-T Recommendation X.800 also mandates that the following security services be provided in order to protect the network against these security risks:

  • Peer entity authentication.
  • Data origin authentication.
  • Access control service.
  • Connection confidentiality.
  • Connectionless confidentiality.
  • Selective field confidentiality.
  • Traffic flow confidentiality.
  • Connection Integrity with recovery.
  • Connection integrity without recovery.
  • Selective field connection integrity.
  • Connectionless integrity.
  • Selective field connectionless integrity.
  • Non-repudiation Origin.
  • Non-repudiation. Delivery.

For managing security, it is suggested that the IRP Agent authenticates with the IRP managing entity so as to establish a trusted relationship between the NE and the EMS. Authentication should be bidirectional. This means, that the IRP Agent should check whether the IRP manager is authorized to perform certain operations on the NE. On the other hand, the IRP manager should check whether the data/traffic is being received from a trusted IRP Agent.

If there is any bulk data transfer, then the integrity of the data should be examined. At all times, an activity trace log should be maintained of all the operations that have been performed between the IRP Agent and the IRP manager.

In case of any discrepancies,  the management entity should immediately raise security alarms towards the network management system.

Conclusion:

It is imperative that any IMS deployment is supported by a robust EMS and NMS infrastructure. Without a proper standardized management mechanism in place, network management and support is not possible.

In conclusion, the following are some of the salient features that should characterize a management system. This list is not exhaustive and may be extended as per the requirements of the operator.

The salient features of any management infrastructure (MI) are as follows:

1. The MI should be capable of managing nodes supplied by different vendors in addition to the management system itself.

2. It should support standardized interfaces such as SNMP, CORBA or SOAP as specified by the solution sets earlier.

3. It should provide fault management capabilities.

4. It should facilitate remote management operations towards the network elements.

5. Interoperability with other networks regarding the exchange of management information should be supported.

6. It should be secure.

7. It should be able to restore the operational state of the network element in case of failover or switchover scenarios.

8. It should be possible for installing new software releases into the managed network element.

9. It should be possible for managing and installing new instances of the network element into the network.

10. It should be possible for conducting tests against the new installation before commissioning into the live network.

11. Interfaces should be supported towards OSS and BSS subsystems.

12. External non-realtime interfaces towards CRMs should be supported.

There can be many more requirements. However, these ten requirements are an absolute must for any management infrastructure.

In case you found this post useful, please feel free to provide feedback by leaving comments. Thank you.

Posted in 3gpp, alarms, configuration, CORBA, EMS, fault management, FCAPS, I-CSCF, IMS, management, monitoring, network elements, NMS, OAM, performance management, security management, SNMP, telecom | 13 Comments »

Scalability Planning for the IMS Charging architecture

Posted by Aayush Bhatnagar on November 4, 2009


Introduction:

The IP Multimedia subsystem provides a well defined and streamlined architecture for charging multimedia calls and services. The IMS network elements interface to the charging platform over the DIAMETER protocol to enable both pre-paid and post-paid charging.

Architectural Overview of IMS Charging:

The IMS charging platform is sub-divided into two major components:

1. The Charging Data Function (CDF).

2. The Online Charging System (OCS).

The CDF is responsible for receiving triggers for offline (post-paid) charging, while the OCS  is responsible for receiving pre-paid charging triggers. The CDF supports the Rf DIAMETER application, while the OCS supports the Ro DIAMETER application interface.

The Charging Detail Records (CDRs) are collected and co-related at the Charging Gateway Function (CGF). The CGF acts as a gateway to the Billing System, which performs mediation duties.

Scalability Challenges:

In the IMS architecture, the P-CSCF, S-CSCF, SIP-application servers, MRF ,MGCF,BGCF and the I-BCF all need to support the offline charging application interface (Rf interface). This means, that they all act as charging clients (CTF) in case of post-paid scenarios and send triggers to the CDF.

The S-CSCF, the MRF and SIP Application servers support the online charging interface towards the OCS. The IMS Gateway function facilitates the online charging functionality in the IMS architecture by acting as a SIP AS.

A quick look at this architectural challenge, presents us with a nearly all connected architecture for the CDF (it interfaces with almost all network elements for postpaid charging). This means, that the CDF acts as a multiplexer of incoming DIAMETER commands.

charging

Even for a simple IMS call, the CDF will receive triggers from the P-CSCF and the S-CSCF. If there is an application server involvement (Eg: Supplementary services) in the call flow, it will receive a trigger from that AS as well. If an I-BCF is present in the network, and the call needs to be terminated in another IMS domain, the I-BCF will also send a trigger to the CDF. This is a very practical scenario, as almost all IMS customers will subscribe to at least one supplementary service and every IMS core network is expected to have border control and peering functions for security (I-BCF acting as an entry and exit point to the network).

This translates into four DIAMETER transactions for the CDF for a single IMS transaction. A single IMS call initiated by an INVITE may have multiple chargeable transactions involved (those for UPDATEs, RE-INVITEs and finally for BYE).

The OCS interfaces with only the S-CSCF, the MRFC and SIP application servers. Hence it is expected to be less loaded as compared to the CDF as shown below:

online

Scalability Requirements Quantified:

Let us take an example of one IMS call for the sake of calculation. A reasonable load of 100 calls per second is assumed in the calculations.

We will only consider “chargeable” transactions i.e. transactions for whom a DIAMETER trigger will be sent to the CDF. A single call can have an INVITE transaction, an UPDATE transaction, a RE-INVITE transaction (assuming there was only 1 re-invite in the call) and finally a BYE transaction. Thus, we have 4 “chargeable” transactions. For each transaction, a DIAMETER ACR/ACA exchange takes place between each network node and the CDF.

Even for a reasonable load of 100 calls per second on the IMS core, the DIAMETER transactions for the CDF will need to scale up to 1600 TPS. This provides us with a 1:16 scalability requirement for the Charging Data Function.

For the Online Charging System, only the S-CSCF (through the IMS GWF), SIP application servers (if any in the call path) and the MRFC send charging triggers.

For the sake of calculation, if we have one application server in the path of an IMS call, then a single IMS transaction will result in 2 DIAMETER transactions on the Ro interface. Taking the above assumptions, where a single call has 4 “chargeable” transactions, and a moderate load of 100 cps, the OCS requires to support 800 TPS. This provides us with a 1:8 scalability requirement for the Online Charging System.

The scalability factor requirement for both the CDF and OCS is considerable. In case of the OCS, the response time of the CCR will also affect the call setup latency, as the calls to CCR are synchronous. Unless a CCA is received, the SIP signaling is not put through.

Other considerations to plan for scalability:

In view of the above scalability requirements, we can now plan further on the nature of the deployment architecture of the CDF and the OCS.  Even though the scalability requirements for the OCS may seem to be half (800 TPS) of what is projected for the CDF (1600 TPS) , but the complexity of the OCS is much more as compared to the CDF. The online charging system has many internal modules responsible for real-time rate determination, calculation of the units to be debited and account balance management. Moreover, these modules need to be invoked for each IMS call. The OCS also needs to support time based and content based charging paradigms. This means, that the OCS node is a mission-critical and real-time charging engine. Apart from real-time traffic, non-real time traffic such as pre-paid balance inquiries, pre-paid recharging etc also need to be handled at the OCS (either through an IVR or a SMS based mechanism).

On the other hand, the CDF is responsible for creating and dumping the CDR files. There is no real time rate determination or account balance management involved. The raw CDRs are transported to the billing system (BSS) over FTP, where the itemized billing and mediation takes place.

Deployment planning and possible scenarios:

In view of these architectural discrepancies and varying levels of complexities between the CDF and the OCS, it is clear that there needs to be an architectural separation between the OCS and CDF for deployment. This means, that we have to deploy both nodes independently on dedicated machines and possibly in their own independent clusters.

Let us consider a single IMS domain. A single IMS domain will consist of its own S-CSCF, P-CSCF, SIP application servers (as needed), MRF, I-BCF(or a comparable SBC) and the charging platform (consisting of the OCS and the CDF).

For greater scalability, it is proposed to have a dedicated cluster for each charging entity. The OCS application should have its own cluster and so should the CDF. Both the OCS and CDF may share the same gateway (CGF) to interface to the Billing domain. The clusters should have a DIAMETER load balancer (for the Ro and Rf applications) installed to distribute load amongst the cluster instances.

For hardware configuration, the cluster members may reside on the same machine (for smaller deployments) and on a hardware pair (active-standby) to achieve hardware redundancy. Usually, for carrier grade deployments, hardware redundancy is a necessity.

Discussed below are certain deployment configurations of the OCS and the CDF for varying load requirements. The deployment architectural choices shown below also consider hardware redundancy.

NOTE: “Servers” may vary from case to case. You may go for a T-1000, T-2000 or a higher end SUN server. You may also go for a “cheaper” option by using HP servers (8 cores and 16 GB RAM). For large-scale carrier grade deployments, an ATCA is a must (12 blades and above).

Deployment for up to 1 million BHCA (approx 270 CPS):

270 CPS is assumed to be the call rate of SIP signaling during peak load. This is the most common deployment scenario considered in telecom and usually serves as a benchmark.

For IMS deployments for up to 270 CPS, we can have multiple options. We may go for a HP server pair configured as follows:

a) Server-1 has CDF active and OCS as stand-by.

b) Server-2 has CDF stand-by and OCS as active.

This simple deployment will provide us with software and hardware redundancy, while also providing dedicated 8 core servers for each charging application (OCS and CDF).

Software redundancy can be performed by providing 2 independent processes of the CDF and OCS on each server (active) and 2 more processes of the CDF and OCS (for stand-by). A software load balancer may be used for distributing this load amongst the OCS and CDF processes.

This is called a 1+1 active-standby configuration. Each active instance of the CDF is expected to handle up to 4320 TPS at peak load. Each active instance of the OCS is expected to handle up to 2160 TPS at peak load.

This deployment scenario is shown below. The red-arrows depict change-over in case of hardware failure. Software fault tolerance is taken care of by switching between the software instances of the OCS and the CDF by using a software load balancer.

server-pair

Each server may host multiple software processes for the CDF and the OCS. Each process can be a full CDF/OCS application in its own right. The load balancer may also be deployed in an active-standby configuration similar to the CDF and the OCS. The CDF and OCS active processes may be more in number, based on the scalability requirements. Each new process of the OCS or the CDF may be instantiated using a CLI or over SNMP, when the element management systems get an alarm of possible overload or traffic peaks.

This  kind of a deployment architecture is shown below:

failover

Other interfaces for this deployment can be a command line interface (CLI) for polling the system and performing administrative tasks. Other interfaces will be over SNMP to the northbound management systems for monitoring the system health and servicing alarms.

Other possibilities:

In case of higher loads, the system can switch to a 1+1 Active-Active configuration, where all software processes of the CDF and the OCS are accepting DIAMETER requests. The load balancer will handle the distribution of the requests amongst the processes. In case the capacity is still not sufficient, new application processes of the CDF and the OCS can be started by firing the appropriate CLI command to scale horizontally.

For deployments for over 1 million BHCA, as shown earlier, the load on the charging platform will be even higher. For catering to such traffic, a time will come when the system needs to scale out. We may require a server pair for the CDF and the OCS dedicatedly.

Another interim option is to have all active CDF processes on one machine and all active OCS process on another machine and horizontally scale by increasing the number of processes.

Conclusion:

This post was an attempt to quantify the challenges at hand for scaling the Charging platform in the conetxt of IMS. As seen, the requirements for scalability for the charging platform is more challenging than the other core network nodes. The scalability factors increase almost exponentially as the load on the IMS core increases. Hence, a robust scalability architecture needs to be devised for catering to the same.

Posted in 3gpp, DIAMETER, DIAMETER charging, I-CSCF, post-paid, pre-paid, S-CSCF | Tagged: , , , , , , | 12 Comments »