Analytics

Oracle : Displaying multiple records in one row

Brief History of the Process:
The process merges customer’s CDRs data along with offer information and prepares a cumulative data insight on a daily basis.
The major concerns for issues are due to extraneous volume of data that it processes. Volume of CDRs are
120 Million on average and subscribers offer details are – 250 millions.
The existing process first summarizes CDRS into an intermediate table with avg volume of 15 millions.
Subsequent part of the process was written in basic pl/sql cursor for loop where it opens up the offers details information from the subscriptions table (avg volume – 250 mill) and merges all offers appending with ‘#’ for each subscriber summarized from the CDR table.
The average execution time was 14 hours with huge load with extreme CPU usage in the system. On occasions when hot back ups run days, the process extended up to 21 – 24 hours to execute.
We are in pressure to minimize the load in the system along with time constraint to finish the process in time as these data are in must need to Finance who there after extracts mission critical report out of it.

Analysis And Solution:
Once I have taken the process first for tuning on it, my concerns was lying with volume of records it processes. I analysed and made my mind to create a dynamic table with details of offers in a hash separated values along with the subscriber. This would give me a ready in hand details of subscriber with their offers segregated by ‘#’. This can easily be summarised at further to get the final cumulative table.
However the new thought that keep knocking the head was how to get the list of string separated by hash and create a table through single SQL by passing the existing cursor for loop.
I had two solutions in hand.
1.  To use the analytic ROW_NUMBER() function to simulate a hierarchy in the ordered data, and then in an
outer query use SYS_CONNECT_BY_PATH(new in 9i) to show that “hierarchy” as one line, and take the last
value in each group using the handy KEEP (DENSE_RANK LAST) construction (also added in 9i.)
This does not result in a particularly efficient or elegant query, but at least
  • It is self-contained, as no PL/SQL functions or object types are required, and
  • The results are ordered , the added advantage.
The query goes as below.
create table fact_services_details parallel 4
as
select subscriber,
ltrim(max(sys_connect_by_path(service_name,’#’))
keep (dense_rank last order by curr),’#’) as services
from
(
select subs.subscriber,
bs.service_name,
row_number() over (partition by subs.subscriber order by bs.service_name) as curr,
row_number() over (partition by subs.subscriber order by bs.service_name) -1 prev
from bolton_services bs, subscriptions subs
where subs.offerid = bs.offer_id
and subs.status <> -1
and subs.next_bill > sysdate
and bs.report_finance = ‘Y’
)
group by subscriber
connect by prev = prior curr and subscriber=prior subscriber
start with curr=1;
——————————————————————————————————————————–
Hurray, The table got created in 1 hour 45 mins and complete process finishes in 2 hour 10 mins.
LOG as below.
12-JUN-2013 05:32:59 smp_cds process started at
12-JUN-2013 07:18:41 insert into smp_call_detail_daily_summary..
12-JUN-2013 07:19:44 The process successfully finished..
check for cdr loading…………………………………….. = 12/06/2013 05:10:01
drop table started at…………………………………….. = 12/06/2013 05:10:01
assigning table string……………………………………. = 12/06/2013 05:10:01
create the table…………………………………………. = 12/06/2013 05:10:01
new column add started at…………………………………. = 12/06/2013 05:30:05
create index started at…………………………………… = 12/06/2013 05:30:05
compile the procedure…………………………………….. = 12/06/2013 05:31:05
gather stat on table smp_call_detail_summary………………… = 12/06/2013 05:31:06
call new process…………………………………………. = 12/06/2013 05:32:59
PL/SQL procedure successfully completed.
———————————————————————————————————-
2.       The second solution in hand was  harnessing the dark power of XML, which was written to get the same
cumulative table using XMLAGG functions.
create table fact_services_details parallel 4
as
select subscriber,
rtrim
( xmlagg (xmlelement (bs, service_name || ‘#’) order by service_name).extract (‘//text()’)
,’#’ ) AS SERVICES
FROM  bolton_services bs, subscriptions subs
where subs.offerid = bs.offer_id
and subs.status <> -1
AND subs.next_bill > SYSDATE
and bs.report_finance = ‘Y’
GROUP BY subscriber;
———————————————————————————————————-
This doesn’t go with optimum results as the table creation was taking more time than the first solution.
I had to discard this idea in all and stick to the first approach.
This went live with first solution and running fine for last two weeks.
The business is happy to get the result in time and  my team is happy for the reduced load in the system even during the busy month-end days when back ups are running.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s