WANG Hui,LI Lang
(Dept.of computer science,Hengyang Normal University,Hengyang Hunan 421002,China)
Application of Materialized View as Aggregate Table in Data Warehouse
WANG Hui,LI Lang
(Dept.of computer science,Hengyang Normal University,Hengyang Hunan 421002,China)
Aggregate tables store the pre-calculated summaries,and are critical in query performance tuning in a data warehouse.Analyzed the three methods of aggregates data building:triggers,stored procedures and materialized views.Stated the methods,usages and advantages of materialized views via examples,i.e.flexible refresh method,reduction of burdensome programming workload,query rewrite mechanism insuring application independence and so on.Indicated materialized views are ideal choice for aggregate tables.
materialized view;aggregate table;data warehouse
With the advent of the database application systems and information explosion in 1990s,people have acquired more and more data.Business analysts and decision-makers have to face the issues how to wisely use the data to make business decisions and direct business activities,thus Decision Support System(DSS)and On-Line Analytical Processing(OLAP)are in existence.The foundation of all DSS processing is to build a data warehouse.
A data warehouse is a subject-oriented,integrated,nonvolatile,and time-variant collection of data in support of management's decisions[1],the various applications of which may be in a few industries,such as finance,telecommunication,manufacture,retail sales,transportation and so on.In a data warehouse,huge historical data are stored,major operations are query and load,data access usually is for a group and access is infrequently,all of which are different from the traditional legacy database system,therefore the data modeling in a data warehouse should be different.Currently the industry has concluded that dimensional modeling is the most viable technique for data warehouse[2].
In dimensional model there are fact tables,where the numerical performance measurements of the business are stored,and dimension tables,which contain the textual descriptors of the business.In addition,in order to improve the query performance of data warehouse,there are some aggregate tables,which are pre-calculated summaries of the most granular data at higher levels along the dimension hierarchies[3].
Since the aggregate tables are pre-calculated summaries derived from the fact and dimension tables,it is critical how they are loaded or updated.There are three methods for aggregates data building.
1)Trigger:Create some triggers on fact and dimension tables.The advantage of the method is that the update of aggregate table is immediate,no delay as long as the fact or dimension table is loaded or updated;whereas the disadvantage is that the update time of aggregate is not flexible,the fact or dimension table change is finished only after all derived aggregate tables are updated,which is timeconsuming.The issue is vital and seriously impacts the performance of data warehouse,especially in business hours.
2)Stored procedure:Create some database stored procedures for updating aggregate tables.The benefit is that they are flexible,and can be run on schedule or manually;whereas the drawback is that building stored procedure is a tough and burdensome work,and requires developers'experience.
3)Materialized view:Create materialized views for aggregate tables.The merit is that the update of materialized view is also flexible,and can be achieved immediately,periodically or manually[4].The materialized view is a good practical feature of DBMS,and should be taken good advantage of to reduce the workload.
A materialized view(MV)is a schema object that provides indirect access to table data by storing the results of a query,and can be used to summarize,compute,replicate,and distribute da-ta[5].
MV is similar to index in a few ways[6],whereas CREATE MV statement is somewhat similar to CREATE VIEW.
MV can be used in a few fields:
1)Data replication.MV allows maintaining the copies of remote data on the local node.
2)ETL implementation[7][8].The usage of MV is an option for the ETL technology,which transfers data from data source to data warehouse.
3)Aggregate table data building.Creating MV is a good choice for aggregate table derived from fact and dimension tables in data warehouse.
As mentioned above,creating MV is a good choice for aggregate table.The following is an example for MV usage.
Figure 1 Fact and dimension in star schema
Suppose there are one fact table and three dimension tables in a star schema shown in Figure 1,one aggregate table could be summarizing the totals for a category,per city,per month shown in Figure 2.
Figure 2 Aggregate table
Where“BUILD IMMEDIATE”,which is default,indicates to build the data when the MV is created,another option is“BUILD DEFERRED”,which indicates to build the data later on;
For“REFRESH”there are three options:(a)“FAST”indicates the incremental refresh method,(b)“COMPLETE”indicates the complete refresh method,(c)“FORCE”(default)indicates to use“FAST”to refresh the MV if possible,use“COMPLETE”otherwise;
“ON DEMAND”(default)indicates MV to be refreshed by calling the refresh procedure,another option is“ON COMMIT”meaning a fast refresh to occur whenever an operation on a base table is committed.
“ENABLE QUERY REWRITE”indicates the MV is considered available for rewriting queries,another option is“DISABLE QUERY REWRITE”(default)meaning the MV unavailable for rewriting queries.
In order to use“FAST”refresh method for a MV,which is the most possible case in a data warehouse,materialized view log on each base table should be created.
If“ON DEMAND”is specified in the CREATE MV statement,which is also the most possible case in a data warehouse,system procedure“dbms_mview.refresh”needs to be called to refresh the MV.The procedure can be run manually or periodically with job queues.
The advantages of MV as aggregate table in a data warehouse are:
(1)MV is the feature of DBMS,and no burdensome programming work is needed for building and refreshing data of MV,which DBMS can handle.
(2)MV actually stores the query results,and can improve the query performance of data warehouse by the pre-calculated summaries.
(3)MV can be refreshed instantly,manually or periodically,which is very flexible.In a data warehouse environment MV is usually refreshed periodically,or manually if necessary.
(4)MV in a data warehouse is usually accessed through the query rewrite mechanism,transpar-ent to the end user or application,and can be added,altered or dropped just like index without impacting the validity and availability of the SQL statements in the applications..
Aggregate table plays an important role as a performance-tuning tool in a data warehouse,and MV is an ideal choice for aggregate table building.The paper describes the methods,usages,and advantages of MV as aggregate table,and we believe that it is practical valuable and greatly helpful in data warehouse applications.
[1]INMON W.H.Building the Data Warehouse[M].4th ed Indiana,USA,Wiley Publication Inc.2005
[2]KIMBALL R.,ROSS M.The Data Warehouse Toolkit 2nd ed.:The Complete Guide to Dimensional Modeling[M].New York USA,John Wiley and Sons,Inc.2002
[3]PONNIAH P.Data Warehousing Fundamentals for IT Professionals[M].2nd ed.New Jersey USA John Wiley&Sons,Inc.2010
[4]ZHU Wen,Mao Qin-h(huán)ui,Xue Yan,et al.Analysis and comparison on maintenance algorithms of materialized view in data warehouse[J].Modern Computer,2008(4):58-60
[5]CYRAN M,LANE P.Oracle Database Concepts 10g release[M].California,USA,Oracle Corp.2003
[6]CHEN Yi-xin,Ni Zi-wei.Application of materialized view in oracle large DB query[J].Fujian Computer,2011(10):147-148
[7]KIMBALL R.,CASERTA J.The Data Warehouse ETL Toolkit[M].Indiana,USA,Wiley Publication Inc.2004
[8]XIE Ren-dong,YANG Jun,Application of oracle materialized view in data warehouse[J].Computer Knowledge and Technology,2008,2(3):421-423
物化視圖作為匯總表在數(shù)據(jù)倉(cāng)庫(kù)中的應(yīng)用
汪 輝,李 浪
(衡陽(yáng)師范學(xué)院 計(jì)算機(jī)科學(xué)系,湖南 衡陽(yáng) 421002)
匯總表存儲(chǔ)的是事先計(jì)算好的匯總數(shù)據(jù),在數(shù)據(jù)倉(cāng)庫(kù)查詢調(diào)優(yōu)中是至關(guān)重要的。分析了匯總表數(shù)據(jù)生成的三種方法:即觸發(fā)器、存儲(chǔ)過程、物化視圖;用實(shí)例闡述了物化視圖的使用方法和優(yōu)點(diǎn),即靈活的數(shù)據(jù)刷新方式、減少繁重的編程工作、查詢重寫機(jī)制保證應(yīng)用獨(dú)立性等,指出了物化視圖是實(shí)現(xiàn)匯總表的理想選擇。
物化視圖;匯總表;數(shù)據(jù)倉(cāng)庫(kù)
CLC nunber:TP311
AArtical ID:1673-0313(2012)06-0059-04
date:2012-10-12
The Scientific Research Fund of Hunan Provincial Education Department(No:11B018),the Scientific Research fund of Hengyang Normal University(No 11B43)
Biography:Wang Hui(1964— ),male,born in Nanjing,Jiangsu province,MS,ever worked in Canada for 10years as architect in a few High-Tech companies,designed a few data warehouse applications.Research fields:Data Warehouse,BI,OLAP,Data Mining etc.Li Lang(1971— ),male,born in Hunan province,PhD,professor in Hengyang Normal University,Research fields:Embedded security,data processing etc.