In this post, we will take an overview of Business Intelligence (BI). We will begin by describing the general term BI and then go towards details of implementation of BI using Microsoft BI platform.
Every organization is always in the process of increasing its revenue. No organization says that we have done some amount of business last year and henceforth we will continue with the same every year. In order to increase the business, the organization has to decide what will be the business growth for the next year and also for further 5 years and so on. The owner of the organization or the Board of Directors can not take a figure out of blue and say that it is the target for the next year. In order to arrive at some number the management has to take into consideration various aspects like the current industry situation, the company’s niche area, weakness and strengths of the company and competitors etc. If the organization is in the business for last 10 years and each year the growth is 10 % then this also becomes the deciding factor. For knowing all the above factors we need to keep the data and utilize it as and when required. Business Intelligence tools help in storing the data, analyzing the data and utilizing it for reports. Business Intelligence can be useful for any size of organization.
Let us take a particular example. Suppose there is a company in production business. They are coming up with a very good product which will provide large profits for them. The organization knows the cyclic trend when the products are launched till the time the sale increases. This trend can be available from previous product launches. The company may take certain decisions based on market survey over the years which suggests the perfect season when the prospective clients may have some funds to spend.
Let us take another example. Let us say we have a company which is in training business. The company deals with all computer related trainings mainly with 2 flavors. First one is where the clients / customers are trained in company’s premises. This type of business is called retail business. Other type of business is where the trainers go to various corporate for delivering trainings. Let us call it Corporate Training Business. Let us focus on how the company will increase the retail business. The company will give advertisements, inserts etc. Suppose the company wants to target a chunk of students who will have a lot of free time once they finish their secondary school or higher secondary school exams. The company approaches various schools, colleges, private institutes and collects the addresses. Thus the data can be gathered by various persons with various means. Some may collect data with a laptop on Access database, some with Excel while some may collect the data on papers. This way the company collects data from various sources. When the exams are approaching they send the students mailers saying “Best wishes for the coming exams and when the exams are over please visit us as we are starting special batches”. Thus a lot of students enquire after the exams and some of them can become actual clients. This was one way of increasing business. This newly gathered data can be combined with the data company is having for previous years. The data can be repeated if some of these students had already visited the institute. We need to omit on the repeated entries. The company can use the complete data to find out information like the colleges or locations from where maximum enquiries came. By keeping this data the company can find out if there is any slack period for walk in enquiries over the years. The management may require to know which were the months for maximum enquiry and enquiry to enrollment ratio.
If we consider the corporate training business, the company has to know where the industry is going, what kind of expertise will be required in future so that it keeps the staff ready before some competitor takes the chunk. If the company is in this business, then they can also have some trend such as which is slack period, they can keep database of students who have done short term courses and approach them for long duration courses. The board of directors can take decision how to increase the business if they can look at some trend reports or analysis.
For doing all these activities, the company needs some tools, for gathering data, processing data creating reports from the data.
Microsoft provides the suite of tools in one bundle along with the database engine called Business Intelligence Development Studio (commonly called as BIDS). The three main templates are SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS). As the names suggest SSIS is used for gathering and integrating data, SSAS for processing and analyzing gathered data and SSRS for creating reports.
Let us concentrate on SQL Server Integration Services first.
SQL Server Integration Services (SSIS)
This tool is very useful for gathering data from various sources. As we had already seen in one example where the company fetches students’ data from various places, this data can be any form.
It can be in the form of Excel sheets, access database or even in text files. Before the data is utilized for sending mailers as mentioned above it has to be in uniform format. Some students’ information can be found at multiple places but we need to send them mailer only once. There can be some information which we can omit while creating our database. In order to fetch the data from various sources, doing some cleaning jobs before putting in SQL Server database SSIS provides us various tools.
With SSIS we have a concept called Package which is the basic unit of work. Each SSIS project template can have multiple packages. These packages have extension of dtsx (Data Transformation Services with XML format). With SQL Server the information from package is stored in XML format.
The package will have all the necessary connections as it is the primary task for integrating data. Multiple packages can not share same connection manager. It can be copied from one package to another.
In this package we will have to decide on the flow of the processes for example we will initially fetch the data from flat files (text files), remove unnecessary information, truncate on extra field area and then put the data into SQL Server database. This flow can be specified with the help of Control Flow. This control flow can be called as the flowchart or the workflow. The Control Flow consists of different types of tasks like containers (loops, sequence container), data preparation tasks (data profiling, file system, FTP, XML etc), process communication tasks (execute package, Send mail, MSMQ etc) and the Data Flow Task
Even though the Data Flow Task is a part of Control Flow Task it has a lot of importance. This process is also called as ETL. ETL stands for Extract, Transform and Load. SSIS provides various data sources, destinations and a lot of transform tools. As mentioned sources can vary from excel, text files, XML files, access or even SQL server. Similarly the destinations can be disparate. Depending on how the gathered data is we need various transformation tools. We may have to compute some columns, create some derived columns, sort the data, and change the type for the data etc.
We can use variables with various scopes with packages with which some information can be given on the fly when the package is running. We can log some information; create event handlers for various tasks or when error is generated.
Just creating the package will not be enough, we need to execute it also. For running or executing the package there are various ways. One will be with the help of BIDs itself, second by using command line utilities like DTEXEC or DTEXECUI. We can also deploy the package from test machine to the production machine and run from their by putting it as a part of SQL Server Agent job. Depending on the security policy we can set the restriction about who can execute the package. We can also use configuration files with which certain information can be changed or given at the time of deployment of package.
Once the data is integrated and put at a common place say SQL Server database we can process and analyze the data with the help of analysis services.
SQL Server Analysis Services (SSAS)
This tool mainly deals with processing the already gathered information/ data. The data for any organization can be of two types, one which is required for a day to basis and the other which is for historical purposes. If we take the same example of a training organization, we can say that the existing students’ information will be required daily where as the past students’ information will be historical data. Once the student completes a particular course the company does not delete his/her information as it may be required sometime later when we have to send mailers for further courses or organize alumni events. Every organization thus has to maintain 2 different data one called as transactional data (OLTP- Online) and the other called historical or data warehouse.
The main purpose of transactional data is to have faster insertion and updates of data as against the historical data is mainly used for querying purpose. Thus the norms of data integrity, data normalization can be different for the 2 types of data. The size of data is of primary importance for transactional whereas the historical data can grow as much as required. The main requirement of keeping the historical data is the moment any query is fired it should give the instant result. Thus the data can be repeated, computed or aggregated columns can be stored as a part of Data Warehouse data.
The term Data Warehouse stands for the complete data for any organization. There are 2 schools of thoughts for gathering and creating Data Warehouse for any organization. The 2 experts who strongly follow their thoughts are Ralph Kimball and Bill Inmon. Kimball believes that all different departments can create their data storage and then the data can be integrated to form the Data Warehouse (BOTTOM-UP APPROACH). Inmon on the other hand has the opposite approach where once the complete Data Warehouse is ready automatically individual department’s data will also be ready (TOP-DOWN APPROACH). There is no wrong or right way as the final aim is to create and build Data Warehouse. Every organization can decide on its own approach.
With Microsoft SQL Server Analysis Services template we can keep the data in the aggregated in the form of a cube so as to make query processing in least time. This template provides various terms like Data Source, Data Source View, Cubes, Dimension and Fact tables.
While querying data we need information like number of students, total sales figures, total courses conducted which is in numeric form. Just saying 10000 as the number of students will not be meaningful but if we say this figure is for last 6 months or 1 year it will be more meaningful. We can also have information for various students locality wise or college wise in order to get more data.
In these examples we see main two categories one as numeric measures and other as slices and dices against which these numeric measures are queried. Normally table which stores all the information (all students’ data) is called as Fact table. The axis for fact tables is stored in Dimension tables.
The thumb rule can be primary keys are stored I dimension tables (students locality, colleges) and the subsequent secondary keys are stored in fact tables.
We can store this information in already aggregated format in the form of Cubes so as to fetch the data faster. In order to create Cubes first we need to provide the source from which the data is gathered.
With Data Source we can specify the provider from where the data comes which can be in the form of SQL Server database, DB2, Oracle or even hierarchical format.
Once the Data Source is set we can create a window for looking at the data. Suppose the SQL Server Database from where data is fetched consists of 50 tables, whereas our requirement is only for the data coming from Human Resources department. So we can create a window/ view which comprises of only those table which are required. Another advantage of DSV (Data Source View) can be if the tables are not related in the database we can establish the logical relationship. We can also create computed columns, thus the data can be manipulated without affecting the original data.
The different tables as a part of DSV can be of two types either Star schema or snowflake schema. The star schema specifies there is a single fact table and multiple dim tables related to it which looks like a star. In snowflake as the name suggest, one dim table can act as a fact for another dim table thus forming snowflake.
Once a cube is created it can be deployed on the Analysis Services server from where it can be queried by browsing or creating reports on it. We can specify whether the data in the cube should be automatically updated once the underlying fact or dim tables data is changed or not. This is possible by setting the storage settings as MOLAP (multidimensional OLAP), ROLAP (Realtime / Relational OLAP) or HOLAP (Hybrid OLAP).
A cube comprises of fact tables and dim tables. Maximum data will be stored in fact. When the storage becomes very high in volume it can be stored across multiple partitions. The numeric measures are provided from fact tables while the dim tables provide the primary keys based on which the data can be sliced and queried. Mostly for every cube a special type of dimension is required called Time Dimension. The data with time or to be specific date can be sliced by giving time attributes like month, quarter, semester or even year.
As already mentioned the data can be kept in already aggregated format in Analysis services server so as to query it faster. We have to keep in mind a very important factor that the data in Data Warehouse should be periodically updated from the transactional data. Thus for this periodic updating, we may have to go back to SSIS to fetch data.
Once the data is processed we need to present it in a user friendly manner so that certain decisions can be taken. The data can be presented in the form of pure HTML, Excel sheet, PDF or in the form of chart. This job can be achieved by Reporting Services
SQL Server Reporting Services (SSRS)
With the help of BIDS (Business Intelligence Development Studio) there are various templates provides for reports, Report Server Project, Report Server Wizard, Report Model project. The first 2 are for creating reports and the third will be used to create a report model based on which ad-hoc report can be created using Report Builder.
For creation of any report first we will require the data on which the report is based on. Like with SSAS we can create Data source for report. The data can be fetched from OLTP database, OLAP database, any other database like Oracle, Teradata or even on hierarchical data. The Data Source can be shared across multiple reports or can be different for different reports depending upon the requirements. The reports created are stored in XML format with the extension as RDL which stands for Report Description/Definition Language.
There report can be created by using a wizard which will provide step by step report creation followed by certain format for report after creation. Once the report created can be viewed with the help of preview tab from BIDS. The report can be deployed on Report Server which by default is rendered in HTML format. Before deploying the report we need to specify the Target Server URL for Report Server which is as follows
http://<machine Name>/ReportServer (in case of default instance)
http://<machine name>/ReportServer_<instance name> (in case of non default or named instance)
The _ will be replaced by $ in SQL Server 2005.
For creation of reports without using wizard reports there are various controls provided like Table, Matrix, Chart, List and Tablix (which is combination of Table and Matrix). The field list from a data source can be specified which can also comprise of parameters if required. For creation of dataset query analyzer is provided with the help of which we can select the different tables and the respective fields from them. If the tables are related in the database the relationship will be automatically fetched, if not it can be created.
We can specify default and available values for parameters used. These values can be some constant values or it can be from a query. The parameter can have multiple values if specified. In order to get the parameters data from another query we can create another dataset in the same report.
As already mentioned once reports created they can be deployed to the server with the help of BIDS. All reports can be deployed in one go. Another deployment method is to start Report Server and then choose the tab for Upload file. We can programmatically deploy reports by using rs utility.
It may that after seeing a particular report or trend, the management requests for some other report. In order to create the new report we may have to go back to integration stage and thus SSIS, SSAS and SSRS continue in cyclic manner.
In this article we discussed general concepts of Business Intelligence. Business Intelligence comprises of various tools which are used for gathering, storing, analyzing and providing access to information required for better understanding for organizations. These analysis and reports can be used for betterment of the organization. How every organization needs to use Business Intelligence. Later we talked about the 3 mail tools provided by Microsoft SQL Server called Integration Services, Analysis Services and Reporting Services. We also saw the main features provided by these tools.