In one of the previous article Exploring Business Intelligence (BI) with SQL Server , I discussed an overview of Business Intelligence and the three main tools provided by Microsoft for the same. The three tools are SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS). In this article we will discuss the various components in SSIS, its architecture and how to create a small package using SSIS.
BI is a set of tools and technologies provided to make better business decisions. These tools can be used for querying and reporting, data mining, data gathering or OLAP (Online analytical processing). Every organization deals with a lot of data. Every department in any organization has its own way of data storage. Some prefer to store in excel, some use access while some departments may store the data in relational database like SQL Server. If we are creating a report for Order Processing and we get the customers’ orders in printed form we require data entry for this. Suppose this data is entered with excel sheet. We have an Inventory processing system purchased from a third party where the data is stored in Oracle and we store all our customers’ information in SQL Server. We want to create a report which uses data from all these disparate sources. We can put all these data in a common platform with the help of SSIS.
As the name suggests SSIS is used for fetching data (from disparate sources), doing some changes on it (like changing the type from excel to SQL, or creating calculated field) and put the data on a uniform format which can be subsequently used for further analyzing or reporting.
SSIS is a service so you can have only one service on a single box. We have 2 runtime engines in the form of control flow and data flow. It is commonly called as ETL (Extract, Transform and Load). With Business Intelligence Development Studio (BIDS) we have graphical tools as well as wizards for creating and debugging the packages. Packages form the intricate component in SSIS template. Each package is a basic unit of deployment. Every package has control flow and data flow element. Each package has single control flow which is like the workflow for the package. Data flow is a component of control flow but provided with a separate tab as most of the packages will use it for ETL.
With ETL, apart from extracting and loading data from various sources and destinations we can do a lot of transformations to the data. We can merge data from 2 sources into one, we can convert column data type to suit the destination data type, the data can be sorted before putting in, and we can split the data by giving a condition so as to continue with only conditional data and many more tasks.
When we start the template of SQL Server Integration services, with the help of BIDS, we get following screen
The template automatically creates a package with the name as “Package.dtsx”. The extension dtsx stands for data transformation services with XML format. Everything in the package will be stored in XML format. The following screen shows the separate tab for Data Flow.
The following screen shows some of the components from control flow tab from toolbox
With the data flow we get three distinct components -- Data sources, Data Transformations and Data Destinations as shown here
With Data flow, we normally fetch data from various sources, do any data cleansing on it if required and finally load into destination.
We also have debugging facility with control flow, where we can add break points when required. With Data flow we have data viewer as a debugger which will show the data in grid, histogram, scatter plot or column chart.
Apart from control flow and data flow, other components of package include connection managers, variables, Event handlers and log providers. When we are working with data source or data destinations we require specifying the connections from which we can fetch data or load data into. These can be created by dragging and dropping data source or destination component on the design area and configuring it or by right clicking in the empty area for connection managers as shown below
We can create variables with different scope with each package. SSIS has 2 types of variables, system variables and user defined variable. System variables store information about the running packages. The scope of user defined variables can be for complete package or for a control on the package. The variable name is case sensitive. These variables can be used for passing properties at run time, specify the value for loop iterations etc.
At run time, control flow components like loops, sequence containers and also packages raise events. Event handlers are used to enhance functionality depending upon the event raised like OnError, OnProgress, OnTaskFailed etc. The events for Package are shown below:
Remember if any event has no handlers, event is raised to the next container up in hierarchy for package.
Logging can be provided for tasks, containers or packages. When you add logging to the package, you need to specify the provider and the location. The following figure shows types of log.
You can configure log by giving the name and description. You also need to give the connection manager log needs.
In this article, we had a quick overview of SSIS and the different features it provides. In the next article, I will show you how to create your first SSIS package with the help of Import Export Wizard