Nested Case Statement in SQL Server

This article is a continuation of SQL Server CASE Statement and CASE WHEN Examples . Today we will learn about Nested Case Statement in SQL Server.

We can nest CASE statements similar to nested ifs that we find in most programming languages.

Let us see an example.

select ename, job, sal, case  -- Outer Case
    when ename like 'A%' then
    case when sal >= 1500 then 'A' -- Nested Case
    end
    when ename like 'J%' then
    case when sal >= 2900 then 'J' -- Nested Case
    end
end as "Name-Grade"            
From Emp

7-nested-case
Image 7-Nested-Case

Limit of nesting a CASE function is up to 10 levels only.

In the following example, the limit of 10 is completely utilized.

Declare @x int
set @x = 1
Select 
case when @x <= 100 then  -- Level 1
 case when @x <= 90 then -- Level 2
    case when @x <= 80 then -- Level 3
        case when @x <= 70 then -- Level 4
            case when @x <= 60 then -- Level 5
                case when @x <= 50 then -- Level 6
                    case when @x <= 40 then -- Level 7
                       case when @x <= 30 then --Level 8
                         case when @x <= 20 then--Level 9
                           case when @x<= 10 then--Level 10
                                100
End End End End End End End End End 
End as "Nested Case" 
--Ending all levels!

8-nested-case-upto-10-levels
Image 8-Nested-Case-Upto-10-Levels

If nesting is exceeding 10 levels, then SQL Server throws an error.
Declare @x int
set @x = 1
Select 
case when @x <= 100 then  -- Level 1
case when @x <= 90 then -- Level 2
case when @x <= 80 then -- Level 3
    case when @x <= 70 then -- Level 4
        case when @x <= 60 then -- Level 5
            case when @x <= 50 then -- Level 6
                case when @x <= 40 then -- Level 7
                   case when @x <= 30 then --Level 8
                     case when @x <= 20 then--Level 9
                       case when @x<= 10 then--Level 10
                          case when @x <= 11 then -- 11
                             100
End End End End End End End End End End 
End as "Nested Case"

Msg 125, Level 15, State 4, Line 14
Case expressions may only be nested to level 10.

SQL Server select into temp table

Many a times we get an output, and then query the output further. In such scenarios instead of writing the entire query repeatedly, and then improvising it or filtering it differently, we can dump the output of the main query into a temporary table i.e. Select into Temp table. Later we can query the temporary table & improvise the outputs as needed.

Let us create the sample tables.

CREATE TABLE EMP
       (EMPNO Integer Primary Key,
        ENAME Varchar(15),
        JOB Varchar(9),
        MGR Integer,
        HIREDATE datetime,
        SAL Integer,
        COMM Integer,
        DEPTNO Integer);

INSERT INTO EMP VALUES
        (7369, 'SMITH',  'CLERK',     7902, '12/17/80',
        800, NULL, 20);
INSERT INTO EMP VALUES
        (7499, 'ALLEN',  'SALESMAN',  7698,
        '5/20/81',1600,  300, 30);
INSERT INTO EMP VALUES
        (7521, 'WARD',   'SALESMAN',  7698,
        '5/22/81', 1250,  500, 30);
INSERT INTO EMP VALUES
        (7566, 'JONES',  'MANAGER',   7839,
        '4/2/81',  2975, NULL, 20);
INSERT INTO EMP VALUES
        (7654, 'MARTIN', 'SALESMAN',  7698,
        '9/28/81', 1250, 1400, 30);
INSERT INTO EMP VALUES
        (7698, 'BLAKE',  'MANAGER',   7839,
        '5/1/81',  2850, NULL, 30);
INSERT INTO EMP VALUES
        (7782, 'CLARK',  'MANAGER',   7839,
        '6/9/81',  2450, NULL, 10);
INSERT INTO EMP VALUES
        (7788, 'SCOTT',  'ANALYST',   7566,
        '12/9/82', 3000, NULL, 20);
INSERT INTO EMP VALUES
        (7839, 'KING',   'PRESIDENT', NULL,
        '11/17/81', 5000, NULL, 10);
INSERT INTO EMP VALUES
        (7844, 'TURNER', 'SALESMAN',  7698,
        '9/8/1981',  1500,    0, 30);
INSERT INTO EMP VALUES
        (7876, 'ADAMS',  'CLERK',     7788,
        '1/12/83', 1100, NULL, 20);
INSERT INTO EMP VALUES
        (7900, 'JAMES',  'CLERK',     7698,
        '12/3/81',   950, NULL, 30);
INSERT INTO EMP VALUES
        (7902, 'FORD',   'ANALYST',   7566,
        '12/3/81',  3000, NULL, 20);
INSERT INTO EMP VALUES
        (7934, 'MILLER', 'CLERK',     7782,
        '1/23/82', 1300, NULL, 10);


CREATE TABLE DEPT
       (DEPTNO INTEGER Primary Key,
        DNAME VARCHAR(14),
        LOC VARCHAR(13) );

INSERT INTO DEPT VALUES
        (10,'ACCOUNTING','NEW YORK');
INSERT INTO DEPT VALUES (20,'RESEARCH','DALLAS');
INSERT INTO DEPT VALUES
        (30,'SALES','CHICAGO');
INSERT INTO DEPT VALUES
        (40,'OPERATIONS','BOSTON');

Now we will write a complex looking select statement as shown here –

select ename, emp.deptno, sal, 
           case when sal >= 5000 then 'Grade A'
                when sal >= 3000 then 'Grade B'
                when sal >= 2500 then 'Grade C'
                when sal >= 2000 then 'Grade D'
                when sal >= 1000 then 'Grade E'
                else
                'Grade F'
            end as Sal_Grade,
            datepart(yy, hiredate) as Year_Of_Hiredate,
            DName,
           Sal + IsNull(Comm,0) as Total,
           Case
            when Comm is null or Comm = 0 then 'Commission not provided'
            when Comm is not null then 'Commission provided'
            End as Commission_Status,
               SubString(Loc,1,1) as Location_ID
from emp join dept
on Emp.deptno = Dept.deptno

The output of this query is as as follows –

complex-select-output

In this query, we have used the following techniques:

1. Two Case Functions
2. Date Function
3. IsNull Function
4. Substring Function
5. Inner Join

Now improvise this output. Put this output into a local temporary table by issuing the following query:

Select * into #Emp_Info
from(select ename, emp.deptno, sal, 
           case when sal >= 5000 then 'Grade A'
                when sal >= 3000 then 'Grade B'
                when sal >= 2500 then 'Grade C'
                when sal >= 2000 then 'Grade D'
                when sal >= 1000 then 'Grade E'
                else
                'Grade F'
            end as Sal_Grade,
            datepart(yy, hiredate) as Year_Of_Hiredate,
            DName,
           Sal + IsNull(Comm,0) as Total,
           Case
            when Comm is null or Comm = 0 then 'Commision not provided'
            when Comm is not null then 'Commision provided'
            End as Commission_Status,
               SubString(Loc,1,1) as Location_ID
from emp join dept
on Emp.deptno = Dept.deptno
) as Emp_Dept

So now these 14 processed rows are the actual records of #Emp_Info table. Therefore now when we will query this table, the inner join, case, date, substring & IsNull functions do not get executed again & again. This will be a big performance boost!

Let’s explore different outputs querying the #Emp_Info table now.

Query 1 – Display records of Grade B employees

Select * from #Emp_Info
where Sal_Grade = 'Grade B'
go

gradeb-employees
Query 2 – Display grade wise total of salaries

Select Sal_Grade, Sum(Sal) as Salary_Sum
from #Emp_Info
Group by Sal_Grade
go

sql-select-into-temp

Query 3 – Display employee names, totals, DName and year of hiredate for all employees who have joined in 1981.

Select Ename, DName, Total, Year_of_Hiredate
from #Emp_Info
where Year_of_Hiredate = '1981'
sql-select

So as we can see, doing a T-SQL Select into temp table can help us fire additional queries on the output, and in some cases, give a performance boost too

Importance of RAID in Databases

This article was modified on August 31st, 2016.

RAID is a disk system that contains multiple disk drives, called an array, to provide greater performance, fault tolerance, storage capacity, at a moderate cost. While configuring your server system, you typically have to make a choice between hardware RAID and software RAID for the server’s internal disk drives

RAID System

RAID systems are widely used as storage solutions to get the best I/O performance, depending whether the application is write intensive or read intensive. DBA RayRankins mentions in his book that for database-related applications, in order to minimize disk head movement and maximize I/O performance, it’s a good practice to spread random I/O’s (data changes) and sequential I/O’s (for the transaction log) across different disk subsystems. I agree and endorse his view, as SQL Server, or for that matter any other database, is very much an I/O intensive system.

I will be using SQL Server as an example to explain the importance of RAID in databases, however you can implement these concepts in your choice of database. The concepts more or less, remain the same

Although RAID is not a part of a database like SQL Server, implementing RAID can directly affect the way SQL Server performs.There are many RAID arrays available such as RAID 0, RAID 1, RAID 3, RAID 4, RAID 5, RAID 6, RAID 10 and RAID 01. In this article, we will discuss the ones you will likely encounter as a SQL Server DBA i.e. RAID levels 0, 1, 5 and 10 and also  discuss their advantages and disadvantages from a fault tolerance and performance perspective

Note: RAID is not a replacement for backups. Backups are very essential for any system.

Different RAID Levels (Advantages and Disadvantages)


We will discussing only RAID 0, 1, 5 and 10 (database perspective) .

RAID 0 – Also known as Disk Striping, RAID 0 does not provide redundancy or fault tolerance but instead writes data to two drives, in an alternating fashion. This provides the best read write I/O performance. If you had 8 chunks of data, for example, chunk 1, 3, 5, and 7 would be written to the first drive, and chunk 2, 4, 6, and 8 would be written to the second drive, but all in a fixed (sequential) order. RAID 0 has a simple design, easier to implement and no overheads for parity. The drawback is that any piece of data is on only one disk, so if one disk fails, data stored within those disks are lost.

RAID 1- Also known as Disk Mirroring, RAID 1 provides a redundant, identical copy of a selected disk and thus provides good fault tolerance. It can be implemented with 2 drives. The disadvantage is that it has a big storage overhead and a high cost/capacity ratio

RAID 5 - Also known as Disk Striping with Parity, stripes data across multiple drives and writes parity bits across drives. Data redundancy is provided by the parity information. It can be implemented with 3 or more disks and is a popular choice amongst DBA’s. Since data and parity information are arranged on the disk array, two types of information are always on different disks. If one disk fails, just replace it with a new disk and the array rebuilds itself. RAID 5 has a higher read rate and makes good use of capacity. The drawback of RAID 5 is slower write rates and slow rebuild times.

RAID 10 - Also known as mirroring with striping, RAID 10 is a combination of RAID1 + RAID0. RAID 10 uses a striped array of disks that are then mirrored to another identical set of striped disks. This array level uses at least four hard disks and additional disks must be added in even numbers. The data is first placed into mirrored pairs at the lower level. Next, the controller selects a member from each mirrored pair and stripes the data into a new logical volume. Since RAID 10 writes in a random fashion, it provides best performance with a write-intensive application (like video editing). The drawback is that it is expensive.

Which RAID is Suitable for my Database?


Now that you have an overview of RAID levels, let’s look at which RAID is suitable for a database. The answer to this question depends on a variety of factors. Do you want availability, performance or cost? What are your requirements for fault tolerance and performance? Here’s a quick snapshot of the performance and fault tolerance provided by RAID

sqlserver-raid-performance

When it comes to a database like SQL Server, no one RAID level will suit your need. In most cases, SQL Server performs large reads and small writes. So for databases, where write operations are more, RAID 5 is not a good choice. On the contrary, RAID 10 is a good option for databases with more write operations.

Here are some points and best practices to keep in mind while deciding the RAID system for your database.
  • Your Data, Logs, tempdb, and backups should be on separate physical drives or a set of disks (array).
  • RAID1 is often chosen to store operating systems, binaries, index file groups and database transaction log files. Critical to log and index performance is fault tolerance and a good write speed.
  • Since log file are written sequentially and are read only for recovery operations, recommended RAID for Log Files is RAID 1 or 10. If your RAID 1 is at a 100% usage, choose RAID 10 for better performance.
  • For data files with random access and read heavy data volumes, striping is important. So recommended RAID is 5 or 10.
  • For data files which require good write performance, RAID 10 is recommended. Use a battery backed caching RAID controller for better write performance
  • For tempdb files with good read/write performance RAID 0, 1 or 10 is recommended. Although tempdb contains temporary data and DBA’s often go in for RAID 0 for tempdb, keep in mind that SQL Server requires tempdb to carry out many of its activities. So if you want your system to be always available, rethink RAID 0.
  • Putting log files or tempdb on a RAID 5 array is not recommended, since RAID 5 does not perform well for write operations. DBA's although have conflicting opinions over this point.
  • Choose small fast drives, over large slow drives.
Make sure you do your own research, so you can make an informed decision! In a real scenario, you may not have the luxury of deciding your server configurations due to the cost associated with it. So keeping these points in mind, may help you in such scenarios.

Please use the comments section and share your opinions!

SQL Server Default Port

In this article, we are going to explore the default port number of SQL Server required for getting connected to different applications.

What is a Port? It is an endpoint of communication in an operating system. A port is always associated with an IP address of a host, and the protocol type of the communication, thereby completing the destination or origination address of a communication session. A port is identified for each address and protocol by a 16-bit number, commonly known as the port number.
When front end applications want to connect to SQL Server, port number is essential.

The default port number of SQL Server is 1433.

How to get the port number in SQL Server?

Note - The steps are shown through SQL Server 2012 environment. They are largely applicable to SQL Server 2014 too.

Step 1: Open the SQL Server Configuration Manager window through Start > All Programs > Microsoft SQL Server 2012 > Configuration Tools 

sccm-navigation

Step 2: Explore SQL Native Client 11.0 Configuration (32bit)

sql-native-client-configuration

Step 3: Click on Client Protocols option within it  –

client-protocols

Step 4: Double click on TCP/IP from the right hand side pane

tcp-ip

We will get to know the Default Port as shown here

1433-default-port

The default port > 1433 port is the official Internet Assigned Number Authority (IANA) socket number for SQL Server. Client systems use TCP 1433 to connect to the database engine; SQL Server Management Studio (SSMS) uses the port to manage SQL Server instances across the network. You can reconfigure SQL Server to listen on a different port, but 1433 is by far the most common implementation.

Some other default SQL Server ports:

TCP 1434 – For Dedicated Admin Connection
UDP 1434 – For SQL Server Named Instance
TCP 2383 – For Default port of SSAS
TCP 2382 – For Named instances of SSAS
TCP 135 – For SSIS & T-SQL Debugger
TCP 80 and 443 – For SSRS
TCP 4022 – For Service Broker

other-default-ports

Shrink a Live SQL Server Database and Logs - Caveats and Best Practices

Updated on August 20th, 2016.

If you have deleted a lot of data in the database and truncated some tables, probably you may want to shrink database files to recover the unused space. Basically shrinking will move the data pages at the end of the file into unoccupied space, available in the front of the file.

You can use the DBCC SHRINKDATABASE command. For example, this command

DBCC SHRINKDATABASE (your_database,10) 

will decrease database size and allow for 10 percent free space. 

You can also shrink log file alone using DBCC SHRINKFILE command Take full back up of the database and run the following :

ALTER DATABASE your_database SET RECOVERY SIMPLE; 

DBCC SHRINKFILE (your_database_Log, 10); 

ALTER DATABASE your_database SET RECOVERY FULL; 

It will set the log file size to 10 MB 

Caveats of shrinking database and logs:

1. SHRINK commands should not be done frequently as it will increase the index fragmentation.

2. It is a fully logged operation which will increase the log size and slows down the performance

3. Shrinking is not possible while the current database is being backed up up or restored

4. Shrinking is not possible when columnstore index is enabled

Best practices

1. Use SHRINK only if a lot of operations such as TRUNCATE and DROP has lead to unused space

2. Take full backup before shrinking database files. If you shrink log file, set the Recovery mode into SIMPLE before shrinking and make it FULL after shrinking

3. Never turn ON AUTO SHRINK option

4. Always rebuild indexes after SHRINKING the database files to avoid fragmentation

SQL Server Update statement with JOIN

In this article, we are specifically going to see a practical usage of the UPDATE statement in SQL Server when the condition is based on JOIN of another table.

Example:

CREATE TABLE EMP
       (EMPNO Integer Primary Key,
        ENAME Varchar(15),
        JOB Varchar(9),
        MGR Integer,
        HIREDATE datetime,
        SAL Integer,
        COMM Integer,
        DEPTNO Integer);
go
INSERT INTO EMP VALUES
        (7369, 'SMITH',  'CLERK',     7902, '12/17/80',
        800, NULL, 20);
INSERT INTO EMP VALUES
        (7499, 'ALLEN',  'SALESMAN',  7698,
        '5/20/81',1600,  300, 30);
INSERT INTO EMP VALUES
        (7521, 'WARD',   'SALESMAN',  7698,
        '5/22/81', 1250,  500, 30);
INSERT INTO EMP VALUES
        (7566, 'JONES',  'MANAGER',   7839,
        '4/2/81',  2975, NULL, 20);
INSERT INTO EMP VALUES
        (7654, 'MARTIN', 'SALESMAN',  7698,
        '9/28/81', 1250, 1400, 30);
INSERT INTO EMP VALUES
        (7698, 'BLAKE',  'MANAGER',   7839,
        '5/1/81',  2850, NULL, 30);
INSERT INTO EMP VALUES
        (7782, 'CLARK',  'MANAGER',   7839,
        '6/9/81',  2450, NULL, 10);
INSERT INTO EMP VALUES
        (7788, 'SCOTT',  'ANALYST',   7566,
        '12/9/82', 3000, NULL, 20);
INSERT INTO EMP VALUES
        (7839, 'KING',   'PRESIDENT', NULL,
        '11/17/81', 5000, NULL, 10);
INSERT INTO EMP VALUES
        (7844, 'TURNER', 'SALESMAN',  7698,
        '9/8/1981',  1500,    0, 30);
INSERT INTO EMP VALUES
        (7876, 'ADAMS',  'CLERK',     7788,
        '1/12/83', 1100, NULL, 20);
INSERT INTO EMP VALUES
        (7900, 'JAMES',  'CLERK',     7698,
        '12/3/81',   950, NULL, 30);
INSERT INTO EMP VALUES
        (7902, 'FORD',   'ANALYST',   7566,
        '12/3/81',  3000, NULL, 20);
INSERT INTO EMP VALUES
        (7934, 'MILLER', 'CLERK',     7782,
        '1/23/82', 1300, NULL, 10);
go

CREATE TABLE DEPT
       (DEPTNO INTEGER Primary Key,
        DNAME VARCHAR(14),
        LOC VARCHAR(13) );
go
INSERT INTO DEPT VALUES
        (10,'ACCOUNTING','NEW YORK');
INSERT INTO DEPT VALUES (20,'RESEARCH','DALLAS');
INSERT INTO DEPT VALUES
        (30,'SALES','CHICAGO');
INSERT INTO DEPT VALUES
        (40,'OPERATIONS','BOSTON');

Let us see the two tables.

select * from emp
go
select * from dept
go

emp-dept-table

Here we can see that this is a typical parent-child table relationship. Many a times, the end user wants to see the Employee Names (Enames) and Department Names (Dnames). But for that, they need to surrender to the inner join technique. Right now, only 14 rows are in emp table and 4 rows in dept.

But in a real time production environment, the number of records will be much higher. So this may result into a performance overhead. Therefore the DBA in such scenarios may decide to add the DName column in Emp table itself to eliminate the a JOIN overhead.

Note: Here we are moving towards an intentional De-Normalization technique.

The following ALTER Table command is used for that.

Alter Table Emp
Add Dname Varchar(40)

But now the problem is that for existing records if we see the DName colum, then it will be Null. Let us check it.

select * from emp

sql-adding-dname-column

So now we want to update each Dname as per the Dname of Dept table. For this we will issue the Update statement with join.

Update Emp 
set Dname  = Dept.DName
From Emp, Dept
where Emp.Deptno = Dept.Deptno

Technique: Here the update technique with join is used in the following sequential steps.

1. Every record of Emp is taken.
2. Its deptno is matched with deptno of Dept table.
3. The corresponding DName of Dept table is assigned to that row.
4. Next row of Emp is taken.
5. Step numbers 2 & 3 are repeated.
6. This process continues till the last row is reached by Emp table.

So now let us see the Emp table again.

select * from emp

after-update-with-sql-join

and that’s how we can use the Update statement with JOIN in SQL Server.

SQL Server Substring with CharIndex

In this article we are going to explore the T-SQL function CharIndex and also how to use it with another T-SQL function Substring().

CharIndex: This function returns the location of a substring in a string. By default, it is not case sensitive and is useful to check the position of a particular character or digit in the string. I have seen this function being used in product codes, different types of IDs for validation purpose, or to get some business understanding of that code/ID. Here’s the syntax:

CHARINDEX( Substring or Character, string, [start_position] )

where..

Substring or Character - The string that you want to find. It has to be within string data type family such as char or varchar only.
String - The string or number or an alphanumeric value to search within.
start_position - Optional. The position in string where the search will start. The first position is 1.

The function returns bigint if the expressionToSearch is of varchar(max), nvarchar(max), or varbinary(max) data types; otherwise, it retuns an int.

Substring with CharIndex examples:

select charindex('C','ABCDEF',1) 
-- Returns 3

select charindex('C','ABCDEF') 
-- Still returns 3 as start_position is option, default is 1

select charindex('c','ABCDEF',1) 
-- By default not case sensitive, so returns 3

select charindex('c','ABCDEF' COLLATE Latin1_General_CS_AS) 
–- Returns 0 as lowercase c is not found.

select charindex('Z','ABCDEF',1) 
-- Returns 0 as there is no Z character

select charindex('A','ABCDAEFG',2) 
-- Returns the second position of A, i.e. 5

select charindex('A','ABCDAEFG',9) 
-- Returns 0 as there is no ninth position there is no A

select charindex('4',123456) 
-- Searching from pure numbers, returns 4 as 4 is on fourth position

select charindex('curry','sqlservercurry.com') 
-- It considers the first character of string curry, 
-- i.e. c and then checks the first occurrence, so returns 10

Using CharIndex with Substring:

First let us check the functionality of the Substring function. It extracts a part of the string from the specified position.

Syntax:

Substring(String, Position or Index, Number of characters to extract)
select substring('abcdef', 2, 3) 
-- Returns bcd as 2 is from which position and 3 is number of characters to extract

Now the best usage of CharIndex with Substring is to create a functionality of “Text to Columns”.

That means many a times the names are in the format of “FirstName whitespace LastName” in the same column itself. In such cases, the need is to separate First Name in one column, and Last Name in another column for display purposes.

Consider the following example.

create table Authors
(AuthorID Varchar(2),
 AuthorName Varchar(60)
 )
 go
Insert into Authors Values('A1','Suprotim Agarwal')
Insert into Authors Values('A2','Mahesh Sabnis')
Insert into Authors Values('A3','Mandar Mulay')
go
select * from Authors
go

sql-default-author-table

Now let us say we want to display first name and last name in different columns. For that, we will use the substring function and within it, we will use CharIndex function to know the position number of white space.

See the following query:

select substring(AuthorName,1, charindex(' ',AuthorName)) as [First Name],
substring(AuthorName,(charindex(' ',AuthorName)) + 1, len(AuthorName)) as [Last Name]
from Authors
go

substring-with-charindex

Explanation:

First Name - substring(AuthorName,1, charindex(' ',AuthorName)) means from the first position till the first white space comes extract all the characters.

Last Name - substring(AuthorName,(charindex(' ',AuthorName)) + 1, len(AuthorName)) means to extract from the first white space + 1. Then to extract all remaining characters the len function is used.

And that’s how we use the SQL Server T-SQL Substring function, as well as Substring with CharIndex.

SQL Server Delete Duplicate Rows

There can be two types of duplication of rows in a table

1. Entire row getting duplicated because there is no primary key or unique key.
2. Only primary key or unique key value is different, but remaining all values are same.

Scenario 1: Delete duplicate rows without primary key or unique key.

Let us create the following example.

create table customers1
(CustId Int,
 CustName Varchar(20),
 CustCity Varchar(20),
 Passport_Number Varchar(20))
 go
Insert into customers1 Values(1, 'John', 'Paris', 'P123X78')
Insert into customers1 Values(2, 'Martin', 'London', 'L873X92')
Insert into customers1 Values(3, 'Smith', 'New York', 'N293Y99')
Insert into customers1 Values(1, 'John', 'Paris', 'P123X78')
go
select * from customers1
go

sql-row-duplicate

We want remove one of the duplicate records of John.

By issuing the following summary query, we can see which see which records are duplicate.

select * from customers1
Group by Custid,CustName, CustCity, Passport_Number
Having count(*) > 1

duplicate-row

Now we will add this row to a local temporary table.

Select * into #Temp_customers1 from customers1 where 1 = 2
Insert into #Temp_customers1
select * from customers1
Group by Custid,CustName, CustCity, Passport_Number
Having count(*) > 1

Now the situation is that the duplicate row is in the local temporary table. All we need to now is to delete records from main table customers1 as per matching custid of the local temporary table.

Delete from customers1
where custid in (select Custid from #Temp_customers1)

Will the above query work? Not entirely, as by using the above query, we lost all the duplicate records!! Let us see the table again.

select * from customers1
go

deleting-all-duplicate-rows

Now to keep one record of John, we will take help of the local temporary table again. Let us add the same record from temporary table into customers1 table.

Insert into Customers1
select * from #Temp_customers1
go

Finally we got a single record of John at the end. Let us confirm by seeing the Customers1 table.

select * from customers1
go

removing-duplicate-row

Once done, we can drop the local temporary table.

Scenario 2: Delete duplicate rows where primary key or unique key value is different but remaining values are same.

Let us create the following example.

create table customers2
(CustId Int Primary Key,
 CustName Varchar(20),
 CustCity Varchar(20),
 Passport_Number Varchar(20))
 go
Insert into customers2 Values(1, 'John', 'Paris', 'P123X78')
Insert into customers2 Values(2, 'Martin', 'London', 'L873X92')
Insert into customers2 Values(3, 'Smith', 'New York', 'N293Y99')
Insert into customers2 Values(4, 'John', 'Paris', 'P123X78')
Insert into customers2 Values(5, 'John', 'Paris', 'P123X78')
select * from customers2
go

Here is the same customer’s record, but this time John’s record has been added thrice with different customer ids but same Passport number!

duplicate-rows-different-custids

Scenario 2.a: Delete Duplicate rows but keep one using CTE

We need to use the technique of Self Join initially to check for duplicate records containing different custid but same passport number.

select distinct a.*
from customers2 a join customers2 b
on a.custid <> b.custid and a.CustName =  b.CustName
and a.CustCity = b.CustCity  and a.Passport_Number =  b.Passport_Number

selfjoin-showing-duplicate-rows


Now we have realized that custid 1, 4 & 5 are duplicate. The self-join statement accompanied by delete statement will give us the desired output of keeping the last duplicate record by eliminating all the previous duplicate records. We will use the Common Table Expression (CTE) and put the Self Join query in it.

With Duplicates as
(select distinct a.custid as Customer_ID
from customers2 a join customers2 b
on a.custid <> b.custid and a.CustName =  b.CustName
and a.CustCity = b.CustCity  and a.Passport_Number =  b.Passport_Number
) 
Delete from Customers2
where custid in (select Customer_ID from Duplicates)
and custid <> (select max(Customer_ID) from Duplicates)
Let’s check which rows got deleted.
select * from customers2
go

keeping-last-record

Scenario 2.b: Delete all duplicate records but keep the first original one

Let’s first truncate the customers2 table and add the same rows again.

Truncate Table customers2
go
Insert into customers2 Values(1, 'John', 'Paris', 'P123X78')
Insert into customers2 Values(2, 'Martin', 'London', 'L873X92')
Insert into customers2 Values(3, 'Smith', 'New York', 'N293Y99')
Insert into customers2 Values(4, 'John', 'Paris', 'P123X78')
Insert into customers2 Values(5, 'John', 'Paris', 'P123X78')
go

The only change in the sub query will be that we need to use min(CustomerID) instead of max(CustomerID).

So the query will be as follows.

With Duplicates as
(select distinct a.custid as Customer_ID
from customers2 a join customers2 b
on a.custid <> b.custid and a.CustName =  b.CustName
and a.CustCity = b.CustCity  and a.Passport_Number =  b.Passport_Number
) 
Delete from Customers2
where custid in (select Customer_ID from Duplicates)
and custid <> (select min(Customer_ID) from Duplicates)
Let us confirm this in the customers2 table.
select * from customers2
go

keeping-first-record


And that’s how we can delete duplicate records in SQL Server with tables without primary key, containing primary key and by keeping one original row.

T-SQL Insert Into Table Examples

In T-SQL, to add records into a table (SQL Server) we use a Data Manipulation Language (DML) command known as Insert statement. In this article, we are going to explore the different capabilities of the T-SQL Insert statement.

Let us create a table Table1 with 3 columns A, B and C

create table Table1
(A int,
 B Varchar(10),
 C Date)

Example 1: Add one complete row, date in American format

Here we will mention the actual column values in the Values clause in a sequence that matches the columns mentioned in the CREATE TABLE statement.

Syntax: Insert into Table_Name Values(Col1 Value, Col2 Value….., Coln Value)

Insert into Table1 Values (1,'Aa', '12/13/2014')

Example 2: Add one complete row, date in British format

The date can be entered either in American or British format.

Insert into Table1 Values (2,'Bb', '13-Dec-2014')

Example 3: Add one partial row, specifying Null
Sometimes we do not want to add values in every column. Let us create a new row by specifying values for A and C columns only.

Insert into Table1 Values (3, Null, '14-Dec-2014') 

Example 4: Add one partial row, specifying column names

Consider there are 25 columns in a SQL table, and we want to create a new row by specifying values for only 3 columns. In that case, mentioning Null for the remaining 22 columns multiple times will be cumbersome. So instead of that, we can only mention those 3 columns after the table name, and mention the values for only those 3 columns. The remaining 22 columns by default will get null values. Caution: There should not be primary key or NOT NULL constraint to any of those columns.

Syntax: Insert into Table_Name (Column1, Column3, Column5) Values(Col1 Value, Col3 Value, Col5 Value)

Insert into Table1(A,C) Values (4,'15-Dec-2014')

Note: The columns can be mentioned in any sequence after the table name, but then the values need to mentioned in the same sequence.

Example 5: Ignore the identity column while adding row.

If the table has identity (auto number generation property) column, then by default we cannot add value to that column.

Let us create table Table2 with A as identity column and B, C as integer data type columns

create table Table2
(A int identity,
 B int,
 C int)

To add a record, use the syntax demoed in Example 4.

Insert into Table2 (B,C) Values(100,200)

Example 6: Ignore the computed column while adding row

Let us create table Table3 with a,b and c as 3 columns. C will be the computed column.

create table Table3
(A int,
 B int,
 C as (a + b))
Insert into Table3 Values(200, 300)

The column C will implicitly get value 500!

Example 7: Bulk Insertion – Select along with Union All

If we want to add many records in a table through one statement, then SELECT statement can be associated with INSERT statement instead of values clause.

Let us create table Table4 first.

create table Table4
(a int, b varchar(10)
)

Now if we want add five rows in Table4, we will use SELECT statement along with UNION ALL.

Insert into Table4
Select 1,'A'
Union All
Select 2,'B'
Union All
Select 3,'C'
Union All
Select 4,'D'
Union All
Select 5,'E'

Example 8: Bulk Insertion – Select along with Union All, violation of constraint

If any record violates the constraint, then the entire set gets rejected.

create table Table5
(a int check (a <= 3), b varchar(10)
)
Insert into Table5
Select 1,'A'
Union All
Select 2,'B'
Union All
Select 3,'C'
Union All
Select 4,'D'
Union All
Select 5,'E'

No rows will get added in Table 5.

Example 9: Bulk Insertion – Multiple rows through Values clause itself

From SQL Server 2008 version onwards, the Values clause has been strengthen to add multiple rows.

Syntax:

Insert Into TableName
Values    (……………..),
           (……………..),
           (……………..),
           (……………..)

So in the Table4 now, we will add three records using the Values clause.

Insert into Table4
Values (6,'E'),(7,'F'),(8,'G')

Note: If any one record fails then the entire batch gets rejected.

Example 10: Bulk Insertion – Copying records of another table

Many a times we need to copy records from one table to another.

Insert into Target_Table
Select * from Source_Table

Note: Here it is assumed that the structure of source & target tables are same. That means the number of columns should be same, and positional columns should be compatible as per data type & size.
Let us create table Table6 similar to Table4

create table Table6
(a int, b varchar(10)
)

Now all records of Table4 need to be inserted in Table6.

Insert into Table6
Select * from Table4


Example 11: Bulk Insertion – Copying records of another table, but not in all columns.

Let us create table Table7 similar to Table4

create table Table7
(a int, b varchar(10)
)

Here we need to mention the column names. Suppose we want to add records, but taking values of column B only, then the command will be:

Insert into Table7(B)
Select B from Table4

Conclusion:


In this article we have seen how to add records into an SQL Server table using the T-SQL Insert statement with its multiple examples.

SQL Server Alter Table with Add Column

There are many scenarios where we need to do structural changes to a table, i.e. adding a column, modifying data type, dropping a column and so on. This change to an existing table is possible by issuing a SQL Server Data Definition Language (DDL) T-SQL command known as Alter Table.

In this article we are specifically focusing on adding a new column in a table using SQL Server Alter Table.

Scenario 1: Adding columns to empty table

When the table is created and if there are no records in that table, then there is no problem in adding a column or multiple columns to that table.

Let us see the following example:

create table customers1
(
  custid varchar(5),
 custname varchar(50)
)
go

Scenario 1.a: Adding single column:

After creating the table, we realize that date of birth (DOB) column needs to be added. The following command will help us in achieving this.

Alter Table customers1
Add DOB Date
go

Scenario 1.b: Adding multiple columns:

Adding two more columns is also very easy. Let us add nationality and gender as two more columns by just having a comma separator in each definition.

Alter Table customers1
Add Nationality Varchar(40), Gender Char(1)
go

Scenario 1.c: Adding columns with constraints:

Adding a column with constraint such as primary key, unique key, foreign key, not null, check or default is again not challenging in an empty table.

Note – A primary key constraint cannot be added on the new column if there is already a primary key on any other existing column.

Let us add SSN and deposit amount as two columns, SSN having primary key and deposit amount having check constraint.

Alter Table customers1
Add SSN Varchar(10) Primary Key, 
Deposit_Amount Integer Check(Deposit_Amount >= 200)
go

Scenario 2: Adding columns to table having records

When a column is added to a table with existing records, then for those records, by default a null value is assigned by SQL Server.

Adding column on table having existing records.

Let us create table customers2 and add 2 records in it.

create table customers2
(custid varchar(5),
 custname varchar(50)
 )
 Go
Insert into customers2 Values('C1','John')
Insert into customers2 Values('C2','Martin')
go

Scenario 2.a: Adding columns with default null values on existing records.

Alter Table customers2
Add DOB Date
go

Let us see the table Customers2

select * from customers2

sqlserver-nulls-new-column
Image2-No-Default-Value

Scenario 2.b: Adding columns with default value

It might happen that you want to add a column specifying some default value and there are existing records in the table. But in the default setup, those existing records will not get that value.
Let us add token amount as a column to this Customers2 table with default value as 100.

Alter Table customers2
Add Token_Amount Integer Default 100
go

Let us see the table Customers2

select * from customers2

sql-no-default-value
Image2-No-Default-Value

Note -The default value will be effective when new record is added.

Scenario 2.c: Adding columns with default value and applying those values on existing records.

Let us add another column tax with default value as 50. This value needs to be added to the existing records as well. We need to explicitly specify the clause With Values.


Alter Table customers2
Add Tax Integer Default 50 With Values
go

default-value-withclause
Image3-Default-Value-WithClause

Scenario 2.d: Adding columns with Not Null constraint on existing records.

Let us add a column Country with Not Null constraint. Now this will be contradictory because the earlier two records cannot have null value for Country.

Let us try.

Alter Table customers2
Add Country Varchar(50) Not Null
go

It will throw the following error.

ALTER TABLE only allows columns to be added that can contain nulls

To rectify this, we need to mention the default value which will safeguard the earlier records.

Scenario 2.e: Adding columns with not null constraint and default value on existing records

Let’s have India as the default value.

Alter Table customers2
Add Country Varchar(50) Default 'India' Not Null
go

Now this works fine. Let us see the table again.

select * from customers2

default-value-notnull
Image4-Default-Value-NotNull


Conclusion:

In this article, we have seen the following scenarios of using ALTER table to add column to a SQL Server table:

1. Adding one or multiple columns to an empty table.
2. Adding column to table having records. Those records get default value as null.
3. Adding column with default value to table having records
  • In absence of With Values clause
  • In presence of With Values clause
4. Adding column with not null constraint to table having records
  • In absence of Default clause
  • In presence of Default clause