Calculate SQL Azure Database Size

I was looking out for a correct way to programmatically determine the size of a SQL Azure database. After searching many solutions, I finally found one shared by Dimitri Furman of the SQL Server team.

His solution involves using the sys.database_files dmv and the FILEPROPERTY function with the ‘SpaceUsed’ argument.

To those new to sys.database_files, this system catalog view stores information and properties about each file for a database. Since it is a db-level view, it gives information about files in the current database only. Five properties that could be of interest are: logical filename, physical filename, initial size, maximum size and a growth increment.

To determine how much space is used in a file, you can use FILEPROPERTY with SpaceUsed.

Here’s an example: SELECT FILEPROPERTY(‘SomeFile’, ‘SpaceUsed’); I have often used the FILEPROPERTY function in the past while monitoring the progress of a SHRINK operation.

Query for calculating Size of a SQL Azure Database

Here’s how to combine sys.database_files with FILEPROPERTY to programmatically calculate the size of a SQL Azure database.

SELECT 
SUM(CAST(FILEPROPERTY(name, 'SpaceUsed') AS bigint) * 8192.) 
AS DatabaseSizeInBytes,
SUM(CAST(FILEPROPERTY(name, 'SpaceUsed') AS bigint) * 8192.)/1024 /1024
AS DatabaseSizeInMB,
SUM(CAST(FILEPROPERTY(name, 'SpaceUsed') AS bigint) * 8192.)/1024/1024/1024 
AS DatabaseSizeInGB
FROM sys.database_files
WHERE type_desc = 'ROWS';

Let’s understand this query.

FILEPROPERTY() returns an int value for a file name stored within sys.database_files. Since sys.database_files is a db-level view, it gives information about files in the current database only. If a file is not present, null value is returned.

Since SpaceUsed represents "pages" and a page is 8 KB in SQL Server, so multiplying by 8192 gets the total bytes. Then dividing two times by 1024 converts the output to MB, and dividing by three times by 1024 converts the output to GB.

CAST is for casting the value to type bigint

If anybody is wondering about the dot (.) after an 8192, then it is to convert the result implicitly to a decimal value.

Please note that logs are excluded for the purposes of determining database size.

To know about Azure SQL Database resource limits, check https://docs.microsoft.com/en-in/azure/sql-database/sql-database-resource-limits

Remove Non-Alphabet Characters from a String–SQL Server

Suppose you have a string that contains numbers and other special characters and you want to keep only alphabets and remove all the other characters. One way to do this is to use a while loop that parses each character

Here is the code for the same

declare @str varchar(20)
set @str='ab12#89L(h12k'
Select @str as 'Original String'
declare @temp_str varchar(20), @i int
select @temp_str ='',@i=1
while @i<=len(@str)
begin
    set @temp_str=@temp_str+case when substring(@str,@i,1) like '[a-zA-Z]' then substring(@str,@i,1) else '' end
    set @i=@i+1
end
select @temp_str as 'String with Alphabets'


The code inside the While loop takes each character and check if it is in the range a-z or A-Z and appends that character with another variable. Other characters will not get appended. So at the end of while loop, the second variable will have only alphabets.

image

GroupBy Clause - SQL Server vs MySQL

A GROUP BY Clause is used to group the data based on specific columns along with summary information. However there are some differences in usage of this clause in SQL Server and MySQL

Let us create this testing table with some sample data

create table testing
(
    sales_id int,
    product_id char(7),
    sales_date datetime,
    sales_amount decimal(12,2)
)


insert into testing(sales_id,product_id,sales_date,sales_amount)
select 1,'PR00001','2001-01-01',1200.00 union all
select 2,'PR00002','2003-01-21',3000.50 union all
select 3,'PR00002','2003-01-21',2500.00 union all
select 4,'PR00001','2002-02-15',1000.00 union all
select 5,'PR00003','2005-12-19',3000.50 union all
select 6,'PR00003','2006-11-01',8000.00 union all
select 7,'PR00004','2007-04-22',350.00 union all
select 8,'PR00004','2007-04-22',590.00 union all
select 9,'PR00004','2007-04-22',590.00 union all
select 10,'PR00001','2008-05-27',4800.50


If you want to get total sales amount for each product, you can write this query both in SQL Server and MySQL

select product_id,sum(sales_amount) as sales_amount from testing
group by product_id


As per ANSI SQL, all columns that are not part of aggregate functions should be included in GROUP BY clause

If you run the following code in SQL Server

select product_id,sum(sales_amount) as sales_amount from testing

You will get an error

Msg 8120, Level 16, State 1, Line 1
Column 'testing.product_id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.


However if you run this code in MySQL, you will get the following result

Product_id        sales_amount
-----------        --------------
PR00001        25031.50

Because MySQL does the auto grouping for the columns specified in the SELECT statement, if they are omitted in GROUP BY clause, it just simply displays the first value of columns along with total of summary column. In this case, it displays the first product id and total of all products

The following is also possible in MySQL

select *,sum(sales_amount) as sales_amount from testing

As explained, it will list out all columns of first row along with total of sales_amount. You need to aware of this feature in MySQL while using GROUP BY Clause

Nested Case Statement in SQL Server

This article is a continuation of SQL Server CASE Statement and CASE WHEN Examples . Today we will learn about Nested Case Statement in SQL Server.

We can nest CASE statements similar to nested ifs that we find in most programming languages.

Let us see an example.

select ename, job, sal, case  -- Outer Case
    when ename like 'A%' then
    case when sal >= 1500 then 'A' -- Nested Case
    end
    when ename like 'J%' then
    case when sal >= 2900 then 'J' -- Nested Case
    end
end as "Name-Grade"            
From Emp

7-nested-case
Image 7-Nested-Case

Limit of nesting a CASE function is up to 10 levels only.

In the following example, the limit of 10 is completely utilized.

Declare @x int
set @x = 1
Select 
case when @x <= 100 then  -- Level 1
 case when @x <= 90 then -- Level 2
    case when @x <= 80 then -- Level 3
        case when @x <= 70 then -- Level 4
            case when @x <= 60 then -- Level 5
                case when @x <= 50 then -- Level 6
                    case when @x <= 40 then -- Level 7
                       case when @x <= 30 then --Level 8
                         case when @x <= 20 then--Level 9
                           case when @x<= 10 then--Level 10
                                100
End End End End End End End End End 
End as "Nested Case" 
--Ending all levels!

8-nested-case-upto-10-levels
Image 8-Nested-Case-Upto-10-Levels

If nesting is exceeding 10 levels, then SQL Server throws an error.
Declare @x int
set @x = 1
Select 
case when @x <= 100 then  -- Level 1
case when @x <= 90 then -- Level 2
case when @x <= 80 then -- Level 3
    case when @x <= 70 then -- Level 4
        case when @x <= 60 then -- Level 5
            case when @x <= 50 then -- Level 6
                case when @x <= 40 then -- Level 7
                   case when @x <= 30 then --Level 8
                     case when @x <= 20 then--Level 9
                       case when @x<= 10 then--Level 10
                          case when @x <= 11 then -- 11
                             100
End End End End End End End End End End 
End as "Nested Case"

Msg 125, Level 15, State 4, Line 14
Case expressions may only be nested to level 10.

SQL Server select into temp table

Many a times we get an output, and then query the output further. In such scenarios instead of writing the entire query repeatedly, and then improvising it or filtering it differently, we can dump the output of the main query into a temporary table i.e. Select into Temp table. Later we can query the temporary table & improvise the outputs as needed.

Let us create the sample tables.

CREATE TABLE EMP
       (EMPNO Integer Primary Key,
        ENAME Varchar(15),
        JOB Varchar(9),
        MGR Integer,
        HIREDATE datetime,
        SAL Integer,
        COMM Integer,
        DEPTNO Integer);

INSERT INTO EMP VALUES
        (7369, 'SMITH',  'CLERK',     7902, '12/17/80',
        800, NULL, 20);
INSERT INTO EMP VALUES
        (7499, 'ALLEN',  'SALESMAN',  7698,
        '5/20/81',1600,  300, 30);
INSERT INTO EMP VALUES
        (7521, 'WARD',   'SALESMAN',  7698,
        '5/22/81', 1250,  500, 30);
INSERT INTO EMP VALUES
        (7566, 'JONES',  'MANAGER',   7839,
        '4/2/81',  2975, NULL, 20);
INSERT INTO EMP VALUES
        (7654, 'MARTIN', 'SALESMAN',  7698,
        '9/28/81', 1250, 1400, 30);
INSERT INTO EMP VALUES
        (7698, 'BLAKE',  'MANAGER',   7839,
        '5/1/81',  2850, NULL, 30);
INSERT INTO EMP VALUES
        (7782, 'CLARK',  'MANAGER',   7839,
        '6/9/81',  2450, NULL, 10);
INSERT INTO EMP VALUES
        (7788, 'SCOTT',  'ANALYST',   7566,
        '12/9/82', 3000, NULL, 20);
INSERT INTO EMP VALUES
        (7839, 'KING',   'PRESIDENT', NULL,
        '11/17/81', 5000, NULL, 10);
INSERT INTO EMP VALUES
        (7844, 'TURNER', 'SALESMAN',  7698,
        '9/8/1981',  1500,    0, 30);
INSERT INTO EMP VALUES
        (7876, 'ADAMS',  'CLERK',     7788,
        '1/12/83', 1100, NULL, 20);
INSERT INTO EMP VALUES
        (7900, 'JAMES',  'CLERK',     7698,
        '12/3/81',   950, NULL, 30);
INSERT INTO EMP VALUES
        (7902, 'FORD',   'ANALYST',   7566,
        '12/3/81',  3000, NULL, 20);
INSERT INTO EMP VALUES
        (7934, 'MILLER', 'CLERK',     7782,
        '1/23/82', 1300, NULL, 10);


CREATE TABLE DEPT
       (DEPTNO INTEGER Primary Key,
        DNAME VARCHAR(14),
        LOC VARCHAR(13) );

INSERT INTO DEPT VALUES
        (10,'ACCOUNTING','NEW YORK');
INSERT INTO DEPT VALUES (20,'RESEARCH','DALLAS');
INSERT INTO DEPT VALUES
        (30,'SALES','CHICAGO');
INSERT INTO DEPT VALUES
        (40,'OPERATIONS','BOSTON');

Now we will write a complex looking select statement as shown here –

select ename, emp.deptno, sal, 
           case when sal >= 5000 then 'Grade A'
                when sal >= 3000 then 'Grade B'
                when sal >= 2500 then 'Grade C'
                when sal >= 2000 then 'Grade D'
                when sal >= 1000 then 'Grade E'
                else
                'Grade F'
            end as Sal_Grade,
            datepart(yy, hiredate) as Year_Of_Hiredate,
            DName,
           Sal + IsNull(Comm,0) as Total,
           Case
            when Comm is null or Comm = 0 then 'Commission not provided'
            when Comm is not null then 'Commission provided'
            End as Commission_Status,
               SubString(Loc,1,1) as Location_ID
from emp join dept
on Emp.deptno = Dept.deptno

The output of this query is as as follows –

complex-select-output

In this query, we have used the following techniques:

1. Two Case Functions
2. Date Function
3. IsNull Function
4. Substring Function
5. Inner Join

Now improvise this output. Put this output into a local temporary table by issuing the following query:

Select * into #Emp_Info
from(select ename, emp.deptno, sal, 
           case when sal >= 5000 then 'Grade A'
                when sal >= 3000 then 'Grade B'
                when sal >= 2500 then 'Grade C'
                when sal >= 2000 then 'Grade D'
                when sal >= 1000 then 'Grade E'
                else
                'Grade F'
            end as Sal_Grade,
            datepart(yy, hiredate) as Year_Of_Hiredate,
            DName,
           Sal + IsNull(Comm,0) as Total,
           Case
            when Comm is null or Comm = 0 then 'Commision not provided'
            when Comm is not null then 'Commision provided'
            End as Commission_Status,
               SubString(Loc,1,1) as Location_ID
from emp join dept
on Emp.deptno = Dept.deptno
) as Emp_Dept

So now these 14 processed rows are the actual records of #Emp_Info table. Therefore now when we will query this table, the inner join, case, date, substring & IsNull functions do not get executed again & again. This will be a big performance boost!

Let’s explore different outputs querying the #Emp_Info table now.

Query 1 – Display records of Grade B employees

Select * from #Emp_Info
where Sal_Grade = 'Grade B'
go

gradeb-employees
Query 2 – Display grade wise total of salaries

Select Sal_Grade, Sum(Sal) as Salary_Sum
from #Emp_Info
Group by Sal_Grade
go

sql-select-into-temp

Query 3 – Display employee names, totals, DName and year of hiredate for all employees who have joined in 1981.

Select Ename, DName, Total, Year_of_Hiredate
from #Emp_Info
where Year_of_Hiredate = '1981'
sql-select

So as we can see, doing a T-SQL Select into temp table can help us fire additional queries on the output, and in some cases, give a performance boost too

Importance of RAID in Databases

This article was modified on August 31st, 2016.

RAID is a disk system that contains multiple disk drives, called an array, to provide greater performance, fault tolerance, storage capacity, at a moderate cost. While configuring your server system, you typically have to make a choice between hardware RAID and software RAID for the server’s internal disk drives

RAID System

RAID systems are widely used as storage solutions to get the best I/O performance, depending whether the application is write intensive or read intensive. DBA RayRankins mentions in his book that for database-related applications, in order to minimize disk head movement and maximize I/O performance, it’s a good practice to spread random I/O’s (data changes) and sequential I/O’s (for the transaction log) across different disk subsystems. I agree and endorse his view, as SQL Server, or for that matter any other database, is very much an I/O intensive system.

I will be using SQL Server as an example to explain the importance of RAID in databases, however you can implement these concepts in your choice of database. The concepts more or less, remain the same

Although RAID is not a part of a database like SQL Server, implementing RAID can directly affect the way SQL Server performs.There are many RAID arrays available such as RAID 0, RAID 1, RAID 3, RAID 4, RAID 5, RAID 6, RAID 10 and RAID 01. In this article, we will discuss the ones you will likely encounter as a SQL Server DBA i.e. RAID levels 0, 1, 5 and 10 and also  discuss their advantages and disadvantages from a fault tolerance and performance perspective

Note: RAID is not a replacement for backups. Backups are very essential for any system.

Different RAID Levels (Advantages and Disadvantages)


We will discussing only RAID 0, 1, 5 and 10 (database perspective) .

RAID 0 – Also known as Disk Striping, RAID 0 does not provide redundancy or fault tolerance but instead writes data to two drives, in an alternating fashion. This provides the best read write I/O performance. If you had 8 chunks of data, for example, chunk 1, 3, 5, and 7 would be written to the first drive, and chunk 2, 4, 6, and 8 would be written to the second drive, but all in a fixed (sequential) order. RAID 0 has a simple design, easier to implement and no overheads for parity. The drawback is that any piece of data is on only one disk, so if one disk fails, data stored within those disks are lost.

RAID 1- Also known as Disk Mirroring, RAID 1 provides a redundant, identical copy of a selected disk and thus provides good fault tolerance. It can be implemented with 2 drives. The disadvantage is that it has a big storage overhead and a high cost/capacity ratio

RAID 5 - Also known as Disk Striping with Parity, stripes data across multiple drives and writes parity bits across drives. Data redundancy is provided by the parity information. It can be implemented with 3 or more disks and is a popular choice amongst DBA’s. Since data and parity information are arranged on the disk array, two types of information are always on different disks. If one disk fails, just replace it with a new disk and the array rebuilds itself. RAID 5 has a higher read rate and makes good use of capacity. The drawback of RAID 5 is slower write rates and slow rebuild times.

RAID 10 - Also known as mirroring with striping, RAID 10 is a combination of RAID1 + RAID0. RAID 10 uses a striped array of disks that are then mirrored to another identical set of striped disks. This array level uses at least four hard disks and additional disks must be added in even numbers. The data is first placed into mirrored pairs at the lower level. Next, the controller selects a member from each mirrored pair and stripes the data into a new logical volume. Since RAID 10 writes in a random fashion, it provides best performance with a write-intensive application (like video editing). The drawback is that it is expensive.

Which RAID is Suitable for my Database?


Now that you have an overview of RAID levels, let’s look at which RAID is suitable for a database. The answer to this question depends on a variety of factors. Do you want availability, performance or cost? What are your requirements for fault tolerance and performance? Here’s a quick snapshot of the performance and fault tolerance provided by RAID

sqlserver-raid-performance

When it comes to a database like SQL Server, no one RAID level will suit your need. In most cases, SQL Server performs large reads and small writes. So for databases, where write operations are more, RAID 5 is not a good choice. On the contrary, RAID 10 is a good option for databases with more write operations.

Here are some points and best practices to keep in mind while deciding the RAID system for your database.
  • Your Data, Logs, tempdb, and backups should be on separate physical drives or a set of disks (array).
  • RAID1 is often chosen to store operating systems, binaries, index file groups and database transaction log files. Critical to log and index performance is fault tolerance and a good write speed.
  • Since log file are written sequentially and are read only for recovery operations, recommended RAID for Log Files is RAID 1 or 10. If your RAID 1 is at a 100% usage, choose RAID 10 for better performance.
  • For data files with random access and read heavy data volumes, striping is important. So recommended RAID is 5 or 10.
  • For data files which require good write performance, RAID 10 is recommended. Use a battery backed caching RAID controller for better write performance
  • For tempdb files with good read/write performance RAID 0, 1 or 10 is recommended. Although tempdb contains temporary data and DBA’s often go in for RAID 0 for tempdb, keep in mind that SQL Server requires tempdb to carry out many of its activities. So if you want your system to be always available, rethink RAID 0.
  • Putting log files or tempdb on a RAID 5 array is not recommended, since RAID 5 does not perform well for write operations. DBA's although have conflicting opinions over this point.
  • Choose small fast drives, over large slow drives.
Make sure you do your own research, so you can make an informed decision! In a real scenario, you may not have the luxury of deciding your server configurations due to the cost associated with it. So keeping these points in mind, may help you in such scenarios.

Please use the comments section and share your opinions!

SQL Server Default Port

In this article, we are going to explore the default port number of SQL Server required for getting connected to different applications.

What is a Port? It is an endpoint of communication in an operating system. A port is always associated with an IP address of a host, and the protocol type of the communication, thereby completing the destination or origination address of a communication session. A port is identified for each address and protocol by a 16-bit number, commonly known as the port number.
When front end applications want to connect to SQL Server, port number is essential.

The default port number of SQL Server is 1433.

How to get the port number in SQL Server?

Note - The steps are shown through SQL Server 2012 environment. They are largely applicable to SQL Server 2014 too.

Step 1: Open the SQL Server Configuration Manager window through Start > All Programs > Microsoft SQL Server 2012 > Configuration Tools 

sccm-navigation

Step 2: Explore SQL Native Client 11.0 Configuration (32bit)

sql-native-client-configuration

Step 3: Click on Client Protocols option within it  –

client-protocols

Step 4: Double click on TCP/IP from the right hand side pane

tcp-ip

We will get to know the Default Port as shown here

1433-default-port

The default port > 1433 port is the official Internet Assigned Number Authority (IANA) socket number for SQL Server. Client systems use TCP 1433 to connect to the database engine; SQL Server Management Studio (SSMS) uses the port to manage SQL Server instances across the network. You can reconfigure SQL Server to listen on a different port, but 1433 is by far the most common implementation.

Some other default SQL Server ports:

TCP 1434 – For Dedicated Admin Connection
UDP 1434 – For SQL Server Named Instance
TCP 2383 – For Default port of SSAS
TCP 2382 – For Named instances of SSAS
TCP 135 – For SSIS & T-SQL Debugger
TCP 80 and 443 – For SSRS
TCP 4022 – For Service Broker

other-default-ports

Shrink a Live SQL Server Database and Logs - Caveats and Best Practices

Updated on August 20th, 2016.

If you have deleted a lot of data in the database and truncated some tables, probably you may want to shrink database files to recover the unused space. Basically shrinking will move the data pages at the end of the file into unoccupied space, available in the front of the file.

You can use the DBCC SHRINKDATABASE command. For example, this command

DBCC SHRINKDATABASE (your_database,10) 

will decrease database size and allow for 10 percent free space. 

You can also shrink log file alone using DBCC SHRINKFILE command Take full back up of the database and run the following :

ALTER DATABASE your_database SET RECOVERY SIMPLE; 

DBCC SHRINKFILE (your_database_Log, 10); 

ALTER DATABASE your_database SET RECOVERY FULL; 

It will set the log file size to 10 MB 

Caveats of shrinking database and logs:

1. SHRINK commands should not be done frequently as it will increase the index fragmentation.

2. It is a fully logged operation which will increase the log size and slows down the performance

3. Shrinking is not possible while the current database is being backed up up or restored

4. Shrinking is not possible when columnstore index is enabled

Best practices

1. Use SHRINK only if a lot of operations such as TRUNCATE and DROP has lead to unused space

2. Take full backup before shrinking database files. If you shrink log file, set the Recovery mode into SIMPLE before shrinking and make it FULL after shrinking

3. Never turn ON AUTO SHRINK option

4. Always rebuild indexes after SHRINKING the database files to avoid fragmentation

SQL Server Update statement with JOIN

In this article, we are specifically going to see a practical usage of the UPDATE statement in SQL Server when the condition is based on JOIN of another table.

Example:

CREATE TABLE EMP
       (EMPNO Integer Primary Key,
        ENAME Varchar(15),
        JOB Varchar(9),
        MGR Integer,
        HIREDATE datetime,
        SAL Integer,
        COMM Integer,
        DEPTNO Integer);
go
INSERT INTO EMP VALUES
        (7369, 'SMITH',  'CLERK',     7902, '12/17/80',
        800, NULL, 20);
INSERT INTO EMP VALUES
        (7499, 'ALLEN',  'SALESMAN',  7698,
        '5/20/81',1600,  300, 30);
INSERT INTO EMP VALUES
        (7521, 'WARD',   'SALESMAN',  7698,
        '5/22/81', 1250,  500, 30);
INSERT INTO EMP VALUES
        (7566, 'JONES',  'MANAGER',   7839,
        '4/2/81',  2975, NULL, 20);
INSERT INTO EMP VALUES
        (7654, 'MARTIN', 'SALESMAN',  7698,
        '9/28/81', 1250, 1400, 30);
INSERT INTO EMP VALUES
        (7698, 'BLAKE',  'MANAGER',   7839,
        '5/1/81',  2850, NULL, 30);
INSERT INTO EMP VALUES
        (7782, 'CLARK',  'MANAGER',   7839,
        '6/9/81',  2450, NULL, 10);
INSERT INTO EMP VALUES
        (7788, 'SCOTT',  'ANALYST',   7566,
        '12/9/82', 3000, NULL, 20);
INSERT INTO EMP VALUES
        (7839, 'KING',   'PRESIDENT', NULL,
        '11/17/81', 5000, NULL, 10);
INSERT INTO EMP VALUES
        (7844, 'TURNER', 'SALESMAN',  7698,
        '9/8/1981',  1500,    0, 30);
INSERT INTO EMP VALUES
        (7876, 'ADAMS',  'CLERK',     7788,
        '1/12/83', 1100, NULL, 20);
INSERT INTO EMP VALUES
        (7900, 'JAMES',  'CLERK',     7698,
        '12/3/81',   950, NULL, 30);
INSERT INTO EMP VALUES
        (7902, 'FORD',   'ANALYST',   7566,
        '12/3/81',  3000, NULL, 20);
INSERT INTO EMP VALUES
        (7934, 'MILLER', 'CLERK',     7782,
        '1/23/82', 1300, NULL, 10);
go

CREATE TABLE DEPT
       (DEPTNO INTEGER Primary Key,
        DNAME VARCHAR(14),
        LOC VARCHAR(13) );
go
INSERT INTO DEPT VALUES
        (10,'ACCOUNTING','NEW YORK');
INSERT INTO DEPT VALUES (20,'RESEARCH','DALLAS');
INSERT INTO DEPT VALUES
        (30,'SALES','CHICAGO');
INSERT INTO DEPT VALUES
        (40,'OPERATIONS','BOSTON');

Let us see the two tables.

select * from emp
go
select * from dept
go

emp-dept-table

Here we can see that this is a typical parent-child table relationship. Many a times, the end user wants to see the Employee Names (Enames) and Department Names (Dnames). But for that, they need to surrender to the inner join technique. Right now, only 14 rows are in emp table and 4 rows in dept.

But in a real time production environment, the number of records will be much higher. So this may result into a performance overhead. Therefore the DBA in such scenarios may decide to add the DName column in Emp table itself to eliminate the a JOIN overhead.

Note: Here we are moving towards an intentional De-Normalization technique.

The following ALTER Table command is used for that.

Alter Table Emp
Add Dname Varchar(40)

But now the problem is that for existing records if we see the DName colum, then it will be Null. Let us check it.

select * from emp

sql-adding-dname-column

So now we want to update each Dname as per the Dname of Dept table. For this we will issue the Update statement with join.

Update Emp 
set Dname  = Dept.DName
From Emp, Dept
where Emp.Deptno = Dept.Deptno

Technique: Here the update technique with join is used in the following sequential steps.

1. Every record of Emp is taken.
2. Its deptno is matched with deptno of Dept table.
3. The corresponding DName of Dept table is assigned to that row.
4. Next row of Emp is taken.
5. Step numbers 2 & 3 are repeated.
6. This process continues till the last row is reached by Emp table.

So now let us see the Emp table again.

select * from emp

after-update-with-sql-join

and that’s how we can use the Update statement with JOIN in SQL Server.

SQL Server Substring with CharIndex

In this article we are going to explore the T-SQL function CharIndex and also how to use it with another T-SQL function Substring().

CharIndex: This function returns the location of a substring in a string. By default, it is not case sensitive and is useful to check the position of a particular character or digit in the string. I have seen this function being used in product codes, different types of IDs for validation purpose, or to get some business understanding of that code/ID. Here’s the syntax:

CHARINDEX( Substring or Character, string, [start_position] )

where..

Substring or Character - The string that you want to find. It has to be within string data type family such as char or varchar only.
String - The string or number or an alphanumeric value to search within.
start_position - Optional. The position in string where the search will start. The first position is 1.

The function returns bigint if the expressionToSearch is of varchar(max), nvarchar(max), or varbinary(max) data types; otherwise, it retuns an int.

Substring with CharIndex examples:

select charindex('C','ABCDEF',1) 
-- Returns 3

select charindex('C','ABCDEF') 
-- Still returns 3 as start_position is option, default is 1

select charindex('c','ABCDEF',1) 
-- By default not case sensitive, so returns 3

select charindex('c','ABCDEF' COLLATE Latin1_General_CS_AS) 
–- Returns 0 as lowercase c is not found.

select charindex('Z','ABCDEF',1) 
-- Returns 0 as there is no Z character

select charindex('A','ABCDAEFG',2) 
-- Returns the second position of A, i.e. 5

select charindex('A','ABCDAEFG',9) 
-- Returns 0 as there is no ninth position there is no A

select charindex('4',123456) 
-- Searching from pure numbers, returns 4 as 4 is on fourth position

select charindex('curry','sqlservercurry.com') 
-- It considers the first character of string curry, 
-- i.e. c and then checks the first occurrence, so returns 10

Using CharIndex with Substring:

First let us check the functionality of the Substring function. It extracts a part of the string from the specified position.

Syntax:

Substring(String, Position or Index, Number of characters to extract)
select substring('abcdef', 2, 3) 
-- Returns bcd as 2 is from which position and 3 is number of characters to extract

Now the best usage of CharIndex with Substring is to create a functionality of “Text to Columns”.

That means many a times the names are in the format of “FirstName whitespace LastName” in the same column itself. In such cases, the need is to separate First Name in one column, and Last Name in another column for display purposes.

Consider the following example.

create table Authors
(AuthorID Varchar(2),
 AuthorName Varchar(60)
 )
 go
Insert into Authors Values('A1','Suprotim Agarwal')
Insert into Authors Values('A2','Mahesh Sabnis')
Insert into Authors Values('A3','Mandar Mulay')
go
select * from Authors
go

sql-default-author-table

Now let us say we want to display first name and last name in different columns. For that, we will use the substring function and within it, we will use CharIndex function to know the position number of white space.

See the following query:

select substring(AuthorName,1, charindex(' ',AuthorName)) as [First Name],
substring(AuthorName,(charindex(' ',AuthorName)) + 1, len(AuthorName)) as [Last Name]
from Authors
go

substring-with-charindex

Explanation:

First Name - substring(AuthorName,1, charindex(' ',AuthorName)) means from the first position till the first white space comes extract all the characters.

Last Name - substring(AuthorName,(charindex(' ',AuthorName)) + 1, len(AuthorName)) means to extract from the first white space + 1. Then to extract all remaining characters the len function is used.

And that’s how we use the SQL Server T-SQL Substring function, as well as Substring with CharIndex.