SQL Server Inner Join - Concepts and Best practices

In a database such as SQL Server, we create tables which are related to each other through a common column. For example, if we have to store customers’ related information, then the customer’s personal information gets stored in one table and the same customer’s transaction information, gets stored in another table. Both the tables have CustID as a common column.

We can implement RDBMS (Relational Database Management System) concepts through any database, such as SQL Server. So the same entity’s information flows down in different tables. This is part of the Normalization process within RDBMS framework.

However as part of the requirement, we may need to show a report containing customer information from multiple tables, on a matching column condition. Here arises the need of using the technique of Joins in SQL scripting.

Example - Let us create an Employee specific set of 2 tables in a new database.

Execute the following script in your instance of SQL Server.

Use Master
go
Create Database Inner_Join
go
Use Inner_Join
go
CREATE TABLE EMP
       (EMPNO Integer,
        ENAME Varchar(15),
        JOB Varchar(9),
        MGR Integer,
        HIREDATE datetime,
        SAL Integer,
        COMM Integer,
        DEPTNO Integer)
go
INSERT INTO EMP VALUES
        (7369, 'SMITH', 'CLERK', 7902, '12/17/80', 800, NULL, 20)
INSERT INTO EMP VALUES
        (7499, 'ALLEN', 'SALESMAN', 7698, '5/20/81', 1600, 300, 30)
INSERT INTO EMP VALUES
        (7521, 'WARD', 'SALESMAN', 7698, '5/22/81', 1250, 500, 30)
INSERT INTO EMP VALUES
        (7566, 'JONES', 'MANAGER',   7839, '4/2/81', 2975, NULL, 20)
INSERT INTO EMP VALUES
        (7654, 'MARTIN', 'SALESMAN', 7698, '9/28/81', 1250, 1400, 30)
INSERT INTO EMP VALUES
        (7698, 'BLAKE', 'MANAGER',   7839, '5/1/81', 2850, NULL, 30)
INSERT INTO EMP VALUES
        (7782, 'CLARK', 'MANAGER',   7839, '6/9/81', 2450, NULL, 10)
INSERT INTO EMP VALUES
        (7788, 'SCOTT',  'ANALYST',  7566, '12/9/82', 3000, NULL, 20)
INSERT INTO EMP VALUES
        (7839, 'KING',   'PRESIDENT', NULL, '11/17/81', 5000, NULL, 10)
INSERT INTO EMP VALUES
        (7844, 'TURNER', 'SALESMAN', 7698, '9/8/1981',1500,    0, 30)
INSERT INTO EMP VALUES
        (7876, 'ADAMS', 'CLERK', 7788, '1/12/83', 1100, NULL, 20)
INSERT INTO EMP VALUES
        (7900, 'JAMES', 'CLERK', 7698,  '12/3/81', 950, NULL, 30)
INSERT INTO EMP VALUES
        (7902, 'FORD',   'ANALYST', 7566, '12/3/81', 3000, NULL, 20)
INSERT INTO EMP VALUES
        (7934, 'MILLER', 'CLERK',  7782, '1/23/82', 1300, NULL, 10)
go
CREATE TABLE DEPT
       (DEPTNO INTEGER,
        DNAME VARCHAR(14),
        LOC VARCHAR(13) )
go
INSERT INTO DEPT VALUES (10,'ACCOUNTING','NEW YORK')
INSERT INTO DEPT VALUES (20,'RESEARCH','DALLAS')
INSERT INTO DEPT VALUES (30,'SALES','CHICAGO')
INSERT INTO DEPT VALUES (40,'OPERATIONS','BOSTON')
go

Let us see the tables to understand the relationship.

use Inner_Join
go
select * from dept
go
select * from emp
go

dept-no

You can observe that Deptno is the common field in both the tables, Dept & Emp. Department details are in Dept table and Employee table contains a DEPTNO column which represents the Department Number an Employee belongs to. Let us see two records of Emp table to get a clarity. Consider the record of SMITH in Emp table. Smith’s Deptno is 20, which means Smith’s Department Name (DNAME) is RESEARCH. Similarly consider ALLEN’s record. His Deptno is 30, which means Allen’s DNAME is Sales. So for the first two records, we have mapped the common column Deptno!
Now we want to display the ENAME and DNAME for the matching Deptno. This is possible by a technique within SQL scripting that is called as Joins.

There are multiple types of joins, but in this article we are going to focus on Inner Join or also known as Equi-Join.

Editorial Note: Check this artilce to understand Types of JOIN in SQL Server - Inner, Self, Outer and Cross JOIN

So what is Inner Join? Inner join is displaying data from multiple tables on matching values of common field (s), within the associated tables.

Inner Join Syntax

There are two types of syntaxes for Inner join:

1) Non-ANSI
2) ANSI

Let us see the Non-ANSI syntax first:

select ename, dname
from emp, dept
where emp.deptno = dept.deptno

non-ansi-sql

One thing to note is that in the from clause of SQL statement, we have two tables separated by a comma. The mapping of common column is done in the where clause.

How does it work internally? Whenever a JOIN command is given, then each record of the first table is evaluated with each record of the second table. That means a Cartesian product happens and then as per the matching values of where clause, those records are shown. As per the Emp and Dept tables, 14 records of Emp are cross evaluated with 4 records of Dept table. So overall 14 * 4 = 56 evaluations will happen and matching value records are shown.

Now, let us remove the where clause and see what happens.

select ename, dname
from emp, dept

where-clause
rows-affected

Oops!! The Cartesian product result is displayed now! Due to the absence of where clause, each evaluation has been displayed as well. So this output is logically going wrong! Actually we have accidentally landed in another type of join that is Cross Join.

Now, let us have the common field for display purpose. We will include the Deptno column along with Ename and Dname columns in the column list.

select ename, dname, deptno
from emp, dept
where emp.deptno = dept.deptno

It has thrown the following error –

ambiguous-column

The common field’s name is same in both the tables. So a confusion arose in understanding that deptno from which table should be considered. The error would not have come if the common column had different name. In such a case where name is same, it is mandatory to mention the table name and . (dot) separator as prefix .

Let us rectify the above query.

select ename, dname, emp.deptno
from emp, dept
where emp.deptno = dept.deptno

inner-join-result

Ideally table name prefix should be mentioned for each column irrespective of whether it is common or uncommon. Advantage of this approach is that it will give clarity to any other user who is not habitual with the tables & its columns.

So the same query will be written as follows:

select emp.ename, dept.dname, emp.deptno
from emp, dept
where emp.deptno = dept.deptno

But in projects, table names many a times are lengthy. So mentioning a lengthy table name repeatedly the will be cumbersome. To avoid that, introduce a table alias in the from clause. Use that alias in the column list and where clause.

JOIN – Best practice

So now I am coming to the best practice of writing a JOIN statement.

select e.ename, d.dname, e.deptno
from emp e, dept d
where e.deptno = d.deptno

Internally, from clause is getting executed first, then the where clause and finally the select clause as per the above statement. 

INNER JOIN – ANSI Syntax

So far we have seen the NON-ANSI syntax of joining tables. Now let us see the ANSI syntax. As per ANSI Joins, there are two changes:

1. The comma within from clause gets replaced by Join type specification
2. The JOIN condition has to be given using ON clause instead of where clause.

The select statement will look like this:

select e.ename, d.dname, e.deptno
from emp e Inner Join dept d
On e.deptno = d.deptno

Amongst the multiple types of joins, the default type of join is an Inner Join. So Inner is an optional keyword. The query can have only Join as the keyword which will be as follows: 

select e.ename, d.dname, e.deptno
from emp e Join dept d
On e.deptno = d.deptno 

In Non-ANSI style of join, we have observed that if the where clause is skipped, then it shows a Cartesian product output. On the same lines, let us try to omit the On clause in ANSI and let’s see what happens.

select e.ename, d.dname, e.deptno
from emp e Join dept d

It is throwing an error.

incorrect-syntax

It means that On clause is mandatory in ANSI type of join. The advantage of ANSI join is that it is a dedicated inner join and it prevents an accidental Cartesian product.

About one-to-many relationship for Inner join

The inner or equi join will be logically correct if we have one-to-many relationship between the values of common field. In the Emp and Dept tables, there is such relationship, so the output is correct, i.e a matching deptno is once in the dept table and it is multiple times in the emp table.

What if there is a many-to-many relationship within the tables? Right now the deptno column is not having a primary constraint in the Dept table. So we can add one more record of the same deptno 10.

Insert into Dept Values(10, 'Testing', 'CALIFORNIA')
Go

So right now there are 2 records of the same deptno 10 in dept table.

Let us see that –

select * from dept

duplicate-records

Now we will execute the same ANSI join statement again.

select e.ename, d.dname, e.deptno
from emp e Join dept d
On e.deptno = d.deptno 

The output has logically gone wrong!

wrong-output

CLARK, KING and MILLER are displayed in both Accounting & Testing departments.

An Inner join will give correct result only when there is one-to-many relationship. Therefore normally this problem gets eliminated when we have Primary Key and Foreign Key relationship within the common field.

Let us delete this additional record of deptno 10 so that everything goes back to normal.

Delete from Dept
where DName = 'Testing'
go

Important features of Inner Join

1. Common field name may be same or different.
2. The data type and size of common fields should be ideally the same.
3. If the data type is not same, then using type casting functions the data types should be made same in the On clause.
4. There should be one-to-many relationship within the common field values.
I hope through this article, your Inner Join principle now is as sound as a dollar!


No comments: