Monday, January 18, 2016

#BackToBasics: Returning all rows from one table that do not exists in another table



In the A challenge for 2016....... accepted! post I said that I would write a post once every month that would explain very basic stuff about SQL Server. Today is the first post, in this post we will take a look at how you can return rows from one table which do not exist in another table.

First we need to create our two tables, these tables will be very simple, each will have only 1 column

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
CREATE TABLE MainData (ID int)
GO

INSERT MainData values(1)
INSERT MainData values(2)
INSERT MainData values(4)
INSERT MainData values(5)
INSERT MainData values(7)
GO


CREATE TABLE SomeData (ID2 int)
GO

INSERT SomeData values(1)
INSERT SomeData values(2)
INSERT SomeData values(3)
INSERT SomeData values(4)
INSERT SomeData values(6)
INSERT SomeData values(8)
GO

Now that our tables are set up it is time to look at the various queries.

NOT IN

This is one of the simpler ways, it is almost a direct translation from English,  select all from this table which is not in the other table
Here is what the query looks like

1
2
SELECT t2.* FROM SomeData t2
WHERE ID2 NOT IN(SELECT ID FROM MainData)

This is the output

ID2
--------
3
6
8

The output is correct, our MainData table has only these values: 1,2,4,5 and 7. The SomeData table has the values 3, 6 and 8, these values do not exist in our MainData table. As you can see the NOT IN query is very simple and will work most of the time, there are however two scenarios where it could be problematic using NOT IN. We will look at these two scenarios at the end of this post.

NOT EXISTS

Using NOT EXISTS is very similar to NOT IN, one addition you have to make is adding a WHERE clause, you are in essence doing a JOIN condition but not returning anything that satisfied this condition. Here is what the query looks like

1
2
SELECT t2.* FROM SomeData t2
WHERE NOT EXISTS (SELECT ID FROM MainData t1 WHERE t1.ID = t2.ID2)

The results for this query are the same as the NOT IN query

LEFT JOIN

A LEFT JOIN query returns all the data from both table, it will return NULL values for the rows that don't have a matched in the outer joined table. If you run the following query


1
2
SELECT * FROM SomeData t2
LEFT JOIN MainData t1 ON t1.ID = t2.ID2

Here is what the results look like

ID2 ID
----    -----
1 1
2 2
3 NULL
4 4
6 NULL
8 NULL


As you can see the rows with the values 3,6 and 8 in the 1st column have a NULL value in the 2nd column.

The LEFT JOIN query is more complex compared to NOT IN and NOT EXISTS. You also need to know  that this query will return data from both tables, this is why we need to specify SELECT t2.*. I also specified t2.* in the other two queries, this was however not needed since  NOT IN and NOT EXISTS only return data from one table. Like in the NOT EXISTS query, you also specify a JOIN condition in the LEFT JOIN query. Finally in the WHERE clause you are filtering out the data which does exists by asking for all the rows where t1.ID is NULL
Here is what the query looks like and the results are the same as for the NOT IN and the NOT EXISTS queries

1
2
3
SELECT t2.* FROM SomeData t2
LEFT JOIN MainData t1 ON t1.ID = t2.ID2
WHERE t1.ID IS NULL


EXCEPT

Using EXCEPT is very easy, the query is basically return every from one table EXCEPT what is returned by the bottom query. Here is what the query looks like

1
2
3
SELECT t2.* FROM SomeData t2
EXCEPT
SELECT ID FROM MainData

This query will return the same data as the other queries. Using EXCEPT is pretty simple and straightforward but I have to warn you, EXCEPT does a sort and is the worst performing query of all the ones mentioned here. I don't recommend using EXCEPT for any big tables


Some problems you might encounter when using NOT IN 


 Take a look at the query below, do you see anything wrong? Run the following query

1
2
SELECT t2.* FROM SomeData t2
WHERE ID2 NOT IN(SELECT ID2 FROM MainData)

You don't get any errors but the query returns nothing. The problem is that the MainData table does not contain a column named ID2, it is named ID. Since the ID2 table does exists in the SomeData table SQL Server does not throw an error. If you use LEFT JOIN or NOT EXISTS, you cannot make this mistake. Run the following two queries to see what happens


1
2
3
4
5
6
SELECT t2.* FROM SomeData t2
WHERE NOT EXISTS (SELECT ID FROM MainData t1 WHERE t1.ID2 = t2.ID2)

SELECT t2.* FROM SomeData t2
LEFT JOIN MainData t1 ON t1.ID2 = t2.ID2
WHERE t1.ID IS NULL
Here is the output

Msg 207, Level 16, State 1, Line 75
Invalid column name 'ID2'.
Msg 207, Level 16, State 1, Line 79
Invalid column name 'ID2'.

As you can see you got an error.

NULL values will also cause a problem when using NOT IN. Add the following NULL value to the MainData table

1
INSERT MainData values(NULL)

If you run all the 3 different queries again, you will notice something

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
SELECT t2.ID2 as 'Not in' FROM SomeData t2
WHERE ID2 NOT IN(SELECT ID FROM MainData)

SELECT t2.ID2 as 'Not Exists' FROM SomeData t2
WHERE NOT EXISTS (SELECT ID FROM MainData t1 WHERE t1.ID = t2.ID2)

SELECT t2.ID2 as 'Left Join' FROM SomeData t2
LEFT JOIN MainData t1 ON t1.ID = t2.ID2
WHERE t1.ID IS NULL

SELECT t2.ID2 as 'Except' FROM SomeData t2
EXCEPT
SELECT ID FROM MainData
Here are the results

Not in
-----------

Not Exists
-----------
3
6
8

Left Join
-----------
3
6
8

Except
-----------
3
6
8

Do you see what happened? The NOT IN query is not returning anything, a NULL value is not equal to anything, not even to another NULL value.

So that's it for this post, I showed you four ways to return values from one table which do not exists in another table. I also showed you why NOT IN might cause some problems. In general I like to use EXISTS and NOT EXISTS unless I need data from both table, in that case I will use a JOIN.

Till next month. If you want me to cover a topic leave me a comment.

2 comments:

Unknown said...

To "fix" your left join, you can use a left outer join (outer meaning don't include commonalities between tables).

Denis Gobo said...

??? LEFT JOIN and LEFT OUTER JOIN is the same thing (OUTER is an optional keyword)