You Should Never Use IN In SQL To JOIN With Another Table
Do you use the following syntax?
SELECT *
FROM TABLE
WHERE CoLumn IN (SELECT Column FROM TABLE)
OR this?
SELECT *
FROM TABLE
WHERE CoLumn NOT IN (SELECT Column FROM TABLE)
Do NOT use that, it will cause problems sooner or later. Don't believe me? Let's take a look
First create these 2 tables and populate them with some sample data
CREATE TABLE TestTable1 (id1 int)
CREATE TABLE TestTable2 (id2 int)
INSERT TestTable1 VALUES(1)
INSERT TestTable1 VALUES(2)
INSERT TestTable1 VALUES(3)
INSERT TestTable2 VALUES(1)
INSERT TestTable2 VALUES(2)
Now let's run the IN query
SELECT *
FROM TestTable1
WHERE id1 IN (SELECT id2 FROM TestTable2)
-----
id1
1
2
No problems here right?
What if by mistake you wrote id1 instead of id2?
SELECT *
FROM TestTable1
WHERE id1 IN (SELECT id1 FROM TestTable2)
-----
id1
1
2
3
Oops all 3 rows are returned, if you just run this SELECT id1 FROM TestTable2 you will get this error
Server: Msg 207, Level 16, State 3, Line 1
Invalid column name 'id1'.
So what happens? SQL Server sees column id1 and says "yes I have that it is in the TestTable1 table, I can use that"
What can we do? Use EXISTS because you will get an error instead of a wrong resultset
SELECT *
FROM t1
WHERE EXISTS (SELECT * FROM TestTable2 t2 WHERE t2.id2 = t1.id1 )
-----
id1
1
2
A JOIN will do the same as EXISTS
SELECT t1.*
FROM TestTable1 t1
JOIN TestTable2 t2 ON t2.id2 = t1.id1
-----
id1
1
2
Now let's try NOT IN
SELECT *
FROM TestTable1
WHERE id1 NOT IN (SELECT id2 FROM TestTable2)
-----
id1
3
No problem right?
Add a NULL value to the TestTable2 table
INSERT TestTable2 VALUES(NULL)
Let's try running it again
SELECT *
FROM TestTable1
WHERE id1 NOT IN (SELECT id2 FROM TestTable2)
Where are my rows? Nowhere, since NULL is not equal to anything including another NULL SQL just returns nothing
What happens when you use NOT EXISTS?
SELECT *
FROM TestTable1 t1
WHERE NOT EXISTS (SELECT * FROM TestTable2 t2 WHERE t2.id2 = t1.id1 )
-----
id1
3
That works without a problem
What about a LEFT JOIN?
SELECT t1.*
FROM TestTable1 t1
LEFT JOIN TestTable2 t2 ON t2.id2 = t1.id1
WHERE t2.id2 IS NULL
-----
id1
3
That works without a problem also
So from now on use EXISTS, NOT EXISTS, JOIN and LEFT JOIN
DO NOT use IN or NOT IN ever again. Okay?
SELECT *
FROM TABLE
WHERE CoLumn IN (SELECT Column FROM TABLE)
OR this?
SELECT *
FROM TABLE
WHERE CoLumn NOT IN (SELECT Column FROM TABLE)
Do NOT use that, it will cause problems sooner or later. Don't believe me? Let's take a look
First create these 2 tables and populate them with some sample data
CREATE TABLE TestTable1 (id1 int)
CREATE TABLE TestTable2 (id2 int)
INSERT TestTable1 VALUES(1)
INSERT TestTable1 VALUES(2)
INSERT TestTable1 VALUES(3)
INSERT TestTable2 VALUES(1)
INSERT TestTable2 VALUES(2)
Now let's run the IN query
SELECT *
FROM TestTable1
WHERE id1 IN (SELECT id2 FROM TestTable2)
-----
id1
1
2
No problems here right?
What if by mistake you wrote id1 instead of id2?
SELECT *
FROM TestTable1
WHERE id1 IN (SELECT id1 FROM TestTable2)
-----
id1
1
2
3
Oops all 3 rows are returned, if you just run this SELECT id1 FROM TestTable2 you will get this error
Server: Msg 207, Level 16, State 3, Line 1
Invalid column name 'id1'.
So what happens? SQL Server sees column id1 and says "yes I have that it is in the TestTable1 table, I can use that"
What can we do? Use EXISTS because you will get an error instead of a wrong resultset
SELECT *
FROM t1
WHERE EXISTS (SELECT * FROM TestTable2 t2 WHERE t2.id2 = t1.id1 )
-----
id1
1
2
A JOIN will do the same as EXISTS
SELECT t1.*
FROM TestTable1 t1
JOIN TestTable2 t2 ON t2.id2 = t1.id1
-----
id1
1
2
Now let's try NOT IN
SELECT *
FROM TestTable1
WHERE id1 NOT IN (SELECT id2 FROM TestTable2)
-----
id1
3
No problem right?
Add a NULL value to the TestTable2 table
INSERT TestTable2 VALUES(NULL)
Let's try running it again
SELECT *
FROM TestTable1
WHERE id1 NOT IN (SELECT id2 FROM TestTable2)
Where are my rows? Nowhere, since NULL is not equal to anything including another NULL SQL just returns nothing
What happens when you use NOT EXISTS?
SELECT *
FROM TestTable1 t1
WHERE NOT EXISTS (SELECT * FROM TestTable2 t2 WHERE t2.id2 = t1.id1 )
-----
id1
3
That works without a problem
What about a LEFT JOIN?
SELECT t1.*
FROM TestTable1 t1
LEFT JOIN TestTable2 t2 ON t2.id2 = t1.id1
WHERE t2.id2 IS NULL
-----
id1
3
That works without a problem also
So from now on use EXISTS, NOT EXISTS, JOIN and LEFT JOIN
DO NOT use IN or NOT IN ever again. Okay?
Labels: tip
Social bookmark this




14 Comments:
I beg to differ. The examples you gave are not very convincing - one is an obvious mistake nobody should make, and the second - null value in an a key column - is a mistake even bigger.
The *real* reason to use NOT EXISTS over NOT IN is performance: with NOT IN the database engine cannot use indexes, and needs to perform a full table scan.
Nice tips! Thanks!
......or, instead of abandoning a useful feature, write proper sql.
Wow, this has to be one of the worst sql suggestions I've ever seen. IN is there for a reason, and the reasons you've given against are ridiculous. You could argue for some kind of JOIN purity or from perf, but instead came up with what?
What's the point of this? Sorry, I don't get it..
Ehr... interesting title, disappointing post :-(
hmm..i'll have to try this tomorrow.. i had just refactored a group of stored procedures from using a join to using IN..
the join was causing lots and lots of disk I/O as the joined table is very very large but the # of records needed was very small..
converting to IN allowed more efficient use of the existing indexes and lessened disk i/o..the resulting query plan used a left-semi join if i remember correctly..
i'll have to see if there is a difference with EXISTS..
leszek, true not exists will perform better than not in however it will be an index scan not a table scan. You can't have NULLs in a PK column
>>one is an obvious mistake nobody should make
Yes but if you do make it you are not warned (no error is generated)
I think you should avoid using ALL the SQL keywords... that would eliminate a whole host of analagous errors. Might want to avoid using all keywords in all languages, actually.
Use aliases and it will protect you against typing errors:
SELECT *
FROM TestTable1 t1
WHERE t1.id1 IN (SELECT t2.id1 FROM TestTable2 t2)
Now you get an error instead of a silent wrong operation.
I'm sorry, but this article is absolutely absurd.
Your argument is because you might type something wrong? Please. Remove this article, it's pointless.
oh noes! I typed SLEGRPPP instead of SELECT, never use 'SELECT' again!
>>oh noes! I typed SLEGRPPP instead of SELECT, never use 'SELECT' again!
You completely missed the point, when you mistype select you will get an error, when you mistype the column name you won't
Geez, these developers are friggin' morons. Such a simple article, and they don't seem to get it.
The issue, as stated numerous times in the article, is that no error is generated, NOT THAT THE SQL IS "POOR".
I RARELY use the "IN" syntax when joining, but sometimes just as a quick one-off I do. That's when I noticed this behaviour, and found this great article.
pretty interesting article..
Post a Comment
<< Home