I'm just wondering if the amount of id's in a list will influence query performance.

query example:

SELECT * FROM foos WHERE foos.ID NOT IN (2, 4, 5, 6, 7)

Where (2, 4, 5, 6, 7) is an indefinitely long list.

And how many is too many (in context of order)?

UPDATE: The reason why i'm asking it because i have two db. On of it (read-only) is the source of items and another one contain items that is processed by operator. Every time when operator asking for new item from read-only db I want to exclude item that is already processed.

4

Best Answer


Yes, the amount of IDs in the list will impact performance. A network packet is only so big, for example, and the database has to parse all that noise and turn it into a series of:

WHERE foo.ID <> 2AND foo.ID <> 4AND foo.ID <> 5AND ...

You should consider other ways to let your query know about this set.

Here is wacky rewrite of that query that might perform a little better

SELECT * FROM foosLEFT JOIN(SELECT 2 id UNIONSELECT 4 UNIONSELECT 5 UNIONSELECT 6 UNIONSELECT 7) NOT_IDSUSING (id) WHERE NOT_IDS.id IS NULL;

The NOT_IDS subquery does work as shown by the following:

mysql> SELECT * FROM-> (-> SELECT 2 id UNION-> SELECT 4 UNION-> SELECT 5 UNION-> SELECT 6 UNION-> SELECT 7-> ) NOT_IDS;+----+| id |+----+| 2 || 4 || 5 || 6 || 7 |+----+5 rows in set (0.00 sec)mysql>

Just for fun, and given your update, I'm going to suggest a different strategy:

You could join across tables like so ...

insert into db1.foos (cols) select colsfrom db2.foos srcleft join db1.foos dston src.pk = dst.pkwhere dst.othercolumn is null

I'm not sure how the optimizer will handle this or if it's going to be faster (depends on your indexing strategy, I guess) than what you're doing.

The db's are in the same server? If yes you can make a multi-db query with a left join and take the null ones. (here an example: Querying multiple databases at once ) . Otherwise you can make a stored procedure, pass the id's with a string, and split them inside with a regular expression. I have a similar problem, but within an in-memory db and a postgres db. Luckly my situation is (In...)