What are the most common SQL anti-patterns?

Sql Problem Overview

All of us who work with relational databases have learned (or are learning) that SQL is different. Eliciting the desired results, and doing so efficiently, involves a tedious process partly characterized by learning unfamiliar paradigms, and finding out that some of our most familiar programming patterns don't work here. What are the common antipatterns you've seen (or yourself committed)?

Sql Solutions

Solution 1 - Sql

I am consistently disappointed by most programmers' tendency to mix their UI-logic in the data access layer:

SELECT
    FirstName + ' ' + LastName as "Full Name",
    case UserRole
        when 2 then "Admin"
        when 1 then "Moderator"
        else "User"
    end as "User's Role",
    case SignedIn
        when 0 then "Logged in"
        else "Logged out"
    end as "User signed in?",
    Convert(varchar(100), LastSignOn, 101) as "Last Sign On",
    DateDiff('d', LastSignOn, getDate()) as "Days since last sign on",
    AddrLine1 + ' ' + AddrLine2 + ' ' + AddrLine3 + ' ' +
        City + ', ' + State + ' ' + Zip as "Address",
    'XXX-XX-' + Substring(
        Convert(varchar(9), SSN), 6, 4) as "Social Security #"
FROM Users

Normally, programmers do this because they intend to bind their dataset directly to a grid, and its just convenient to have SQL Server format server-side than format on the client.

Queries like the one shown above are extremely brittle because they tightly couple the data layer to the UI layer. On top of that, this style of programming thoroughly prevents stored procedures from being reusable.

Solution 2 - Sql

Here are my top 3.

Number 1. Failure to specify a field list. (Edit: to prevent confusion: this is a production code rule. It doesn't apply to one-off analysis scripts - unless I'm the author.)

SELECT *
Insert Into blah SELECT *

should be

SELECT fieldlist
Insert Into blah (fieldlist) SELECT fieldlist

Number 2. Using a cursor and while loop, when a while loop with a loop variable will do.

DECLARE @LoopVar int

SET @LoopVar = (SELECT MIN(TheKey) FROM TheTable)
WHILE @LoopVar is not null
BEGIN
  -- Do Stuff with current value of @LoopVar
  ...
  --Ok, done, now get the next value
  SET @LoopVar = (SELECT MIN(TheKey) FROM TheTable
    WHERE @LoopVar < TheKey)
END

Number 3. DateLogic through string types.

--Trim the time
Convert(Convert(theDate, varchar(10), 121), datetime)

Should be

--Trim the time
DateAdd(dd, DateDiff(dd, 0, theDate), 0)

I've seen a recent spike of "One query is better than two, amiright?"

SELECT *
FROM blah
WHERE (blah.Name = @name OR @name is null)
  AND (blah.Purpose = @Purpose OR @Purpose is null)

This query requires two or three different execution plans depending on the values of the parameters. Only one execution plan is generated and stuck into the cache for this SQL text. That plan will be used regardless of the value of the parameters. This results in intermittent poor performance. It is much better to write two queries (one query per intended execution plan).

Solution 3 - Sql

Human readable password fields, egad. Self explanatory.
Using LIKE against indexed columns, and I'm almost tempted to just say LIKE in general.
Recycling SQL-generated PK values.
Surprise nobody mentioned the god-table yet. Nothing says "organic" like 100 columns of bit flags, large strings and integers.
Then there's the "I miss .ini files" pattern: storing CSVs, pipe delimited strings or other parse required data in large text fields.
And for MS SQL server the use of cursors at all. There's a better way to do any given cursor task.

Edited because there's so many!

Solution 4 - Sql

Don't have to dig deep for it: Not using prepared statements.

Solution 5 - Sql

Using meaningless table aliases:

from employee t1,
department t2,
job t3,
...

Makes reading a large SQL statement so much harder than it needs to be

Solution 6 - Sql

var query = "select COUNT(*) from Users where UserName = '" 
            + tbUser.Text 
            + "' and Password = '" 
            + tbPassword.Text +"'";

Blindly trusting user input
Not using parameterized queries
Cleartext passwords

Solution 7 - Sql

My bugbears are the 450 column Access tables that have been put together by the 8 year old son of the Managing Director's best friends dog groomer and the dodgy lookup table that only exists because somebody doesn't know how to normalise a datastructure properly.

Typically, this lookup table looks like this:

ID INT,
Name NVARCHAR(132),
IntValue1 INT,
IntValue2 INT,
CharValue1 NVARCHAR(255),
CharValue2 NVARCHAR(255),
Date1 DATETIME,
Date2 DATETIME

I've lost count of the number of clients I've seen who have systems that rely on abominations like this.

Solution 8 - Sql

The ones that I dislike the most are

Using spaces when creating tables, sprocs etc. I'm fine with CamelCase or under_scores and singular or plurals and UPPERCASE or lowercase but having to refer to a table or column [with spaces], especially if [ it is oddly spaced] (yes, I've run into this) really irritates me.
Denormalized data. A table doesn't have to be perfectly normalized, but when I run into a table of employees that has information about their current evaluation score or their primary anything, it tells me that I will probably need to make a separate table at some point and then try to keep them synced. I will normalize the data first and then if I see a place where denormalization helps, I'll consider it.
Overuse of either views or cursors. Views have a purpose, but when each table is wrapped in a view it's too much. I've had to use cursors a few times, but generally you can use other mechanisms for this.
Access. Can a program be an anti-pattern? We have SQL Server at my work, but a number of people use access due to it's availabilty, "ease of use" and "friendliness" to non-technical users. There is too much here to go into, but if you've been in a similar environment, you know.

Solution 9 - Sql

use SP as the prefix of the store procedure name because it will first search in the System procedures location rather than the custom ones.

Solution 10 - Sql

Overuse of temporary tables and cursors.

Solution 11 - Sql

For storing time values, only UTC timezone should be used. Local time should not be used.

Solution 12 - Sql

using @@IDENTITY instead of SCOPE_IDENTITY()

Quoted from this answer :

@@IDENTITY returns the last identity value generated for any table in the current session, across all scopes. You need to be careful here, since it's across scopes. You could get a value from a trigger, instead of your current statement.
SCOPE_IDENTITY returns the last identity value generated for any table in the current session and the current scope. Generally what you want to use.
IDENT_CURRENT returns the last identity value generated for a specific table in any session and any scope. This lets you specify which table you want the value from, in case the two above aren't quite what you need (very rare). You could use this if you want to get the current IDENTITY value for a table that you have not inserted a record into.

Solution 13 - Sql

Re-using a 'dead' field for something it wasn't intended for (e.g. storing user data in a 'Fax' field) - very tempting as a quick fix though!

Solution 14 - Sql

select some_column, ...
from some_table
group by some_column

and assuming that the result will be sorted by some_column. I've seen this a bit with Sybase where the assumption holds (for now).

Solution 15 - Sql

SELECT FirstName + ' ' + LastName as "Full Name", case UserRole when 2 then "Admin" when 1 then "Moderator" else "User" end as "User's Role", case SignedIn when 0 then "Logged in" else "Logged out" end as "User signed in?", Convert(varchar(100), LastSignOn, 101) as "Last Sign On", DateDiff('d', LastSignOn, getDate()) as "Days since last sign on", AddrLine1 + ' ' + AddrLine2 + ' ' + AddrLine3 + ' ' + City + ', ' + State + ' ' + Zip as "Address", 'XXX-XX-' + Substring(Convert(varchar(9), SSN), 6, 4) as "Social Security #" FROM Users

Or, cramming everything into one line.

Solution 16 - Sql

The FROM TableA, TableB WHERE syntax for JOINS rather than FROM TableA INNER JOIN TableB ON
Making assumptions that a query will be returned sorted a certain way without putting an ORDER BY clause in, just because that was the way it showed up during testing in the query tool.

Solution 17 - Sql

Learning SQL in the first six months of their career and never learning anything else for the next 10 years. In particular not learning or effectively using windowing/analytical SQL features. In particular the use of over() and partition by.

> Window functions, like aggregate > functions, perform an aggregation on a > defined set (a group) of rows, but > rather than returning one value per > group, window functions can return > multiple values for each group.

See O'Reilly SQL Cookbook Appendix A for a nice overview of windowing functions.

Solution 18 - Sql

I need to put my own current favorite here, just to make the list complete. My favorite antipattern is not testing your queries.

This applies when:

Your query involves more than one table.
You think you have an optimal design for a query, but don't bother to test your assumptions.
You accept the first query that works, with no clue about whether it's even close to optimized.

And any tests run against atypical or insufficient data don't count. If it's a stored procedure, put the test statement into a comment and save it, with the results. Otherwise, put it into a comment in the code with the results.

Solution 19 - Sql

Temporary Table abuse.

Specifically this sort of thing:

SELECT personid, firstname, lastname, age
INTO #tmpPeople
FROM People
WHERE lastname like 's%'

DELETE FROM #tmpPeople
WHERE firstname = 'John'

DELETE FROM #tmpPeople
WHERE firstname = 'Jon'

DELETE FROM #tmpPeople
WHERE age > 35

UPDATE People
SET firstname = 'Fred'
WHERE personid IN (SELECT personid from #tmpPeople)

Don't build a temporary table from a query, only to delete the rows you don't need.

And yes, I have seen pages of code in this form in production DBs.

Solution 20 - Sql

Contrarian view: over-obsession with normalization.

Most SQL/RBDBs systems give one lots of features (transactions, replication) that are quite useful, even with unnormalized data. Disk space is cheap, and sometimes it can be simpler (easier code, faster development time) to manipulate / filter / search fetched data, than it is to write up 1NF schema, and deal with all the hassles therein (complex joins, nasty subselects, etc).

I have found the over-normalized systems are often premature optimization, especially during early development stages.

(more thoughts on it... http://writeonly.wordpress.com/2008/12/05/simple-object-db-using-json-and-python-sqlite/)

Solution 21 - Sql

I just put this one together, based on some of the SQL responses here on SO.

It is a serious antipattern to think that triggers are to databases as event handlers are to OOP. There's this perception that just any old logic can be put into triggers, to be fired off when a transaction (event) happens on a table.

Not true. One of the big differences are that triggers are synchronous - with a vengeance, because they are synchronous on a set operation, not on a row operation. On the OOP side, exactly the opposite - events are an efficient way to implement asynchronous transactions.

Solution 22 - Sql

Stored Procedures or Functions without any comments...

Solution 23 - Sql

I don't know it's an "official" anti-pattern, but I dislike and try to avoid string literals as magic values in a database column.

An example from MediaWiki's table 'image':

img_media_type ENUM("UNKNOWN", "BITMAP", "DRAWING", "AUDIO", "VIDEO", 
    "MULTIMEDIA", "OFFICE", "TEXT", "EXECUTABLE", "ARCHIVE") default NULL,
img_major_mime ENUM("unknown", "application", "audio", "image", "text", 
    "video", "message", "model", "multipart") NOT NULL default "unknown",

(I just notice different casing, another thing to avoid)

I design such cases as int lookups into tables ImageMediaType and ImageMajorMime with int primary keys.

date/string conversion that relies on specific NLS settings

CONVERT(NVARCHAR, GETDATE())

without format identifier

Solution 24 - Sql

Identical subqueries in a query.

Solution 25 - Sql

The Altered View - A view that is altered too often and without notice or reason. The change will either be noticed at the most inappropriate time or worse be wrong and never noticed. Maybe your application will break because someone thought of a better name for that column. As a rule views should extend the usefulness of base tables while maintaining a contract with consumers. Fix problems but don't add features or worse change behavior, for that create a new view. To mitigate do not share views with other projects and, use CTEs when platforms allow. If your shop has a DBA you probably can't change views but all your views will be outdated and or useless in that case.
The !Paramed - Can a query have more than one purpose? Probably but the next person who reads it won't know until deep meditation. Even if you don't need them right now chances are you will, even if it's "just" to debug. Adding parameters lowers maintenance time and keep things DRY. If you have a where clause you should have parameters.

The case for no CASE -

SELECT  
CASE @problem  
  WHEN 'Need to replace column A with this medium to large collection of strings hanging out in my code.'  
    THEN 'Create a table for lookup and add to your from clause.'  
  WHEN 'Scrubbing values in the result set based on some business rules.'  
    THEN 'Fix the data in the database'  
  WHEN 'Formating dates or numbers.'   
    THEN 'Apply formating in the presentation layer.'  
  WHEN 'Createing a cross tab'  
    THEN 'Good, but in reporting you should probably be using cross tab, matrix or pivot templates'   
ELSE 'You probably found another case for no CASE but now I have to edit my code instead of enriching the data...' END

Solution 26 - Sql

The two I find the most, and can have a significant cost in terms of performance are:

Using cursors instead of a set based expression. I guess this one occurs frequently when the programmer is thinking procedurely.
Using correlated sub-queries, when a join to a derived table can do the job.

Solution 27 - Sql

Putting stuff in temporary tables, especially people who switch from SQL Server to Oracle have a habit of overusing temporary tables. Just use nested select statements.

Solution 28 - Sql

Developers who write queries without having a good idea about what makes SQL applications (both individual queries and multi-user systems) fast or slow. This includes ignorance about:

physical I/O minimization strategies, given that most queries' bottleneck is I/O not CPU
perf impact of different kinds of physical storage access (e.g. lots of sequential I/O will be faster than lots of small random I/O, although less so if your physical storage is an SSD!)
how to hand-tune a query if the DBMS produces a poor query plan
how to diagnose poor database performance, how to "debug" a slow query, and how to read a query plan (or EXPLAIN, depending on your DBMS of choice)
locking strategies to optimize throughput and avoid deadlocks in multi-user applications
importance of batching and other tricks to handle processing of data sets
table and index design to best balance space and performance (e.g. covering indexes, keeping indexes small where possible, reducing data types to minimum size needed, etc.)

Solution 29 - Sql

Using primary keys as a surrogate for record addresses and using foreign keys as a surrogate for pointers embedded in records.

Solution 30 - Sql

Using SQL as a glorified ISAM (Indexed Sequential Access Method) package. In particular, nesting cursors instead of combining SQL statements into a single, albeit larger, statement. This also counts as 'abuse of the optimizer' since in fact there isn't much the optimizer can do. This can be combined with non-prepared statements for maximum inefficiency:

DECLARE c1 CURSOR FOR SELECT Col1, Col2, Col3 FROM Table1

FOREACH c1 INTO a.col1, a.col2, a.col3
    DECLARE c2 CURSOR FOR
        SELECT Item1, Item2, Item3
            FROM Table2
            WHERE Table2.Item1 = a.col2
    FOREACH c2 INTO b.item1, b.item2, b.item3
        ...process data from records a and b...
    END FOREACH
END FOREACH

The correct solution (almost always) is to combine the two SELECT statements into one:

DECLARE c1 CURSOR FOR
    SELECT Col1, Col2, Col3, Item1, Item2, Item3
        FROM Table1, Table2
        WHERE Table2.Item1 = Table1.Col2
        -- ORDER BY Table1.Col1, Table2.Item1

FOREACH c1 INTO a.col1, a.col2, a.col3, b.item1, b.item2, b.item3
    ...process data from records a and b...
END FOREACH

The only advantage to the double loop version is that you can easily spot the breaks between values in Table1 because the inner loop ends. This can be a factor in control-break reports.

Also, sorting in the application is usually a no-no.

Solution 31 - Sql

I just came across view definition like this:

CREATE OR REPLACE FORCE VIEW PRICE (PART_NUMBER, PRICE_LIST, LIST_VERSION ...)
AS
  SELECT sp.MKT_PART_NUMBER,
    sp.PRICE_LIST,
    sp.LIST_VERSION,
    sp.MIN_PRICE,
    sp.UNIT_PRICE,
    sp.MAX_PRICE,
...

There are 50 or so columns in the view. Some developers take a small pride torturing others by not providing column aliases, so one have to count column offset in both places in order to be able to figure out what column in a view corresponds to.

Solution 32 - Sql

I have seen too many people holding on for dear life to IN (...) while totally oblivious to EXISTS. For a good example, see Symfony Propel ORM.

Solution 33 - Sql

Application Joins Not solely an SQL issue, but looking for descriptions of the problem and finding this question, I was surprised it wasn't listed.

As I've heard the phrase used, an application join, is when you pull a set of rows out of each of two or more tables and then join them in your (Java) code with a pair of nested for loops. This burdens the system (your app and the database) with having to identify the whole cross product, retrieving it and sending it to the appication. Assuming the app can filter the cross product down as fast as the database (dubious), just cutting the result set down sooner means less data transfer.

Solution 34 - Sql

Joining redundant tables into a query like this:

select emp.empno, dept.deptno
from emp
join dept on dept.deptno = emp.deptno;

Solution 35 - Sql

Having 1 table

code_1
value_1
code_2
value_2
...
code_10
value_10

Instead of having 3 tables

code, value and code_value

You never know when you may have need more than 10 couples code, value.

You don't waste disk space if you only need one couple.

Solution 36 - Sql

re: using @@IDENTITY instead of SCOPE_IDENTITY()

you should use neither; use output instead

cf. https://connect.microsoft.com/SQLServer/feedback/details/328811/scope-identity-sometimes-returns-incorrect-value

Solution 37 - Sql

My favourite SQL anti-patterns:

JOIN on non-unique columns and using SELECT DISTINCT to trim the result.

Creating a view joining many tables just to select few columns from one table.

 CREATE VIEW my_view AS 
     SELECT * FROM table1
     JOIN table2 ON (...)
     JOIN table3 ON (...);

 SELECT col1, col2 FROM my_view WHERE col3 = 123;

Solution 38 - Sql

Maybe not an anti pattern but it annoys me is when DBA's of certain DB's (ok I'm talking about Oracle here) write SQL Server code using Oracle style and code conventions and complain when it runs so bad. Enough with the cursors Oracle people! SQL is meant to be set based.

Solution 39 - Sql

Not using the With clause or a proper join and relying on sub-queries.

Anti-Pattern:

select 
 ...
from data
where RECORD.STATE IN (
          SELECT STATEID
            FROM STATE
           WHERE NAME IN
                    ('Published to test',
                     'Approved for public',
                     'Published to public',
                     'Archived'
                    ))

Better:
I like using the with clause to make my intent more readable.

with valid_states as (
          SELECT STATEID
            FROM STATE
           WHERE NAME IN
                    ('Published to test',
                     'Approved for public',
                     'Published to public',
                     'Archived'
                    )
select  ... from data, valid_states
where data.state = valid_states.state

Best:

select 
  ... 
from data join states using (state)
where 
states.state in  ('Published to test',
                     'Approved for public',
                     'Published to public',
                     'Archived'
                    )

Content Type	Original Author	Original Content on Stackoverflow
Question	dkretz	View Question on Stackoverflow
Solution 1 - Sql	Juliet	View Answer on Stackoverflow
Solution 2 - Sql	Amy B	View Answer on Stackoverflow
Solution 3 - Sql	annakata	View Answer on Stackoverflow
Solution 4 - Sql	stesch	View Answer on Stackoverflow
Solution 5 - Sql	Tony Andrews	View Answer on Stackoverflow
Solution 6 - Sql	user1228	View Answer on Stackoverflow
Solution 7 - Sql	Pete OHanlon	View Answer on Stackoverflow
Solution 8 - Sql	Jamal Hansen	View Answer on Stackoverflow
Solution 9 - Sql	Oscar Cabrero	View Answer on Stackoverflow
Solution 10 - Sql	Rockcoder	View Answer on Stackoverflow
Solution 11 - Sql	Frank Schwieterman	View Answer on Stackoverflow
Solution 12 - Sql	Brann	View Answer on Stackoverflow
Solution 13 - Sql	FruitBreak	View Answer on Stackoverflow
Solution 14 - Sql	Adrian Pronk	View Answer on Stackoverflow
Solution 15 - Sql	Jasper Bekkers	View Answer on Stackoverflow
Solution 16 - Sql	Joel Coehoorn	View Answer on Stackoverflow
Solution 17 - Sql	Brian	View Answer on Stackoverflow
Solution 18 - Sql	dkretz	View Answer on Stackoverflow
Solution 19 - Sql	geofftnz	View Answer on Stackoverflow
Solution 20 - Sql	Gregg Lind	View Answer on Stackoverflow
Solution 21 - Sql	dkretz	View Answer on Stackoverflow
Solution 22 - Sql	Bliek	View Answer on Stackoverflow
Solution 23 - Sql	devio	View Answer on Stackoverflow
Solution 24 - Sql	EvilTeach	View Answer on Stackoverflow
Solution 25 - Sql	jason saldo	View Answer on Stackoverflow
Solution 26 - Sql	Mitch Wheat	View Answer on Stackoverflow
Solution 27 - Sql	tuinstoel	View Answer on Stackoverflow
Solution 28 - Sql	Justin Grant	View Answer on Stackoverflow
Solution 29 - Sql	Walter Mitty	View Answer on Stackoverflow
Solution 30 - Sql	Jonathan Leffler	View Answer on Stackoverflow
Solution 31 - Sql	Tegiri Nenashi	View Answer on Stackoverflow
Solution 32 - Sql	sayap	View Answer on Stackoverflow
Solution 33 - Sql	Chris	View Answer on Stackoverflow
Solution 34 - Sql	Tony Andrews	View Answer on Stackoverflow
Solution 35 - Sql	Luc M	View Answer on Stackoverflow
Solution 36 - Sql	Denis Valeev	View Answer on Stackoverflow
Solution 37 - Sql	Iwo Banas	View Answer on Stackoverflow
Solution 38 - Sql	Craig	View Answer on Stackoverflow
Solution 39 - Sql	Brian	View Answer on Stackoverflow