What is the reason not to use select *?

Sql

Sql Problem Overview


I've seen a number of people claim that you should specifically name each column you want in your select query.

Assuming I'm going to use all of the columns anyway, why would I not use SELECT *?

Even considering the question SQL query - Select * from view or Select col1, col2, … colN from view, I don't think this is an exact duplicate as I'm approaching the issue from a slightly different perspective.

One of our principles is to not optimize before it's time. With that in mind, it seems like using SELECT * should be the preferred method until it is proven to be a resource issue or the schema is pretty much set in stone. Which, as we know, won't occur until development is completely done.

That said, is there an overriding issue to not use SELECT *?

Sql Solutions


Solution 1 - Sql

The essence of the quote of not prematurely optimizing is to go for simple and straightforward code and then use a profiler to point out the hot spots, which you can then optimize to be efficient.

When you use select * you're make it impossible to profile, therefore you're not writing clear & straightforward code and you are going against the spirit of the quote. select * is an anti-pattern.


So selecting columns is not a premature optimization. A few things off the top of my head ....

  1. If you specify columns in a SQL statement, the SQL execution engine will error if that column is removed from the table and the query is executed.
  2. You can more easily scan code where that column is being used.
  3. You should always write queries to bring back the least amount of information.
  4. As others mention if you use ordinal column access you should never use select *
  5. If your SQL statement joins tables, select * gives you all columns from all tables in the join

The corollary is that using select * ...

  1. The columns used by the application is opaque
  2. DBA's and their query profilers are unable to help your application's poor performance
  3. The code is more brittle when changes occur
  4. Your database and network are suffering because they are bringing back too much data (I/O)
  5. Database engine optimizations are minimal as you're bringing back all data regardless (logical).

Writing correct SQL is just as easy as writing Select *. So the real lazy person writes proper SQL because they don't want to revisit the code and try to remember what they were doing when they did it. They don't want to explain to the DBA's about every bit of code. They don't want to explain to their clients why the application runs like a dog.

Solution 2 - Sql

If your code depends on the columns being in a specific order, your code will break when there are changes to the table. Also, you may be fetching too much from the table when you select *, especially if there is a binary field in the table.

Just because you are using all the columns now, it doesn't mean someone else isn't going to add an extra column to the table.

It also adds overhead to the plan execution caching since it has to fetch the meta data about the table to know what columns are in *.

Solution 3 - Sql

One major reason is that if you ever add/remove columns from your table, any query/procedure that is making a SELECT * call will now be getting more or less columns of data than expected.

Solution 4 - Sql

  1. In a roundabout way you are breaking the modularity rule about using strict typing wherever possible. Explicit is almost universally better.

  2. Even if you now need every column in the table, more could be added later which will be pulled down every time you run the query and could hurt performance. It hurts performance because

    • You are pulling more data over the wire; and
    • Because you might defeat the optimizer's ability to pull the data right out of the index (for queries on columns that are all part of an index.) rather than doing a lookup in the table itself

When TO use select *

When you explicitly NEED every column in the table, as opposed to needing every column in the table THAT EXISTED AT THE TIME YOU WROTE THE QUERY. For example, if were writing an DB management app that needed to display the entire contents of the table (whatever they happened to be) you might use that approach.

Solution 5 - Sql

There are a few reasons:

  1. If the number of columns in a database changes and your application expects there to be a certain number...
  2. If the order of columns in a database changes and your application expects them to be in a certain order...
  3. Memory overhead. 8 unnecessary INTEGER columns would add 32 bytes of wasted memory. That doesn't sound like a lot, but this is for each query and INTEGER is one of the small column types... the extra columns are more likely to be VARCHAR or TEXT columns, which adds up quicker.
  4. Network overhead. Related to memory overhead: if I issue 30,000 queries and have 8 unnecessary INTEGER columns, I've wasted 960kB of bandwidth. VARCHAR and TEXT columns are likely to be considerably larger.

Note: I chose INTEGER in the above example because they have a fixed size of 4 bytes.

Solution 6 - Sql

If your application gets data with SELECT * and the table structure in the database is changed (say a column is removed), your application will fail in every place that you reference the missing field. If you instead include all the columns in your query, you application will break in the (hopefully) one place where you initially get the data, making the fix easier.

That being said, there are a number of situations in which SELECT * is desirable. One is a situation that I encounter all the time, where I need to replicate an entire table into another database (like SQL Server to DB2, for example). Another is an application written to display tables generically (i.e. without any knowledge of any particular table).

Solution 7 - Sql

I actually noticed a strange behaviour when I used select * in views in SQL Server 2005.

Run the following query and you will see what I mean.

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[starTest]') AND type in (N'U'))
DROP TABLE [dbo].[starTest]
CREATE TABLE [dbo].[starTest](
	[id] [int] IDENTITY(1,1) NOT NULL,
	[A] [varchar](50) NULL,
	[B] [varchar](50) NULL,
	[C] [varchar](50) NULL
) ON [PRIMARY]

GO

insert into dbo.starTest
select 'a1','b1','c1'
union all select 'a2','b2','c2'
union all select 'a3','b3','c3'

go
IF  EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[vStartest]'))
DROP VIEW [dbo].[vStartest]
go
create view dbo.vStartest as
select * from dbo.starTest
go

go
IF  EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[vExplicittest]'))
DROP VIEW [dbo].[vExplicittest]
go
create view dbo.[vExplicittest] as
select a,b,c from dbo.starTest
go


select a,b,c from dbo.vStartest
select a,b,c from dbo.vExplicitTest

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[starTest]') AND type in (N'U'))
DROP TABLE [dbo].[starTest]
CREATE TABLE [dbo].[starTest](
	[id] [int] IDENTITY(1,1) NOT NULL,
	[A] [varchar](50) NULL,
	[B] [varchar](50) NULL,
	[D] [varchar](50) NULL,
	[C] [varchar](50) NULL
) ON [PRIMARY]

GO

insert into dbo.starTest
select 'a1','b1','d1','c1'
union all select 'a2','b2','d2','c2'
union all select 'a3','b3','d3','c3'

select a,b,c from dbo.vStartest
select a,b,c from dbo.vExplicittest

Compare the results of last 2 select statements. I believe what you will see is a result of **Select *** referencing columns by index instead of name.

If you rebuild the view it will work fine again.

EDIT

I have added a separate question, “select * from table” vs “select colA, colB, etc. from table” interesting behaviour in SQL Server 2005 to look into that behaviour in more details.

Solution 8 - Sql

You might join two tables and use column A from the second table. If you later add column A to the first table (with same name but possibly different meaning) you'll most likely get the values from the first table and not the second one as earlier. That won't happen if you explicitly specify the columns you want to select.

Of course specifying the columns also sometimes causes bugs if you forget to add the new columns to every select clause. If the new column is not needed every time the query is executed, it may take some time before the bug gets noticed.

Solution 9 - Sql

I understand where you're going regarding premature optimization, but that really only goes to a point. The intent is to avoid unnecessary optimization in the beginning. Are your tables unindexed? Would you use nvarchar(4000) to store a zip code?

As others have pointed out, there are other positives to specifying each column you intend to use in the query (such as maintainability).

Solution 10 - Sql

When you're specifying columns, you're also tying yourself into a specific set of columns and making yourself less flexible, making Feuerstein roll over in, well, whereever he is. Just a thought.

Solution 11 - Sql

**SELECT *** is not always evil. In my opinion, at least. I use it quite often for dynamic queries returning a whole table, plus some computed fields.

For instance, I want to compute geographical geometries from a "normal" table, that is a table without any geometry field, but with fields containing coordinates. I use postgresql, and its spatial extension postgis. But the principle applies for many other cases.

An example:

  • a table of places, with coordinates stored in fields labeled x, y, z:

    CREATE TABLE places (place_id integer, x numeric(10, 3), y numeric(10, 3), z numeric(10, 3), description varchar);

  • let's feed it with a few example values:

    INSERT INTO places (place_id, x, y, z, description) VALUES
    (1, 2.295, 48.863, 64, 'Paris, Place de l'Étoile'),
    (2, 2.945, 48.858, 40, 'Paris, Tour Eiffel'),
    (3, 0.373, 43.958, 90, 'Condom, Cathédrale St-Pierre');

  • I want to be able to map the contents of this table, using some GIS client. The normal way is to add a geometry field to the table, and build the geometry, based on the coordinates. But I would prefer to get a dynamic query: this way, when I change coordinates (corrections, more accuracy, etc.), the objects mapped actually move, dynamically. So here is the query with the **SELECT ***:

    CREATE OR REPLACE VIEW places_points AS
    SELECT *,
    GeomFromewkt('SRID=4326; POINT ('|| x || ' ' || y || ' ' || z || ')')
    FROM places;

Refer to postgis, for GeomFromewkt() function use.

  • Here is the result:

    SELECT * FROM places_points;

place_id |   x   |   y    |   z    |         description          |                            geomfromewkt
----------+-------+--------+--------+------------------------------+--------------------------------------------------------------------
1 | 2.295 | 48.863 | 64.000 | Paris, Place de l'Étoile | 01010000A0E61000005C8FC2F5285C02405839B4C8766E48400000000000005040
2 | 2.945 | 48.858 | 40.000 | Paris, Tour Eiffel | 01010000A0E61000008FC2F5285C8F0740E7FBA9F1D26D48400000000000004440 3 | 0.373 | 43.958 | 90.000 | Condom, Cathédrale St-Pierre | 01010000A0E6100000AC1C5A643BDFD73FB4C876BE9FFA45400000000000805640 (3 lignes)

The rightmost column can now be used by any GIS program to properly map the points.

  • If, in the future, some fields get added to the table: no worries, I just have to run again the same VIEW definition.

I wish the definition of the VIEW could be kept "as is", with the *, but hélas it is not the case: this is how it is internally stored by postgresql:

SELECT places.place_id, places.x, places.y, places.z, places.description, geomfromewkt(((((('SRID=4326; POINT ('::text || places.x) || ' '::text) || places.y) || ' '::text) || places.z) || ')'::text) AS geomfromewkt FROM places;

Solution 12 - Sql

Even if you use every column but address the row array by numeric index you will have problems if you add another row later on.

So basically it is a question of maintainability! If you don't use the * selector you will not have to worry about your queries.

Solution 13 - Sql

Selecting only the columns you need keeps the dataset in memory smaller and therefor keeps your application faster.

Also, a lot of tools (e.g. stored procedures) cache query execution plans too. If you later add or remove a column (particularly easy if you're selecting off a view), the tool will often error when it doesn't get back results that it expects.

Solution 14 - Sql

It's ok when you're doing exists(select * ...) since it never gets expanded. Otherwise it's really only useful when exploring tables with temporary select statments or if you had a CTE defined above and you want every column without typing them all out again.

Solution 15 - Sql

It makes your code more ambiguous and more difficult to maintain; because you're adding extra unused data to the domain, and it's not clear which you've intended and which not. (It also suggests that you might not know, or care.)

Solution 16 - Sql

To answer you question directly: Do not use "SELECT *" when it makes your code more fragle to changes to the underlying tables. Your code should break only when a change is made to the table that directly affects requirments of your program.

Your application should take advantage of the abstraction layer that Relational access provides.

Solution 17 - Sql

I don't use SELECT * simply because it is nice to see and know what fields I am retrieving.

Solution 18 - Sql

Generally bad to use 'select *' inside of views because you will be forced to recompile the view in the event of a table column change. Changing the underlying table columns of a view you will get an error for non-existant columns until you go back and recompile.

Solution 19 - Sql

Just to add one thing that no one else has mentioned. Select * returns all the columns, someone may add a column later that you don't necessarily want the users to be able to see such as who last updated the data or a timestamp or notes that only managers should see not all users, etc.

Further, when adding a column, the impact on existing code should be reviewed and considered to see if changes are needed based on what information is stored in the column. By using select *, that review will often be skipped because the developer will assume that nothing will break. And in fact nothing may explicitly appear to break but queries may now start returning the wrong thing. Just because nothing explicitly breaks, doesn't mean that there should not have been changes to the queries.

Solution 20 - Sql

because "select * " will waste memory when you don't need all the fields.But for sql server, their performence are the same.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionNotMeView Question on Stackoverflow
Solution 1 - SqlRobert PaulsonView Answer on Stackoverflow
Solution 2 - SqlBobView Answer on Stackoverflow
Solution 3 - SqlahockleyView Answer on Stackoverflow
Solution 4 - SqlJohnFxView Answer on Stackoverflow
Solution 5 - SqlPowerlordView Answer on Stackoverflow
Solution 6 - SqlMusiGenesisView Answer on Stackoverflow
Solution 7 - SqlkristofView Answer on Stackoverflow
Solution 8 - SqlKaniuView Answer on Stackoverflow
Solution 9 - SqlJim B-GView Answer on Stackoverflow
Solution 10 - SqlorbfishView Answer on Stackoverflow
Solution 11 - SqlPierreView Answer on Stackoverflow
Solution 12 - SqlmarkusView Answer on Stackoverflow
Solution 13 - SqlSoldarnalView Answer on Stackoverflow
Solution 14 - SqldotjoeView Answer on Stackoverflow
Solution 15 - SqldkretzView Answer on Stackoverflow
Solution 16 - SqlMetroView Answer on Stackoverflow
Solution 17 - SqllkesslerView Answer on Stackoverflow
Solution 18 - SqlChristopher KleinView Answer on Stackoverflow
Solution 19 - SqlHLGEMView Answer on Stackoverflow
Solution 20 - SqlFloatFishView Answer on Stackoverflow