Doesn't Linq to SQL miss the point? Aren't ORM-mappers (SubSonic, etc.) sub-optimal solutions?

SqlLinq to-SqlLinq to-Entities

Sql Problem Overview


I'd like the community's take on some thoughts I've had about Linq to Sql and other ORM mappers.

I like Linq to Sql and the idea of expressing data access logic (or CRUD operations in general) in your native development tongue rather than having to deal with the "impedance mismatch" between C# and SQL. For example, to return an ObjectDataSource-compatible list of Event instances for a business layer, we use:

return db.Events.Select(c => new EventData() { EventID = c.EventID, Title = c.Title })

If I were to implement this using old SQL-to-C# constructs, I'd have to create a Command class, add the EventID parameter (using a string to describe the "@EventID" argument), add the SQL query string to the Command class, execute the command, and then use (cast-type)nwReader["FieldName"] to pull each returned field value and assign it to a member of a newly created instance of my EventData class (yuck).

So, that is why people like Linq/SubSonic/etc. and I agree.

However, in the bigger picture I see a number of things that are wrong. My sense is that Microsoft also sees something wrong and that is why they are killing Linq to SQL and trying to move people to Linq to Entities. Only, I think that Microsoft is doubling-down on a bad bet.

So, what is wrong?

The problem is that there are architecture astronauts, especially at Microsoft, who look at Linq to Sql and realize that it is not a true data management tool: there are still many things you cannot do easily of comfortably in C# and they aim to fix it. You see this manifested in the ambitions behind Linq to Entities, blog posts about the revolutionary nature of Linq and even the LinqPad challenge.

And the problem with that is that it assumes that SQL is the problem. That is, in order to reduce a mild discomfort (impedance mismatch between SQL and C#), Microsoft has proposed the equivalent of a space suit (full isolation) when a band-aid (Linq to SQL or something similar) would do just fine.

As far as I can see, developers are quite smart enough to master the relational model and then apply it intelligently in their development efforts. In fact, I would go one further and say that Linq to SQL, SubSonic, etc. are already too complex: the learning curve isn't that much different from mastering SQL itself. Since, for the foreseeable future, developers must master SQL and the relational model, we're now faced with learning two query / CRUD languages. Worse yet, Linq is often difficult to test (you don't have a query window), removes us one layer from the real work we are doing (it generates SQL), and has very clumsy support (at best) for SQL constructs like Date handling (e.g. DateDiff), "Having" and even "Group By".

What is the alternative? Personally, I don't need a different model for data access like Linq to Entities. I'd prefer to simply pop up a window in Visual Studio, enter and validate my SQL, and then press a button to generate or supplement a C# class to encapsulate the call. Since you already know SQL, wouldn't you prefer to just enter something like this:

Select EventID, Title From Events Where Location=@Location

and end up with an EventData class that A) contains the EventID and Title fields as properties and B) has a factory method that takes a 'Location' string as an argument and that generates a List<EventData>? You'd have to think carefully about the object model (the above example obviously doesn't deal with that) but the fundamental approach of still using SQL while eliminating the impedance mismatch appeals to me a great deal.

The question is: am I wrong? Should Microsoft rewrite the SQL infrastructure so that you don't have to learn SQL / relational data management any more? Can they rewrite the SQL infrastructure in this way? Or do you think that a very thin layer on top of SQL to eliminate the pain of setting up parameters and accessing data fields is quite sufficient?

Update I wanted to promote two links to the top because I think that they capture important aspects of what I am after. First, CodeMonkey points out an article entitled "The Vietnam of Computer Science." It takes a while to get started but is a very interesting read. Second, AnSGri points to one of Joel Spolsky's more prominent pieces: The Law of Leaky Abstractions. It isn't exactly on topic but it is close and is a great read.

Update 2: I've given the "answer" to ocdecio although there are many great answers here and the choice of the "right" answer is purely subjective. In this case, his answer squared with what I think is truly the best practice given the current state of technology. This is an area that I fully expect to evolve, however, so things may well change. I'd like to thank everyone who contributed, I've upvoted everyone who I think gave a thoughtful answer.

Sql Solutions


Solution 1 - Sql

Let me preface this by saying that I am a dyed-in-the-wool database guy.

As a gross over-generalization: Developers don't know SQL. Developers don't really want to know SQL. They can write it, they can design tables, but it makes them feel icky. They tend to do stupid things when the necessary query is more than a simple join. Not because the developers are stupid -- because they can't be bothered. They like living in a world where they only have to deal with one concept space; moving from objects to tables and back is a context switch the price for which they don't like paying.

This doesn't mean they are bad, or wrong; it means there is an opportunity for improvement. If your customers (in this case, developers using your framework) don't like SQL and tables -- give them a layer of abstraction that lets them get away without dealing with the underlying mess.

It's the same logic that makes garbage collection / automated memory management a big hit. Yes, developers can deal with it; yes, they can write code that is better optimized without it; but not having to deal with it makes them happier and more productive.

Solution 2 - Sql

I think the popularity of ORMs has been spawned by developers developing data layers and writing the same CRUD code over and over again application after application. ORMs are just another tool/technology that lets developers spend less time writing the same SQL statements over and over and concentrate on the logic of the application instead (hopefully).

Solution 3 - Sql

For at least 6 years I have been using my own ORM that is based on a very simple concept: projection. Each table is projected into a class, and SQL is generated on the fly based on the class definition. It still requires me to know SQL but it takes care of the 90% simple CRUD, and I never had to manage connections, etc - and it works for the major DB vendors.

I'm happy with what I have and didn't find anything worth dropping it for.

Solution 4 - Sql

IMHO, OR/M is not only about 'abstracting the SQL away' or hiding the SQL, or enabling multi-DBMS support.

It enables you to put more focus on your problem domain, since you have to spent less time writing the boring CRUD SQL queries. On the other hand, if you are using a good OR/M, this OR/M should enable you to write SQL queries if this seems to be necessary.

An OR/M can be a powerful tool if you use it properly; it can take care of lazy loading, polymorphic queries / associatons ...
Don't get me wrong; there's nothing wrong with plain SQL, but, if you have to take care yourself of translating your (well thought and normalized) relational model to an expressive OO/domain model, then I think you're spending way to much time doing plumbing.

Using an OR/M also does not mean that you -as a developer- should have no knowledge of SQL. The contrary is true imho.
Knowing SQL and knowing how to write an efficient SQL query, will -imho- enable you to use an OR/M properly.

I must also admit that I'm writing this with NHibernate in mind. This is the OR/M that I'm using atm, and I haven't used Linq to SQL or Linq to entities (yet).

Solution 5 - Sql

Linq's design and the linq to entities framework certainly has uses as an orm tool, but the big idea is that it will be used as a common api to query ANY data source, not just RDBMS's.

I remember reading that linq, while obviously designed with SQL in mind, is meant to be a query language for any data store. You can write linq queries for SQL, but you can also theoretically write linq queries that target ldap, filesystem's, exchange, web services, ad infinitum. You can't use SQL for those programs.

You also need to learn a different API for almost every data store. Linq gives everyone a common target to create a data access API.

Whether this vision works or not remains to be seen, but that is the idea. I think as we want systems to inter-operate more and more we may find some very nice uses for l2e.

I'll add some references if I can find them again.

http://laribee.com/blog/2007/03/17/linq-to-entities-vs-nhibernate/

Solution 6 - Sql

I agree 100%. A lot of this is a result of the fact that procedural coding skills and SQL coding skills are very different; and this fact is not widely acknowledged. So, until that realization hits, programmers search for ways to make their skillset transferable from one domain to the other.

It doesn't work.

You simply must learn how to phase-shift: learn how to think about your code differently depending on which domain you are addressing.

Has anyone else noticed how much more complex and verbose a query becomes when it's mapped from SQL to LINQ? Like, in all the online examples?

Solution 7 - Sql

You should stop worrying and learn to love the ORM. Abstractions such as these will help us focus our skills and make advances in the field.

There is still plenty of room to take advantage of the functional skills you have acquired and apply them in the application layer. This is in fact one of the strengths of LINQ to SQL over other ORM's.

I can only agree with many of the other comments. The time you save, you can focus on refining your domain model and make a better application. And, once you've pinpointed the bottleneck, use to create optimized SQL.

What might not be immediately obvious is that the ORM comes with a number of features that are really nice. The identity map that helps avoid loading items over and over, lazy loading helps you express the domain with less plumbing and the unit of work helps you track changes and optimize database writes.

Solution 8 - Sql

As Dmitriy pointed out, developers don't know SQL. More precisely, the majority know SQL, but don't understand it and definitely don't like, so they tend to search for the magic bullet, creating the demand for things like Linq to make the illusion (hm, abstraction) that they don't use anything different than their beloved classes.

That's very bad, as the law of leaky abstractions always holds true.

Some ORM solutions are definite good (e.g. JPA/Hibernate), not because using them you don't have to worry about SQL. In fact, to use JPA effectively you need very deep understanding of the DB in general, querying abilities in particular. The good point is that they make the machine do the boring work, to the point where it autogenerates entire database from scratch.

Linq to SQL, as I think, doesn't solve real problem. It's kind of other presentation, nothing more. It might be good, though it overcomplicates the already complex language. On the other hand, Linq to Objects is very interesting concept, because it's kind of sql-querying the collections.

Solution 9 - Sql

Historical perspective.

When Codd et. al. originally were working out the theory and implementation of relational databases, one entirely separate issue was "How do we query these things"? A number of strategies and syntaxes were proposed, with various strengths and weaknesses. One of those candidates was SQL (in its earliest form, obviously.) Another was QBE (Query By Example). Another was called "quel", I believe; and there were several others. SQL certainly didn't become dominant because it was acclaimed as superior to all others. Unfornately, though, the others have pretty much disappeared, to the poverty of us all (because they could be used simultaneously on the same data.)

If Microsoft has a good story that they are reviving one of these other languages, or have invented a worthy addition, then I think we would be well-advised to listen up. But so far all I've seen is yet another "One Ring To Rule Them All".

There's a hell of a lot of thought and rigor behind SQL, and a lot of time-proven durability. Microsoft has a certain history of believing that their admittedly top-grade development organization can out-think the rest of us, including our collective institutional memories. It doesn't seem often to work that way. As long as we're bonded to relational data stores, we should think twice about superset abstraction paradigms that move us away from the bare metal with promises of equal or better performance.

Solution 10 - Sql

Being the author of an ORM project myself, I have to admit that my response to a question like this is apt to be a bit biased. I've already developed some of my own thoughts about the answer prior to reading the question, so I'm already somewhat anchored.

I will say that my reason for developing an ORM wasn't because of the "impedance mismatch" between SQL and imperative programming, but rather solely for the purpose of becoming database-platform agnostic. The former issue of having to write more code to manage persistence is a small hurdle that's easily resolved if you only work with one database vendor. Becoming database platform agnostic is a much more challenging problem and imo has a much larger impact on the profitability of your business assuming that like me you plan to sell software to other people (and are not just using it in house).

When I started working on my ORM tools several years ago, the concept was impractical in my preferred language, most people I spoke to didn't understand why I was working on it and some well respected voices in the community had as much as written articles in trade magazines stating that what I had already done was not only impossible but also undesirable. Some of the same reasons were given - it's too complex, it's limiting and it adds overhead. Today the same community has at least three popular database abstraction tools (although there is some debate about the definition of the term ORM). The reason why I mention this is because when I started working on these tools, the original objections carried a lot more weight than they do now. The underlying technology both hardware and software has changed in the intervening years to make these tools much more practical in the long run. My tendency is to try and take a long-view of software and work on solutions that are maybe not quite practical yet but that will become practical soon. So given that I wouldn't count out LINQ to Entities as a good solution for certain problems.

I also tend to prefer more options rather than less. So while I may support the idea behind developing LINQ to Entities, I'm less apt to support killing off LINQ to SQL merelyu because LINQ to Entities has become available. Objects are great for doing what objects do, there's no question about that... In my (again biased) opinion, a problem occurs in that people see any given tool or software paradigm as a "magic bullet" and want to insist that everything must be that way. It's well known that a relational database is very good at handling certain other tasks, reporting being a good example. So in my opinion, it's kind of shooting yourself in the foot to insist that everything must be entities, because then you're forcing yourself to use an inefficient tool for the job. So with regard to reporting in particular, getting rid of LINQ to SQL and using only LINQ to Entities at least on the surface sounds to me like the http://en.wikipedia.org/wiki/Abstraction_inversion">abstraction inversion anti pattern.

So I guess the synopsis for my answer is this: use a hammer if your problem is a nail - use a screwdriver if your problem is a screw.

Solution 11 - Sql

Dmitry's statement that Developer's don't like SQL may have lot of truth but the solution isn't only ORM. The solution is to hire as part of the development team a Development DBA. In my company my .net development team has an excellent Oracle DBA who does absolutely no production dba work. His role in our team is data modelling, physical db design, creating stored procs, data analysis etc. He is the reason our db side is so clean and performing. All our DB access is via stored procs.

What is a development DBA ? http://www.simple-talk.com/sql/database-administration/what-use-is-a-development-dba/

Solution 12 - Sql

I do both database and distributed application programming (web and compiled) and feel like taking the time to develop stored-procedure based data access layers is time well spent. Personally, I prefer to do data modeling and identify the needed procs early in the development process... seems to help uncover design/interface logic/structure issues.

I'm not a big fan of inline database programming (whether the sql code is hand or machine generated). I believe that the database is the foundation of one's application and that taking the time it to hand-code stored procs is worthwhile.

That said, I am intrigued by the OODBMS concept and hope that someday I'll get to work on some in a production environment.

Solution 13 - Sql

If you want a database to perform as it scales, it has to be designed and normalized according to database relational model best practices.

If you let the Object Model and the ORM dictate your data model, you will just end up with a poor data model which is not normalized and/or contains artifacts from the object model.

1 table = 1 class is a bad idea.

To start with you never, ever, have classes which represent many-to-many or many-to-one link tables. These correspond to child collections in the object model - the links themselves are not objects.

If you treat your database as tables to simply hold your objects in rows (one common approach) and give the application direct access to tables, you are giving up all the benefits of defining an interface layer of database services that the database can use to protect its perimeter.

ORMs have their place, but if you use them to simply persist your objects as designed in your object model, your database will not be a relational database, and it will not be able to be used as one for the purposes of ETL, reporting, refactoring, etc.

Solution 14 - Sql

I don't like any of the current solutions - but i prefer more choices over less choices ;-)

i used an OODBMS in a project long ago that was the closest to getting it right (unfortunatley the product had some heinous bugs and terrible tech support) - object persistence was largely invisible, handled behind the scenes (via preprocessor code-generator) and controlled by very simple Update, Delete, and Find methods and simple containers.

I'd love to see this (and perhaps some object-entity model does this well already) again, with OCL (Object Constraint Language) as the native multi-object query language

Solution 15 - Sql

I think the logic behind these things is that the overhead of building and running an SQL statement in the framework layer is insignificant compared to overhead in other parts of the system (e.g., a HTTP request round trip is orders of magnitude longer).

The advantages - fast development, queries that fit in with the language rather than being escaped strings, etc, often outweigh the disadvantages. If performance is an issue, then you can optimise later.

I don't think that "not needing to know SQL" is a factor. Any decent developer will need to know SQL at some point in their development. The idea of a database abstraction layer is to remove the effort of generating boilerplate code for database queries.

Indeed in our system in Java, I created a code generator utility that took annotated classes and generated a full CRUD API, and handling all those quirky things you pick up over time. Far easier to create your model and annotate it than to slog through writing a DAO. Scale such a tool up and you end up with LINQ or Hibernate or myriad other ORM DB solutions.

Solution 16 - Sql

There was another question here that asked about ORMs in general. Check that out for some more discussion of whether or not the impedance mismatch is as big a deal as all that.

Solution 17 - Sql

I think the real solution that they needed was something more like SQL literal. VB.Net 9.0 supports XML Literals, which allow you to write XML right in your code, and ensure that it's validated and meets the DTD. A similarly nice feature would be SQL literals that allow you to write inline SQL code in your .Net code, and have it validated by the IDE. There would need to be some sort of plugin architecture to verifying the information against the database, but that could be easily written for the popular database engines. This would provide what I think to be the real solution, to the problem they were trying to solve, without resorting to sub-optimal solutions.

Solution 18 - Sql

Codemonkey makes a very good point: stored procedures offer a level of abstraction that fulfills some of the promise of the more complex ORM models. This isn't intuitive at first glance but I can think of an example right from my own code. I have a "check-in" sproc that touches nearly a dozen tables (who does this ID belong to? do they have a membership? Are there any messages? Do they get points for this check-in? etc.).

Modeling this in C# - no matter how sophisticated the data model - would never be a good idea as it would necessitate many trips "over the wire" to check on data and update various tables. While it is true that you can handle sprocs in Linq or other ORMS, all I need is a wrapper class that allows me to call the sproc with standard C# parameters. In fact, this was such a pressing need for me that I wrote a code generator in T-SQL to create such wrappers.

Solution 19 - Sql

I have to agree that the outburst of ORM tools largely stems from the annoyance of writing SQL, dealing with whatever DB driver you have to use, internally abstracting between DBs (Oracle vs SQL Server, vs whatever for code reusability), and transforming data types.

Ideal solution? definately not! However, I think this is an iteration in the process of better merging the application with the data store. After all, in most cases, having a DB without an accessing application written in whatever language is practically useless. (would you really ever just install oracle, and expect all employees to work in SQLPlus directly?)

As such, I think the data store and the application are just moving together in a way that the application developer can more easilly interact with the data store.

For .NET in particular, what I'd much rather see instead of Linq is a built in way to map a class definition to an underlying data source to an underlying data store, and have all the plumbing just work.

In other words I could just say "Employee.Name = 'rally25rs';" and the object's property would change, and underneath that it would persist to the data store itself. Just ditch SQL completely instead of going 1/2 way like Linq to SQL does now.

Of course, a solution like that brings up issues with performance, transaction handling, etc.

Maybe the entire concept of programming language and relational database need to be rethought and realigned? At their core, they are completely separate and disjointed entities. Maybe its time to just ditch the "relational sql aware database" and build the next thing, whee executing a line of code would directly operate on the database.

Solution 20 - Sql

there´s no problem with linq, but with those who use it and ignore what happens "behind the scenes"

I still prefer to get my hands dirty with SQL. At least i´ll know exatcly whats happening.

Solution 21 - Sql

Ted Neward wrote a great essay on his take on this subject - that ORM's are the "Vietnam" of computer science...

http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx

Solution 22 - Sql

Most people have missed an essential point: in most cases, you are significantly more productive when querying in LINQ than in SQL. I've written an article on why this is so.

When I set the LINQPad Challenge, I wasn't joking: I do nearly all of my ad-hoc querying in LINQ because most queries can be written more quickly and reliably in LINQ than in SQL. I've also designed and worked on large business applications using LINQ to SQL and seen a major gains in productivity. This is not "architecture astronaut" stuff - LINQ to SQL is a productive and practical technology that drives this very site.

The biggest hindrance with LINQ is failing to properly learn it. I've seen so many LINQ queries that are horrible transliterations of SQL queries to back this up. If you write LINQ queries using only your knowledge of SQL, the end result can only be the same - or worse - than SQL.

Solution 23 - Sql

I just discovered this question. I guess it's pretty much played out by now, but I'm going to throw in my two cents anyway...

I only want to write code for the things that aren't obvious.

CRUD code is obvious. I don't want to write it. Therefore, ORMs are a good idea.

This doesn't mean that ORMs don't have problems, but the problems are with execution, not intent. As ORMs mature, the problems will diminish, and the productivity gains already available for simple scenarios will eventually extend to complex scenarios as well.

LINQ is also a good idea. Others have mentioned a number of the advantages, but all I have to do is think about the first time I tried to do a pivot in LINQ where I didn't know the number of columns in advance. Or the first time I realized I didn't have to create a new DataView every time I wanted to sort or filter something. LINQ empowers me to do everything I want to do with data in C#, rather than having to figure out how divide up the work between SQL and C#.

So, yes, ORMs, LINQ, and other emerging technologies are suboptimal solutions, but they don't miss the point, and they won't be suboptimal forever.

Solution 24 - Sql

I wanted to write this as a reply to @SquareCog reply here, but it told me I had -1836 characters left. S.O. noob here so apologies if I've done this wrong.


In the 18th century gentleman of leisure used to study science. At that time science in its entirety was not such a large subject that a reasonably intelligent person couldn't understand it all. By which I mean a single learned fellow could understand the entirety of scientific thinking of the time.

As time has gone by hundreds of new fields of science have been discovered and each one researched to the point where these days very few people can even understand the entirety of a single complete field of science.

So it is with programming.

These days the programming language field is large enough and growing fast enough that it is as much as can be reasonably be expected of a developer to know the entirety of his own specialised languages(s). For a skilled coder to also be expected to understand the entirety of the database field too, including database design, the nuances of native SQL and how it operates on different databases and the administration of those databases too, is possibly asking a bit much.

I think some of the responders here are glossing over some of the complexities of developing a large performant enterprise level database, knowing a 'handful of SQL statements' most certainly does not cut it. Which is why most large software houses have dedicated teams of database developers. As Stroustrup says, 'Divide and conquer' is the only way to effectively deal with the complexity inherent in developing and managing large software projects.

Developers don't dislike working with SQL because they are lazy or because it makes them feel 'icky'. They dislike working with SQL because they are smart. They know better than anyone that only someone who specialises in SQL will deliver the highest quality database functionality and that for them to be put into the position of being 'jack of all trades' developers is a suboptimal development strategy.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMark BrittinghamView Question on Stackoverflow
Solution 1 - SqlSquareCogView Answer on Stackoverflow
Solution 2 - SqlJim AndersonView Answer on Stackoverflow
Solution 3 - SqlOtávio DécioView Answer on Stackoverflow
Solution 4 - SqlFrederik GheyselsView Answer on Stackoverflow
Solution 5 - SqlTrevor AbellView Answer on Stackoverflow
Solution 6 - SqldkretzView Answer on Stackoverflow
Solution 7 - SqlCristian LibardoView Answer on Stackoverflow
Solution 8 - SqlansgriView Answer on Stackoverflow
Solution 9 - SqldkretzView Answer on Stackoverflow
Solution 10 - SqlIsaac DealeyView Answer on Stackoverflow
Solution 11 - SqlkanadView Answer on Stackoverflow
Solution 12 - SqlcodemonkeyView Answer on Stackoverflow
Solution 13 - SqlCade RouxView Answer on Stackoverflow
Solution 14 - SqlSteven A. LoweView Answer on Stackoverflow
Solution 15 - SqlJeeBeeView Answer on Stackoverflow
Solution 16 - SqlHamish SmithView Answer on Stackoverflow
Solution 17 - SqlKibbeeView Answer on Stackoverflow
Solution 18 - SqlMark BrittinghamView Answer on Stackoverflow
Solution 19 - SqlCodingWithSpikeView Answer on Stackoverflow
Solution 20 - SqlDonOctavioDelFloresView Answer on Stackoverflow
Solution 21 - SqlfuzzboneView Answer on Stackoverflow
Solution 22 - SqlJoe AlbahariView Answer on Stackoverflow
Solution 23 - SqldevuxerView Answer on Stackoverflow
Solution 24 - SqlNeutrinoView Answer on Stackoverflow