SQLAlchemy + SQL Injection

PythonSecuritySqlalchemy

Python Problem Overview


What are the best practices for mitigating SQL injection attacks when using SQLAlchemy?

Python Solutions


Solution 1 - Python

tldr: Avoid raw SQL as much as possible.

The accepted answer is lazy and incorrect. The filter method accepts raw SQL, and if used in that way, is fully susceptible to SQL injection attacks. For instance, if you were to accept a value from a url and combine it with raw sql in the filter, you are open to attack:

session.query(MyClass).filter("foo={}".format(getArgs['val']))

using the above code and the below url, you would be injecting SQL in to your filter statement. The code above would return all rows in your database.

URL encoded:

https://example.com/?val=2%20or%201%20=%201

Easier to understand (URL decoded):

https://example.com/?val=2 or 1 = 1

Solution 2 - Python

> If you have any "special" characters (such as semicolons or > apostrophes) in your data, they will > be automatically quoted for you by the > SQLEngine object, so you don't have to > worry about quoting. This also means > that unless you deliberately bypass > SQLAlchemy's quoting mechanisms, > SQL-injection attacks are basically > impossible.

[per http://www.rmunn.com/sqlalchemy-tutorial/tutorial.html]

Solution 3 - Python

To add to @Tendrid answer. I did a little investigation using quiet naive approach. filter method has *criterion as its argument, several other ORM Query methods have similar argument.

In case of filter method *criterion argument ends up passed into _literal_as_text, which in case of string - marks it as safe sql (please correct me if I'm wrong). Therefore it makes it unsafe.

Here is outcome of ORM Query class method investigation with *criterion argument:

filter   - uses _literal_as_text (NOT SAFE)
having   - uses _literal_as_text (NOT SAFE)

distinct - uses _literal_as_label_reference (NOT SAFE)
group_by - uses _literal_as_label_reference (NOT SAFE)
order_by - uses _literal_as_label_reference (NOT SAFE)

join     - uses model attributes to resolve relation (SAFE)

Examples of possible method missuses (to keep it simple, string formatting is skipped):

db.session.query(User.login).group_by('login').having('count(id) > 4; select name from roles').all()
db.session.query(User.login).distinct('name) name from roles /*').order_by('*/').all()
db.session.query(User.login).order_by('users_login; select name from roles').all()
db.session.query(User.login).group_by('login union select name from roles').all()

Note that these methods are only unsafe if string literal is passed.

Solution 4 - Python

I'm inclined to agree with @Tendrid's answer.

If you write this:

session.query(MyClass).filter("foo={}".format(getArgs['val']))

... you are creating an injection vulnerability.

SqlAlchemy's approach is to use bound parameters to avoid these injection attacks. The way you're meant to use filter() is to write:

session.query(MyClass).filter(MyClass.foo == getArgs['va'])

As SqlAlchemy has overloaded python's operators like == to escape the SQL correctly (and avoid injection). See here

There is a warning about this buried in the documentation of SqlAlchemy here that says:

> Always use bound parameters > > As mentioned at the beginning of this section, textual SQL is not the > usual way we work with SQLAlchemy. However, when using textual SQL, a > Python literal value, even non-strings like integers or dates, should > never be stringified into SQL string directly; a parameter should > always be used. This is most famously known as how to avoid SQL > injection attacks when the data is untrusted. However it also allows > the SQLAlchemy dialects and/or DBAPI to correctly handle the incoming > input for the backend. Outside of plain textual SQL use cases, > SQLAlchemy’s Core Expression API otherwise ensures that Python literal > values are passed as bound parameters where appropriate.

And there's a section in the glossary on bound parameters here

It says:

> Bound parameters are the primary means in which data is passed to the > DBAPI database driver. While the operation to be invoked is based on > the SQL statement string, the data values themselves are passed > separately, where the driver contains logic that will safely process > these strings and pass them to the backend database server, which may > either involve formatting the parameters into the SQL string itself, > or passing them to the database using separate protocols. > > The specific system by which the database driver does this should not > matter to the caller; the point is that on the outside, data should > always be passed separately and not as part of the SQL string itself. > This is integral both to having adequate security against SQL > injections as well as allowing the driver to have the best > performance.

Basically that means this:

session.query(MyClass).filter("foo={}".format(getArgs['val']))

... is broken because you're passing the data to the filter() together with the SQL statement foo=<data>.

You're meant to always keep the statement and data separate, i.e.:

session.query(MyClass).filter(MyClass.foo == getArgs['va'])

or

session.query(MyClass).filter_by(foo=getArgs['va'])

As then SqlAlchemy can work it's magic and do the escaping with bound parameters.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMikeView Question on Stackoverflow
Solution 1 - PythonTendridView Answer on Stackoverflow
Solution 2 - PythonAndreas JungView Answer on Stackoverflow
Solution 3 - PythonaisbaaView Answer on Stackoverflow
Solution 4 - PythonDonalView Answer on Stackoverflow