sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings

PythonUnicodeSqliteZlib

Python Problem Overview


Using SQLite3 in Python, I am trying to store a compressed version of a snippet of UTF-8 HTML code.

Code looks like this:

...
c = connection.cursor()
c.execute('create table blah (cid integer primary key,html blob)')
...
c.execute('insert or ignore into blah values (?, ?)',(cid, zlib.compress(html)))

At which point at get the error:

sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

If I use 'text' rather than 'blob' and don't compress the HTML snippet, it works all fine (db is to large though). When I use 'blob' and compress via Python zlib library, I get the above error message. I looked around but couldn't find a simple answer for this one.

Python Solutions


Solution 1 - Python

If you want to use 8-bit strings instead of unicode string in sqlite3, set approptiate text_factory for sqlite connection:

connection = sqlite3.connect(...)
connection.text_factory = str

Solution 2 - Python

Found the solution, I should have spent just a little more time searching.

Solution is to 'cast' the value as a Python 'buffer', like so:

c.execute('insert or ignore into blah values (?, ?)',(cid, buffer(zlib.compress(html))))

Hopefully this will help somebody else.

Solution 3 - Python

In order to work with the BLOB type, you must first convert your zlib compressed string into binary data - otherwise sqlite will try to process it as a text string. This is done with sqlite3.Binary(). For example:

c.execute('insert or ignore into blah values (?, ?)',(cid, 
sqlite3.Binary(zlib.compress(html))))

Solution 4 - Python

Syntax:

5 types of possible storage: NULL, INTEGER, TEXT, REAL and BLOB

BLOB is generally used to store pickled models or dill pickled models

> cur.execute('''INSERT INTO Tablename(Col1, Col2, Col3, Col4) VALUES(?,?,?,?)''', 
                                      [TextValue, Real_Value, Buffer(model), sqlite3.Binary(model2)])
> conn.commit()

> # Read Data:
> df = pd.read_sql('SELECT * FROM Model, con=conn) 
> model1 = str(df['Col3'].values[0]))
> model2 = str(df['Col'].values[0]))

Solution 5 - Python

You could store the value using repr(html) instead of the raw output and then use eval(html) when retrieving the value for use.

c.execute('insert or ignore into blah values (?, ?)',(1, repr(zlib.compress(html))))

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionR. HillView Question on Stackoverflow
Solution 1 - PythonzagView Answer on Stackoverflow
Solution 2 - PythonR. HillView Answer on Stackoverflow
Solution 3 - PythonMarioVilasView Answer on Stackoverflow
Solution 4 - PythonPranzellView Answer on Stackoverflow
Solution 5 - PythonzwalkerView Answer on Stackoverflow