How to stack multiple lstm in keras?

TensorflowDeep LearningKerasLstmKeras Layer

Tensorflow Problem Overview


I am using deep learning library keras and trying to stack multiple LSTM with no luck. Below is my code

model = Sequential()
model.add(LSTM(100,input_shape =(time_steps,vector_size)))
model.add(LSTM(100))

The above code returns error in the third line Exception: Input 0 is incompatible with layer lstm_28: expected ndim=3, found ndim=2

The input X is a tensor of shape (100,250,50). I am running keras on tensorflow backend

Tensorflow Solutions


Solution 1 - Tensorflow

You need to add return_sequences=True to the first layer so that its output tensor has ndim=3 (i.e. batch size, timesteps, hidden state).

Please see the following example:

# expected input data shape: (batch_size, timesteps, data_dim)
model = Sequential()
model.add(LSTM(32, return_sequences=True,
               input_shape=(timesteps, data_dim)))  # returns a sequence of vectors of dimension 32
model.add(LSTM(32, return_sequences=True))  # returns a sequence of vectors of dimension 32
model.add(LSTM(32))  # return a single vector of dimension 32
model.add(Dense(10, activation='softmax'))

From: https://keras.io/getting-started/sequential-model-guide/ (search for "stacked lstm")

Solution 2 - Tensorflow

Detail explanation to @DanielAdiwardana 's answer. We need to add return_sequences=True for all LSTM layers except the last one.

Setting this flag to True lets Keras know that LSTM output should contain all historical generated outputs along with time stamps (3D). So, next LSTM layer can work further on the data.

If this flag is false, then LSTM only returns last output (2D). Such output is not good enough for another LSTM layer.

# expected input data shape: (batch_size, timesteps, data_dim)
model = Sequential()
model.add(LSTM(32, return_sequences=True,
               input_shape=(timesteps, data_dim)))  # returns a sequence of vectors of dimension 32
model.add(LSTM(32, return_sequences=True))  # returns a sequence of vectors of dimension 32
model.add(LSTM(32))  # return a single vector of dimension 32
model.add(Dense(10, activation='softmax'))

On side NOTE :: last Dense layer is added to get output in format needed by the user. Here Dense(10) means one-hot encoded output for classification task with 10 classes. It can be generalised to have 'n' neurons for classification task with 'n' classes.

In case you are using LSTM for regression (or time series) then you may have Dense(1). So that only one numeric output is given.

Solution 3 - Tensorflow

An example code like this should work:

regressor = Sequential()

regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (33, 1)))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 50))
regressor.add(Dropout(0.2))

regressor.add(Dense(units = 1))

regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')

regressor.fit(X_train, y_train, epochs = 10, batch_size = 4096)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTamim AddariView Question on Stackoverflow
Solution 1 - TensorflowDaniel De FreitasView Answer on Stackoverflow
Solution 2 - Tensorflowshantanu pathakView Answer on Stackoverflow
Solution 3 - TensorflowElvin AghammadzadaView Answer on Stackoverflow