-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Predict fails on a model containing the Attention layer. #20429
Comments
Hi @HGS-mbayer - Here Using Attention layer with Conv2D , for model creation here is some steps you can consider first.
Here is the model using Attention layer and Convolution network.
Attached gist for the reference here. |
Your example works fine for me, however, I have models that were trained using Tensorflow 2.9.1 (tf-keras 2.6.0 for that version of Tensorflow if I'm not mistaken). They worked fine until upgrading to Keras 3.6.0 with the torch backend. And I went back and tried (I think) every version of Keras 3.0+ and they all fail with the following code snippet. Here is another example that is more closely related to my architecture: import keras
import numpy as np
def create_model(dims: tuple[int, int, int], num_classes: int):
width, height, bands = dims
ip = (width, height, bands)
inputs = keras.layers.Input(shape=ip)
conv2d_1 = keras.layers.Conv2D(8, (3, 3), activation='relu')(inputs)
maxpool = keras.layers.MaxPooling2D(pool_size=(2, 2))(conv2d_1)
att_out = keras.layers.Attention()([maxpool, maxpool])
conv2d_2 = keras.layers.Conv2D(8, (3, 3), activation='relu')(att_out)
output2 = keras.layers.Conv2D(num_classes, (1, 1), activation='softmax')(conv2d_2)
model = keras.Model(inputs=inputs, outputs=output2)
return model
model = create_model2((464, 464, 4), 2)
data = np.random.randn(1, *model.input_shape[1:])
test = model.predict(data) Using Tensorflow 2.15.0 (final version of Tensorflow to use Keras2) the same code snippet succeeds. If I upgrade to the latest Tensorflow, which also brings in Keras3, then it will fail as before. import numpy as np
import tensorflow as tf
def create_model(dims: tuple[int, int, int], num_classes: int):
width, height, bands = dims
ip = (width, height, bands)
inputs = tf.keras.layers.Input(shape=ip)
conv2d_1 = tf.keras.layers.Conv2D(8, (3, 3), activation='relu')(inputs)
maxpool = tf.keras.layers.MaxPooling2D(pool_size=(2, 2))(conv2d_1)
att_out = tf.keras.layers.Attention()([maxpool, maxpool])
conv2d_2 = tf.keras.layers.Conv2D(8, (3, 3), activation='relu')(att_out)
output2 = tf.keras.layers.Conv2D(num_classes, (1, 1), activation='softmax')(
conv2d_2
)
model = tf.keras.Model(inputs=inputs, outputs=output2)
return model
model = create_model((464, 464, 4), 2)
data = np.random.randn(1, *model.input_shape[1:])
test = model.predict(data) I guess what I'm trying to get to the bottom of is this:
Thanks again for your assistance! |
The root of the error is that the shape of the tensor produced by the convolutional and pooling layers does not align with what the Attention layer expects. This issue arises when reshaping is not performed to convert the convolutional output into a shape compatible with the Attention layer, leading to dimension mismatch errors. Reshape the MaxPooling Output: The Reshape layer transforms the 4D output of the pooling layer to a 3D format with shape (batch_size, sequence_length, feature_dim). |
Thanks @divyashreepathihalli For the explanation.
This is the convolution with Attention layer model architecture create with same as you mentioned. I'll be create tutorial example with convolution network with Attention layer. |
Thanks for taking the time on this issue. The more I think about this the more I can't help but believe a regression has occurred.
|
Running predict on a model containing an Attention layer causes a RuntimeError due to a dimension issue.
Example Code
Here is a dummy model to reproduce the issue.
Training also appears to fail:
Traceback
The text was updated successfully, but these errors were encountered: