Replies: 1 comment 1 reply
-
We have some way of implementing the matrix, but the accuracy is very very bad, so we're doing something wrong. We've modified the The concatenation and reconstruction without applying the matrix works fine, when applying matrix matmul, it goes wrong, which confirms that applying the matrix goes wrong, not the concatenation and reconstruction. # Load the matrix, which is torch.Size([4,3]) made for 4 classes with 3 dimensions.
prototypes = torch.from_numpy(np.load('prototypes4.npy')).float()
class Detect(nn.module):
def forward(self, x):
"""Concatenates and returns predicted bounding boxes and class probabilities."""
for i in range(self.nl):
x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)
if self.training: # Training path
shape = x[0].shape
# concatinating the three heads together, using the view function to shape the heads
# from [batch, bbox and cls data, width, height] to [batch, bbox and cls data, width and height]
x_cat = torch.cat([xi.view(shape[0], self.no, -1) for xi in x], 2)
# splitting the full tensor so we obtain two tensors of shape [batch, bbox, width * height] and [batch, cls, width * height]
box, cls = x_cat.split((self.reg_max * 4, self.nc), 1)
# removing the class probability of one class. (so we can recalculate this probability later with the matrix)
# shape: [batch, num classes - 1, width * height]
cls = cls[:, :3, :]
# applying the matrix, gaining a [batch, num classes, width * height] tensor
cls = torch.stack([torch.mm(batch_mat.t().to('mps'), prototypes.t().to('mps')).t().to('mps') for batch_mat in cls])
# reshaping the "x_cat" tensor
new_x_cat = torch.cat([box.to('mps'), cls], 1)
# before the actual training, a couple of test runs are done, here we can't apply and use our matrix normally (not needed either)
try:
# reshaping the "x" list of tensors, using a split and view command
x = list(new_x_cat.split((6400, 1600, 400), dim=2))
x[0] = x[0].view(shape[0], self.no, 80, -1)
x[1] = x[1].view(shape[0], self.no, 40, -1)
x[2] = x[2].view(shape[0], self.no, 20, -1)
except Exception as e:
print(e)
return x
# Inference path
shape = x[0].shape # BCHW
x_cat = torch.cat([xi.view(shape[0], self.no, -1) for xi in x], 2)
if self.dynamic or self.shape != shape:
self.anchors, self.strides = (x.transpose(0, 1) for x in make_anchors(x, self.stride, 0.5))
self.shape = shape
if self.export and self.format in ("saved_model", "pb", "tflite", "edgetpu", "tfjs"): # avoid TF FlexSplitV ops
box = x_cat[:, : self.reg_max * 4]
cls = x_cat[:, self.reg_max * 4 :]
else:
box, cls = x_cat.split((self.reg_max * 4, self.nc), 1)
dbox = self.decode_bboxes(box)
if self.export and self.format in ("tflite", "edgetpu"):
# Precompute normalization factor to increase numerical stability
# See https://github.com/ultralytics/ultralytics/issues/7371
img_h = shape[2]
img_w = shape[3]
img_size = torch.tensor([img_w, img_h, img_w, img_h], device=box.device).reshape(1, 4, 1)
norm = self.strides / (self.stride[0] * img_size)
dbox = dist2bbox(self.dfl(box) * norm, self.anchors.unsqueeze(0) * norm[:, :2], xywh=True, dim=1)
cls = cls[:, :3, :]
# Application of the matrix
cls = torch.stack([torch.mm(batch_mat.t().to('mps'), prototypes.t().to('mps')).t().to('mps') for batch_mat in cls])
y = torch.cat((dbox, cls.sigmoid()), 1)
return y if self.export else (y, x) |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, we're a group university students trying to implement a paper on top of YOLO that introduces a matrix to improve object detection accuracy. We would like to request assistance of those that are familiar with the internals of the YOLO framework and architecture. We’re not looking for cooperation, but rather someone who can point out the things we are trying to find within the codebase.
The github repository of the paper is https://github.com/tkasarla/max-separation-as-inductive-bias
They provide a custom script to generate the matrix. Inside the file you can adjust
nr_classes
at the bottom. I've set this to 4, since we train with a custom dataset that has 4 classes precisely. When you run this program, a new file calledprototypes<NR_CLASSES>.npy
is generated. This file contains the actual matrix.They provide a demo for AlexNet and ResNet. Inside
LT_CIFAR/train_cifar.py
they show how the matrix is applied, see line 45: https://github.com/tkasarla/max-separation-as-inductive-bias/blob/main/LT_CIFAR/train_cifar.py#L45The outputs on line 45 is
torch.Size([128, 4])
and the prototypes istorch.Size([4, 3])
, which suits very well for a matrix multiplication. The matrix should be applied to the class probabilities, not onto the bounding boxes.We want to apply the matrix onto YOLO, but v8 has quite an advanced structure. The matrix should be applied on the class probabilities of the output layer before the loss is applied, as can be seen in the demo of the github repo of the paper on line 45. Inside
nn/tasks.py
in theloss()
of theBaseModel()
there is thepreds
variable, which I think would be sort of equivalent to the output layer, but I'm not sure if this contains the class probabilities, because the preds variable contains three tensor objects and we have no idea what they represent.I hope someone can clarify where the output layer class probabilities reside in the code where I can possible apply this 4x3 matrix. It would be of great help.
Beta Was this translation helpful? Give feedback.
All reactions