Requesting help for research project #7646

kkoomen · 2024-01-17T18:35:56Z

kkoomen
Jan 17, 2024

Hi, we're a group university students trying to implement a paper on top of YOLO that introduces a matrix to improve object detection accuracy. We would like to request assistance of those that are familiar with the internals of the YOLO framework and architecture. We’re not looking for cooperation, but rather someone who can point out the things we are trying to find within the codebase.

The github repository of the paper is https://github.com/tkasarla/max-separation-as-inductive-bias

They provide a custom script to generate the matrix. Inside the file you can adjust nr_classes at the bottom. I've set this to 4, since we train with a custom dataset that has 4 classes precisely. When you run this program, a new file called prototypes<NR_CLASSES>.npy is generated. This file contains the actual matrix.

They provide a demo for AlexNet and ResNet. Inside LT_CIFAR/train_cifar.py they show how the matrix is applied, see line 45: https://github.com/tkasarla/max-separation-as-inductive-bias/blob/main/LT_CIFAR/train_cifar.py#L45

The outputs on line 45 is torch.Size([128, 4]) and the prototypes is torch.Size([4, 3]), which suits very well for a matrix multiplication. The matrix should be applied to the class probabilities, not onto the bounding boxes.

We want to apply the matrix onto YOLO, but v8 has quite an advanced structure. The matrix should be applied on the class probabilities of the output layer before the loss is applied, as can be seen in the demo of the github repo of the paper on line 45. Inside nn/tasks.py in the loss() of the BaseModel() there is the preds variable, which I think would be sort of equivalent to the output layer, but I'm not sure if this contains the class probabilities, because the preds variable contains three tensor objects and we have no idea what they represent.

I hope someone can clarify where the output layer class probabilities reside in the code where I can possible apply this 4x3 matrix. It would be of great help.

kkoomen · 2024-01-18T21:53:22Z

kkoomen
Jan 18, 2024
Author

We have some way of implementing the matrix, but the accuracy is very very bad, so we're doing something wrong.

We've modified the Detect.forward() in nn/modules/task.py, we are working with 4 classes, so it is hardcoded to work for this, just to get our first version out. Later on, we can adjust this to work for any number of classes.

The concatenation and reconstruction without applying the matrix works fine, when applying matrix matmul, it goes wrong, which confirms that applying the matrix goes wrong, not the concatenation and reconstruction.

# Load the matrix, which is torch.Size([4,3]) made for 4 classes with 3 dimensions.
prototypes = torch.from_numpy(np.load('prototypes4.npy')).float()

class Detect(nn.module):

    def forward(self, x):
        """Concatenates and returns predicted bounding boxes and class probabilities."""
        for i in range(self.nl):
            x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)

        if self.training:  # Training path
            shape = x[0].shape

            # concatinating the three heads together, using the view function to shape the heads 
            # from [batch, bbox and cls data, width, height] to [batch, bbox and cls data, width and height]
            x_cat = torch.cat([xi.view(shape[0], self.no, -1) for xi in x], 2)

            # splitting the full tensor so we obtain two tensors of shape [batch, bbox, width * height] and [batch, cls, width * height]
            box, cls = x_cat.split((self.reg_max * 4, self.nc), 1)

            # removing the class probability of one class. (so we can recalculate this probability later with the matrix)
            # shape: [batch, num classes - 1, width * height]
            cls = cls[:, :3, :]

            # applying the matrix, gaining a [batch, num classes, width * height] tensor
            cls = torch.stack([torch.mm(batch_mat.t().to('mps'), prototypes.t().to('mps')).t().to('mps') for batch_mat in cls])

            # reshaping the "x_cat" tensor
            new_x_cat = torch.cat([box.to('mps'), cls], 1)

            # before the actual training, a couple of test runs are done, here we can't apply and use our matrix normally (not needed either)
            try:

                # reshaping the "x" list of tensors, using a split and view command
                x = list(new_x_cat.split((6400, 1600, 400), dim=2))

                x[0] = x[0].view(shape[0], self.no, 80, -1)
                x[1] = x[1].view(shape[0], self.no, 40, -1)
                x[2] = x[2].view(shape[0], self.no, 20, -1)

            except Exception as e:
                print(e)

            return x

        # Inference path
        shape = x[0].shape  # BCHW
        x_cat = torch.cat([xi.view(shape[0], self.no, -1) for xi in x], 2)
        if self.dynamic or self.shape != shape:
            self.anchors, self.strides = (x.transpose(0, 1) for x in make_anchors(x, self.stride, 0.5))
            self.shape = shape

        if self.export and self.format in ("saved_model", "pb", "tflite", "edgetpu", "tfjs"):  # avoid TF FlexSplitV ops
            box = x_cat[:, : self.reg_max * 4]
            cls = x_cat[:, self.reg_max * 4 :]
        else:
            box, cls = x_cat.split((self.reg_max * 4, self.nc), 1)
        dbox = self.decode_bboxes(box)

        if self.export and self.format in ("tflite", "edgetpu"):
            # Precompute normalization factor to increase numerical stability
            # See https://github.com/ultralytics/ultralytics/issues/7371
            img_h = shape[2]
            img_w = shape[3]
            img_size = torch.tensor([img_w, img_h, img_w, img_h], device=box.device).reshape(1, 4, 1)
            norm = self.strides / (self.stride[0] * img_size)
            dbox = dist2bbox(self.dfl(box) * norm, self.anchors.unsqueeze(0) * norm[:, :2], xywh=True, dim=1)

        cls = cls[:, :3, :]

        # Application of the matrix
        cls = torch.stack([torch.mm(batch_mat.t().to('mps'), prototypes.t().to('mps')).t().to('mps') for batch_mat in cls])
        y = torch.cat((dbox, cls.sigmoid()), 1)
        return y if self.export else (y, x)

1 reply

pderrenger Feb 6, 2024
Maintainer

@kkoomen it seems you're working on integrating a custom matrix into the YOLOv8 architecture to modify the class probabilities before loss computation. The preds variable you mentioned indeed contains the predictions, but it's structured to include both bounding box coordinates and class probabilities.

To apply your matrix to the class probabilities, you'll need to isolate the class probability predictions from the preds tensor. In YOLOv8, the output tensor typically has a shape [batch_size, anchors, (4 + 1 + num_classes) * num_outputs], where 4 represents the bounding box coordinates, 1 is the objectness score, and num_classes is the number of classes.

Here's a simplified approach to apply your matrix:

Load your matrix as a PyTorch tensor and ensure it's on the same device as your model's parameters.
During the forward pass, after obtaining the predictions, split the tensor into bounding boxes, objectness scores, and class probabilities.
Apply the matrix multiplication to the class probabilities.
Concatenate the modified class probabilities back with the bounding boxes and objectness scores.
Continue with the loss computation.

Here's a conceptual snippet of how you might modify the forward pass:

# Assuming 'preds' is the output tensor from the model
# and 'prototypes' is your matrix loaded as a PyTorch tensor

# Ensure prototypes is on the same device as the model
prototypes = prototypes.to(preds.device)

# Split the predictions into components
boxes, obj, cls_probs = split_predictions(preds, num_classes)

# Apply the matrix to class probabilities
# Assuming cls_probs is of shape [batch, anchors, num_classes]
cls_probs = cls_probs.reshape(-1, num_classes)  # Flatten to [batch * anchors, num_classes]
cls_probs = torch.matmul(cls_probs, prototypes)  # Apply the matrix
cls_probs = cls_probs.reshape(batch_size, -1, num_classes + 1)  # Reshape back to original

# Concatenate the modified class probabilities with boxes and objectness
modified_preds = torch.cat((boxes, obj, cls_probs), dim=-1)

# Continue with loss computation using 'modified_preds'

Please note that the above code is highly conceptual and will need to be adapted to fit the exact structure of your predictions tensor and the YOLOv8 architecture. You'll also need to write the split_predictions function to correctly split the predictions tensor into its components.

Remember to carefully test and validate your implementation to ensure that the matrix is being applied correctly and that the tensor shapes are consistent throughout the forward pass. Good luck with your research project! 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ultralytics

Requesting help for research project #7646

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Ultralytics

Requesting help for research project #7646

kkoomen Jan 17, 2024

Replies: 1 comment · 1 reply

kkoomen Jan 18, 2024 Author

pderrenger Feb 6, 2024 Maintainer

kkoomen
Jan 17, 2024

Replies: 1 comment 1 reply

kkoomen
Jan 18, 2024
Author

pderrenger Feb 6, 2024
Maintainer