You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm developing a skeleton_demo.py using the MMPose webcam API based on this tutorial, and while it works, there are several challenges I need help with:
PoseC3D currently doesn't support multi-class classification.
PoseC3D's input requires multiple frames, not bounding boxes. This creates a challenge when multiple bounding boxes appear in a single frame—how do I handle labeling in such cases?
I’m struggling to resolve these issues. any helps?
What is the feature?
webcam based skeleton_demo.py based on mmpose webcam API
What alternatives have you considered?
here's my code. 1. mmpose/mmpose/apis/webcam/nodes/model_nodes/pose_tracker_node.py
def_merge_bbox(bboxes: List[Dict], ratio=0.5):
"""Merge bboxes in a video to create a new bbox that covers the region where hand moves in the video."""iflen(bboxes) <=1:
returnbboxesbboxes.sort(key=lambdab: _compute_area(b), reverse=True)
merged=Falseforiinrange(1, len(bboxes)):
small_area=_compute_area(bboxes[i])
x1=max(bboxes[0]['bbox'][0], bboxes[i]['bbox'][0])
y1=max(bboxes[0]['bbox'][1], bboxes[i]['bbox'][1])
x2=min(bboxes[0]['bbox'][2], bboxes[i]['bbox'][2])
y2=min(bboxes[0]['bbox'][3], bboxes[i]['bbox'][3])
area_ratio= (abs(x2-x1) *abs(y2-y1)) /small_areaifarea_ratio>ratio:
bboxes[0]['bbox'][0] =min(bboxes[0]['bbox'][0],
bboxes[i]['bbox'][0])
bboxes[0]['bbox'][1] =min(bboxes[0]['bbox'][1],
bboxes[i]['bbox'][1])
bboxes[0]['bbox'][2] =max(bboxes[0]['bbox'][2],
bboxes[i]['bbox'][2])
bboxes[0]['bbox'][3] =max(bboxes[0]['bbox'][3],
bboxes[i]['bbox'][3])
merged=Truebreakifmerged:
bboxes.pop(i)
return_merge_bbox(bboxes, ratio)
else:
# return the largest bounding boxreturn [bboxes[0]]
@dataclassclassTrackInfo:
next_id: int=0last_objects: List=None@NODES.register_module()classPoseTrackerNode(Node):
"""Perform object detection and top-down pose estimation. Only detect objects every few frames, and use the pose estimation results to track the object at interval. Note that MMDetection is required for this node. Please refer to `MMDetection documentation <https://mmdetection.readthedocs.io/en /latest/get_started.html>`_ for the installation guide. Parameters: name (str): The node name (also thread name) det_model_cfg (str): The config file of the detection model det_model_checkpoint (str): The checkpoint file of the detection model pose_model_cfg (str): The config file of the pose estimation model pose_model_checkpoint (str): The checkpoint file of the pose estimation model input_buffer (str): The name of the input buffer output_buffer (str|list): The name(s) of the output buffer(s) enable_key (str|int, optional): Set a hot-key to toggle enable/disable of the node. If an int value is given, it will be treated as an ascii code of a key. Please note: (1) If ``enable_key`` is set, the ``bypass()`` method need to be overridden to define the node behavior when disabled; (2) Some hot-keys are reserved for particular use. For example: 'q', 'Q' and 27 are used for exiting. Default: ``None`` enable (bool): Default enable/disable status. Default: ``True`` device (str): Specify the device to hold model weights and inference the model. Default: ``'cuda:0'`` det_interval (int): Set the detection interval in frames. For example, ``det_interval==10`` means inference the detection model every 10 frames. Default: 1 class_ids (list[int], optional): Specify the object category indices to apply pose estimation. If both ``class_ids`` and ``labels`` are given, ``labels`` will be ignored. If neither is given, pose estimation will be applied for all objects. Default: ``None`` labels (list[str], optional): Specify the object category names to apply pose estimation. See also ``class_ids``. Default: ``None`` bbox_thr (float): Set a threshold to filter out objects with low bbox scores. Default: 0.5 kpt2bbox_cfg (dict, optional): Configure the process to get object bbox from its keypoints during tracking. Specifically, the bbox is obtained from the minimal outer rectangle of the keyponits with following configurable arguments: ``'scale'``, the coefficient to expand the keypoint outer rectangle, defaults to 1.5; ``'kpt_thr'``: a threshold to filter out low-scored keypoint, defaults to 0.3. See ``self.default_kpt2bbox_cfg`` for details smooth (bool): If set to ``True``, a :class:`Smoother` will be used to refine the pose estimation result. Default: ``True`` smooth_filter_cfg (str): The filter config path to build the smoother. Only valid when ``smooth==True``. Default to use an OneEuro filter Example:: >>> cfg = dict( ... type='PoseTrackerNode', ... name='pose tracker', ... det_model_config='demo/mmdetection_cfg/' ... 'ssdlite_mobilenetv2_scratch_600e_coco.py', ... det_model_checkpoint='https://download.openmmlab.com' ... '/mmdetection/v2.0/ssd/' ... 'ssdlite_mobilenetv2_scratch_600e_coco/ssdlite_mobilenetv2_' ... 'scratch_600e_coco_20210629_110627-974d9307.pth', ... pose_model_config='configs/wholebody/2d_kpt_sview_rgb_img/' ... 'topdown_heatmap/coco-wholebody/' ... 'vipnas_mbv3_coco_wholebody_256x192_dark.py', ... pose_model_checkpoint='https://download.openmmlab.com/mmpose/' ... 'top_down/vipnas/vipnas_mbv3_coco_wholebody_256x192_dark' ... '-e2158108_20211205.pth', ... det_interval=10, ... labels=['person'], ... smooth=True, ... device='cuda:0', ... # `_input_` is an executor-reserved buffer ... input_buffer='_input_', ... output_buffer='human_pose') >>> from mmpose.apis.webcam.nodes import NODES >>> node = NODES.build(cfg) """default_kpt2bbox_cfg: Dict=dict(scale=1.5, kpt_thr=0.3)
def__init__(
self,
name: str,
model_config: str,
model_checkpoint: str,
det_model_config: str,
det_model_checkpoint: str,
pose_model_config: str,
pose_model_checkpoint: str,
input_buffer: str,
output_buffer: Union[str, List[str]],
enable_key: Optional[Union[str, int]] =None,
enable: bool=True,
device: str='cuda:0',
det_interval: int=1,
class_ids: Optional[List] =None,
labels: Optional[List] =None,
bbox_thr: float=0.5,
kpt2bbox_cfg: Optional[dict] =None,
smooth: bool=False,
smooth_filter_cfg: str='configs/_base_/filters/one_euro.py',
min_frame: int=16,
fps: int=30,
score_thr: float=0.7):
asserthas_mmdet, \
f'MMDetection is required for {self.__class__.__name__}.'super().__init__(name=name, enable_key=enable_key, enable=enable)
self.model_config=mmcv.Config.fromfile(model_config)
self.model_checkpoint=model_checkpointself.det_model_config=get_config_path(det_model_config, 'mmdet')
self.det_model_checkpoint=det_model_checkpointself.pose_model_config=get_config_path(pose_model_config, 'mmpose')
self.pose_model_checkpoint=pose_model_checkpointself.device=device.lower()
self.class_ids=class_idsself.labels=labelsself.bbox_thr=0.9self.det_interval=det_intervalifnotkpt2bbox_cfg:
kpt2bbox_cfg=self.default_kpt2bbox_cfgself.kpt2bbox_cfg=copy.deepcopy(kpt2bbox_cfg)
self.det_countdown=0self.track_info=TrackInfo()
ifsmooth:
smooth_filter_cfg=get_config_path(smooth_filter_cfg, 'mmpose')
self.smoother=Smoother(smooth_filter_cfg, keypoint_dim=2)
else:
self.smoother=Noneself._clip_buffer= [] # items: (clip message, num of frames)self.score_thr=score_thrself.min_frame=min_frameself.fps=fpsself.det_model=init_detector(
self.det_model_config,
self.det_model_checkpoint,
device=self.device)
self.pose_model=init_pose_model(
self.pose_model_config,
self.pose_model_checkpoint,
device=self.device)
# register buffersself.register_input_buffer(input_buffer, 'input', trigger=True)
self.register_output_buffer(output_buffer)
defbypass(self, input_msgs):
returninput_msgs['input']
@propertydeftotol_clip_length(self):
returnsum([clip[1] forclipinself._clip_buffer])
def_extend_clips(self, clips: List[Message]):
"""Push the newly loaded clips from buffer, and discard old clips."""forclipinclips:
clip_length=clip.get_image().shape[0]
self._clip_buffer.append((clip, clip_length))
total_length=0foriinrange(-2, -len(self._clip_buffer) -1, -1):
total_length+=self._clip_buffer[i][1]
iftotal_length>=self.min_frame:
self._clip_buffer=self._clip_buffer[i:]
breakdef_merge_clips(self):
"""Concat the clips into a longer video, and gather bboxes."""videos= [clip[0].get_image() forclipinself._clip_buffer]
video=np.stack(videos, axis=0)
bboxes= []
forclipinself._clip_buffer:
objects=clip[0].get_objects(lambdax: x.get('label') =='person')
bboxes.append(_merge_bbox(objects))
bboxes=list(filter(len, bboxes))
returnvideo, bboxes, self._clip_buffer[0][0].get_image().shapedefprocess(self, input_msgs):
input_msg=input_msgs['input']
img=input_msg.get_image()
ifself.det_countdown==0:
# get objects by detection modelself.det_countdown=self.det_intervalpreds=inference_detector(self.det_model, img)
single_objects_det=self._post_process_det(preds)
else:
# get object by pose trackingsingle_objects_det=self._get_objects_by_tracking(img.shape)
self.det_countdown-=1single_objects_pose, _=inference_top_down_pose_model(
self.pose_model,
img,
single_objects_det,
bbox_thr=self.bbox_thr,
format='xyxy')
single_objects, next_id=get_track_id(
single_objects_pose,
self.track_info.last_objects,
self.track_info.next_id,
use_oks=False,
tracking_thr=0.3)
self.track_info.next_id=next_idself.track_info.last_objects=single_objects.copy()
# Pose smoothingifself.smoother:
single_objects=self.smoother.smooth(single_objects)
forobjinsingle_objects:
obj['det_model_cfg'] =self.det_model.cfgobj['pose_model_cfg'] =self.pose_model.cfginput_msg.update_objects(single_objects)
self._extend_clips([input_msg])
video, shape=self._merge_clips()
ifself.totol_clip_length>=self.min_frameandlen(
single_objects) >0andmax(map(len, single_objects)) >0:
# Init posec3d modelh, w=shape[0], shape[1]
forcomponentinself.model_config.data.test.pipeline:
ifcomponent['type'] =='PoseNormalize':
component['mean'] = (w//2, h//2, .5)
component['max_value'] = (w, h, 1.)
self.model=init_recognizer(self.model_config, self.model_checkpoint, self.device)
# Inference poseprint('Start Inferencing....')
pred_label, pred_score, bboxes=recognize_pose_model_batch(
self.model,
self.det_model,
self.pose_model,
self.score_thr,
self.bbox_thr,
video,
shape)
result=bboxes[-1][0] # Sort by bbox areaifpred_score>self.score_thr:
result['label'] =pred_labelinput_msg.update_objects([result])
returninput_msg
What is the problem this feature will solve?
I'm developing a skeleton_demo.py using the MMPose webcam API based on this tutorial, and while it works, there are several challenges I need help with:
(Skeleton-based model (PoseC3D) for Real-Time Webcam Inference #2155)
I’m struggling to resolve these issues. any helps?
What is the feature?
webcam based skeleton_demo.py based on mmpose webcam API
What alternatives have you considered?
here's my code.
1. mmpose/mmpose/apis/webcam/nodes/model_nodes/pose_tracker_node.py
2. mmpose/apis/inference.py
The text was updated successfully, but these errors were encountered: