Skip to main content

estimate_pose_multi_view

Skill class for ai.intrinsic.estimate_pose_multi_view skill.

The estimate_pose_multi_view skill can be used to estimate the pose of objects in the scene given at least 2 cameras (and a maximum of 4 cameras). We recommend using 3 cameras in a triangular configuration for the best estimation performance. The skill uses the input images obtained from the cameras to estimate the 6 DOF pose of every instance of the desired object in the scene. The CAD model of the object to be estimated is defined at training time, which is why it is important that the object matches the CAD model provided. The skill succeeds if the number of detections is equal or greater than the min_num_instances specified. The skill returns the pose, score and ID for each detected object instance. This skill does not update the pose of the detected object instances in the belief world. Instead, skills such as update_world or sync_product_objects can use the output of this skill to update the belief world. The skill can use any ML multi-view pose estimator or One Shot pose estimator as long as the pose estimator is created on the same camera configuration.

Prerequisites

Required assets:

  • Cameras : The skill can be run with a minimum of 2 cameras and a maximum of 4. If less than 4 cameras are available, the remaining slots can be filled with an already passed camera. The cameras should be placed at similar distances to the objects to be detected. The distances and angles of all cameras w.r.t the objects in the region of interest should fall within the pose range specified at training time.
  • Pose estimator : the pose estimator for the object to be detected should be created in advance following the ML Multi‐view pose estimator training process or the One Shot pose estimator creation process.

The cameras must be extrinsically calibrated w.r.t each other. This means that, if the solution is running in real world, the relative position and orientation of the cameras should match (up to the calibration accuracy) the real world position. Follow the camera-to-camera calibration process for more information on how to calibrate multiple cameras.

Usage Example

  • Set the Cameras that need to be used for pose estimation. The skill provides up to 4 possible slots. However, if less than 4 cameras are available, the remaining slots can be filled with an already passed in camera.
  • Set the Pose estimator to be used for the object that needs to be detected. The Pose estimator should be of type ML Multi‐view pose estimator or One Shot pose estimator.

(Optional parameters):

  • Set the Min num instances parameter to define the minimum number of instances that the pose estimator should detect in the scene. The skill will fail if the not enough parts are detected in the scene. Default: 0.
  • Set Update object pose to update the pose of an object in the belief world with the first returned pose of the pose estimator.
  • Set Visibility score params to additionally calculate the visibility score of each detected part in the scene. Visibility score defines how visible the part is and it can be used for grasp planning to tell the robot which part to pick up first.
  • Set Region of interest to limit the search space for the pose estimator.Only objects within the region of interest will be considered for pose estimation. This also reduces the runtime for one shot pose estimation.

When the skill is run, the skill dialog will display two images:

  • A raw image from the first camera
  • An annotated image where every detected object instance is highlighted.

Update the digital twin (belief world)

To update the digital twin (belief world) with all the pose estimates returned by this skill, you can use the sync_product_objects skill or update_world skill as described for the [estimate_pose skill documentation]((https://flowstate.intrinsic.ai/docs/skill_guides/perception/estimate_pose/#update-the-digital-twin-belief-world).

Parameters

pose_estimator

Id of the pose estimator.

min_num_instances

Minimal number of instances that must be detected. Skill fails if not enough instances can be found. Leaving this value unset will default to 0.

update_object_pose

Update pose for a single object. If multiple instances of the same object are in the scene, the pose with the highest detection score will be updated.

visibility_score_params

Optional parameter to compute visibility score for pose estimates.

log_debug_data

Optional parameter to log images and metadata for debugging purposes.

capture_data

If specified, the skill uses the capture_result_locations within to retrieve images captured earlier.

If not specified, the skill captures new images directly from the cameras.

inference_timeout_sec

Maximum allowed time in seconds to run the pose estimation inference. This includes the pose estimator creation and the actual pose estimation.

region_of_interest

If set, the pose estimator will restrict the search space to the region of interest.

Returns

estimates

Pose estimates returned by the pose estimator.

root_ts_target

Poses of the returned pose estimates. Note that this is redundant to the poses returned in 'estimates' but allows easier use in combination with other skills like sync_product_objects.

Error Code

The skill does not have error codes yet