Applied AI / Industrial perception

Robotic bin-picking perception

An RGB-D industrial perception pipeline that segments visible object instances, extracts 3D geometry features from depth, and ranks candidate objects for grasp selection.

Status Public repository and technical report

Stack Python, YOLO, PyTorch, Open3D, NumPy

Input data RGB, depth, masks, camera intrinsics, BOP-format data

Output Object-level features, candidate rankings, validation metrics, and visual comparisons

Robotic bin-picking project figure comparing input scene, ground truth masks, and model predictions — Figure from the robotic bin-picking perception repository.

Project notes

This page is meant as a quick read: what the project did, how it was built, and what the reported results mean.

Pick selection needs more than a segmentation mask.
In cluttered scenes, the visible object is not always the best object to pick. The project adds geometry and ranking so candidate picks can be compared within each image.
YOLO masks, RGB-D geometry, and a learned ranker.
Each visible mask is paired with depth and camera intrinsics, back-projected into a point cloud, converted into object-level features, then ranked with a PyTorch MLP trained against a heuristic target.
Depth, visibility, position, extent, class, and orientation.
The feature extractor uses point-cloud statistics, bounding boxes, image-position cues, valid-point counts, object class, centroid, approximate extent, and PCA orientation axes.
Good agreement with the heuristic target and known-mask baseline.
The ranker was evaluated on 2,016 object-candidate rows with R2 = 0.9307 and Pearson r = 0.9661. The integrated YOLO-plus-ranker pipeline placed the selected object in the heuristic top three in 81.9 percent of scenes.
The repo includes code, figures, report, and evaluation scripts.
Dataset files are referenced separately and are not committed to the repo. The project code and documentation are released under the MIT License.