Of A

2D Semantic Segmentation of a Robotic Surgical Scene

The Challenge

An ultimate goal for robotic surgery could be one where surgical tasks are performed autonomously with accuracy better than human surgeons. Several technical building blocks are essential to make this goal a reality. One such engineering problem is segmentation and tracking of anatomical and surgical objects in real-time during the surgery. Solutions to these problems when achieved have many practical applications in the realms of surgeon training and developing patient safety tools when combined with Augmented Reality and Virtual Reality. In this case study, we developed multiple computer vision models to detect and track anatomical and surgical objects of interest using the state of the art CV models on annotated data provided by Intuitive Surgicals.



Data Preparation: To enhance the limited training data, a number of data augmentation techniques were applied to extract maximum information for the models to learn from.


Base Model Selection: Drawing from non-healthcare domain, we explored a number of Convolutional Neural Network architectures such as ResNet, Deep Lab V3, Fast RCNN, and UNet on various AI Platforms with Cost and speed to execution as key factors.


Model Training: We arrived at two different models one using Deeplab V3 (Tensor Flow) for accuracy of segmentation and one with Fast RCNN (PyTorch) to generate inferences to meet the real-time segmentation needs.


Hardware: A combination of Cloud hosted GPUs and Local GPU compute engines were used for data preparation, training and inference generation. With the elastic scaling capability of the hardware, we were able to tailor our computing costs to our needs.


Experiments: We tinkered with a number of base models including UNet and different AI platforms to arrive at an approach that allowed us to balance time/cost constraints and model accuracy.



on challenging scenes with sparse ground truth


on easier scenes with dense ground truth


on certain anatomical objects


The Data used in this effort was generated in Porcene labs and far from practical reality of human surgeries. Research in this area drives a need for large numbers of high quality annotated surgical videos and its meta data to have an adequate data for AI/ML model development. The current approach of supervised learning techniques are not scalable due to the time and effort it takes to label the anatomical objects of interest. Additionally, it is difficult to accurately label soft tissue in the data without enough domain expertise as a clinician or a surgeon. Any flaws introduced in the labelling process will carry over into the model accuracy. The Privacy concerns and the lack of incentives for the surgeons to provide high quality surgical data is a big inhibitor to make any meaningful progress. Model accuracy achieved is ok at best and far from good enough.