Endoscopic View
The Future of robotic surgery with the incorporation of AR/VR can lead to the development and deployment of novel safety tools that can augment the capabilities of surgeons. One of the key engineering building blocks to make this a reality is the ability to know the depth of different anatomical structures within the endoscopic view. This can lead to a better understanding of spatial relationships between different surgical and anatomical objects and provide options for the surgical team to minimize risk and allow human experts to stay in control.
In this case study, we leveraged the stereo view of the surgeon with both left and right views along with the metadata of the camera calibration provided by Intuitive Surgicals to develop a depth estimation model that generates the depth of each pixel within each frame. By utilizing only the stereo view of the surgeon, an interactive 3D view can be developed to be able to see the surgical scene from multiple perspectives, this will allow for the development of more accurate and useful safety tools.
Data Preparation: As a first step, the video was separated into frames, augmented the point cloud with a concept of locality based on depth. Rotation of the camera was then applied to the setpoint cloud to generate a depth map to be used for training.
Base Model Selection: We explored several base models for validating our data prep process and used UNet for this purpose. We iteratively augmented the UNet model to eventually arrive at PSMNet like architecture for generating the best model inferences.
Model Training: Adjusted the magnitude of the gradient as needed, masked the portions of the frames where we found bad data, used a mean average error loss, adam optimizer with weight decay and other parameters to iteratively assign weights to different samples based on the data at hand.
Experiments: We performed a number of experiments to generate a rotating 3D view of the surgical scene developed using the 2D frame and the depth map to be able to view the surgical scene from different perspectives.