Evaluating Faster R-CNN for cataract surgery tool detection using microscopy video

TitleEvaluating Faster R-CNN for cataract surgery tool detection using microscopy video
Publication TypeConference Abstract
Year of Publication2022
AuthorsLee, H.Y.
Secondary AuthorsHisey, R.
Tertiary AuthorsHolden, M.
Subsidiary AuthorsLiu, J., Ungi T., G. Fichtinger, & Law C.
Conference NameImaging Network of Ontario Symposium

Introduction: Traditional methods of cataract surgery skill assessment rely on human expert supervision. This exposes the trainee to interobserver variability and inconsistent feedback. Alternative measures such as sensorbased instrument motion analysis promise objective assessment [1]. However, sensor-based systems are logistically complicated and expensive to obtain. Previous studies have demonstrated a strong correlation between sensor-based metrics and two-dimensional motion metrics obtained from object detection [2]. Reliable object detection is the foundation for computing such performance metrics. Therefore, the objective of this study is to evaluate the performance of an object detection network, namely Faster Region-Based Convolutional Neural Network (FRCNN), in recognition of cataract surgery tools in microscopy video. Methods: Microscope video was recorded for 25 trials of cataract surgery on an artificial eye. The trials were performed by a cohort consisting of one senior-surgeon and four junior-surgeons and manually annotated for bounding box locations of the cataract surgery tools (Figure 1) The surgical tools used included: forceps, diamond keratomes, viscoelastic cannulas, and cystotome needles. A FRCNN [3] was trained on a total of 130,614 frames for object detection. We used five-fold cross validation, using a leave-one-userout method. In this manner, all videos from one surgeon were reserved for testing and the frames from the remaining 20 videos were divided among training and validation. Network performance was evaluated via mean average precision (mAP), which is defined as the area under the precision/recall curve. Samples were considered correctly identified when the intersection over union (IoU) between the ground truth and predicted bounding boxes was greater than 0.5. Results: The overall mAP of the network was 0.63. Toolspecific mAPs ranged between 0.49 and 0.96 (Table 1). The high accuracy in detection of the cystotome needle is likely due to the distinct size and shape of the tool tip. The diamond keratome had the lowest mAP of any of the tools recognized, however this may be attributed to variations in the appearance of the tool tip (Figure 2). Conclusions: The FRCNN was able to recognize the surgical tools used in cataract surgery with reasonably high accuracy. Now that we know the network can sufficiently recognize the surgical tools, our next goal is to use this network to compute motion-based performance metrics. Future work seeks to validate these performance metrics against those obtained from sensor-based tracking and against expert evaluations. This serves as a first step towards providing consistent and accessible feedback for future trainees learning cataract surgery. 

PerkWeb Citation KeyLee2022a