Abstract

Background

There is increasing availability of operative video for use in surgical training. Emerging technologies can now assess video footage and automatically generate metrics that could be harnessed to improve the assessment of operative performance. However, a comprehensive understanding of which technology features are most impactful in surgical training is lacking. The aim of this scoping review was to explore the current use of automated video analytics in surgical training.

Methods

PubMed, Scopus, the Web of Science, and the Cochrane database were searched, to 29 September 2023, following PRISMA extension for scoping reviews (PRISMA-ScR) guidelines. Search terms included ‘trainee’, ‘video analytics’, and ‘education’. Articles were screened independently by two reviewers to identify studies that applied automated video analytics to trainee-performed operations. Data on the methods of analysis, metrics generated, and application to training were extracted.

Results

Of the 6736 articles screened, 13 studies were identified. Computer vision tracking was the common method of video analysis. Metrics were described for processes (for example movement of instruments), outcomes (for example intraoperative phase duration), and critical safety elements (for example critical view of safety in laparoscopic cholecystectomy). Automated metrics were able to differentiate between skill levels (for example consultant versus trainee) and correlated with traditional methods of assessment. There was a lack of longitudinal application to training and only one qualitative study reported the experience of trainees using automated video analytics.

Conclusion

The performance metrics generated from automated video analysis are varied and encompass several domains. Validation of analysis techniques and the metrics generated are a priority for future research, after which evidence demonstrating the impact on training can be established.

Introduction

Surgery, a complex and high-stakes medical intervention, requires technical finesse, as well as shrewd decision-making. Improving patient outcomes is of particular importance, given approximately half a million operations are performed worldwide each day1. Whilst many factors associated with a poorer postoperative outcome are challenging to modify (for example functional status)2, two-thirds of adverse events in surgical patients occur during the intraoperative phase3, with over half being preventable (for example bleeding)4. Good technical performance is associated with reduced mortality in patients undergoing gastric bypass surgery5 and is a predictor of postoperative complications after surgery for gastric cancer6. Non-technical skills of surgery, including decision-making, situational awareness, communication, teamwork, and leadership7, are equally influential with regard to performance, with breakdown in team communication being the most common failure identified from adverse intraoperative events3.

Assessment of intraoperative skills is an important aspect of surgical training. The Objective Structured Assessment of Technical Skills (OSATS) and its multiple derivatives can reliably assess technical proficiency8 and demonstrate development over time9. However, such assessment is time-consuming to perform, resource intensive, and prone to observer bias. The methods of non-technical skill assessment (for example the Non-Technical Skills for Surgeons (NOTSS) framework), although validated10, face similar challenges when implemented in the real-world environment.

Since its inception, minimally invasive surgery (MIS) has become the established surgical approach across multiple specialties11. Given the nature of MIS, the potential availability of video for education is vast and its use is supported by the current literature. Multiple-choice test performance and intraoperative knowledge of medical students are superior when operative video-based teaching is used, compared with conventional methods12,13. For surgical trainees, improved performance, scored using a global assessment scale, is also observed in laparoscopic colorectal surgery14 and the performance of surgical trainees can be further improved with the addition of expert coaching15.

Perhaps the most utilized application of operative video is post-hoc review. For trainees performing inguinal hernia repair, postoperative video-based assessment is feasible16 and, after reviewing the video footage with a supervising surgeon, subsequent performance has been shown to improve17. The limitations of facilitated debriefing using video are that it is time-consuming to perform and inter-observer variation can make it difficult to determine a trainee’s true level of competence. Alternatively, self-assessment of operative video is associated with a reduced learning curve and circumvents the need for expert involvement18. However, negative correlation has been shown between performance and self-assessment scores, with a concerning trend for individuals who perform poorly to rate themselves highly19.

In recent years, the development of artificial intelligence (AI) in healthcare has increased rapidly20. Whilst there are now numerous clinical applications21, an emerging field is the use of AI to enhance surgical education. The accuracy of ChatGPT (OpenAI, San Francisco, CA, USA), an application based on large language models, with regard to answering board examination questions in orthopaedics has been explored22 and advanced simulators now allow trainees to increase caseload attainment, without exposing patients to harm, in urological settings23. An even more promising area of development is the analysis of video to assess operative performance. Motion tracking has been applied with favourable results in simulated settings for both open and laparoscopic skills24,25. However, these studies required the use of physical markers on the target object, limiting the transferability into the live operating room environment. Recently, more advanced techniques that can analyse operative video and produce performance metrics without the need for adjuncts have been developed26. Automated analysis of non-technical skills using operative video has also been trialled27.

Given the challenges of traditional methods of skill assessment and the need to train surgeons who are competent, there is a growing need for an objective and reliable method of providing feedback that is complementary to expert-based assessment. With the wealth of video available, alongside rapidly developing technologies that can perform analytics, there is the potential to provide meaningful feedback on performance. The aim of this scoping review was to explore the current uses of automated analyses from real-world operative video in the context of surgical training.

Methods

The framework developed by Arksey and O’Malley28 and the PRISMA extension for scoping reviews (PRISMA-ScR)29 were used in the design, conduct, and reporting of this scoping review. A scoping review was chosen to gain a broad understanding of this emerging field in surgical education.

Definitions

Automated operative video analytics was defined as the use of footage from any part of a real-world operation that had non-human assessment applied. The assessment could be performed during or after the operation. Application to surgical training was defined as inclusion of surgical trainees or fellows in the study and production of metrics from the automated assessment that were used to guide feedback and training.

Search strategy

Electronic searches of PubMed, Scopus, the Web of Science, and the Cochrane database were performed from database inception to 29 September 2023. The search criteria were developed with the assistance of a medical librarian with key search terms including ‘surgeon’, ‘trainee’, ‘resident’, ‘video analytics’, ‘education’, and ‘training’ combined with the Boolean operators AND and OR. The full search criteria for each database are available in the Supplementary Methods. The reference lists of the included studies were also searched for relevant articles. Covidence (Veritas Health Innovation, Melbourne, Victoria, Australia) was used to manage references throughout the review and extraction process.

Study eligibility

Given the broad search criteria adopted, title screening was conducted by a single reviewer (L.D.), with sensitivity prioritized over specificity, to prevent exclusion of potentially relevant studies. Abstract and full-text screening was performed independently by two reviewers (L.D. and C.P.B.). Disagreements were settled by either consensus or discussion with a third reviewer (S.Y.). Full texts were considered to have met the inclusion criteria when: at least one method of automated video analysis was used; the setting was a real-world operation; operative video performed by a trainee was used; and metrics were generated from the assessment with the potential to facilitate feedback and training. Participants had to have at least one video analysed for inclusion. In addition, qualitative studies exploring trainee and trainer experience of using automated operative video analytics were also eligible for inclusion. Articles were excluded when they involved a simulated or virtual reality setting or when they were reviews, case reports, editorials, commentaries, or conference abstracts. Studies were also excluded when they included video analysis based solely on expert assessment (for example OSATS), relied on non-video analysis of performance (for example kinematic data from robotic arms), or described a method of automated analysis that did not produce specific metrics that could be applied to training.

Data charting

All data from the included full texts were extracted independently by two reviewers (L.D. and C.P.B.) using a predefined data extraction template. Extraction data included, but were not limited to, demographics of participants, operation type, method of automated video analytics used, metrics generated, and application to training. Full extraction data are available in the Supplementary Methods.

Summarizing data

Data extracted from the included studies are described narratively, with quantitative analysis applied, where appropriate (for example total number of participants included in studies). Study characteristics extracted included first author, year of publication, primary country of research, and study type. Quality assessment of quantitative studies was performed independently by two reviewers (L.D. and C.P.B.) using the Medical Education Research Study Quality Instrument (MERSQI)30. This tool assesses study design, sampling, type of data, validity of evaluation instrument, data analysis, and outcomes, to provide an overall score, with a maximum of 18 (Table S1). Disagreements were settled by either consensus or discussion with a third reviewer (S.Y.). For studies with ‘not applicable’ MERSQI items, scores were based on maximum points available, adjusted to a standard denominator of 18.

Results

The search generated a total of 10 520 articles, of which 3788 were duplicates and were removed. Of the 6736 articles screened, 39 underwent full-text review. A total of 9 studies met the inclusion criteria, with a further 4 studies identified from searches of reference lists, giving a total of 13 studies (Fig. 1)31–43. Reasons for exclusion during full-text review included simulated setting (10 articles), no automated analysis (8 articles), no application to training (6 articles), and being conference abstracts (2 articles). All of the included studies were published within the last 10 years, with the majority (8 studies) published within the last 5 years.

PRISMA flow chart
Fig. 1

PRISMA flow chart

Quality assessment of the included studies

The mean adjusted MERSQI score for the included studies was 12.73 out of 18. All of the studies reported objective measures and went beyond descriptive analysis only, apart from one study, which was a feasibility study and thus not powered for statistical analysis35. Full MERSQI scoring is available in Table S2.

General study characteristics

The majority of studies were observational and one qualitative study was included. There was one multicentre study (17 centres included), with the number of institutions not reported in five studies. Cataract surgery and cholecystectomy were the two most common operations assessed and the majority of studies involved open surgery (6 studies). None of the included studies reported any patient or clinical outcomes (for example mortality). The median number of videos included for analysis was 69 (range 6–1107). Full study characteristics are available in Table 1.

Table 1

General study characteristics

StudyCountryStudy typeSpecialtyOperationApproachNumber of videos in study
Azari et al.31USAPCSPan-specialtyMultipleOpen103
Balal et al.32UKPCSOphthalmologyCataract surgeryOpen120
Din et al.33UKPCSOphthalmologyCataract surgeryOpen22
Frasier et al.34USAPCSPan-specialtyMultipleOpen138
Glarner et al.35USAPCSPlastic surgeryReduction mammoplastyOpen6
Hameed et al.36CanadaQualitativeHepatobiliaryCholecystectomyLaparoscopicNot applicable
Humm et al.37UKRCSHepatobiliaryCholecystectomyLaparoscopic159
Kitaguchi et al.38JapanRCSColorectalTransanal total mesorectal excisionEndoscopic45
Lee et al.39South KoreaRCSEndocrinologyThyroidectomyRobotic40
Smith et al.40UKPCSOphthalmologyCataract surgeryOpen20
Wawrzynski et al.41UKPCSOtolaryngologyDacryocystorhinostomyEndoscopic20
Wu et al.42ChinaPCSHepatobiliaryCholecystectomyLaparoscopic1107
Yang et al.43USAPCSColorectalRectopexyRobotic92
StudyCountryStudy typeSpecialtyOperationApproachNumber of videos in study
Azari et al.31USAPCSPan-specialtyMultipleOpen103
Balal et al.32UKPCSOphthalmologyCataract surgeryOpen120
Din et al.33UKPCSOphthalmologyCataract surgeryOpen22
Frasier et al.34USAPCSPan-specialtyMultipleOpen138
Glarner et al.35USAPCSPlastic surgeryReduction mammoplastyOpen6
Hameed et al.36CanadaQualitativeHepatobiliaryCholecystectomyLaparoscopicNot applicable
Humm et al.37UKRCSHepatobiliaryCholecystectomyLaparoscopic159
Kitaguchi et al.38JapanRCSColorectalTransanal total mesorectal excisionEndoscopic45
Lee et al.39South KoreaRCSEndocrinologyThyroidectomyRobotic40
Smith et al.40UKPCSOphthalmologyCataract surgeryOpen20
Wawrzynski et al.41UKPCSOtolaryngologyDacryocystorhinostomyEndoscopic20
Wu et al.42ChinaPCSHepatobiliaryCholecystectomyLaparoscopic1107
Yang et al.43USAPCSColorectalRectopexyRobotic92

PCS, prospective cohort study; RCS, retrospective cohort study.

Table 1

General study characteristics

StudyCountryStudy typeSpecialtyOperationApproachNumber of videos in study
Azari et al.31USAPCSPan-specialtyMultipleOpen103
Balal et al.32UKPCSOphthalmologyCataract surgeryOpen120
Din et al.33UKPCSOphthalmologyCataract surgeryOpen22
Frasier et al.34USAPCSPan-specialtyMultipleOpen138
Glarner et al.35USAPCSPlastic surgeryReduction mammoplastyOpen6
Hameed et al.36CanadaQualitativeHepatobiliaryCholecystectomyLaparoscopicNot applicable
Humm et al.37UKRCSHepatobiliaryCholecystectomyLaparoscopic159
Kitaguchi et al.38JapanRCSColorectalTransanal total mesorectal excisionEndoscopic45
Lee et al.39South KoreaRCSEndocrinologyThyroidectomyRobotic40
Smith et al.40UKPCSOphthalmologyCataract surgeryOpen20
Wawrzynski et al.41UKPCSOtolaryngologyDacryocystorhinostomyEndoscopic20
Wu et al.42ChinaPCSHepatobiliaryCholecystectomyLaparoscopic1107
Yang et al.43USAPCSColorectalRectopexyRobotic92
StudyCountryStudy typeSpecialtyOperationApproachNumber of videos in study
Azari et al.31USAPCSPan-specialtyMultipleOpen103
Balal et al.32UKPCSOphthalmologyCataract surgeryOpen120
Din et al.33UKPCSOphthalmologyCataract surgeryOpen22
Frasier et al.34USAPCSPan-specialtyMultipleOpen138
Glarner et al.35USAPCSPlastic surgeryReduction mammoplastyOpen6
Hameed et al.36CanadaQualitativeHepatobiliaryCholecystectomyLaparoscopicNot applicable
Humm et al.37UKRCSHepatobiliaryCholecystectomyLaparoscopic159
Kitaguchi et al.38JapanRCSColorectalTransanal total mesorectal excisionEndoscopic45
Lee et al.39South KoreaRCSEndocrinologyThyroidectomyRobotic40
Smith et al.40UKPCSOphthalmologyCataract surgeryOpen20
Wawrzynski et al.41UKPCSOtolaryngologyDacryocystorhinostomyEndoscopic20
Wu et al.42ChinaPCSHepatobiliaryCholecystectomyLaparoscopic1107
Yang et al.43USAPCSColorectalRectopexyRobotic92

PCS, prospective cohort study; RCS, retrospective cohort study.

Participant details

There were a total of 107 trainees in the nine studies that reported on numbers of participants (mean(s.d.) 12(9.2)). The majority of studies (8 studies) did not report the grade of training. A total of five studies reported the baseline skills of trainees, using the total prior number of operations performed as the sole marker of skill. Of the studies, one used baseline skills to differentiate between novices, intermediates, and experts.

All of the included studies had a comparator group, with a total of 152 attendings or experts in the ten studies reporting these numbers (mean(s.d.) 15(17.5)). In contrast to trainees, the baseline skills of attendings and experts were more frequently reported (7 studies), using the prior number of operations performed as the marker of skill. Full participant characteristics are available in Table 2.

Table 2

Characteristics of study participants

StudyNumber of traineesLevel of trainingBaseline reported*Number of experts for comparisonBaseline reported
Azari et al.313Resident (unspecified)Unknown6Unknown
Balal et al.3220Resident (unspecified)<200 operations performed20>1000 operations performed
Din et al.3331Resident (unspecified)<200 operations performed31>1000 operations performed
Frasier et al.343PGY3–PGY5Not reported6Unknown
Glarner et al.353PGY3–PGY5Unknown38–30 years of experience
Hameed et al.3613Fellow, Senior resident, Junior residentUnknown7Mean of 57 laparoscopic cholecystectomies per year
Humm et al.37UnknownResident (unspecified)Unknown1Unknown
Kitaguchi et al.38UnknownUnknownNovice <10 operations, intermediate 10–30 operationsUnknown>30 operations performed
Lee et al.39UnknownFellowUnknownUnknownUnknown
Smith et al.4010Resident (unspecified)<200 operations performed10>1000 operations performed
Wawrzynski et al.4110Resident (unspecified)<20 operations performed10>100 operations performed
Wu et al.4214Resident (unspecified)Unknown58Unknown
Yang et al.43UnknownPGY3–PGY7UnknownUnknownUnknown
StudyNumber of traineesLevel of trainingBaseline reported*Number of experts for comparisonBaseline reported
Azari et al.313Resident (unspecified)Unknown6Unknown
Balal et al.3220Resident (unspecified)<200 operations performed20>1000 operations performed
Din et al.3331Resident (unspecified)<200 operations performed31>1000 operations performed
Frasier et al.343PGY3–PGY5Not reported6Unknown
Glarner et al.353PGY3–PGY5Unknown38–30 years of experience
Hameed et al.3613Fellow, Senior resident, Junior residentUnknown7Mean of 57 laparoscopic cholecystectomies per year
Humm et al.37UnknownResident (unspecified)Unknown1Unknown
Kitaguchi et al.38UnknownUnknownNovice <10 operations, intermediate 10–30 operationsUnknown>30 operations performed
Lee et al.39UnknownFellowUnknownUnknownUnknown
Smith et al.4010Resident (unspecified)<200 operations performed10>1000 operations performed
Wawrzynski et al.4110Resident (unspecified)<20 operations performed10>100 operations performed
Wu et al.4214Resident (unspecified)Unknown58Unknown
Yang et al.43UnknownPGY3–PGY7UnknownUnknownUnknown

PGY, postgraduate year. *Definition used in each study to quantify experience of participants. In all studies, except one, reporting baseline, the number of operations previously performed was used.

Table 2

Characteristics of study participants

StudyNumber of traineesLevel of trainingBaseline reported*Number of experts for comparisonBaseline reported
Azari et al.313Resident (unspecified)Unknown6Unknown
Balal et al.3220Resident (unspecified)<200 operations performed20>1000 operations performed
Din et al.3331Resident (unspecified)<200 operations performed31>1000 operations performed
Frasier et al.343PGY3–PGY5Not reported6Unknown
Glarner et al.353PGY3–PGY5Unknown38–30 years of experience
Hameed et al.3613Fellow, Senior resident, Junior residentUnknown7Mean of 57 laparoscopic cholecystectomies per year
Humm et al.37UnknownResident (unspecified)Unknown1Unknown
Kitaguchi et al.38UnknownUnknownNovice <10 operations, intermediate 10–30 operationsUnknown>30 operations performed
Lee et al.39UnknownFellowUnknownUnknownUnknown
Smith et al.4010Resident (unspecified)<200 operations performed10>1000 operations performed
Wawrzynski et al.4110Resident (unspecified)<20 operations performed10>100 operations performed
Wu et al.4214Resident (unspecified)Unknown58Unknown
Yang et al.43UnknownPGY3–PGY7UnknownUnknownUnknown
StudyNumber of traineesLevel of trainingBaseline reported*Number of experts for comparisonBaseline reported
Azari et al.313Resident (unspecified)Unknown6Unknown
Balal et al.3220Resident (unspecified)<200 operations performed20>1000 operations performed
Din et al.3331Resident (unspecified)<200 operations performed31>1000 operations performed
Frasier et al.343PGY3–PGY5Not reported6Unknown
Glarner et al.353PGY3–PGY5Unknown38–30 years of experience
Hameed et al.3613Fellow, Senior resident, Junior residentUnknown7Mean of 57 laparoscopic cholecystectomies per year
Humm et al.37UnknownResident (unspecified)Unknown1Unknown
Kitaguchi et al.38UnknownUnknownNovice <10 operations, intermediate 10–30 operationsUnknown>30 operations performed
Lee et al.39UnknownFellowUnknownUnknownUnknown
Smith et al.4010Resident (unspecified)<200 operations performed10>1000 operations performed
Wawrzynski et al.4110Resident (unspecified)<20 operations performed10>100 operations performed
Wu et al.4214Resident (unspecified)Unknown58Unknown
Yang et al.43UnknownPGY3–PGY7UnknownUnknownUnknown

PGY, postgraduate year. *Definition used in each study to quantify experience of participants. In all studies, except one, reporting baseline, the number of operations previously performed was used.

Method of automated video analysis

The most common method of automated video analysis was computer vision tracking (7 studies). This was used in all of the studies that assessed open surgery and one of the studies that assessed endoscopic procedures; three of these studies required manual identification of the region of interest (ROI) before computer vision tracking could be used for the completion of the assessment automatically31,34,35. More advanced AI-based approaches to video analysis were used in the remaining studies, with convolutional neural networking (CNN) being the most common method (4 studies) (Table 3). CNN relies on recognizing patterns to process images. Pyramid scene parsing networking (PSPN), based on image segmentation, was also used to generate heat maps in laparoscopic cholecystectomy36.

Table 3

Summary of the included studies

StudyMethod of analysisMetric(s) producedApplication to training
Azari et al.31Computer vision trackingFluidity of motion
Motion economy
Tissue handling
Metric correlated with alternative assessment tool
Balal et al.32Computer vision trackingIntraoperative phase duration
Instrument distance travelled
Instrument use
To differentiate between different skill sets
Din et al.33Computer vision trackingInstrument distance travelled
Instrument use
Metric correlated with alternative assessment tool
+
To differentiate between different skill sets
Frasier et al.34Computer vision trackingSpeed and acceleration of movementTo differentiate between different skill sets
Glarner et al.35Computer vision trackingInstrument use
Bimanual dexterity
Displacement, velocity, and acceleration
To differentiate between different skill sets
Hameed et al.36PSPNHeat mapsQualitative study
Humm et al.37CNNTotal operative duration
Intraoperative phase duration
To differentiate between different skill sets
Kitaguchi et al.38CNNGenerated suturing scoreTo differentiate between different skill sets
Lee et al.39CNNClassification of skill (novice, skilled, and expert)Metric correlated with alternative assessment tool
+
To differentiate between different skill sets
Smith et al.40Computer vision trackingTotal operative duration
Instrument distance travelled
Instrument use
To differentiate between different skill sets
Wawrzynski et al.41Computer vision trackingTotal operative duration
Instrument distance travelled
Instrument use
To differentiate between different skill sets
Wu et al.42TSMCVS achievementMetric correlated with years of experience
Yang et al.43CNNInstrument distance travelled
Bimanual dexterity
Depth perception
Metric correlated with alternative assessment tool
+
To differentiate between different skill sets
StudyMethod of analysisMetric(s) producedApplication to training
Azari et al.31Computer vision trackingFluidity of motion
Motion economy
Tissue handling
Metric correlated with alternative assessment tool
Balal et al.32Computer vision trackingIntraoperative phase duration
Instrument distance travelled
Instrument use
To differentiate between different skill sets
Din et al.33Computer vision trackingInstrument distance travelled
Instrument use
Metric correlated with alternative assessment tool
+
To differentiate between different skill sets
Frasier et al.34Computer vision trackingSpeed and acceleration of movementTo differentiate between different skill sets
Glarner et al.35Computer vision trackingInstrument use
Bimanual dexterity
Displacement, velocity, and acceleration
To differentiate between different skill sets
Hameed et al.36PSPNHeat mapsQualitative study
Humm et al.37CNNTotal operative duration
Intraoperative phase duration
To differentiate between different skill sets
Kitaguchi et al.38CNNGenerated suturing scoreTo differentiate between different skill sets
Lee et al.39CNNClassification of skill (novice, skilled, and expert)Metric correlated with alternative assessment tool
+
To differentiate between different skill sets
Smith et al.40Computer vision trackingTotal operative duration
Instrument distance travelled
Instrument use
To differentiate between different skill sets
Wawrzynski et al.41Computer vision trackingTotal operative duration
Instrument distance travelled
Instrument use
To differentiate between different skill sets
Wu et al.42TSMCVS achievementMetric correlated with years of experience
Yang et al.43CNNInstrument distance travelled
Bimanual dexterity
Depth perception
Metric correlated with alternative assessment tool
+
To differentiate between different skill sets

PSPN, pyramid scene parsing networking; CNN, convolutional neural networking; TSM, temporal shifting module; CVS, critical view of safety.

Table 3

Summary of the included studies

StudyMethod of analysisMetric(s) producedApplication to training
Azari et al.31Computer vision trackingFluidity of motion
Motion economy
Tissue handling
Metric correlated with alternative assessment tool
Balal et al.32Computer vision trackingIntraoperative phase duration
Instrument distance travelled
Instrument use
To differentiate between different skill sets
Din et al.33Computer vision trackingInstrument distance travelled
Instrument use
Metric correlated with alternative assessment tool
+
To differentiate between different skill sets
Frasier et al.34Computer vision trackingSpeed and acceleration of movementTo differentiate between different skill sets
Glarner et al.35Computer vision trackingInstrument use
Bimanual dexterity
Displacement, velocity, and acceleration
To differentiate between different skill sets
Hameed et al.36PSPNHeat mapsQualitative study
Humm et al.37CNNTotal operative duration
Intraoperative phase duration
To differentiate between different skill sets
Kitaguchi et al.38CNNGenerated suturing scoreTo differentiate between different skill sets
Lee et al.39CNNClassification of skill (novice, skilled, and expert)Metric correlated with alternative assessment tool
+
To differentiate between different skill sets
Smith et al.40Computer vision trackingTotal operative duration
Instrument distance travelled
Instrument use
To differentiate between different skill sets
Wawrzynski et al.41Computer vision trackingTotal operative duration
Instrument distance travelled
Instrument use
To differentiate between different skill sets
Wu et al.42TSMCVS achievementMetric correlated with years of experience
Yang et al.43CNNInstrument distance travelled
Bimanual dexterity
Depth perception
Metric correlated with alternative assessment tool
+
To differentiate between different skill sets
StudyMethod of analysisMetric(s) producedApplication to training
Azari et al.31Computer vision trackingFluidity of motion
Motion economy
Tissue handling
Metric correlated with alternative assessment tool
Balal et al.32Computer vision trackingIntraoperative phase duration
Instrument distance travelled
Instrument use
To differentiate between different skill sets
Din et al.33Computer vision trackingInstrument distance travelled
Instrument use
Metric correlated with alternative assessment tool
+
To differentiate between different skill sets
Frasier et al.34Computer vision trackingSpeed and acceleration of movementTo differentiate between different skill sets
Glarner et al.35Computer vision trackingInstrument use
Bimanual dexterity
Displacement, velocity, and acceleration
To differentiate between different skill sets
Hameed et al.36PSPNHeat mapsQualitative study
Humm et al.37CNNTotal operative duration
Intraoperative phase duration
To differentiate between different skill sets
Kitaguchi et al.38CNNGenerated suturing scoreTo differentiate between different skill sets
Lee et al.39CNNClassification of skill (novice, skilled, and expert)Metric correlated with alternative assessment tool
+
To differentiate between different skill sets
Smith et al.40Computer vision trackingTotal operative duration
Instrument distance travelled
Instrument use
To differentiate between different skill sets
Wawrzynski et al.41Computer vision trackingTotal operative duration
Instrument distance travelled
Instrument use
To differentiate between different skill sets
Wu et al.42TSMCVS achievementMetric correlated with years of experience
Yang et al.43CNNInstrument distance travelled
Bimanual dexterity
Depth perception
Metric correlated with alternative assessment tool
+
To differentiate between different skill sets

PSPN, pyramid scene parsing networking; CNN, convolutional neural networking; TSM, temporal shifting module; CVS, critical view of safety.

Metrics produced

Metrics generated varied between studies and can be classified as either process or outcome based. Process-based metrics uniformly analysed motion and, in all but one study, were generated using computer vision tracking. Distance travelled and total movement of hands/instruments were the most common motions measured, although bimanual dexterity, speed, acceleration, and fluidity of movements were also reported. These metrics were generated for entire procedures, specific intraoperative phases, or individual tasks.

Outcome-based metrics were more diverse, largely generated from advanced AI-based analysis, and assessed either during specific phases of the operation or for individual tasks. Intraoperative phase durations were assessed in laparoscopic cholecystectomy and cataract surgery, whilst performance scores were generated for the quality of suturing in transanal total mesorectal excision (TaTME). Of the studies, two generated metrics that aimed to assess critical safety elements in laparoscopic cholecystectomy; these generated scores, to determine critical view of safety achievement, and heat maps, to demonstrate areas of safe and unsafe dissection. The full list of metrics produced is available in Table 3.

Application to training

The most common application to training was the use of metrics to discriminate between different skill levels (10 studies) for distinguishing trainees and attendings/experts. Of the studies, one compared different groups of trainees (novice versus intermediate). All studies were able to distinguish between groups with high levels of accuracy. In addition, five studies compared the metrics produced with an alternative assessment tool. These were based on expert assessment and included OSATS, Global Evaluative Assessment of Robotic Skills (GEARS), Objective Structured Assessment of Cataract Surgical Skill (OSACSS), and a performance rubric created by the study authors that had previously been validated using GEARS. All studies reported that the metrics produced from analysis correlated with the validated tool used. The number of human assessors ranged from one to six and the assessors were defined as being expert (2 studies) or board-certified surgeons (2 studies) (or no information was given (1 study)) and, of the studies relying on board-certified surgeons, one reported that the surgeons were experienced in the use of the AI model created to perform the video analysis. No study produced metrics for the assessment of non-technical skills or reported on how the metrics produced could be applied to assess non-technical skills. Furthermore, none of the included studies explored the impact on training outcomes (for example learning curves).

A qualitative study aimed to assess the educational value of using AI-generated heat maps to display areas of safe and unsafe dissection in laparoscopic cholecystectomy. Trainees and trainers strongly agreed (73%) that their use after surgery for coaching and feedback could be effective. Most suggested that heat maps should be incorporated in the surgical training curriculum, particularly for those at an earlier stage in training. However, 40% reported that, even if readily available, they would not use it routinely. Some raised concerns over potential dependence on it for making intraoperative decisions. Full results of the included studies are available in Table S3.

Discussion

This scoping review explores the current methods of automated analysis of intraoperative video in the context of surgical training. This is a rapidly evolving field, with most of the included studies being published within the last 5 years.

Several methods of automating the assessment of surgical video were identified. Computer vision, the broad term used to describe an algorithm’s ability to process and understand visual data20, was used to generate kinematic data. Early techniques utilized tracking of objects between individual frames extracted from video31,34,35 and others relied on the principle of point feature tracking32,33,40,41. The included studies were effective at tracking movements of the surgeon’s hand during suturing and knot tying31 and tracking surgical instruments during cataract surgery40. The need for manual identification of the ROI before automated tracking was a limitation in several studies31,34,35. Human-dependent steps in these studies presented potential barriers to the scalability of the methods used. Minimizing inputs required from trainees, and trainers, to collect and analyse operative video not only improves engagement but also likely reduces the impact on cognitive load during the operation.

Newer approaches, utilizing AI, aim to overcome some of these barriers. CNN was able to automatically detect and track surgical instruments to assess motion38,39,43. Surgical tool recognition is fundamental to this approach, with several studies reporting high accuracy with regard to identifying the presence or absence of instruments, as well as differentiating between instrument types44,45. Furthermore, tool usage determined using AI correlates strongly with that determined by human assessment46. Limitations of AI-based video analysis include the need for accurately annotated video for training AI models. However, the introduction of open data sets has diminished this obstacle47.

It is not surprising that the most commonly produced metrics reflect kinematic data. Traditionally collected using sensors applied to the surgeon or instrument, kinematic data have long been used as markers of technical proficiency, due to strong correlation with expert-based assessment scores48,49, even when deployed during complex operations50. Insights into operative performance can be gained by analysing motion throughout an entire operation (for example total distance travelled) or within an individual task. Of the five studies that assessed individual tasks, four were based on motion analysis and included critical steps, such as suturing of bowel anastomoses and TaTME. Importantly, this granular technique is now being automatically assessed in both laparoscopic and robotic simulated settings51,52.

Operative phase recognition is an emerging field of AI workflow analysis53,54. The concept of surgical workflow analysis is based on the premise that an operation is rarely a singular event but rather a series of complex sequences that build upon one another. Variations in these sequences are more common for less experienced surgeons55 and, thus, being able to efficiently analyse phases has great potential for improving post-hoc feedback. Several applications of phase analysis, such as duration of hepatocystic triangle dissection in laparoscopic cholecystectomy37, tool usage in phacoemulsification during cataract surgery32, and skill determination during the first phase of robot-assisted thyroidectomy39, were identified in the present review. Analysis of surgical gestures, defined as granular interactions between surgical instruments and human tissue56, is another emerging field. Preliminary studies have identified that gesture type (for example to indicate ‘dissect’ or ‘coagulate’) and gesture efficiency (for example use of ineffective or erroneous gestures) can distinguish experts from novices57–59. Most studies in this field are in the simulated setting; therefore, the potential application of surgical gestures to improve trainee surgical performance in the operating room remains unclear.

The majority of the included studies used the metrics produced from their analysis to differentiate between skill sets, a finding not consistent in the broader literature60. A limitation of using this method in the included studies was that very few reported the level of training or the baseline skills of the participants. For example, the term ‘trainee’ is ambiguous and is equally applicable to someone in their first year of surgical training as to someone in their last. It may also be applied to fully certified surgeons learning a new procedure or approach (for example robotics).

Automatically generated metrics were also compared with validated, expert-based tools. Most studies compared metrics of similar properties (for example tool usage with bimanual dexterity); however, other extractable metrics from video, such as clearness of the operating field from blood, have also been shown to correlate with global expert-based assessments61. Although there is growing evidence to support the accuracy of automated assessments compared with expert-based methods48,62, video-based assessments can only provide metrics from the material provided. In certain scenarios, wider contextual factors need consideration to guide accurate training and assessment. This has led some to advocate a hybrid approach, incorporating expert assessment with objective data generated from AI63.

Currently evidence to explore the experiences of trainees and trainers using metrics generated from video analysis is lacking. Hameed et al.36 report mixed opinions from trainees and trainers on an AI-based assessment of safe dissection planes in laparoscopic cholecystectomy. Most agreed that there was potential for use in surgical training; however, up to 40% said that, even if readily available, they would not routinely use it. Reservations are also shared by patients, with a recent survey reporting that, although most recognized the educational benefit of having their procedure recorded for training64, concerns remained over non-maleficence, regarding data storage and access65. Robust protocols are needed to satisfy these legitimate concerns and frameworks should be established to manage potential data breaches.

The present scoping review identifies the methods and metrics generated from automated video analysis of operative surgery. Establishing robust validity of automated video analytics is essential if this innovation is to help shape future training. It is also crucial that consensus be reached on which metrics of operative performance are most relevant for surgical trainees. No study reported longitudinal use of metrics by trainees to aid in their skill development. Training is a continuous process, requiring regular review of performance and goal setting. Understanding the longer-term impact of new technologies on training is a priority, so they can be applied in the most effective way. A second research priority is to address the lack of automated assessment of non-technical skills in surgery. Exploration of this in simulated and non-simulated settings is in progress27,66.

The present review has some limitations. There were no restrictions with regard to surgical procedure or approach in the inclusion criteria, which likely increased heterogeneity in the findings. The metrics relevant for one procedure or skill are likely different from those required for another. It was surprising that MIS was not the most common surgical approach, given its video-based nature. Inclusion of only real-world operations may have resulted in under-representation of MIS, as research in MIS is still largely confined to the simulated setting, particularly in robotics.

The current landscape of automated video analytics in surgical training shows promise, both in terms of the metrics produced and the video analytics used. Additionally, there is a paucity of evidence with regard to understanding both the end user experience and the longitudinal benefit of automated video analysis for surgical trainees. To embrace this new technology, educators, curriculum developers, and professional bodies must support validation studies in training environments, before widespread adoption, for the benefit of trainees, trainers, and patients.

Funding

L.D. is supported by a Medical Education Fellowship funded by NHS Lothian. R.J.E.S. is funded by NHS Research Scotland (NRS) via a clinician post.

Acknowledgements

The authors would like to thank Marshall Dozier for assistance in generating the search criteria.

Author contributions

Lachlan Dick (Conceptualization, Data curation, Formal analysis, Methodology, Resources, Software, Visualization, Writing—original draft, Writing—review & editing), Connor P. Boyle (Data curation, Writing—review & editing), Richard J. E. Skipworth (Conceptualization, Supervision, Writing—review & editing), Douglas M. Smink (Writing—review & editing), Victoria R. Tallentire (Conceptualization, Supervision, Writing—review & editing), and Steven Yule (Conceptualization, Supervision, Writing—review & editing)

Disclosure

The authors declare no conflict of interest.

Supplementary material

Supplementary material is available at BJS Open online.

Data availability

Some data may be available upon reasonable request.

References

1

Haynes
 
AB
,
Weiser
 
TG
,
Berry
 
WR
,
Lipsitz
 
SR
,
Breizat
 
AHS
,
Dellinger
 
EP
 et al.  
A surgical safety checklist to reduce morbidity and mortality in a global population
.
N Engl J Med
 
2009
;
360
:
491
499

2

Ou-Young
 
J
,
Boggett
 
S
,
El Ansary
 
D
,
Clarke-Errey
 
S
,
Royse
 
CF
,
Bowyer
 
AJ
.
Identifying risk factors for poor multidimensional recovery after major surgery: a systematic review
.
Acta Anaesthesiol Scand
 
2023
;
67
:
1294
1305

3

Gawande
 
AA
,
Zinner
 
MJ
,
Studdert
 
DM
,
Brennan
 
TA
.
Analysis of errors reported by surgeons at three teaching hospitals
.
Surgery
 
2003
;
133
:
614
621

4

Gawande
 
AA
,
Thomas
 
EJ
,
Zinner
 
MJ
,
Brennan
 
TA
.
The incidence and nature of surgical adverse events in Colorado and Utah in 1992
.
Surgery
 
1999
;
126
:
66
75

5

Birkmeyer
 
JD
,
Finks
 
JF
,
O’Reilly
 
A
,
Oerline
 
M
,
Carlin
 
AM
,
Nunn
 
AR
 et al.  
Surgical skill and complication rates after bariatric surgery
.
N Engl J Med
 
2013
;
369
:
1434
1442

6

Fecso
 
AB
,
Bhatti
 
JA
,
Stotland
 
PK
,
Quereshy
 
FA
,
Grantcharov
 
TP
.
Technical performance as a predictor of clinical outcomes in laparoscopic gastric cancer surgery
.
Ann Surg
 
2019
;
270
:
115
120

7

Flin
 
R
,
Yule
 
S
,
Paterson-Brown
 
S
,
Maran
 
N
,
Rowley
 
D
,
Youngson
 
G
.
Teaching surgeons about non-technical skills
.
Surgeon
 
2007
;
5
:
86
89

8

Martin
 
JA
,
Regehr
 
G
,
Reznick
 
R
,
Macrae
 
H
,
Murnaghan
 
J
,
Hutchison
 
C
 et al.  
Objective structured assessment of technical skill (OSATS) for surgical residents
.
Br J Surg
 
1997
;
84
:
273
278

9

Niitsu
 
H
,
Hirabayashi
 
N
,
Yoshimitsu
 
M
,
Mimura
 
T
,
Taomoto
 
J
,
Sugiyama
 
Y
 et al.  
Using the objective structured assessment of technical skills (OSATS) global rating scale to evaluate the skills of surgical trainees in the operating room
.
Surg Today
 
2013
;
43
:
271
275

10

Yule
 
S
,
Flin
 
R
,
Maran
 
N
,
Rowley
 
D
,
Youngson
 
G
,
Paterson-Brown
 
S
.
Surgeons’ non-technical skills in the operating room: reliability testing of the NOTSS behavior rating system
.
World J Surg
 
2008
;
32
:
548
556

11

St John
 
A
,
Caturegli
 
I
,
Kubicki
 
NS
,
Kavic
 
SM
.
The rise of minimally invasive surgery: 16 year analysis of the progressive replacement of open surgery with laparoscopy
.
JSLS
 
2020
;
24
:
e2020.00076

12

Friedl
 
R
,
Höppler
 
H
,
Ecard
 
K
,
Scholz
 
W
,
Hannekum
 
A
,
Stracke
 
S
.
Development and prospective evaluation of a multimedia teaching course on aortic valve replacement
.
Thorac Cardiovasc Surg
 
2006
;
54
:
1
9

13

Friedl
 
R
,
Höppler
 
H
,
Ecard
 
K
,
Scholz
 
W
,
Hannekum
 
A
,
Öchsner
 
W
 et al.  
Multimedia-driven teaching significantly improves students’ performance when compared with a print medium
.
Ann Thorac Surg
 
2006
;
81
:
1760
1766

14

Crawshaw
 
B
,
Steele
 
S
,
Lee
 
E
,
Delaney
 
C
,
Mustain
 
C
,
Russ
 
A
 et al.  
Failing to prepare is preparing to fail: a single-blinded, randomized controlled trial to determine the impact of a preoperative instructional video on the ability of residents to perform laparoscopic right colectomy
.
Dis Colon Rectum
 
2016
;
59
:
28
34

15

Singh
 
P
,
Aggarwal
 
R
,
Tahir
 
M
,
Pucher
 
PH
,
Darzi
 
A
.
A randomized controlled study to evaluate the role of video-based coaching in training laparoscopic skills
.
Ann Surg
 
2015
;
261
:
862
869

16

Driscoll
 
PJ
,
Paisley
 
AM
,
Paterson-Brown
 
S
.
Video assessment of basic surgical trainees’ operative skills
.
Am J Surg
 
2008
;
196
:
265
272

17

Cauraugh
 
JH
,
Martin
 
M
,
Martin
 
KK
.
Modeling surgical expertise for motor skill acquisition
.
Am J Surg
 
1999
;
177
:
331
336

18

Andersen
 
SAW
,
Guldager
 
M
,
Mikkelsen
 
PT
,
Sørensen
 
MS
.
The effect of structured self-assessment in virtual reality simulation training of mastoidectomy
.
Eur Arch Otorhinolaryngol
 
2019
;
276
:
3345
3352

19

Varban
 
OA
,
Thumma
 
JR
,
Carlin
 
AM
,
Ghaferi
 
AA
,
Dimick
 
JB
,
Finks
 
JF
.
Evaluating the impact of surgeon self-awareness by comparing self versus peer ratings of surgical skill and outcomes for bariatric surgery
.
Ann Surg
 
2022
;
276
:
128
132

20

Hashimoto
 
DA
,
Rosman
 
G
,
Rus
 
D
,
Meireles
 
OR
.
Artificial intelligence in surgery: promises and perils
.
Ann Surg
 
2018
;
268
:
70
76

21

Hung
 
AJ
,
Chen
 
J
,
Che
 
Z
,
Nilanon
 
T
,
Jarc
 
A
,
Titus
 
M
 et al.  
Utilizing machine learning and automated performance metrics to evaluate robot-assisted radical prostatectomy performance and predict outcomes
.
J Endourol
 
2018
;
32
:
438
444

22

Lum
 
ZC
.
Can artificial intelligence pass the American Board of Orthopaedic Surgery examination? Orthopaedic residents versus ChatGPT
.
Clin Orthop Relat Res
 
2023
;
481
:
1623
1630

23

Rivas
 
JG
,
Vázquez
 
CT
,
Ruiz
 
CB
,
Taratkin
 
M
,
Marenco
 
JL
,
Cacciamani
 
GE
 et al.  
Artificial intelligence and simulation in urology
.
Actas Urol Esp (Engl Ed)
 
2021
;
45
:
524
529

24

Arts
 
EEA
,
Leijte
 
E
,
Witteman
 
BPL
,
Jakimowicz
 
JJ
,
Verhoeven
 
B
,
Botden
 
SMBI
.
Face, content, and construct validity of the take-home eoSim augmented reality laparoscopy simulator for basic laparoscopic tasks
.
J Laparoendosc Adv Surg Tech
 
2019
;
29
:
1419
1426

25

Hillemans
 
V
,
van de Mortel
 
X
,
Buyne
 
O
,
Verhoeven
 
BH
,
Botden
 
SMBI
.
Objective assessment for open surgical suturing training by finger tracking can discriminate novices from experts
.
Med Ed
 
2023
;
28
:
2198818

26

Khalid
 
S
,
Goldenberg
 
M
,
Grantcharov
 
T
,
Taati
 
B
,
Rudzicz
 
F
.
Evaluation of deep learning models for identifying surgical actions and measuring performance
.
JAMA Netw Open
 
2020
;
3
:
e201664

27

Likosky
 
D
,
Yule
 
SJ
,
Mathis
 
MR
,
Dias
 
RD
,
Corso
 
JJ
,
Zhang
 
M
 et al.  
Novel assessments of technical and nontechnical cardiac surgery quality: protocol for a mixed methods study
.
JMIR Res Protoc
 
2021
;
10
:
e22536

28

Arksey
 
H
,
O’Malley
 
L
.
Scoping studies: towards a methodological framework
.
Int J Soc Res Methodol
 
2005
;
8
:
19
32

29

Tricco
 
AC
,
Lillie
 
E
,
Zarin
 
W
,
O’Brien
 
KK
,
Colquhoun
 
H
,
Levac
 
D
 et al.  
PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation
.
Ann Intern Med
 
2018
;
169
:
467
473

30

Reed
 
DA
,
Cook
 
DA
,
Beckman
 
TJ
,
Levine
 
RB
,
Kern
 
DE
,
Wright
 
SM
.
Association between funding and quality of published medical education research
.
JAMA
 
2007
;
298
:
1002
1009

31

Azari
 
D
,
Frasier
 
L
,
Quamme
 
S
,
Greenberg
 
C
,
Pugh
 
C
,
Greenberg
 
J
 et al.  
Modeling surgical technical skill using expert assessment for automated computer rating
.
Ann Surg
 
2019
;
269
:
574
581

32

Balal
 
S
,
Smith
 
P
,
Bader
 
T
,
Tang
 
H
,
Sullivan
 
P
,
Thomsen
 
A
 et al.  
Computer analysis of individual cataract surgery segments in the operating room
.
Eye
 
2019
;
33
:
313
319

33

Din
 
N
,
Smith
 
P
,
Emeriewen
 
K
,
Sharma
 
A
,
Jones
 
S
,
Wawrzynski
 
J
 et al.  
Man versus machine: software training for surgeons—an objective evaluation of human and computer-based training tools for cataract surgical performance
.
J Ophthalmol
 
2016
;
2016
:
3548039

34

Frasier
 
LL
,
Azari
 
DP
,
Ma
 
Y
,
Quamme
 
SRP
,
Radwin
 
RG
,
Pugh
 
CM
 et al.  
A marker-less technique for measuring kinematics in the operating room
.
Surgery
 
2016
;
160
:
1400
1413

35

Glarner
 
C
,
Hu
 
Y
,
Chen
 
C
,
Radwin
 
R
,
Zhao
 
Q
,
Craven
 
M
 et al.  
Quantifying technical skills during open operations using video-based motion analysis
.
Surgery
 
2014
;
156
:
729
734

36

Hameed
 
MS
,
Laplante
 
S
,
Masino
 
C
,
Khalid
 
MU
,
Zhang
 
H
,
Protserov
 
S
 et al.  
What is the educational value and clinical utility of artificial intelligence for intraoperative and postoperative video analysis? A survey of surgeons and trainees
.
Surg Endosc
 
2023
;
37
:
9453
9460

37

Humm
 
G
,
Peckham-Cooper
 
A
,
Hamade
 
A
,
Wood
 
C
,
Dawas
 
K
,
Stoyanov
 
D
 et al.  
Automated analysis of intraoperative phase in laparoscopic cholecystectomy: a comparison of one attending surgeon and their residents
.
J Surg Educ
 
2023
;
80
:
994
1004

38

Kitaguchi
 
D
,
Teramura
 
K
,
Matsuzaki
 
H
,
Hasegawa
 
H
,
Takeshita
 
N
,
Ito
 
M
.
Automatic purse-string suture skill assessment in transanal total mesorectal excision using deep learning-based video analysis
.
BJS Open
 
2023
;
7
:
zrac176

39

Lee
 
D
,
Yu
 
H
,
Kwon
 
H
,
Kong
 
H
,
Lee
 
K
,
Kim
 
H
.
Evaluation of surgical skills during robotic surgery by deep learning-based multiple surgical instrument tracking in training and actual operations
.
J Clin Med
 
2020
;
9
:
1964

40

Smith
 
P
,
Tang
 
L
,
Balntas
 
V
,
Young
 
K
,
Athanasiadis
 
Y
,
Sullivan
 
P
 et al.  
”PhacoTracking”: an evolving paradigm in ophthalmic surgical training
.
JAMA Ophthalmol
 
2013
;
131
:
659
661

41

Wawrzynski
 
JR
,
Smith
 
P
,
Tang
 
L
,
Hoare
 
T
,
Caputo
 
S
,
Siddiqui
 
AA
 et al.  
Tracking camera control in endoscopic dacryocystorhinostomy surgery
.
Clin Otolaryngol
 
2015
;
40
:
646
650

42

Wu
 
S
,
Chen
 
Z
,
Liu
 
R
,
Li
 
A
,
Cao
 
Y
,
Wei
 
A
 et al.  
SurgSmart: an artificial intelligent system for quality control in laparoscopic cholecystectomy: an observational study
.
Int J Surg
 
2023
;
109
:
1105
1114

43

Yang
 
JH
,
Goodman
 
ED
,
Dawes
 
AJ
,
Gahagan
 
JV
,
Esquivel
 
MM
,
Liebert
 
CA
 et al.  
Using AI and computer vision to analyze technical proficiency in robotic surgery
.
Surg Endosc
 
2023
;
37
:
3010
3017

44

Demirel
 
D
,
Palmer
 
B
,
Sundberg
 
G
,
Karaman
 
B
,
Halic
 
T
,
Kockara
 
S
 et al.  
Scoring metrics for assessing skills in arthroscopic rotator cuff repair: performance comparison study of novice and expert surgeons
.
Int J Comput Assist Radiol Surg
 
2022
;
17
:
1823
1835

45

Jaafari
 
J
,
Douzi
 
S
,
Douzi
 
K
,
Hssina
 
B
.
The impact of ensemble learning on surgical tools classification during laparoscopic cholecystectomy
.
J Big Data
 
2022
;
9
:
49

46

Yamazaki
 
Y
,
Kanaji
 
S
,
Kudo
 
T
,
Takiguchi
 
G
,
Urakawa
 
N
,
Hasegawa
 
H
 et al.  
Quantitative comparison of surgical device usage in laparoscopic gastrectomy between surgeons’ skill levels: an automated analysis using a neural network
.
J Gastrointest Surg
 
2022
;
26
:
1006
1014

47

Ríos
 
MS
,
Molina-Rodriguez
 
MA
,
Londoño
 
D
,
Guillén
 
CA
,
Sierra
 
S
,
Zapata
 
F
 et al.  
Cholec80-CVS: an open dataset with an evaluation of Strasberg’s critical view of safety for AI
.
Sci Data
 
2023
;
10
:
194

48

Aggarwal
 
R
,
Grantcharov
 
T
,
Moorthy
 
K
,
Milland
 
T
,
Papasavas
 
P
,
Dosis
 
A
 et al.  
An evaluation of the feasibility, validity, and reliability of laparoscopic skills assessment in the operating room
.
Ann Surg
 
2007
;
245
:
992
999

49

Chen
 
CH
,
Hu
 
YH
,
Yen
 
TY
,
Radwin
 
RG
.
Automated video exposure assessment of repetitive hand activity level for a load transfer task
.
Hum Factors
 
2013
;
55
:
298
308

50

Ghodoussipour
 
S
,
Reddy
 
SS
,
Ma
 
R
,
Huang
 
D
,
Nguyen
 
J
,
Hung
 
AJ
.
An objective assessment of performance during robotic partial nephrectomy: validation and correlation of automated performance metrics with intraoperative outcomes
.
J Urol
 
2021
;
205
:
1294
1302

51

Oropesa
 
I
,
Sánchez-González
 
P
,
Chmarra
 
MK
,
Lamata
 
P
,
Fernández
 
A
,
Sánchez-Margallo
 
JA
 et al.  
EVA: laparoscopic instrument tracking based on endoscopic video analysis for psychomotor skills assessment
.
Surg Endosc
 
2013
;
27
:
1029
1039

52

Peng
 
W
,
Xing
 
Y
,
Liu
 
R
,
Li
 
J
,
Zhang
 
Z
.
An automatic skill evaluation framework for robotic surgery training
.
Int J Med Robot
 
2019
;
15
:
e1964

53

Hashimoto
 
DA
,
Rosman
 
G
,
Witkowski
 
ER
,
Stafford
 
C
,
Navarrete-Welton
 
AJ
,
Rattner
 
DW
 et al.  
Computer vision analysis of intraoperative video: automated recognition of operative steps in laparoscopic sleeve gastrectomy
.
Ann Surg
 
2019
;
270
:
414
421

54

Hung
 
AJ
,
Oh
 
PJ
,
Chen
 
J
,
Ghodoussipour
 
S
,
Lane
 
C
,
Jarc
 
A
 et al.  
Experts vs super-experts: differences in automated performance metrics and clinical outcomes for robot-assisted radical prostatectomy
.
BJU Int
 
2019
;
123
:
861
868

55

Forestier
 
G
,
Petitjean
 
F
,
Senin
 
P
,
Riffaud
 
L
,
Henaux
 
PL
,
Jannin
 
P
.
Finding discriminative and interpretable patterns in sequences of surgical activities
.
Artif Intell Med
 
2017
;
82
:
11
19

56

Ma
 
R
,
Ramaswamy
 
A
,
Xu
 
J
,
Trinh
 
L
,
Kiyasseh
 
D
,
Chu
 
TN
 et al.  
Surgical gestures as a method to quantify surgical performance and predict patient outcomes
.
NPJ Digit Med
 
2022
;
5
:
187

57

Inouye
 
DA
,
Ma
 
R
,
Nguyen
 
JH
,
Laca
 
J
,
Kocielnik
 
R
,
Anandkumar
 
A
 et al.  
Assessing the efficacy of dissection gestures in robotic surgery
.
J Robot Surg
 
2023
;
17
:
597
603

58

Ma
 
R
,
Vanstrum
 
EB
,
Nguyen
 
JH
,
Chen
 
A
,
Chen
 
J
,
Hung
 
AJ
.
A novel dissection gesture classification to characterize robotic dissection technique for renal hilar dissection
.
J Urol
 
2021
;
205
:
271
275

59

Baghdadi
 
A
,
Hussein
 
AA
,
Ahmed
 
Y
,
Cavuoto
 
LA
,
Guru
 
KA
.
A computer vision technique for automated assessment of surgical performance using surgeons’ console-feed videos
.
Int J Comput Assist Radiol Surg
 
2019
;
14
:
697
707

60

Lin
 
S
,
Qin
 
F
,
Bly
 
RA
,
Moe
 
KS
,
Hannaford
 
B.
 Automatic sinus surgery skill assessment based on instrument segmentation and tracking in endoscopic video. In:
Li
 
Q
,
Leahy
 
R
,
Dong
 
B
,
Li
 
X
(eds),
Multiscale Multimodal Medical Imaging
.
Cham
:
Springer International Publishing
,
2020
,
93
100
(
Lecture Notes in Computer Science
)

61

Liu
 
D
,
Jiang
 
T
,
Wang
 
Y
,
Miao
 
R
,
Shan
 
F
,
Li
 
Z.
 Surgical skill assessment on in-vivo clinical data via the clearness of operating field. In:
Shen
 
D
,
Liu
 
T
,
Peters
 
TM
,
Staib
 
LH
,
Essert
 
C
,
Zhou
 
S
 et al.
(eds),
Medical Image Computing and Computer Assisted Intervention—MICCAI 2019
.
Cham
:
Springer International Publishing
,
2019
,
476
484
(
Lecture Notes in Computer Science
)

62

Law
 
H
,
Ghani
 
K
,
Deng
 
J.
 Surgeon technical skill assessment using computer vision based analysis. In:
Proceedings of the Second Machine Learning for Healthcare Conference
.
PMLR
 
2017
;
68
:
88
99
(https://proceedings.mlr.press/v68/law17a.html)

63

Johnsson
 
V
,
Søndergaard
 
MB
,
Kulasegaram
 
K
,
Sundberg
 
K
,
Tiblad
 
E
,
Herling
 
L
 et al.  
Validity evidence supporting clinical skills assessment by artificial intelligence compared with trained clinician raters
.
Med Educ
 
2024
;
58
:
105
117

64

Gallant
 
JN
,
Brelsford
 
K
,
Sharma
 
S
,
Grantcharov
 
T
,
Langerman
 
A
.
Patient perceptions of audio and video recording in the operating room
.
Ann Surg
 
2022
;
276
:
e1057

65

Walsh
 
R
,
Kearns
 
EC
,
Moynihan
 
A
,
Gerke
 
S
,
Duffourc
 
M
,
Compagnucci
 
MC
 et al.  
Ethical perspectives on surgical video recording for patients, surgeons and society: systematic review
.
BJS Open
 
2023
;
7
:
zrad063

66

Elek
 
RN
,
Haidegger
 
T
.
Next in surgical data science: autonomous non-technical skill assessment in minimally invasive surgery training
.
J Clin Med
 
2022
;
11
:
7533

Author notes

Presented as a talking poster at the Association of Surgeons of Great Britain and Ireland Congress, Belfast, UK, 2024.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact [email protected].

Supplementary data