Synopsis

Artificial intelligence (AI) is poised to revolutionize many aspects of science, including the study of evolutionary morphology. While classical AI methods such as principal component analysis and cluster analysis have been commonplace in the study of evolutionary morphology for decades, recent years have seen increasing application of deep learning to ecology and evolutionary biology. As digitized specimen databases become increasingly prevalent and openly available, AI is offering vast new potential to circumvent long-standing barriers to rapid, big data analysis of phenotypes. Here, we review the current state of AI methods available for the study of evolutionary morphology, which are most developed in the area of data acquisition and processing. We introduce the main available AI techniques, categorizing them into 3 stages based on their order of appearance: (1) machine learning, (2) deep learning, and (3) the most recent advancements in large-scale models and multimodal learning. Next, we present case studies of existing approaches using AI for evolutionary morphology, including image capture and segmentation, feature recognition, morphometrics, and phylogenetics. We then discuss the prospectus for near-term advances in specific areas of inquiry within this field, including the potential of new AI methods that have not yet been applied to the study of morphological evolution. In particular, we note key areas where AI remains underutilized and could be used to enhance studies of evolutionary morphology. This combination of current methods and potential developments has the capacity to transform the evolutionary analysis of the organismal phenotype into evolutionary phenomics, leading to an era of “big data” that aligns the study of phenotypes with genomics and other areas of bioinformatics.

Synopsis (Polish)

Sztuczna inteligencja (AI) może w przyszłości zrewolucjonizować wiele aspektów nauki, w tym badanie morfologii ewolucyjnej. Chociaż klasyczne instrumenty sztucznej inteligencji, takie jak analiza głównych składowych i analiza skupień, są powszechne od dziesięcioleci w badaniach morfologii ewolucyjnej, w ostatnich latach obserwuje się coraz szersze zastosowanie uczenia głębokiego (depp learning) w ekologii i biologii ewolucyjnej. W miarę jak cyfrowe bazy danych okazów stają się coraz bardziej powszechne i ogólnodostępne, sztuczna inteligencja oferuje nowy, ogromny potencjał w zakresie omijania długotrwałych barier utrudniających szybką analizę dużych zbiorów danych fenotypowych. Prezentujemy przegląd obecnego stanu wiedzy o najbardziej rozwiniętych metodach AI używanych w badaniach morfologii ewolucyjnej do pozyskiwania i przetwarzania danych. Przedstawiamy główne dostępne techniki sztucznej inteligencji, dzieląc je na trzy etapy w zależności od kolejności ich występowania: (1) uczenie maszynowe, (2) uczenie głębokie oraz (3) najnowsze osiągnięcia w modelach wielkoskalowych i uczeniu multimodalnym. Następnie przedstawiamy studium przypadków wykorzystujących sztuczną inteligencję w badaniach morfologii ewolucyjnej, w tym przechwytywania i segmentacji obrazu, rozpoznawania cech, morfometrii i filogenetyki. Następnie omawiamy perspektywę krótkoterminowych postępów w konkretnych obszarach badań w tej dziedzinie, w tym potencjał nowych metod sztucznej inteligencji, które nie znalazły jeszcze zastosowania w badaniach nad morfologią ewolucyjną. W szczególności zwracamy uwagę na kluczowe obszary, w których sztuczna inteligencja pozostaje jeszcze niewykorzystana i można ją wykorzystać do usprawnienia badań nad morfologią ewolucyjną. To połączenie obecnych metod i potencjalnych rozwiązań może w przyłości przekształcić analizę ewolucyjną fenotypu organizmu w fenomenologię ewolucyjną, prowadząc do ery „dużych zbiorów danych", które dopasowują badanie fenotypów do genomiki i innych dziedzin bioinformatyki.

Synopsis (Simplified Chinese)

人工智能正处于变革科学诸多领域的前沿, 进化形态学也不例外。尽管诸如主成分分析和聚类分析等传统方法已经在进化形态学研究中应用数十年, 但近年来, 深度学习在生态学和进化生物学中的应用日益增多。随着数字化标本数据库的广泛普及和开放共享, 人工智能展现出巨大的潜力, 能够克服长期以来高速分析大量表型数据中的效率障碍。在此, 我们回顾了当前用于进化形态学研究的人工智能方法, 发现这些方法的应用主要集中在数据获取和处理方面。本文中, 我们先介绍了现有的主要人工智能技术, 并根据它们出现的顺序将其分类为三个阶段:(1)机器学习, (2)深度学习, 和(3)最近的大规模模型和多模态学习。接下来, 我们通过案例分析展示了利用人工智能进行进化形态学研究的现有方法, 包括图像捕捉与分割、特征识别、形态计量学和系统发育学。随后, 我们探讨了该领域在特定研究方向上的近期进展前景, 特别是尚未应用于形态进化研究的潜在人工智能方法。此外, 我们还指出了人工智能在若干关键领域中应用的不足之处, 并探讨了其在进一步提升进化形态学研究中的潜在价值。这种当前方法与未来发展的结合, 能够将表型的进化分析转变为进化表型学。使得在大数据时代下, 表型研究与基因组学及其他生物信息学领域接轨。

Synopsis (Traditional Chinese)

人工智慧正處於變革科學諸多領域的前沿, 進化形態學也不例外。儘管主成分分析和群聚分析等傳統方法已在進化形態學研究中應用數十年, 但近年來, 深度學習在生態學和進化生物學中的應用日益增加。隨著數位化標本資料庫的廣泛普及與開放共享, 人工智慧展現出巨大的潛力, 能夠克服長期以來高速分析大量表型資料中的效率障礙。本文回顧了目前用於進化形態學研究的人工智慧方法, 並發現這些方法的應用主要集中在資料獲取與處理方面。本文中, 我們先介紹現有的主要人工智慧技術, 並依據它們出現的順序將其分為三個階段:(1)機器學習, (2)深度學習, 以及(3)最近的大規模模型和多模態學習。接下來, 我們透過案例分析展示了利用人工智慧進行進化形態學研究的現有方法, 包括影像擷取與分割、特徵識別、形態計量學與系統發育學。隨後, 我們探討了該領域在特定研究方向上的近期進展前景, 特別是尚未應用於形態進化研究的潛在人工智慧方法。此外, 我們還指出了人工智慧在若干關鍵領域中應用的不足之處, 並探討了其在進一步提升進化形態學研究中的潛在價值。當前方法與未來發展的結合, 有望將表型的進化分析轉變為進化表型學, 使表型研究在大數據時代能與基因體學及其他生物資訊學領域接軌。

Synopsis (Japanese)

人工知能(AI)は、進化形態学を含む多くの科学分野において、革新をもたらす大きな可能性を秘めています。主成分分析やクラスター分析といった従来の手法は、何十年にもわたって進化形態学の研究に用いられてきましたが、近年では深層学習の応用が生態学および進化生物学において急速に進展しています。デジタル化された標本データベースがますます普及し、公開される中で、AIはこれまでの制約を打破し、迅速かつ大規模な表現型データの解析を可能にする大きな潜在力を示しています。 本論文では、現在進化形態学の研究に利用されているAI技術の現状を概観し、その多くがデータ取得および処理に集中していることを明らかにします。さらに、技術の発展過程に基づいて、AI技術を以下の3つの段階に分類します:(1)機械学習、(2)深層学習、(3)最新の大規模モデルおよびマルチモーダル学習。次に、AIを活用した進化形態学研究の実例として、画像追跡と分割、特徴認識、形態計量学、系統学の事例を紹介します。 続いて、形態進化の研究において、まだ十分に応用されていない新たなAI技術の可能性について議論し、今後の具体的な研究分野における短期的な進展の見通しを探ります。特に、AIが十分に活用されていない重要な分野を指摘し、それが進化形態学研究のさらなる発展にどのように寄与し得るかを検討します。このように、現行の方法と今後の技術進歩が融合することで、表現型の進化分析が進化表現型学へと発展し、ビッグデータ時代における表現型研究がゲノミクスや他の生命情報科学分野とより密接に統合されることが期待されます。

Synopsis (Italian)

L'intelligenza artificiale (IA) è destinata a rivoluzionare molti aspetti della scienza, incluso lo studio della morfologia evolutiva. Mentre metodi classici di IA, come l'analisi delle componenti principali e l'analisi dei cluster, sono stati comunemente utilizzati nello studio della morfologia evolutiva per decenni, negli ultimi anni si è assistito a un aumento dell'applicazione del deep learning all'ecologia e alla biologia evolutiva. Con la crescente diffusione e disponibilità di database aperti di esemplari digitalizzati, l'IA offre nuove potenzialità per superare le barriere che storicamente hanno impedito l'analisi rapida di grandi quantità di dati di fenotipi. In questo lavoro, esaminiamo lo stato attuale dei metodi di IA disponibili per lo studio della morfologia evolutiva, che si sono per lo più sviluppati nell'acquisizione e nella lavorazione dei dati. Forniamo un'introduzione alle principali tecniche di IA disponibili, suddividendole in tre tipi in base all'ordine in cui sono state utilizzate per la prima volta in studi: (1) machine learning, (2) deep learning e (3) gli sviluppi più recenti nei modelli su larga scala e nel multimodal learning. Successivamente, presentiamo esempi di studi che utilizzano metodi esistenti di IA in diverse applicazioni nel campo della morfologia evolutiva, inclusi la cattura e segmentazione delle immagini, il riconoscimento dei caratteri, la morfometria e la filogenetica. Discutiamo poi di avanzamenti futuri nel breve periodo in aree di ricerca specifiche all'interno di questo campo, incluse potenziali applicazioni di metodi di IA che non sono ancora stati applicati allo studio della morfologia evolutiva. In particolare, evidenziamo alcune aree importanti in cui l'IA è ancora sotto-utilizzata e potrebbe essere impiegata per migliorare gli studi di morfologia evolutiva. I metodi attualmente utilizzati e i potenziali sviluppi hanno la capacità in modo combinato di trasformare le analisi sull'evoluzione del fenotipo degli organismi in “fenomica evolutiva”, una area di ricerca innovata che grazie ai “big data” allinea lo studio dei fenotipi con la genomica e altre aree della bioinformatica.

Synopsis (French)

L'intelligence artificielle (IA) est destinée à révolutionner de nombreux aspects de la science, y compris l'étude de la morphologie évolutive. Alors que les méthodes classiques d'IA, telles que l'analyse en composantes principales et l'analyse de clusters, ont été couramment utilisées dans l'étude de la morphologie évolutive depuis des décennies, ces dernières années ont vu une augmentation de l'application du deep learning à l'écologie et à la biologie évolutive. Avec la diffusion croissante et la disponibilité de bases de données ouvertes d'échantillons numérisés, l'IA offre de nouvelles potentialités pour surmonter les barrières qui ont historiquement empêché l'analyse rapide de grandes quantités de données sur les phénotypes. Dans ce travail, nous examinons l'état actuel des méthodes d'IA disponibles pour l'étude de la morphologie évolutive, qui se sont principalement développées dans l'acquisition et le traitement des données. Nous fournissons une introduction aux principales techniques d'IA disponibles, en les divisant en trois types en fonction de l'ordre dans lequel elles ont été utilisées pour la première fois dans les études: (1) l'apprentissage automatique (machine learning), (2) l'apprentissage profond (deep learning) et (3) les développements les plus récents dans les modèles à grande échelle et l'apprentissage multimodal (multimodal learning). Ensuite, nous présentons des exemples d'études utilisant des méthodes d'IA existantes dans diverses applications dans le domaine de la morphologie évolutive, y compris la capture et la segmentation d'images, la reconnaissance des caractères, la morphométrie et la phylogénétique. Nous discutons ensuite des avancées futures à court terme dans des domaines de recherche spécifiques de ce champ, y compris des applications potentielles de méthodes d'IA qui n'ont pas encore été appliquées à l'étude de la morphologie évolutive. En particulier, nous mettons en évidence certaines zones importantes où l'IA est encore sous-utilisée et pourrait être utilisée pour améliorer les études de morphologie évolutive. Les méthodes actuellement utilisées et les développements potentiels ont la capacité combinée de transformer l'analyse de l'évolution des phénotypes des organismes en “phénomique évolutive”, un domaine de recherche innovant qui, grâce aux “big data”, aligne l'étude des phénotypes avec la génomique et d'autres domaines de la bio-informatique.

Synopsis (German)

Künstliche Intelligenz (KI) hat tiefgreifende Änderungen in vielen Aspekten der Wissenschaft ausgelöst, so auch im Forschungsbereich der evolutionären Morphologie. Während klassische KI-Methoden wie die Hauptkomponentenanalyse und die Clusteranalyse [OH1] bei der Untersuchung der evolutionären Morphologie seit Jahrzehnten gang und gäbe sind, wurde in den letzten Jahren zunehmend Deep Learning in der Ökologie und Evolutionsbiologie eingesetzt. Da eine wachsende Masse von Forschungsdaten in digitalen Datenbanken öffentlich zugänglich gemacht wird, können mittels KI nun die Überwindung schnelle und gleichzeitig umfangreiche phänotyopische Analysen basierend auf big data durchgeführt werden. In dieser Arbeit geben wir einen Überblick über den aktuellen Stand der für die Untersuchung der evolutionären Morphologie verfügbaren und im Bereich der Datenerfassung und -verarbeitung am weitesten entwickelten KI-Methoden. Wir stellen die wichtigsten verfügbaren KI-Techniken vor und drei nacheinander aufgetretenen Stufen zu: (1) maschinelles Lernen, (2) Deep Learning und (3) die jüngsten Fortschritte bei groß angelegten Modellen und multimodalem Lernen. Danach stellen wir Fallstudien zu bestehenden Ansätzen der Nutzung von KI im Bereich der evolutionären Morphologie vor, beispielsweise Bilderfassung und Segmentierung, Merkmalserkennung, Morphometrie und Phylogenetik. Anschließend diskutieren wir mögliche Fortschritte in bestimmten Bereichen der Forschung auf diesem Gebiet in naher Zukunft, darunter auch solche KI-Methoden, die bisher noch nicht zur Untersuchung der morphologischen Evolution angewendet wurden. Wir weisen insbesondere auf Schlüsselbereiche hin, in denen noch weiteres Potenzial zur Nutzung von KI besteht und die zur Verbesserung der Studien zur evolutionären Morphologie eingesetzt werden könnten. Diese Kombination aus bestehenden Methoden und aktuellen Neuentwicklungen könnte die evolutionäre Analyse des Phänotyps von Organismen in eine evolutionäre Phänomik verwandeln, die in enger Verbindung mit der Genomik und anderen Bereichen der Bioinformatik zu einer Ära von integrativer „big data“ führt.

Synopsis (Malagasy)

Ny faharanitan-tsaina artifisialy (artificial intelligence—AI) dia afaka ny hampiova sy hanavao ny lafiny maro eo amin'ny siansa, anisan'izany ny fandalinana ny morfolojia evolisionera (evolutionary morphology). Na dia efa nampiasaina nandritra ny taona maro tamin'ny morfolojia evolisionera aza ny fomba AI klasika toy ny fanadihadiana ny singa fototra (principal component analysis) sy ny fanasokajiana (Cluster Analysis), dia hita fa mihamaro ny fampiasana ny fianarana lalina (deep learning) eo amin'ny ekolojia sy ny biolojia evolisionera tato anatin'ny taona vitsivitsy. Efa mihamaro sy misokatra amin'ny besinimaro ireo tahiry elektronika misy ireo santionany ankehitriny, ka dia manome fahafahana goavana ny AI hanampy amin'ny famahana ny sakana lehibe amin'ny fanadihadiana haingana ireo tahiry goavana momba ny fenôtypika. Ato amin'ity asa ity, dia jerentsika ny toeran'ny fandrosoana ankehitriny amin'ny fampiasana AI amin'ny fandalinana ny morfolojia evolisionera, izay efa nandroso indrindra eo amin'ny sehatry ny fakana sy fanodinana angona. Asehontsika ireo teknika AI lehibe azo ampiasaina, izay mizara telo arakaraka ny filaharany amin'ny vanim-potoana nipoirany: (1) ny fianarana milina (machine learning), (2) ny fianarana lalina (deep learning), ary (3) ny fandrosoana farany indrindra amin'ny modely lehibe sy ny fianarana mitambatra maromaro (multimodal learning). Avy eo dia asehontsika ireo tranga efa nampiasana AI amin'ny morfolojia evolisionera, anisan'izany ny fakana sy fizarana sary, ny fahafantarana ny endri-javatra, ny morfometrika, ary ny fiaviana (phylogenetics). Manaraka izany, dia resahintsika ny fanantenana ho amin'ny fivoarana akaiky amin'ny lafiny manokana eo amin'ity sehatra ity, anisan'izany ny fampiasana ireo fomba AI vaovao izay mbola tsy nampiasaina amin'ny fandalinana ny fivoaran'ny morfolojia. Asongadintsika manokana ireo tontolo lehibe izay mbola tsy ampy fampiasana AI ary azo ampiasaina hanatsarana ny fandalinana ny morfolojia evolisionera. Ity fampitambarana ireo fomba ankehitriny sy ny fandrosoana mety hitranga ity dia manana ny fahefana manova tanteraka ny fandalinana evolisionera ny fenôtypika ho lasa “big data” ka mampifanakaiky ny fandalinana ny fenôtypika sy ny genomika ary ny sehatra hafa ao amin'ny bioinformatika.

Synopsis (Spanish)

La inteligencia artificial (IA) está destinada a revolucionar muchos aspectos de la ciencia, incluido el estudio de la morfología evolutiva. Aunque los métodos clásicos de IA, como el análisis de componentes principales y el análisis de clústeres, han sido habituales en el estudio de la morfología evolutiva durante décadas, en los últimos años se ha observado una creciente aplicación del aprendizaje profundo a la ecología y la biología evolutiva. A medida que las bases de datos de especímenes digitalizados se vuelven cada vez más frecuentes y están disponibles de forma abierta, la IA está ofreciendo un nuevo y vasto potencial para sortear las barreras existentes desde hace mucho tiempo para el análisis rápido de grandes datos de fenotipos. Aquí revisamos el estado actual de los métodos de IA disponibles para el estudio de la morfología evolutiva, que están más desarrollados en el área de adquisición y procesamiento de datos. Presentamos las principales técnicas de IA disponibles, categorizándolas en tres etapas en función de su orden de aparición: (1) aprendizaje automático, (2) aprendizaje profundo, y (3) los avances más recientes en modelos a gran escala y aprendizaje multimodal. A continuación, presentamos estudios de casos de enfoques existentes que utilizan la IA para la morfología evolutiva, incluyendo la captura y segmentación de imágenes, el reconocimiento de rasgos, la morfometría y la filogenética. A continuación, analizamos las perspectivas de avances a corto plazo en áreas específicas de investigación dentro de este campo, incluido el potencial de nuevos métodos de IA que aún no se han aplicado al estudio de la evolución morfológica. En concreto, señalamos las áreas clave en las que la IA sigue infrautilizada y que podrían utilizarse para mejorar los estudios de morfología evolutiva. Esta combinación de métodos actuales y desarrollos potenciales tiene la capacidad de transformar el análisis evolutivo del fenotipo del organismo en fenómica evolutiva, conduciendo a una era de “big data“ que alinee el estudio de los fenotipos con la genómica y otras áreas de la bioinformática.

Synopsis (Catalan)

La intel·ligència artificial (IA) està destinada a revolucionar molts aspectes de la ciència, inclòs l'estudi de la morfologia evolutiva. Encara que els mètodes clàssics de IA, com l'anàlisi de components principals i l'anàlisi de clústers, han estat habituals en l'estudi de la morfologia evolutiva durant dècades, en els últims anys s'ha observat una creixent aplicació de l'aprenentatge profund a l'ecologia i la biologia evolutiva. A mesura que les bases de dades d'espècimens digitalitzats es tornen cada vegada més freqüents i estan disponibles de manera oberta, la IA està oferint un nou i vast potencial per a eludir les barreres existents des de fa molt temps per a l'anàlisi ràpida de grans dades de fenotips. Aquí revisem l'estat actual dels mètodes de IA disponibles per a l'estudi de la morfologia evolutiva, que estan més desenvolupats en l'àrea d'adquisició i processament de dades. Presentem les principals tècniques de IA disponibles, categoritzant-les en tres etapes en funció de la seva ordre d'aparició: (1) aprenentatge automàtic, (2) aprenentatge profund, i (3) els avanços més recents en models a gran escala i aprenentatge multimodal. A continuació, presentem estudis de casos d'enfocaments existents que utilitzen la IA per a la morfologia evolutiva, incloent-hi la captura i segmentació d'imatges, el reconeixement de trets, la morfometria i la filogenètica. A continuació, analitzem les perspectives d'avanços a curt termini en àrees específiques de recerca dins d'aquest camp, inclòs el potencial de nous mètodes de IA que encara no s'han aplicat a l'estudi de l'evolució morfològica. En concret, assenyalem les àrees clau en les quals la IA segueix infrautilitzada i que podrien utilitzar-se per a millorar els estudis de morfologia evolutiva. Aquesta combinació de mètodes actuals i desenvolupaments potencials té la capacitat de transformar l'anàlisi evolutiva del fenotip de l'organisme en fenómica evolutiva, conduint a una era de “big data“ que alineï l'estudi dels fenotips amb la genòmica i altres àrees de la bioinformàtica.

Synopsis (Norwegian)

Kunstig intelligens (KI) er i ferd med å revolusjonere mange aspekter av vitenskap, inkludert studiet av evolusjonær morfologi. Selv om klassiske KI-metoder som hovedkomponentanalyse og klyngeanalyse har vært vanlige i studiet av evolusjonær morfologi i flere tiår, har de senere årene sett økende bruk av dyp læring innen økologi og evolusjonsbiologi. Etter hvert som digitaliserte eksemplardatabaser blir stadig mer utbredt og åpent tilgjengelige, tilbyr KI et stort potensial for å omgå langvarige hindringer for rask, storskala dataanalyse av fenotyper. Her gjennomgår vi dagens KI-metoder tilgjengelig for studiet av evolusjonær morfologi, som er mest utviklet innen dataregistrering og prosessering. Vi introduserer de viktigste tilgjengelige KI-teknikkene, og kategoriserer dem i tre stadier basert på deres rekkefølge av fremvekst: (1) maskinlæring, (2) dyp læring, og (3) de nyeste fremskrittene innen storskalamodeller og multimodal læring. Deretter presenterer vi casestudier av eksisterende tilnærminger som bruker KI for evolusjonær morfologi, inkludert bildeinnsamling og segmentering, funksjonsgjenkjenning, morfometri og fylogenetikk. Vi diskuterer deretter fremtidige fremskritt på spesifikke forskningsområder innen dette feltet, inkludert potensialet til nye KI-metoder som ennå ikke er anvendt på studiet av morfologisk evolusjon. Spesielt påpeker vi nøkkelområder hvor KI fortsatt er underutnyttet og kan brukes til å forbedre studier av evolusjonær morfologi. Denne kombinasjonen av nåværende metoder og potensielle utviklinger har kapasitet til å forvandle den evolusjonære analysen av organismens fenotype til evolusjonær phenomics, og innlede en æra med “big data” som bringer studiet av fenotyper i tråd med genomics og andre områder av bioinformatikk.

Synopsis (Portuguese)

A inteligência artificial (IA) está preparada para revolucionar muitos aspectos da ciência, incluindo o estudo da morfologia evolutiva. Embora os métodos clássicos de IA, como a análise de componentes principais e a análise de agrupamentos, tenham sido comuns no estudo da morfologia evolutiva durante décadas, nos últimos anos temos visto uma aplicação crescente da aprendizagem profunda à ecologia e à biologia evolutiva. À medida que os bancos de dados de espécimes digitalizados se tornam cada vez mais predominantes e disponíveis abertamente, a IA oferece um novo e vasto potencial para contornar barreiras de longa data à análise rápida de fenótipos de big data. Aqui, revisamos o estado atual dos métodos de IA disponíveis para o estudo da morfologia evolutiva, que são mais desenvolvidos na área de aquisição e processamento de dados. Apresentamos as principais técnicas de IA disponíveis, categorizando-as em três estágios com base em sua ordem de aparecimento: (1) o ‘machine learning’, (2) o ‘deep learning’ e (3) os avanços mais recentes em modelos de grande escala e o ‘multimodal learning’. A seguir, apresentamos estudos de caso de abordagens existentes usando IA para morfologia evolutiva, incluindo captura e segmentação de imagens, reconhecimento de características, morfometria e filogenética. Em seguida, discutimos o prospecto para avanços de curto prazo em áreas específicas de investigação neste campo, incluindo o potencial de novos métodos de IA que ainda não foram aplicados ao estudo da evolução morfológica. Em particular, notamos áreas-chave onde a IA permanece subutilizada e poderia ser usada para aprimorar estudos de morfologia evolutiva. Esta combinação de métodos atuais e desenvolvimentos potenciais tem a capacidade de transformar a análise evolutiva do fenótipo do organismo em fenômica evolutiva, levando a uma era de “big data” que alinha o estudo dos fenótipos com a genômica e outras áreas da bioinformática.

Synopsis (Dutch)

Artificiële intelligentie (AI) staat paraat om revolutionair te zijn voor verschillende aspecten van de wetenschap, inclusief de studie van evolutionaire morfologie. Het gebruik van klassieke AI-methodes zoals principal component analyse and cluster analyse wordt al decennia lang regelmatig gebruikt in de studie van evolutionaire morfologie, maar in recente jaren is het gebruik van “deep learning” in de studie van ecologie en evolutionaire biologie toegenomen. Nu dat het digitaliseren van exemplaar databases meer regelmatig en vaak openbaar is, offert AI veel nieuwe mogelijkheden om lang bestaande barrières te omringen, inclusief het spoedige analyse van “big data” van fenotypen. In deze wetenschappelijke beoordeling vatten we de beschikbare AI-modellen samen die geschikt staan voor de study van evolutionaire morfologie met de focus op diegene die het meest geschikt staan voor data-acquisitie en verwerking. Wij introduceren de voornaamste AI-technieken en sorteren ze in drie groepen gebaseerd op de volgorde van verschijning: (1) machinelearning, (2) deep learning, en (3) de meest recente vooruitgang in vorderingen van grootschalige modellen en multimodaal leren. Daarna presenteren wij casestudies van de bestaande gebruiksmiddelen van AI voor de wetenschap van evolutionaire morfologie, inclusief het gebruik voor beeld vastleggen en segmentatie, kenmerken herkenning, morfometrie en fylogenetica. We besprekenook hetvooruitzicht van de vooruitgang op korte termijn op dit gebied, inclusief de potentie van nieuwe AI-methodes die nog niet gebruikt worden binnen het gebied van evolutionaire morfologie. We bespreken vooral de gebieden waar AI onderbenut wordt en waar het gebruik van AI de studie van evolutionaire morfologie versterkt. De combinatie van huidige methoden en de potentiële ontwikkelingen daarvan hebben de capaciteit om de evolutionaire analyse van het organismale fenotype naar evolutionaire fenomica te transformeren, leidend tot “big data'' dat de gebieden van fenotypen en genomica en andere gebieden van de bio-informatica samen brengt.

Synopsis (Papiamento)

Inteligensia Artificial (IA) ta para cla pa revolutioná siensa den hopi sentido, incluyendo den e studio di morfologia evolucionario. E uso di metodonan classico di IA manera análisis di componente principal y análisis di cluster, den e estudio di morfologia evolucionario ta comun pa década. Den e ultimo anjanan tin un cresemento di e application di “deep learning” den e studio di ecologia y biologia evolucionario. Awor cu mas datonan ta wordo colecciona ambos digital y publico, IA ta ofrece un manara nobo pa evitá bareranan cu ta prevení e análisis lihe di “big data” relationando cu e estudio di fenotiponan. Den e revista científica aki nos ta declará e formanan di IA metodonan disponible pa e estudio di morfologia evolucionario cu ta mas desaroyá den e área di acuerdo y tratamento di data. Nos lo introduci e maneranan principal di IA y categorisá nan den: (1) machine learning, (2) deep learning, y (3) e desaroyonan mas reciente den “large-scale models” y “multimodal learning”. Tambe nos lo papia di estudio di metodonan ecsistente cu por wordo usa den morfologia evolucionario, incluyendo esnan uza pa e capturation y segmentacion di imagen, reconosemento di característica, morfometria y filogenetica. E or' ei nos ta discuti e potencial di metodonan nobo di IA cu te awor no a wordo usa den e studio di morfologia evolucionario. Particularmente, nos ta discuti e areanan unda IA no ta wordo utilizá optimalmente pa mehora e studio di morfologia evolucionario. E combinacion di e metodonan classico y e desaroyo di metodonan nobo tin e poder pa por transformá con nos ta analysá e fenotipo di organismo den fenomenologia evolucionario. Esaki lo resulta den un era di “big data” cu ta alinea cu e estudio di fenotiponan cu genomica y ortro areanan di bioinformática.

Synopsis (Afrikaans)

Kunsmatige intelligensie (KI) is het die potensiaal om baie aspekte van die wetenskap te revolusioneer, insluitend die studie van evolusionêre morfologie. Terwyl klassieke KI-metodes soos hoofkomponent-analise en klusteranalise al dekades algemeen in die studie van evolusionêre morfologie is, het die afgelope jare toenemende toepassing van “deep learning” op ekologie en evolusionêre biologie gesien. Namate gedigitaliseerde monsterdatabasisse al hoe meer algemeen en openlik beskikbaar word, bied KI nuwe potensiaal om langdurige hindernisse vir vinnige, grootdata-ontleding van fenotipes te omseil. Hier hersien ons die huidige stand van KI-metodes wat beskikbaar is vir die studie van evolusionêre morfologie wat die meeste ontwikkel is op die gebied van data-verkryging en -verwerking. Ons stel die belangrikste beskikbare KI-tegnieke bekend en kategoriseer hulle in drie stadiums gebaseer op hul voorkomsvolgorde: (1) masjienleer, (2) “deep learning”, en (3) die mees onlangse vordering in grootskaalse modelle en multimodale leer. Volgende bied ons gevallestudies aan van bestaande benaderings wat KI gebruik vir evolusionêre morfologie, insluitend beeldvaslegging en segmentering, kenmerkherkenning, morfometrie en filogenetika. Ons bespreek dan die prospektus vir vordering op die kort termyn in spesifieke areas van ondersoek binne hierdie veld, insluitend die potensiaal van nuwe KI-metodes wat nog nie toegepas is op die studie van morfologiese evolusie nie. Ons let veral op sleutelareas waar KI onderbenut bly en gebruik kan word om studies van evolusionêre morfologie te verbeter. Hierdie kombinasie van huidige metodes en potensiële ontwikkelings het die vermoë om die evolusionêre analise van die organismefenotipe in evolusionêre fenomika te omskep, wat lei tot 'n era van “big data” wat die studie van fenotipes in lyn bring met genomika en ander areas van bioinformatika.

Introduction

The rapid proliferation of tools using artificial intelligence (AI) has highlighted both its immense potential and the numerous challenges its implementation faces in biological sciences. Traditional AI methods (i.e., machine learning) have been widely used in biology for decades; indeed, common analytical methods such as principal component analysis (PCA) and cluster analysis are both types of machine learning (ML). Since the early 2010s, deep learning (DL) has gained significant traction and is increasingly applied to biological problems, including image analysis (Angermueller et al. 2016; Moen et al. 2019; Hallou et al. 2021; Liu et al. 2021b; Pratapa et al. 2021; Akçakaya et al. 2022; Ravindran 2022; Li et al. 2023) and molecular analysis (Atz et al. 2021; Kuhn et al. 2021; Kwon et al. 2021; Audagnotto et al. 2022; Korfmann et al. 2023), among other broad topics within ecology and evolutionary biology (Lürig et al. 2021; Borowiec et al. 2022; Pichler and Hartig 2023).

One key area of biological inquiry relevant to diverse topics is the field of evolutionary morphology, which aims to characterize and reconstruct the evolution of organismal phenotypes (e.g., Alberch et al. 1979; Love 2003). The scope of evolutionary morphology is vast, encompassing pattern, process, and mechanism, from cellular to macroevolutionary levels, across the entire 3.7-billion-year history of life on Earth and, consequently, often involves large datasets (e.g., Cooney et al. 2017, 2019; Price et al. 2019; Goswami and Clavel 2024; Hoyal Cuthill et al. 2024). Comparative evolutionary analyses in particular require large sample sizes for robustness in statistical analysis or evolutionary modeling (e.g., Cardini and Elton 2007; Guillerme and Cooper 2016a, 2016b). Researchers commonly face a trade-off between the breadth and depth of their study, as, typically, high-resolution morphological datasets must sacrifice taxonomic, ecological, or chronological coverage owing to time or computational limitations (e.g., Bardua et al. 2019a; Goswami et al. 2019; Rummel et al. 2024). AI offers an unparalleled opportunity to bridge this breadth–depth gap and thus transform the field into “big data” science, thereby supporting the development of evolutionary morphology. By making large-scale data analysis more feasible, integrating AI into this field will ultimately allow a better understanding of the drivers and mechanisms of morphological evolution.

Here, we focus on the applications of AI to the study of evolutionary morphology, exploring not only preexisting uses but also the potential of recently developed AI methods that have not yet been applied to the study of morphological evolution. We introduce the main available AI techniques, categorizing them into three groups based on their order of appearance: (1) ML, (2) DL, and (3) recent advancements in DL from transformers to large-scale models. Next, we present existing AI approaches in the order of a common lifecycle of evolutionary morphological studies: (1) data acquisition, (2) image data processing, (3) phenomics, and (1) evolutionary analysis. We also focus on 10 case studies in which AI can benefit evolutionary morphological studies, and provide a table of AI tools already available that can be integrated into evolutionary morphology research. Finally, we discuss the areas where there are potential, but limited current, applications of AI to key areas in evolutionary morphology.

Evolution of AI methods

We begin by providing the key definitions necessary for a base-level understanding of this review. These primarily center on the nested relationships of AI, ML, and DL (Fig. 1), but also include the adjacent and overlapping field of computer vision. Because AI applications for evolutionary morphology primarily involve the analysis of images or text, computer vision is often an integral part of AI applications to evolutionary morphology, including most of those discussed here. However, it is worth noting that computer vision is not limited to AI but also present in numerous applications for image data that do not involve AI (e.g., Samoili et al. 2020). Further methodological definitions are provided where required in the main text.

  • Artificial intelligence, or AI, is particularly challenging to define, as its scope is extremely broad. The European AI strategy (European Commission 2018) provides a definition as follows: “Artificial Intelligence refers to systems that display intelligent behaviour by analysing their environment and taking action—with some degree of autonomy—to achieve specific goals,” leaving the interpretation of intelligent behavior open to the reader. Russell and Norvig (2021) provide a more operative definition of AI, as a system that can either “reason,” act human-like, or act rationally.

  • Machine learning, or ML, is a subset of AI that can be defined as “the ability of systems to automatically learn, decide, predict, adapt, and react to changes, improving from experience and data, without being explicitly programmed” (Amalfitano et al. 2024).

  • Neural networks are ML models consisting of layers of interconnected nodes (neurons). Each neuron receives the input, processes it using mathematical functions with parameters, and passes the output to the next layer, enabling the network to learn patterns and make decisions based on data.

  • Deep learning, or DL, is a branch of ML wherein learning is achieved through complex neural networks with many layers and parameters. With a large number of parameters, DL models are able to make predictions on difficult tasks or extract features with data that have complex structures.

  • Computer vision is a multidisciplinary field of computer science that enables machines to interpret, analyze, and understand visual information from the world, through image and video processing algorithms. It refers to using computers for pattern recognition in two-dimensional (2D) and three-dimensional (3D) digital media. While many applications of computer vision for evolutionary morphology involve AI, it is not limited to AI and is applied in diverse fields.

Broad definitions, relationships, and differences between artificial intelligence, machine learning, and deep learning, the sequential development of each successive subset, and their broad introductions over time (Carbonell et al. 1983; Goodfellow et al. 2016).
Fig. 1

Broad definitions, relationships, and differences between artificial intelligence, machine learning, and deep learning, the sequential development of each successive subset, and their broad introductions over time (Carbonell et al. 1983; Goodfellow et al. 2016).

Classical machine learning

Prior to the development of DL, ML methods had been successfully used for classifying, clustering, and predicting structured data, such as tabular data. Techniques like random forests (Breiman 2001) and K-means clustering (MacQueen 1967) have been widely used in evolutionary morphology studies (Dhanachandra et al. 2015; Pinheiro et al. 2022). These methods are typically end-to-end, whereby data are inputted, and the methods learn patterns to generate results. Meanwhile, when it comes to image data, classical computer vision pipelines were composed of two separate computational steps. The first involved the extraction of local or global characteristics (features) that were deemed useful for a task from images. This meant that, for example, the borders and edges of an image needed to be identified, and subsequently, an object could be detected based on the edges, as in the active contours (Kass et al. 1988) and level set methods (Osher and Sethian 1988; Chan and Vese 1999). The extracted features were then used as inputs to ML algorithms that were optimized for structured data.

Subsequent efforts were devoted to the design of methods to extract relevant structures within an image, such as Haar features (Papageorgiou et al. 1998), scale-invariant feature transform (Lowe 2004), histogram of oriented gradients (Dalal and Triggs 2005), Fisher kernels (Perronnin and Dance 2007; Perronnin et al. 2010), and curvelets (Candès et al. 2006). These engineered (or hand-crafted, or heuristic) features were then often used as inputs for ML methods, which can be broadly classified into the following approaches: classification, clustering, and dimension reduction (Lloyd 1982; Cortes and Vapnik 1995; Breiman 2001; Jolliffe and Cadima 2016). Although DL architectures and convolutional neural networks (CNNs) had already been proposed in the early 1990s (LeCun et al. 1989), their success was limited due to a lack of computational power and the availability of large datasets needed to fully exploit their capabilities. However, there were some attempts to design ML systems that could learn the extraction of optimal linear features for downstream tasks (e.g., classification, detection, and clustering) within a boosting framework (Vedaldi et al. 2007).

Deep learning

Although artificial neurons (McCulloch and Pitts 1943) and then artificial neural networks (Rosenblatt 1958) were introduced several decades ago, they were often outperformed by other methods, especially ensembles of decision trees like random forests (Breiman 2001) or boosted trees (Chen and Guestrin 2016) across a variety of tasks at that time. This was mainly due to the difficulty in training fully connected networks (wherein the neurons of each layer are connected to all neurons in the following layer) with more than a few layers. Even when shared-weights approaches and CNNs were introduced (Fukushima 1980; LeCun et al. 1989), they remained on the fringe of the AI community, with the primary bottlenecks being the computational power required to build networks with multiple layers and the amount of data needed to train such systems.

As the availability of data and the performance of computer hardware improved, especially with the advent of graphics processing units (GPUs), deep CNNs rose to prominence in the field of computer vision. A key turning point was reached in 2012, when a deep CNN achieved the best result in the ImageNet Large Scale Visual Recognition Challenge (classifying millions of images into thousands of classes) (Krizhevsky et al. 2017). Ever since, computer vision tasks have been dominated by solutions using deep neural networks (DNNs; a key DL technique), to the extent that learning with DNNs is now generally referred to as AI, a name formerly used only for methods trying to solve general intelligence tasks, rather than specific tasks. In recent years, DL has undergone significant expansion into diverse domains, demonstrating its adaptability and offering promising solutions to challenges in various fields such as physics, medicine, and even gaming (Silver et al. 2016; Shallue and Vanderburg 2018; Raissi et al. 2019; Poon et al. 2023). Concurrently, neural network-based methods such as long short-term memory (LSTM) (Hochreiter and Schmidhuber 1996) and recurrent neural networks (RNNs) (Graves et al. 2013) have been applied to sequential data and have shown great results for handling text and time series data, which has led to them being widely used in natural language processing (NLP) tasks (Zhou et al. 2015; Canizo et al. 2019).

The difficulty of gathering a large enough dataset to fully train a DL model for a specific task can be mitigated by the assumption that many low-level features learned by large models are generally enough for most tasks (Tan et al. 2018). Under this assumption, the features learned for a task can also be transferred for a different task. A technique frequently used in DL is the use of pretrained models that are then fine-tuned (the entire model adapts to the new task) or used for transfer learning (only the final layers of the models are trained) (e.g., Mathis et al. 2018). Using pretrained models reduces the need for large datasets, often improves model performance, and saves training time and resources (Devlin et al. 2019; Dosovitskiy et al. 2021). A common example is the use of models pretrained with the ImageNet dataset​ for downstream tasks (Ren et al. 2016; Chen et al. 2017), such as in Sun et al. (2018), where the ImageNet-based model was used for object detection from underwater videos in marine ecology.

Transformer, large-scale AI models, and multimodal learning

In 2017, a model architecture known as Transformer was developed to address many NLP tasks, such as translation (Vaswani et al. 2017; Vydana et al. 2021). Transformer uses a self-attention mechanism, allowing each token (i.e., words, phrases, sentences, etc.) to interact with other tokens during training. Transformer can handle more information than RNNs and LSTM, can analyze contextual information, and is also better at parallelization. Since Transformer's introduction, it has become state-of-the-art for many NLP tasks (Ahmed et al. 2017; Baevski and Auli 2019). By 2020, most computer vision models were using CNN-based methods. Transformers have been implemented as the backbone architecture for computer vision models (Dosovitskiy et al. 2021; Liu et al. 2021c). A common method is to divide an image into patches, which are treated as sequential inputs similar to tokens in NLP tasks (Dosovitskiy et al. 2021). When Transformer is applied, models can recognize patterns and relationships between different image parts.

Research has shown that having large and diverse datasets allows models to generalize well and perform more accurately (Russakovsky et al. 2015; Goodfellow et al. 2016). Supervised learning is a common learning strategy that requires all training data to be manually labeled. However, gathering a sufficient quantity of labeled data is often extremely labor-intensive, as most applications require relatively large training datasets. Different training strategies are applied to tackle this problem (Fig. 2). Semi-supervised learning uses both labeled and unlabeled data for training (Zhu and Goldberg 2022), weakly supervised learning uses less accurately labeled data for training (Lin et al. 2016), and self-supervised learning only uses unlabeled data (He et al. 2021). These strategies allow DL models to leverage as much data as possible without the need for extensive manual work.

An overview of existing learning strategies and the levels of labeling used in these strategies.
Fig. 2

An overview of existing learning strategies and the levels of labeling used in these strategies.

Self-supervised learning has been widely used in NLP studies. One example uses parts of sentences as input data to predict entire sentences, thereby allowing all the unlabeled text to be considered as training data (Devlin et al. 2019). Models trained with masked sentences can be used as powerful pretrained models for fine-tuning downstream tasks. With access to more training data and larger model architectures, generative models like the Generative Pre-trained Transformer (GPT) family were developed (Radford et al. 2018, 2019; Brown et al. 2020). Recent GPT models (e.g., GPT-3.5 and GPT-4) are capable of performing exceptionally well on many NLP tasks, even when doing zero-shot (no training needed for new tasks) or few-shot (only a few training samples needed) learning (e.g., Brown et al. 2020).

Contrastive learning is one of the self-supervised learning strategies that is widely used in computer vision (Wu et al. 2018; Oord et al. 2019). The idea of contrastive learning is to train a model to map similar instances (e.g., a different view of the same image) close together while mapping dissimilar images farther apart in the feature space. Although different approaches have been designed to map similar/dissimilar instances (Chen et al. 2020; He et al. 2020), the fundamental concept remains the same. As a result, contrastive learning enables models to capture intricate visual patterns and semantics from data without the need for labeled data, thereby improving performance on downstream tasks. Later, masked images (where parts of images are obscured) have been used to predict original images and have been shown to achieve promising results (He et al. 2021).

These learning strategies enable the training of large models using unlabeled or a small set of labeled data, which is particularly applicable to biological sciences, given the wealth of data available in natural history collections (Johnson et al. 2023). Additionally, AI has been successfully applied to process various data modalities, including text, images, and videos (Radford et al. 2021). Multimodal learning can be implemented by combining features extracted from different data modalities into one feature space. This enables tasks such as generating images with text descriptions or generating descriptions for images (Radford et al. 2021). With more data available (e.g., through self-supervised learning) and the advancement of AI models (e.g., Transformer), the field of multimodal learning is rapidly evolving. There have been a few implementations of multimodal models on biological data (Stevens et al. 2024). In evolutionary morphology, multimodal learning can effectively process diverse data modalities, such as photographs, micro-computed tomography (micro-CT) scans, and 3D mesh models.

A full review of the aforementioned three major stages in the development of AI is beyond the scope of this paper, and there are numerous other subfields of AI not explicitly reviewed in this section, such as robotics (Dumiak 2008) and graph neural networks (Dettmers et al. 2018). Nonetheless, these methods hold substantial potential for the study of evolutionary morphology and, where appropriate, will be noted in the subsequent sections focused on current usage and future applications in this field.

AI for evolutionary morphology

In this next section, we pivot toward a goal-oriented review and prospectus of applications of AI in evolutionary morphology, with accompanying case studies. We present the overview of currently available AI tools for evolutionary morphology studies in four sections: data acquisition, image data processing, phenomics, and evolutionary analysis. We introduce these methods with a schematic of generalized AI workflows (Fig. 3), which are expanded in the following sections.

Schematic of a common workflow using manual and AI approaches for evolutionary morphological analysis involving 3D images and meshes. The main steps are: (a) segmenting the specimen from the background; (b) isolating the scan into target regions; and (c) extracting phenomic data from the isolated regions.
Fig. 3

Schematic of a common workflow using manual and AI approaches for evolutionary morphological analysis involving 3D images and meshes. The main steps are: (a) segmenting the specimen from the background; (b) isolating the scan into target regions; and (c) extracting phenomic data from the isolated regions.

Data acquisition

The first step of acquiring data is to collect the relevant samples, which are to be used in the subsequent investigation under appropriate best practices (e.g., Parham et al. 2012). For analysis of evolutionary morphology, this includes obtaining not only the data that are being measured but also the corresponding metadata such as details about museum specimens (e.g., Smith and Blagoderov 2012, Davies et al. 2017, Ioannides et al. 2017, Johnson et al. 2023). The suitability, quality, and quantity of data are of critical importance to the development and implementation of AI models. Data should be diverse and clean; fulfilling these requirements can make a larger difference than model choice, and without data that conforms to these requirements, good models will perform badly (Whang et al. 2023). Diverse data include enough examples of each class of interest. Determining how much data is enough depends on the specific problem at hand. Scarce data can be expanded using existing databases or by employing pretrained networks for transfer learning (Sharif Razavian et al. 2014). However, DL models can be successful on small training sets. Few-shot learning is a form of transfer learning that uses training data where 1–20 examples of each class are available (Wang et al. 2021b). Scarce data tend to have an imbalance between the presence of classes in the dataset; the model may find it difficult to discriminate the scarcely represented classes and perform unreliably (Schneider et al. 2020). Clean data minimize errors from training datasets. Preprocessing a dataset increases the suitability of the data for training and can include contrast enhancement, noise reduction, and masking, where a portion of the image is designated for further analysis (Lürig et al. 2021).

Data scarcity and imbalance can also be improved by additional data collection or artificial data expansion (e.g., data augmentation). Alternatively, an imbalance can be tackled by explicitly accounting for biases in the training algorithm (Buda et al. 2018). Augmentation effectively increases the size of the training set without new data collection by manipulating images to create “new” images from the existing data. This can be achieved by rotating, mirroring, scaling, or altering the pixel values (Shorten and Khoshgoftaar 2019; Mulqueeney et al. 2024a). This process must be controlled with the aim of the model in mind. For example, for planktonic foraminifera, the chirality of a species can be important in species classification, meaning augmentation by mirroring (i.e., horizontally flip) makes the labeled image into a facsimile of a different species (Hsiang et al. 2019).

Identifying and cataloguing specimen data

Many, perhaps even most, studies of evolutionary morphology are based primarily on data housed within museum collections. However, museum collections are rarely fully catalogued and searching for a specific specimen or representatives of specific groups can be challenging. This difficulty is because data are often inconsistent in quality and structure, particularly in large collections (Dutia and Stack 2021). Some of the key challenges to address in cataloguing museum specimens include recognizing species and extracting taxonomic and metadata to enable effective searches. AI can play a key role in this, particularly when it comes to tasks of digitizing, identifying, cataloguing, and locating specimens within collections.

At its most basic definition, digitization involves the creation of digital objects from physical items. Within museums, this is often attributed to the photographing, scanning, or filming of physical specimens (Blagoderov et al. 2012). However, traditional ways of digitizing artifacts, such as digitizing each specimen individually, can be invasive to the specimen, time-consuming, and not very cost-effective (Price et al. 2018). This has led to a series of innovations that can help advance museum digitization, from drawer scanning (Schmidt et al. 2012), which enables multiple specimens to be digitized at once, to special rotating platforms that, when combined with photogrammetry techniques, allow for the 3D scanning of specimens, while avoiding the use of more expensive or time-consuming scanning techniques, as seen in Ströbel et al. (2018) and Medina et al. (2020). ML can lend a hand to these innovations to advance digitization even more, such as the use of computer vision techniques and CNNs to segment individual specimens from whole-drawer scans (Blagoderov et al. 2012; Hudson et al. 2015; Hansen et al. 2020).

DL has recently been applied to many types of biological specimens and collections (e.g., Soltis et al. 2020). These methods have been developed and applied extensively to recognize species, metadata, traits, and even life history stages of digitized specimens. This is most established in the botanical sciences, where flat herbarium sheets are easily digitized in large numbers as 2D photographs (Gehan et al. 2017; Goëau et al. 2022). This has also led to advances in DL-based classification and segmentation tasks of traits within digitized herbarium sheets (e.g., Weaver et al. 2020; Walker et al. 2022), such as with LeafMachine (Weaver and Smith 2023), which is a tool for automatic plant trait extraction, from flowers and leaves to rulers and plant label detection. In some instances, albeit to a lesser degree, species identification methods have also been applied to digitized photographs of animal collections (e.g., Macleod 2017; Ling et al. 2023). Applications of DL to species identification of both plants and animals from photographs have been greatly enhanced by citizen science, resulting in useful online tools such as iNaturalist (Unger et al. 2021) and Pl@ntNet (Goëau et al. 2013). CNN algorithms have borne promising results and can correctly distinguish morphologically similar species (Feng et al. 2021; Hollister et al. 2023). Other ML methods, such as those described by Wilson et al. (2023), have also been applied to rescaling and increasing the quality of and extracting metadata from images of museum specimens, allowing for automatic feeding of this information into databases.

Beyond images of the specimens themselves, AI approaches for capturing information from specimen labels can save vast amounts of manual effort for cataloguing specimens and making key data searchable (Case Study 1). Together, species identification and taxonomic and metadata extraction methods from images represent a powerful tool for unlocking the full potential of natural history collections. These approaches can make data more discoverable and usable for documenting biodiversity both in collections and in the field (Schuettpelz et al. 2017; Wäldchen and Mäder 2018; White et al. 2020; Karnani et al. 2022).

Information on specimens is not limited to museum catalogues but is also available in the wealth of scientific publications detailing and imaging specimens for varied purposes. However, extracting taxonomic data from the literature to describe or identify living and fossil species is a time-consuming task. It may also be difficult to find the first appearance of a species name and correctly identify all synonyms for a taxon, as well as accounting for more recent taxonomic reclassifications. Recently, a few research groups have attempted to tackle this problem using ML, with both NLP and other DNN algorithms successfully applied to extract scientific terms and taxonomic names from scientific articles (e.g., Le Guillarme and Thuiller 2022). This is a relatively new application of ML, and more work is required to train models on a variety of sources, including articles in different languages and historic publications.

Once these data are captured, we need effective tools to search for connected specimens. ML has not yet been adopted on a large enough scale to allow searching global natural history collections and connecting specimens. For instance, Dutia and Stack (2021) created “Heritage Connector,” a toolkit for using ML to allow better connectivity of specimens in collections and publications. This software achieved a precision score of greater than 85% with science museum records of 282,259 objects from 7743 organizations. Further development of software such as this, applied on a global scale, will improve access to the vast specimen data held in natural history collections worldwide.

Case Study 1: Machine learning within museum digitization and data collection

  • (A)

    Label extraction within digitization pipelines

ML tools, along with the latest digitization innovations, have allowed for the development of techniques that enable digitizers to extract information from labels automatically while digitizing specimens. For example, a cost-effective and efficient pinned insect digitization process was introduced by Price et al. (2018), which involved placing the specimen within a light box and capturing a handful of photographs simultaneously with multiple cameras from varying angles. The framework described there and in Salili-James et al. (2022b) shows how one can turn to ML to merge labels together from the differently angled images to obtain clean, unobstructed images of labels and hence automatically extract textual information from them for digitization purposes (Fig. 4). The first step in this process is reliant on DL tools such as CNNs to locate labels from the multiple images of the specimen. Next, various mathematical and computer vision tools are used to “stitch” the found labels together to have one clear image of each label. These labels can then be fed into an optical character recognition (OCR) and then an NLP algorithm to transcribe the text and to automatically obtain trait information. This leads to a streamlined, automated pipeline to extract label information that helps speed up digitization efforts.

  • (B)

    Knowledge graphs within digitization pipelines

An example of the workflow described in Salili-James et al. (2022b): (a) using a setup introduced in Price et al. (2018), the algorithm uses a CNN model to segment all labels found on each of the four images of the specimen, (b) for each label, it then merges the four layers together in order to have one version of each label, which can be fed into an automatic transcription algorithm using OCR, and (c) an example of a merged label, with a sample of the automatically transcribed text above it.
Fig. 4

An example of the workflow described in Salili-James et al. (2022b): (a) using a setup introduced in Price et al. (2018), the algorithm uses a CNN model to segment all labels found on each of the four images of the specimen, (b) for each label, it then merges the four layers together in order to have one version of each label, which can be fed into an automatic transcription algorithm using OCR, and (c) an example of a merged label, with a sample of the automatically transcribed text above it.

Transcribing text from labels is one step of the data extraction process, but understanding the textual data and conceptualizing the data and specimen within a larger picture is another, much bigger goal, which may one day be accomplished with knowledge graphs (Dettmers et al. 2018). Knowledge graphs are structured and contextual data models that represent semantic information about concepts, entities, relationships, and events (Fig. 5). Broadly, this can lead to representations of data structured in graphs with interlinking entities, allowing users to define relationships between different items within large datasets. The Natural History Museum, London, has recently begun developing knowledge graphs, with an initial focus on herbarium sheets (Gu et al. 2022). Herbarium sheets contain a wealth of information from handwritten and printed labels appended to the sheet, to handwritten text containing trait and specimen information written directly on the herbarium sheet. Knowledge graphs can help determine links between herbarium sheet specimens, and subsequently, combined with other ML techniques, can be embedded into digitization workflows (Gu et al. 2023). For example, with integrated OCR tools, incorrect or missing information can be corrected and filled in, by analyzing the relationships present in the graphs. Knowledge graphs can be used to identify data outliers, find historical errors, and clean data, making them a great tool for digitization.

An example of (a) the facets of natural history collection data making up a node of a knowledge graph and (b) a whole knowledge graph showing interconnectivity between nodes.
Fig. 5

An example of (a) the facets of natural history collection data making up a node of a knowledge graph and (b) a whole knowledge graph showing interconnectivity between nodes.

Image and scan data collection

While we refer to the use of images for specimen cataloguing earlier, here we focus on the details of image data collection for analysis. The use of images is central to the study of evolutionary morphology, from simple drawings and photographs to computed tomography (CT) scans (Cunningham et al. 2014). Two-dimensional digitization often involves photographing collections (i.e., specimens, drawers, etc.), whereas three-dimensional digitization involves generating images of specimens using techniques such as photogrammetry, surface scanning, or volumetric scanning.

Present-day efforts to digitize specimens with 2D images for large-scale data acquisition and utilization often involve some automated processes, which can streamline both digitization and the interpretation of data (Case Study 2). Recent studies (Scott and Livermore 2021; Salili-James et al. 2022b) describe software that uses ML models to identify regions of interest (ROIs) in 2D images. Once trained, AI software can capture photographs, segment ROIs, and complete other tasks for large collection datasets. This streamlines the overall acquisition and processing of digital data. Over time, ML software becomes more accurate and efficient as it learns through training datasets and is exposed to more data.

The ability to generate high-resolution 3D images has increased exponentially in recent years, particularly with initiatives for mass scanning of collections and databases for open sharing of image data, including DigiMorph (Rowe 2002), Phenome10K (Goswami 2015), and MorphoSource (Boyer et al. 2016). These images can then undergo segmentation or region identification and data extraction, where specific components are identified and separated from the image for further processing or evaluation.

Novel and potentially more efficient scanning methods are continuously emerging. For instance, a neural radiance field (NeRF) is a fully connected neural network that can generate a 3D scan of an object by inputting photographs of it from different viewpoints (Martin-Brualla et al. 2021). Compared with traditional photogrammetry and CT scanning, this method is able to compute 3D scans based only on sparse images (Yu et al. 2021). While the resolution and accuracy are typically inferior to a full 3D scan, it can make 3D data capture more accessible and faster for some objects (e.g., extremely large specimens).

Case Study 2: Robotics for digitization

  • (A)

    Machine learning and robotics for specimen digitization

One technological advancement that can aid digitization is robotics; robots are indeed already in use in other sectors, such as book scanning at libraries (Dumiak 2008). Though usually highly expensive, the prices of robotic arms have been decreasing (Zhang et al. 2022), and one can now purchase a robotic arm for less than £20,000 (Stanford University 2022). This has enabled digitization teams within museums such as the Natural History Museum, London, to start exploring robotics for digitization research (Scott et al. 2023). Here, the goal was to have a collaborative robot (cobot) aid a digitizer in the mass digitization of certain specimens (Fig. 6). Computer vision can be combined with robotics in order to identify, move, and scan specimens, with synthetic specimens used during the training stages in order to mitigate the risks of specimen damage. Thereafter, by implementing CNN algorithms and/or turning to reinforcement learning, a robotic arm can lead to a pipeline that can enable digitization teams to mass digitize multitudes of specimens, even possibly overnight, revolutionizing museum digitization work.

  • (B)

    Automation of specimen digitization

A Techman 500 robotic arm in action at the Natural History Museum, London, placing down a sample pinned specimen from a Lepidoptera collection. Here, the robotic arm has been trained to locate the specimen from the drawer, and then pick it up and place it on a board in order to scan the specimen. Synthetic specimens were used in the training stage for this task.
Fig. 6

A Techman 500 robotic arm in action at the Natural History Museum, London, placing down a sample pinned specimen from a Lepidoptera collection. Here, the robotic arm has been trained to locate the specimen from the drawer, and then pick it up and place it on a board in order to scan the specimen. Synthetic specimens were used in the training stage for this task.

The use of automated robotics for digitization and high-throughput data collection has historically been applied to 2D methods such as photography. Three-dimensional data, such as micro-CT data, can also be collected with new robotic technologies like autoloaders (Rau et al. 2021). Autoloaders allow users to set up multiple specimens for micro-CT and synchrotron scanning, set distinct parameters for each scan, and subsequently run the autoloader without supervision. The autoloader processes specimens in a queue, pulling each from the stand using a robotic arm, and setting up distinct parameters for each (van de Kamp et al. 2018; Rau et al. 2021). This fully automated process results in greater acquisition efficiency, as the number of specimens digitized via this method increases when digitization can occur without technician supervision. While the use of robotic technology to digitize collections can greatly increase the efficiency of image collection, the improvements are more than mechanical. Robots can learn behaviors through reinforcement learning (trial and error, as well as rewarding and/or punishing). By interacting with the environment (e.g., the digitization room), robots can learn optimal actions that maximize rewards (e.g., successfully imaging a specimen).

Image data processing

Capturing image data has become increasingly widespread in recent years, with large programs focused on mass scanning of natural history collections (Hedrick et al. 2020). The bottleneck has now shifted to processing images in order to obtain usable phenotypic data. Here, we focus on the major aspects of image data processing: segmentation for feature extraction and element isolation for both 2D and 3D data.

Image segmentation refers to dividing an image into meaningful areas or objects and extracting ROIs, allowing for targeted analysis, and understanding of visual content (Yu et al. 2024). There are two main parts to segmentation: semantic segmentation, where all objects of a class are grouped as one entity when segmented; and instance segmentation, where objects of the same class are distinguished. These types of segmentation facilitate numerous computer vision tasks, including object recognition by isolating objects or regions within an image (Garcia-Garcia et al. 2018; Jin et al. 2022), object tracking (Zhao et al. 2021), and interpreting a scene with multiple objects (Byeon et al. 2015). This process has traditionally been performed without DL (Otsu 1979; Najman and Schmitt 1994; Boykov et al. 1999; Nock and Nielsen 2004; Dhanachandra et al. 2015; Minaee and Wang 2019); however, it remains subjective (Joskowicz et al. 2019) and time-intensive (Hughes et al. 2022). However, in recent years, novel DL methods have facilitated models to achieve high accuracy rates on common benchmarks (LeCun et al. 2015; Kale and Thorat 2021; Luo et al. 2021; Zhao et al. 2021; Yu et al. 2022). DL-based segmentation methods are state-of-the-art for many image segmentation challenges and often outperform other methods.

2D image segmentation

There are many applications of automated methods on 2D image datasets for morphological studies. Many studies have aimed to extract features such as size (Al-Kofahi et al. 2018), shape (Schwartz and Alfaro 2021; Lürig 2022), and pixel values (Van Den Berg et al. 2020; He et al. 2022) from segmented photographic images, and used the extracted features to quantify organismal morphology. Moreover, due to the properties of 2D radiological images (e.g., magnetic resonance imaging [MRI] and CT), such as distinguishable grayscale values, specific segmentation models have been developed, particularly for applications to medical images (Ronneberger et al. 2015). Some studies have used these models to segment these radiological images and measure morphological features (e.g., Norman et al. 2018; Montagne et al. 2021). As these automated segmentation methods allow for greater consistency among measurements, they make measurements more repeatable, and, particularly in medical fields, they allow for better longitudinal studies (Willers et al. 2021). These methods applied to 2D images can be adopted for studying evolutionary morphology, for instance in evo-devo or histology; however, as an increasing number of investigations seek more detailed measurements, 2D images may lack sufficient spatial or internal structure information, making segmentation on 3D cross-section images crucial.

3D image segmentation

AI approaches to image segmentation have been adapted for 3D data, being routinely applied to image stacks generated from CT (Ait Skourt et al. 2018, Kendrick et al. 2022) and MRI (Milletari et al. 2016; Lösel et al. 2020). In addition, user-friendly tools for segmenting medical images have been developed that offer built-in features for automatic image segmentation such as Dragonfly (Comet Technologies Canada Inc. 2022) and Biomedisa (Lösel et al. 2020). These have since been applied to datasets on biological systems (Lösel et al. 2023; Rolfe et al. 2023; Mulqueeney et al. 2024a) (Case Study 3).

Beyond increasing the efficiency of segmentation over manual thresholding, DL-assisted segmentation may be beneficial whenever thresholding ROIs is not possible. For example, when specimens being scanned are very dense, scans may not have a consistent perceived density (e.g., Alathari 2015; Furat et al. 2019). Another case where DL segmentation may be useful for CT data is when segmenting regions of an object made of the same material (i.e., if an object of a single material ossifies as a single structure but has varying patterns of ossification along the structure) or when multiple objects have similar densities. Objects with similar densities may be displayed at the same grayscale value through the scan, thus may be difficult to distinguish. Scans such as these are often also very noisy as a result of the high power of the beam needed to penetrate them, frequently resulting in artifacts and irregularities within images (Das et al. 2022). DL segmentation models can be trained to overcome these issues and segment scans based on visual patterns when a minimal number of slices are prelabeled (Tuladhar et al. 2020). Noteworthy uses of this approach include distinguishing fossils from rock matrices with a comparable composition within CT images (e.g., Yu et al. 2022; Edie et al. 2023), a common problem when imaging paleontological specimens.

Case Study 3: Image segmentation for volume rendering

DL tools such as Biomedisa (Lösel et al. 2020) have emerged as powerful solutions for automating feature extraction from 3D images (Fig. 7). They offer an efficient alternative to labor-intensive manual image segmentation methods. In the study by Mulqueeney et al. (2024a), a range of different training sets were used to train a CNN in order to segment CT image data of planktic foraminifera, with each of the accuracy of these models then being compared. The results showed that the efficacy of these neural networks was influenced by the quality of input data and the size of the selected training set. In the context of this case study, this is reflected in the ability of different networks to extract specific traits. In the smaller training sets, predicting the volumetric and shape measurements for internal structures presents a greater challenge compared to the external structure, primarily due to sediment infill (Zarkogiannis et al. 2020a, 2020b). However, by increasing the size of the training set through selecting additional specimens or by applying data augmentation, this problem is mitigated. This reaffirms the principle that expanding the training set leads to the production of better DL models (Bardis et al. 2020; Narayana et al. 2020), albeit with diminishing returns as accuracy approaches 100% (Kavzoglu 2009). These findings help to highlight how training sets can be designed for optimal use in precise image segmentation that is applicable for obtaining a wide range of traits.

Workflow from Mulqueeney et al. (2024a) for producing training data and applying a CNN to perform automated image segmentation, reproduced under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). The workflow includes (a) the creation of training data for the input into Biomedisa and (b) an example application of the trained CNN to automate the process of generating segmentation (label) data.
Fig. 7

Workflow from Mulqueeney et al. (2024a) for producing training data and applying a CNN to perform automated image segmentation, reproduced under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). The workflow includes (a) the creation of training data for the input into Biomedisa and (b) an example application of the trained CNN to automate the process of generating segmentation (label) data.

Isolating regions

Isolation of regions within a segmented 2D or 3D image allows for more in-depth analysis of specific areas of focus. In 2D analysis, these methods are present in behavioral ecology and neuroscience, where limb tracking of segmented species in video footage is used to infer behavior of individuals (Mathis et al. 2018; Marks et al. 2022). Similar to 2D, 3D semantic segmentation using CNNs has started gaining traction, notably in the field of pathology (Schneider et al. 2021; Rezaeitaleshmahalleh et al. 2023), engineering (Kong and Li 2018; Bhowmick et al. 2020), and materials science (Holm et al. 2020; Zhu et al. 2020), and is similarly useful for evolutionary morphology. For example, extracting individual structures, such as sutures, from micro-CT scans of whole crania allows detailed analysis of their morphology and the factors driving their evolution (Case Study 4).

Case Study 4: Image segmentation for automatic trait extraction

Segmentation can also be used to extract phenotypic features directly, such as sutures from their surrounding bones. Cranial sutures are fibrous bands of connective tissue that form the joints between the cranial bones of vertebrates (White et al. 2021). Due to sutures being small regions, measuring them is a highly time-consuming and skill-intensive task. Ongoing work of several authors (M.C., A.G., E.G., Y.H., and O.K.-C.) addresses this methodological challenge using DL models (Fig. 8). First, a dataset was created by segmenting only a specified number of slices (e.g., one out of every 100 slices), which was then split into training and validation sets for model training. Additionally, a test set was created with sutures segmented throughout the entire stack for a few scans, which is used for a robust evaluation of accuracy. As sutures are normally small regions, this creates a class imbalance issue, which can be mitigated through a weighted training approach, with sutures having more weight during training than the surrounding bones. The model performance was evaluated on the test set using the intersection over union for segmented regions. After selecting the best model, sutures for the rest of the scans were predicted and reviewed to generate high-quality suture segmentations. The resulting manually checked and corrected segmentations can be used as a new training set to enhance model performance or used for downstream analysis. Subsequently, features from the extracted sutures can be quantified. Beyond sutures, such a pipeline would be applicable to segmenting (both in 2D and 3D) any open- or close-ended structure, biological or not, that is defined by the interactions between other structures (i.e., cranial endocasts, chambers in mollusc shells, cracks in bones and other materials, and junctions between cells).

A workflow for extracting sutures on micro-CT scans. This workflow includes (a) segmenting sutures on micro-CT scans of mammal skulls. Segmented sutures are used to generate (b) 3D reconstructions, which can then be used to calculate (c) suture measurements.
Fig. 8

A workflow for extracting sutures on micro-CT scans. This workflow includes (a) segmenting sutures on micro-CT scans of mammal skulls. Segmented sutures are used to generate (b) 3D reconstructions, which can then be used to calculate (c) suture measurements.

In addition to segmentation, isolating regions in 3D meshes is a method of separating scans into biologically meaningful regions (Case Study 5). This approach, however, comes with some important challenges, such as the trade-off between computational cost and the quality of 3D data. Current methods typically employ human-created 3D meshes as benchmarks (Chen et al. 2009), which tend to have low polygon counts and thus do not reflect most biological datasets. As a result, the isolation of regions in 3D meshes has proven challenging, with various methods attempting to overcome quality issues in the CT data (Shu et al. 2022; Sun et al. 2023). For example, work by Schneider et al. (2021) attempted to address this by developing a segmentation pipeline able to process higher-polygon and nonmanifold meshes. This is important for geometric morphometrics, where variations in morphology of focal specimens are only discernible when meshes have sufficient polygons to properly map their topology.

Finally, while feature extraction of known phenotypes from supervised learning is relatively straightforward, it is less clear whether unknown or novel phenotypes are similarly recognizable or whether trained models can accommodate large amounts of variation, both of which will be common in analyses of evolutionary morphology. Nonetheless, applying AI to 3D data with species or features not included in the training set has great potential, particularly in light of promising applications of unsupervised learning to discover unknown phenotypes, for example in cell morphology (Choi et al. 2021).

Case Study 5: Feature extraction and region isolation on 3D meshes

In another example from our own work, we sought to conduct a landmark-free 3D morphometric study of skull shape in mammals on 3D meshes, but needed to isolate structures such as cranial ornaments (i.e., antlers and horns) and teeth from the specimen. This is common in analyses using landmarks, as these structures can dominate the variation in an analysis or may have more nonbiological variation due to preservation (i.e., missing teeth). These structures may also warrant their own shape analysis, independent from the skull. To accomplish this, we applied an existing application for accomplishing this task, MedMeshCNN (Schneider et al. 2021), which uses Blender, an open-source 3D software (Blender Online Community 2018). To segregate regions, edges of a mesh are assigned to a specific class (e.g., horns/antlers, teeth, and skull in Fig. 9), resulting in meshes annotated with the ROIs. A model is then trained on the annotated meshes, which can then be applied to other specimens.

Workflow for segmenting horns/antlers and teeth from a skull using Blender.
Fig. 9

Workflow for segmenting horns/antlers and teeth from a skull using Blender.

Phenomics

Phenotypes encompass morphology, behavior, development, and physiology, all of which mediate an organism's interactions with other species and its habitat. Phenomics is the organism-wide, high-dimensional extension of the study of phenotype (Houle et al. 2010). Analysis of phenomes thus entails a variety of traits, all of which are essential in understanding the dynamics of organismal evolution, yet the resolution as to which we can currently measure is limited. Here, we discuss how AI techniques can be used to more effectively describe phenotypic traits specific to morphology, with sections related to discrete and meristic traits, univariate measures, shape, color, and pose estimation.

Discrete and meristic traits

Morphological traits underpin the study of phenotypic evolution within phylogenetic systematics (Hennig 1966). Discrete and meristic traits are those manually scored by researchers, with discrete data including presence and absence and meristic data referring to counts. Discrete and meristic traits are useful for evolutionary analyses of morphology, evidenced by foundational works of morphological disparity (Foote 1993, 1997; see Goswami and Clavel 2024 for a full review). Discrete traits are also critical for diverse aspects of evolutionary study; for example, they are essential to time-calibrate molecular phylogenies and to reconstruct phylogenetic relations among extinct taxa (Smith and Turner 2005; Lee and Palci 2015). However, morphological traits for phylogenetic applications have many limitations (Lee and Palci 2015), as they can be time-consuming and difficult to collect due to personal interpretations and potential errors (Wiens 2001).

AI tools have shown potential in recognizing and extracting discrete and meristic traits to build morphological matrices for phylogenetic analysis in a quicker and more robust way. AI methods, including CNNs, have been successfully applied on small training datasets to recognize species and extract both discrete and meristic traits (Wäldchen and Mäder 2018). Other examples include using ML tools to extract, classify, and count reproductive structures (Love et al. 2021; Goëau et al. 2022), as well as to produce basic measurements such as leaf size (Weaver et al. 2020; Hussein et al. 2021). These methods have also been shown to work on x-ray scans of fossil leaves (Wilf et al. 2021), including counting stomatal and epidermal cells for paleoclimatic analysis (Zhang et al. 2023). A similar CNN algorithm has also been successfully applied to classify freshwater fish by genera from the Amazon region using photographs of museum specimens, for which traits were recognized with 97% accuracy (Robillard et al. 2023). In animal species traits identification, random forest algorithms have also shown promising results. For example, they performed better than traditional linear discriminant analysis in delimiting between species of snakes from field photographs when given a set of morphological traits (Smart et al. 2021). Overall, these algorithms have the potential to be used in morphological trait extraction and phylogenetic analysis.

Univariate measures

Univariate metrics have dominated morphometrics for centuries, but the extraction of univariate traits from a substantial pool of individuals has historically been a laborious and time-consuming process, imposing limitations on available data (Fenberg et al. 2016). Addressing this challenge, AI tools have emerged as effective solutions, streamlining the extraction of univariate traits, including lengths, mass, and size, particularly in 2D images. For instance, neural networks have proven adept at extracting linear measurements, as illustrated by the accurate forewing length extraction of 17,000 specimens of butterflies (Wilson et al. 2023). Moreover, these AI techniques have extended their capabilities beyond simple length measures, such as by measuring plant leaf areas (Kishor Kumar et al. 2017; Mohammadi et al. 2021). Advanced techniques have further enabled the measurement of length across individual anatomical regions, offering a more nuanced understanding than traditional whole-body length measures (Ariede et al. 2023). These techniques have also enabled the extraction of shape proxies, such as ellipticity (Freitas et al. 2023), and the simultaneous analysis of multiple univariate traits (Fernandes et al. 2020).

AI methodologies have seamlessly extended their proficiency from extracting 2D univariate traits to 3D, by employing analogous methods to obtain linear measurements of both length and width within 3D images (Hu et al. 2020; Lu et al. 2023). Some of these methods have the ability to concurrently extract multiple length measurements or features from 3D images (Wu et al. 2021; Yu et al. 2021). Moreover, they can provide volumetric measures of multiple components through segmentation (Lösel et al. 2023; Mulqueeney et al. 2024a).

Shape

Univariate or linear morphometrics has been a tool in evolutionary morphological analysis for centuries (Zelditch et al. 2004), but recent years have seen an explosion of geometric (landmark-based) and surface morphometrics, greatly increasing the scope for capturing and quantifying organismal shape (Mitteroecker and Schaefer 2022). While surface methods are relatively new, they are expanding rapidly and offer great potential to increase understanding of evolutionary dynamics (Bardua et al. 2019b). Currently, this step is overwhelmingly manual, representing a significant bottleneck for big data phenomic analyses from comparative datasets (Goswami and Clavel 2024). Thus, having automated approaches that can provide high-resolution measures of shape would be hugely influential in allowing large-scale comparative analyses and allowing for more reproducible decisions in trait descriptions.

Landmark-based geometric morphometrics

Landmark-based geometric morphometrics is a multivariate methodology, which requires the placement of landmarks that produce 2D or 3D coordinates by labeling homologous anatomical loci to describe biological shapes (Adams et al. 2004; Mitteroecker and Schaefer 2022). Raw coordinates are then transformed using a superimposition method, commonly Procrustes analysis, which uses scaling, rotation, and translation to register objects to a common reference frame so that only biological variation remains (Bookstein 1997). The main advantages of geometric morphometrics include the capacity to densely sample complex shapes in three dimensions: the ability to localize variation, the retention of information on biological homology, and the utility of coordinate data for numerous downstream analyses, from macroevolutionary (e.g., Goswami et al. 2022) to biomechanical analysis (e.g., Pollock et al. 2022). However, the manual placement of landmarks is time-consuming and lacks repeatability (Shearer et al. 2017), especially for big comparative datasets, even when using semi-automated methods (Bardua et al. 2019a).

Automated landmarking techniques have been developed to minimize the user's workload by automating the placement of homologous landmarks. A variety of approaches for automated landmarking via statistical image analysis have been developed in recent decades, frequently relying on image registration to propagate landmarks from one set of scans, or a generic template, to another (Young and Maga 2015; Maga et al. 2017). While these methods have improved in accuracy, they still often lack precision in identifying anatomical loci, even in closely related taxa, particularly around highly variable regions (Devine et al. 2020). Therefore, to improve the obtained results, others have attempted to use DL and computer vision to address the problem of landmark annotation.

One promising approach, currently applied only to 2D images, uses DL for the full process of automatically placing landmarks on specimens (Porto and Voje 2020; Case Study 6). Approaches currently available for 3D images combine image registration and AI, such as Devine et al. (2020), wherein deformable image registration is used to detect the landmarks and then DL is used to optimize their placement, thereby improving accuracy after mapping of landmarks from a template to specimens. Both of these approaches have been shown to reduce both data collection time and error and increase repeatability, thereby supporting phenomic-scale data collection for large datasets. Unfortunately, all present applications behave poorly with even a moderate amount of variation, effectively limiting applications to analysis of conspecifics or congeneric species at present.

Case Study 6: Geometric morphometrics—automated landmarking

AI has been successfully applied to automate placement of landmarks and semilandmarks in samples of fruit flies (Porto and Voje 2020; Salifu et al. 2022), bryozoan colonies (Porto and Voje 2020), and mice (Devine et al. 2020; Porto et al. 2021). Perhaps the most advanced implementation of DL for landmarks placement at present uses a supervised learning approach combining object detection and shape prediction to annotate landmarks (Fig. 10) (Porto and Voje 2020). Object detection, using a histogram of gradient features rather than the more common but less efficient CNN approach, was used to first identify the structure of interest, followed by shape prediction to annotate landmarks. This approach was successfully applied to three datasets of varying complexity, with object detection in particular performing well for all datasets. While only implemented for 2D images at present, the speed of data collection achieved in that study is remarkable (e.g., >13,000 bryozoan zooids annotated in 3 min, approximately the same needed to manually annotate one zooid; Porto and Voje 2020) and demonstrates the potential of AI applications to geometric morphometrics and the need to develop implementations for 3D data.

Workflow for automated landmarking in Porto and Voje (2020), showing (a) the object detection framework where a training set is used to first extract features and then perform classification, and (b) perform shape prediction using a cascade shape regression model to refine the landmark predictions. Reproduced under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).
Fig. 10

Workflow for automated landmarking in Porto and Voje (2020), showing (a) the object detection framework where a training set is used to first extract features and then perform classification, and (b) perform shape prediction using a cascade shape regression model to refine the landmark predictions. Reproduced under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).

Landmark-free geometric morphometrics

Despite these advancements in landmark-based methods, they remain limited as they rely on homologous points of comparison. As a result, they quickly lose explanatory value with increasingly disparate taxa, as homologous points become more difficult to identify and thus fewer in number (Goswami et al. 2019). The introduction of new landmark-free approaches for the analysis of shape may allow us to overcome some of these issues, though the need for grounding in homology will always be a constraint, as well as a critical requirement for maintaining biological meaningfulness, of this approach.

Landmark-free or homology-free methods aim to describe the entire shape of specimens without using landmarks. There are several methods within this family and currently most do not directly use AI, but we note a few that are promising areas of current development. The most common approaches either decimate a mesh into a large number of pseudolandmarks (i.e., points without any homology) (Boyer et al. 2015; Pomidor et al. 2016) or use an atlas-based diffeomorphic approach (Durrleman et al. 2014; Toussaint et al. 2021). These approaches allow shapes that do not share homology to be compared and limit the loss of geometric information, but they may be sensitive to factors outside of shape, including alignment, scaling, and modality (Mulqueeney et al. 2024b). Nonetheless, they offer a potentially rich source of data for AI applications, as we discuss here with a particular emphasis on diffeomorphic methods.

Broadly, diffeomorphic methods involve a shape on a deformable grid that can be stretched and compressed, with mathematical tools called diffeomorphisms, to resemble other shapes. These methods, often referred to as methods of elastic shape analysis due to their elastic nature, can be used to quantify dissimilarities between shapes, register shapes together, and analyze morphometry, all without requiring landmarking. As described in Hartman et al. (2023), these methods can be categorized into two sections: those that apply to parameterized surfaces and those on nonparameterized surfaces (i.e., containing no known point landmarks). Techniques that incorporate these methods include large deformation diffeomorphic metric mapping (Beg et al. 2005), the square root velocity framework (Srivastava et al. 2011), and currents (Benn et al. 2019). One way elastic landmark-free techniques are proving increasingly useful is when analyzing morphometry in a 2D sense, for example, when studying the boundaries of objects seen in images. Here, instead of requiring landmarks on the boundaries, the boundary curve is analyzed as a whole (Salili-James et al. 2022a). Importantly, this also allows for possible invariances to be handled. For example, the metrics within methods can be made to be invariant to shape-preserving transformations, such as scaling, translation, rotation, and/or reparameterization (i.e., where on the boundary the curve starts/ends).

Diffeomorphic methods can also be expanded into higher dimensions as seen with open curves (Lahiri et al. 2015) and closed curves (Klassen and Srivastava 2006)—this can prove particularly useful in the analysis of curves on surfaces in evolutionary datasets. Here, elastic shape analysis allows for dimensionality reduction with a tool analogous to the classic ML tool, PCA, within the true space of the shapes of the objects in a dataset (Srivastava et al. 2011). Given these advances, there has been widespread recent research on elastic methods focused on surfaces (Jermyn et al. 2017; Pierson et al. 2021; Hartman et al. 2023).

Methods using integral geometry belong to another family of approaches used to compare the surfaces of the selected objects (Wang et al. 2021a; Lin et al. 2024). These methods can avoid issues of invariance and alignment to the same extent as the landmark-free approaches noted earlier; however, their efficacy at comparing disparate datasets currently remains untested, resulting in limited applications. Additionally, these approaches have drawn some concerns over ignoring homology (Mitteroecker and Schaefer 2022), though there is great potential for reintroducing homology by combining them with AI tools for feature or trait extraction, as described earlier and demonstrated in Case Study 5.

Overall, these approaches could be used not only to study the shape of specific homologous elements, but also could accelerate studies of modularity and integration (Zelditch and Goswami 2021), which rely on large sample sizes to assess the relationships among structures, how those relationships reflect genetic, developmental, and functional associations among traits, and how they influence the evolution of morphology over shallow to deep timescales. As with landmark-based geometric morphometrics, despite the attention being paid to new AI techniques and its great potential for automating the quantification of shape, there are at present few applications to datasets above the species level.

Color

Color and patterning are key evolutionary components in taxa as diverse as insects, fishes, birds, and reptiles because of their importance in crypsis, aposematism, mimicry, communication, and sexual selection (Cuthill et al. 2017). Understanding how these patterns evolve is, therefore, crucial for understanding broader evolutionary themes such as natural and sexual selection, convergence, parallel evolution, and character displacement (Caro 2017). Color patterning can help researchers to recognize and discriminate between species and is commonly used in taxonomic, behavioral, and ecological studies (e.g., Sinpoo et al. 2019). Traditionally, studies have been limited to qualitative descriptions, which has restricted analyses to relatively small sample sizes due to the difficulty of manually comparing large numbers of diverse and complex patterns and color combinations (Hoyall Cuthill et al. 2024). Quantitative analyses of color patterning have become more common in recent years, with important large-scale studies being carried out in birds (Dale et al. 2015; Cooney et al. 2019) and butterflies (Van Der Bijl et al. 2020; Hoyall Cuthill et al. 2024). Automated and semi-automated methods have been developed to segment color from images (He et al. 2022; Weller et al. 2024) and to quantify and analyze color patterns (Maia et al. 2019).

ML offers a potential solution by processing vast amounts of data and using large image datasets of museum specimens for training and analysis (Case Study 7). ML uses feature extraction and classification to process images in species identification (Wäldchen and Mäder 2018), enabling the comparison of color patterning by quantifying both spectral (i.e., color and luminance) and spatial (i.e., the distribution of pattern elements) properties of color patterns across multiple specimens. This reduces the workload by removing the need to manually process images (Maia et al. 2019). One successful implementation is the analysis of field-based camera trap images, with one study that focused on Serengeti images having a 96% success rate compared with a crowdsourced team of human volunteers (Norouzzadeh et al. 2018). ML has further been used to identify individuals within species of small birds (Ferreira et al. 2020), pandas (Hou et al. 2020), and primates (Guo et al. 2020), based on only minute differences in color pattern.

The preparation and analysis of data workflows can be greatly improved with the use of AI, and some of the most significant progress in this area has been conducted on museum bird specimens. DL methods have been applied to segment and extract plumage from images, which greatly enhances the speed for processing color information (Cooney et al. 2022; He et al. 2022). This approach has also applied pose estimation methods to identify specific points of bird anatomy regions for extracting color information per body part (He et al. 2023). Automated methods are much faster and less subjective than manual methods for color segmentation but are less flexible. Van der Bijl et al. (2020) used a color profiling approach to assess sexual dimorphism in 369 species of butterflies, using a pixelated image to produce a linear sequence of coordinates containing lightness and color values. This method is effective but time-consuming because each specimen must be photographed, with images manipulated and standardized by hand, covering only 2% of the estimated 18,500 extant species of butterflies.

Case Study 7: Color

The wings of butterflies (Lepidoptera) are often brightly colored and conspicuous, and have evolved a high diversity of color and pattern complexity across approximately 18,500 extant species. In many species color patterns are often highly variable between sexes and are thought to have evolved through sexual selection, a hypothesis supported by behavioral studies (Panchen 1980). Qualitative observations have suggested that male birdwing butterflies (Lepidoptera: Papilionidae) can be more brightly colored than females (Vigneron et al. 2008), and that males from different regions may be visually more divergent than equivalent females. Hoyall Cuthill et al. (2024) tested these observations by using ML to quantify and characterize sexual and interspecific variation in wing patterning within this group. Euclidean spatial embeddings of 16,734 dorsal and ventral photographs of birdwing butterfly specimens (3 genera, 35 species, and 131 recognized subspecies) were generated by using DL with a triplet-trained CNN. In this method, the CNN was optimally trained to place all images so that the Euclidean distances between images from the same species are comparatively close relative to distance to images of different species (Fig. 11). CNNs are able to capture and compare features across multidimensional image embeddings and can access any variation within the image that is informative for their designated task, opening up new avenues of analysis, which were not previously possible. The approach was able to reconstruct phenotypic evolution of wing patterns and quantify sexual disparity difference for the first time, revealing high male image disparity in some species and supporting divergent selection of wing patterns in males, consistent with sexual selection. The dataset represents the entire collection of the Natural History Museum, London, the largest and most comprehensive collection of birdwing butterflies on Earth, highlighting the high-throughput ability of ML methods.

Patterns of phenotypic similarity in birdwing butterfly genera and sexes, from Hoyall Cuthill et al. 2024 and reproduced under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). Points represent individual photographs, and proximity represents image similarity. Shown are (a) genera, (b) sex, and (c) embedded images.
Fig. 11

Patterns of phenotypic similarity in birdwing butterfly genera and sexes, from Hoyall Cuthill et al. 2024 and reproduced under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). Points represent individual photographs, and proximity represents image similarity. Shown are (a) genera, (b) sex, and (c) embedded images.

Pose estimation

Pose estimation predicts the relative position of body parts to each other and is used to recognize different animal poses and their changes during locomotion (Pereira et al. 2019). While estimation is usually conducted on static images (Wei et al. 2016), these capabilities have also been adapted to recognize and quantify movement (Mathis et al. 2018). Indeed, parsing kinematic patterns from videos has become the hallmark of locomotion, biomechanics, and behavioral studies, contributing to the rapid transformation of these fields (e.g., Karashchuk et al. 2021). Pose estimation is a relatively simple computer vision problem, based on the annotation of training sets from images. Originally, algorithms were unable to recognize parts that were not sufficiently distinct from the background, an issue called the “background problem” (Diaz et al. 2013), and mitigating this required the placement of markers on the moving parts prior to filming. This problem was amplified in video estimation because motion blur also constitutes a significant challenge, requiring the use of extensive and highly specific training datasets (Nath et al. 2019). In light of these issues, the main element of novelty in the field has been the development of computer vision algorithms able to handle video analyses requiring smaller datasets without markers, such as that offered by the recently introduced DeepLabCut toolbox (Mathis et al. 2018; Nath et al. 2019), which has quickly become the standard tool used for marker-free 3D pose estimation (Fig. 12). Its capabilities are based on transfer learning: the neural network it is based upon was pretrained with large datasets, allowing the application of DL to much smaller supervised datasets (Mathis et al. 2018).

Simplified pipeline for markerless motion tracking and pose estimation from videos using DeepLabCut (Mathis et al. 2018). Limb-reduced skinks (Camaiti et al. 2023) are here used as an example of locomotion tracking.
Fig. 12

Simplified pipeline for markerless motion tracking and pose estimation from videos using DeepLabCut (Mathis et al. 2018). Limb-reduced skinks (Camaiti et al. 2023) are here used as an example of locomotion tracking.

Efforts are being made within the field of pose estimation to bridge gaps between biological and computer science expertise. This is increasingly important in the games and animation industries where there is a need to model animal behaviors and can be applied to evolutionary morphology. Manually editing each keyframe can be a painstaking task, and so physics-based models have been employed for years (e.g., for automatically animating horse gaits) (Huang et al. 2013). In recent years, ML tools have been incorporated to automate the process further, such as in the software, WeightShift, which combines full-body physics-based animation with AI to animate characters (Chapman et al. 2020), or in animating the locomotion of quadrupeds using neural networks (Zhang et al. 2018). Another area of pose estimation, which has recently benefited from ML, is via natural language. AmadeusGPT is a natural language interface for DeepLabCut, which integrates pose estimation and object segmentation (Kirillov et al. 2023). With this the end user can describe a query and get outputs without needing to code (Ye et al. 2023).

Evolutionary analysis

AI has the capacity to transform our ability to capture morphology for evolutionary analysis, as detailed earlier. Thus far, it has perhaps had the greatest impact on data acquisition, which has long been a primary bottleneck for studies of evolutionary morphology. Nonetheless, we are already seeing the implementation of AI approaches for the analysis of diverse questions in evolutionary biology but these have not reached the full potential applications of AI across the field yet. Next, we discuss a range of topics within evolutionary morphology that have already benefited from AI applications and identify key areas in evolutionary morphology that are promising for development. We also provide a table of tools (Table 1) that are already available for applying AI to evolutionary morphology.

Table 1.

Currently available tools using AI that are applicable to research in evolutionary morphologya

Tool name/libraryCapabilitiesSupported data typesPrograming languageReference
Acquiring textual data
NLTK, spaCy (python libraries)Natural language processing (NLP). For example, it can be used for extracting scientific words/taxonomic names from journal articles.TextPythonBird et al. (2009)
TaxoNERD (python library)Extracts scientific names, common names, and name abbreviations
Can link taxa mentioned to a reference taxonomy (e.g., NCBI Taxonomy, GBIF Backbone, and TAXREF)
Tabular data, text, imagesPython or RLe Guillarme and Thuiller (2022)
Pytesseract (python library)Optical character recognition (OCR) to turn images to text. Python wrapper for Google's tesseract engine.ImagesPythonDome and Sathe (2021), Tesseract OCR (2021), and Hoffstaetter (2022)
Google VisionDeep learning application programing interface to perform OCRImagesN/AWalton et al. (2020) and Vision AI (n.d.)
Deep learning
PyTorch, TensorFlow, (python libraries)DL frameworksTabular data (arrays, matrices, etc.)
Image-based data
Text
Audio
PythonAbadi et al. (2015) and Paszke et al. (2019)
Scikit-learn (python library)Tools for ML. Classification methods (e.g., Support Vector Machine), clustering methods (e.g., K-means clustering), dimension reduction (e.g., PCA), and neural networks.A variety of datatypes, from tabular data to image and sound data, etc.PythonPedregosa et al. (2011)
PIL, scikit-image, open-cv-python (python libraries)Image processing and computer vision tools. For example, thresholding, contour extraction with Snakes (Active Contour).ImagesPythonvan der Walt et al. (2014)
Monai, Biomedisa (python libraries)DL tools that are designed for processing medical imagesImages, especially medical imagesPythonLösel et al. (2020) and Cardoso et al. (2022)
LeafMachineDL and CV tools for trait extraction and measurement, from botanical imagesBotanical images, particularly herbarium sheetsPython or GUIWeaver and Smith (2023)
Image processing software
ORS Dragonfly, Avizo-Amira, VGSTUDIO MAXSoftwares for processing and segmenting medical and cross-sectional images. AI-based segmentation methods are also supported.Medical imagesThe software is not open-source; but it supports Python scriptingDragonfly: Comet Technologies Canada Inc. (2022); Avizo: Thermo Fisher Scientific (2021)
3D Slicer, ImageJ, ilastikOpen-source softwares for processing medical and cross-sectional images. Users can add extensions such as SlicerMorph, and Weka Segmentation, or build new extensions.Medical imagesC++, Python, QtSchneider et al. (2012), Kikinis et al. (2013), Arganda-Carreras et al. (2017), Berg et al. (2019), and Rolfe et al. (2021)
Tools can be used in evolutionary morphology
MeshCNNMesh classification and segmentation. Can be used for segmenting 3D mesh models of specimens.3D mesh modelsPythonHanocka et al. (2019)
Detectron2 ML libraryObject detection. Can be used for identifying a specimen in an image.ImagesPythonWu and Kirillov (2019)
Segment AnythingA pretrained segmentation tool that can generate decent segmentation resultsImagesPythonKirillov et al. (2023)
DeepLabCutA tool for placing keypoints on images and videosImages and videosPythonMathis et al. (2018) and Nath et al. (2019)
Pl@ntNetSpecies ID through identification of traits for plantsImagesN/A, input images directly to online tool (identify.plantnet.org)Pl@ntNet IPT (2023)
FloraIncognitaSpecies ID and identification of traits for plantsImagesN/A, input images directly to online tool (floraincognita.com)Mäder et al. (2021)
Fishial.aiSpecies ID and feature recognition for fishImagesN/A input images directly to web portal (portal.fishial.ai)Fishial.ai (2019)
Merlin Bird IDSpecies ID for birds from descriptions, photographs, and sound recordingsImages
Audio
N/A, input images directly to mobile app (merlin.allaboutbirds.org)Cornell Lab of Ornithology (2024)
Wolfram MathematicaIdentifying type of specimen in an image. Categorizing traits of specimens from images.ImagesWolfram Language, C/C++, JavaWolfram Research, Inc. (2024)
MaxEntModeling ecological niches of taxaSpecies occurrence data, environmental rastersJavaPhillips et al. (2024)
Hierarchy-guided neural network (HGNN)Combining hierarchical classification information with phenotypic dataImagesPythonElhamod et al. (2022)
Tool name/libraryCapabilitiesSupported data typesPrograming languageReference
Acquiring textual data
NLTK, spaCy (python libraries)Natural language processing (NLP). For example, it can be used for extracting scientific words/taxonomic names from journal articles.TextPythonBird et al. (2009)
TaxoNERD (python library)Extracts scientific names, common names, and name abbreviations
Can link taxa mentioned to a reference taxonomy (e.g., NCBI Taxonomy, GBIF Backbone, and TAXREF)
Tabular data, text, imagesPython or RLe Guillarme and Thuiller (2022)
Pytesseract (python library)Optical character recognition (OCR) to turn images to text. Python wrapper for Google's tesseract engine.ImagesPythonDome and Sathe (2021), Tesseract OCR (2021), and Hoffstaetter (2022)
Google VisionDeep learning application programing interface to perform OCRImagesN/AWalton et al. (2020) and Vision AI (n.d.)
Deep learning
PyTorch, TensorFlow, (python libraries)DL frameworksTabular data (arrays, matrices, etc.)
Image-based data
Text
Audio
PythonAbadi et al. (2015) and Paszke et al. (2019)
Scikit-learn (python library)Tools for ML. Classification methods (e.g., Support Vector Machine), clustering methods (e.g., K-means clustering), dimension reduction (e.g., PCA), and neural networks.A variety of datatypes, from tabular data to image and sound data, etc.PythonPedregosa et al. (2011)
PIL, scikit-image, open-cv-python (python libraries)Image processing and computer vision tools. For example, thresholding, contour extraction with Snakes (Active Contour).ImagesPythonvan der Walt et al. (2014)
Monai, Biomedisa (python libraries)DL tools that are designed for processing medical imagesImages, especially medical imagesPythonLösel et al. (2020) and Cardoso et al. (2022)
LeafMachineDL and CV tools for trait extraction and measurement, from botanical imagesBotanical images, particularly herbarium sheetsPython or GUIWeaver and Smith (2023)
Image processing software
ORS Dragonfly, Avizo-Amira, VGSTUDIO MAXSoftwares for processing and segmenting medical and cross-sectional images. AI-based segmentation methods are also supported.Medical imagesThe software is not open-source; but it supports Python scriptingDragonfly: Comet Technologies Canada Inc. (2022); Avizo: Thermo Fisher Scientific (2021)
3D Slicer, ImageJ, ilastikOpen-source softwares for processing medical and cross-sectional images. Users can add extensions such as SlicerMorph, and Weka Segmentation, or build new extensions.Medical imagesC++, Python, QtSchneider et al. (2012), Kikinis et al. (2013), Arganda-Carreras et al. (2017), Berg et al. (2019), and Rolfe et al. (2021)
Tools can be used in evolutionary morphology
MeshCNNMesh classification and segmentation. Can be used for segmenting 3D mesh models of specimens.3D mesh modelsPythonHanocka et al. (2019)
Detectron2 ML libraryObject detection. Can be used for identifying a specimen in an image.ImagesPythonWu and Kirillov (2019)
Segment AnythingA pretrained segmentation tool that can generate decent segmentation resultsImagesPythonKirillov et al. (2023)
DeepLabCutA tool for placing keypoints on images and videosImages and videosPythonMathis et al. (2018) and Nath et al. (2019)
Pl@ntNetSpecies ID through identification of traits for plantsImagesN/A, input images directly to online tool (identify.plantnet.org)Pl@ntNet IPT (2023)
FloraIncognitaSpecies ID and identification of traits for plantsImagesN/A, input images directly to online tool (floraincognita.com)Mäder et al. (2021)
Fishial.aiSpecies ID and feature recognition for fishImagesN/A input images directly to web portal (portal.fishial.ai)Fishial.ai (2019)
Merlin Bird IDSpecies ID for birds from descriptions, photographs, and sound recordingsImages
Audio
N/A, input images directly to mobile app (merlin.allaboutbirds.org)Cornell Lab of Ornithology (2024)
Wolfram MathematicaIdentifying type of specimen in an image. Categorizing traits of specimens from images.ImagesWolfram Language, C/C++, JavaWolfram Research, Inc. (2024)
MaxEntModeling ecological niches of taxaSpecies occurrence data, environmental rastersJavaPhillips et al. (2024)
Hierarchy-guided neural network (HGNN)Combining hierarchical classification information with phenotypic dataImagesPythonElhamod et al. (2022)

aWe include coding libraries, websites, and software, along with their application within evolutionary morphology, the data types they support, and the programing language where applicable. The table is broken into four main sections: acquiring textual data, deep learning, image processing software, and softwares for evolutionary morphology research. This table will be regularly updated at https://phenomeai.org/.

Table 1.

Currently available tools using AI that are applicable to research in evolutionary morphologya

Tool name/libraryCapabilitiesSupported data typesPrograming languageReference
Acquiring textual data
NLTK, spaCy (python libraries)Natural language processing (NLP). For example, it can be used for extracting scientific words/taxonomic names from journal articles.TextPythonBird et al. (2009)
TaxoNERD (python library)Extracts scientific names, common names, and name abbreviations
Can link taxa mentioned to a reference taxonomy (e.g., NCBI Taxonomy, GBIF Backbone, and TAXREF)
Tabular data, text, imagesPython or RLe Guillarme and Thuiller (2022)
Pytesseract (python library)Optical character recognition (OCR) to turn images to text. Python wrapper for Google's tesseract engine.ImagesPythonDome and Sathe (2021), Tesseract OCR (2021), and Hoffstaetter (2022)
Google VisionDeep learning application programing interface to perform OCRImagesN/AWalton et al. (2020) and Vision AI (n.d.)
Deep learning
PyTorch, TensorFlow, (python libraries)DL frameworksTabular data (arrays, matrices, etc.)
Image-based data
Text
Audio
PythonAbadi et al. (2015) and Paszke et al. (2019)
Scikit-learn (python library)Tools for ML. Classification methods (e.g., Support Vector Machine), clustering methods (e.g., K-means clustering), dimension reduction (e.g., PCA), and neural networks.A variety of datatypes, from tabular data to image and sound data, etc.PythonPedregosa et al. (2011)
PIL, scikit-image, open-cv-python (python libraries)Image processing and computer vision tools. For example, thresholding, contour extraction with Snakes (Active Contour).ImagesPythonvan der Walt et al. (2014)
Monai, Biomedisa (python libraries)DL tools that are designed for processing medical imagesImages, especially medical imagesPythonLösel et al. (2020) and Cardoso et al. (2022)
LeafMachineDL and CV tools for trait extraction and measurement, from botanical imagesBotanical images, particularly herbarium sheetsPython or GUIWeaver and Smith (2023)
Image processing software
ORS Dragonfly, Avizo-Amira, VGSTUDIO MAXSoftwares for processing and segmenting medical and cross-sectional images. AI-based segmentation methods are also supported.Medical imagesThe software is not open-source; but it supports Python scriptingDragonfly: Comet Technologies Canada Inc. (2022); Avizo: Thermo Fisher Scientific (2021)
3D Slicer, ImageJ, ilastikOpen-source softwares for processing medical and cross-sectional images. Users can add extensions such as SlicerMorph, and Weka Segmentation, or build new extensions.Medical imagesC++, Python, QtSchneider et al. (2012), Kikinis et al. (2013), Arganda-Carreras et al. (2017), Berg et al. (2019), and Rolfe et al. (2021)
Tools can be used in evolutionary morphology
MeshCNNMesh classification and segmentation. Can be used for segmenting 3D mesh models of specimens.3D mesh modelsPythonHanocka et al. (2019)
Detectron2 ML libraryObject detection. Can be used for identifying a specimen in an image.ImagesPythonWu and Kirillov (2019)
Segment AnythingA pretrained segmentation tool that can generate decent segmentation resultsImagesPythonKirillov et al. (2023)
DeepLabCutA tool for placing keypoints on images and videosImages and videosPythonMathis et al. (2018) and Nath et al. (2019)
Pl@ntNetSpecies ID through identification of traits for plantsImagesN/A, input images directly to online tool (identify.plantnet.org)Pl@ntNet IPT (2023)
FloraIncognitaSpecies ID and identification of traits for plantsImagesN/A, input images directly to online tool (floraincognita.com)Mäder et al. (2021)
Fishial.aiSpecies ID and feature recognition for fishImagesN/A input images directly to web portal (portal.fishial.ai)Fishial.ai (2019)
Merlin Bird IDSpecies ID for birds from descriptions, photographs, and sound recordingsImages
Audio
N/A, input images directly to mobile app (merlin.allaboutbirds.org)Cornell Lab of Ornithology (2024)
Wolfram MathematicaIdentifying type of specimen in an image. Categorizing traits of specimens from images.ImagesWolfram Language, C/C++, JavaWolfram Research, Inc. (2024)
MaxEntModeling ecological niches of taxaSpecies occurrence data, environmental rastersJavaPhillips et al. (2024)
Hierarchy-guided neural network (HGNN)Combining hierarchical classification information with phenotypic dataImagesPythonElhamod et al. (2022)
Tool name/libraryCapabilitiesSupported data typesPrograming languageReference
Acquiring textual data
NLTK, spaCy (python libraries)Natural language processing (NLP). For example, it can be used for extracting scientific words/taxonomic names from journal articles.TextPythonBird et al. (2009)
TaxoNERD (python library)Extracts scientific names, common names, and name abbreviations
Can link taxa mentioned to a reference taxonomy (e.g., NCBI Taxonomy, GBIF Backbone, and TAXREF)
Tabular data, text, imagesPython or RLe Guillarme and Thuiller (2022)
Pytesseract (python library)Optical character recognition (OCR) to turn images to text. Python wrapper for Google's tesseract engine.ImagesPythonDome and Sathe (2021), Tesseract OCR (2021), and Hoffstaetter (2022)
Google VisionDeep learning application programing interface to perform OCRImagesN/AWalton et al. (2020) and Vision AI (n.d.)
Deep learning
PyTorch, TensorFlow, (python libraries)DL frameworksTabular data (arrays, matrices, etc.)
Image-based data
Text
Audio
PythonAbadi et al. (2015) and Paszke et al. (2019)
Scikit-learn (python library)Tools for ML. Classification methods (e.g., Support Vector Machine), clustering methods (e.g., K-means clustering), dimension reduction (e.g., PCA), and neural networks.A variety of datatypes, from tabular data to image and sound data, etc.PythonPedregosa et al. (2011)
PIL, scikit-image, open-cv-python (python libraries)Image processing and computer vision tools. For example, thresholding, contour extraction with Snakes (Active Contour).ImagesPythonvan der Walt et al. (2014)
Monai, Biomedisa (python libraries)DL tools that are designed for processing medical imagesImages, especially medical imagesPythonLösel et al. (2020) and Cardoso et al. (2022)
LeafMachineDL and CV tools for trait extraction and measurement, from botanical imagesBotanical images, particularly herbarium sheetsPython or GUIWeaver and Smith (2023)
Image processing software
ORS Dragonfly, Avizo-Amira, VGSTUDIO MAXSoftwares for processing and segmenting medical and cross-sectional images. AI-based segmentation methods are also supported.Medical imagesThe software is not open-source; but it supports Python scriptingDragonfly: Comet Technologies Canada Inc. (2022); Avizo: Thermo Fisher Scientific (2021)
3D Slicer, ImageJ, ilastikOpen-source softwares for processing medical and cross-sectional images. Users can add extensions such as SlicerMorph, and Weka Segmentation, or build new extensions.Medical imagesC++, Python, QtSchneider et al. (2012), Kikinis et al. (2013), Arganda-Carreras et al. (2017), Berg et al. (2019), and Rolfe et al. (2021)
Tools can be used in evolutionary morphology
MeshCNNMesh classification and segmentation. Can be used for segmenting 3D mesh models of specimens.3D mesh modelsPythonHanocka et al. (2019)
Detectron2 ML libraryObject detection. Can be used for identifying a specimen in an image.ImagesPythonWu and Kirillov (2019)
Segment AnythingA pretrained segmentation tool that can generate decent segmentation resultsImagesPythonKirillov et al. (2023)
DeepLabCutA tool for placing keypoints on images and videosImages and videosPythonMathis et al. (2018) and Nath et al. (2019)
Pl@ntNetSpecies ID through identification of traits for plantsImagesN/A, input images directly to online tool (identify.plantnet.org)Pl@ntNet IPT (2023)
FloraIncognitaSpecies ID and identification of traits for plantsImagesN/A, input images directly to online tool (floraincognita.com)Mäder et al. (2021)
Fishial.aiSpecies ID and feature recognition for fishImagesN/A input images directly to web portal (portal.fishial.ai)Fishial.ai (2019)
Merlin Bird IDSpecies ID for birds from descriptions, photographs, and sound recordingsImages
Audio
N/A, input images directly to mobile app (merlin.allaboutbirds.org)Cornell Lab of Ornithology (2024)
Wolfram MathematicaIdentifying type of specimen in an image. Categorizing traits of specimens from images.ImagesWolfram Language, C/C++, JavaWolfram Research, Inc. (2024)
MaxEntModeling ecological niches of taxaSpecies occurrence data, environmental rastersJavaPhillips et al. (2024)
Hierarchy-guided neural network (HGNN)Combining hierarchical classification information with phenotypic dataImagesPythonElhamod et al. (2022)

aWe include coding libraries, websites, and software, along with their application within evolutionary morphology, the data types they support, and the programing language where applicable. The table is broken into four main sections: acquiring textual data, deep learning, image processing software, and softwares for evolutionary morphology research. This table will be regularly updated at https://phenomeai.org/.

Clustering and classification

Classifying individual specimens is an initial step in many evolutionary studies, but is often a time-consuming task. In recent years, more efficient methods using ML image clustering have become widespread in the classification of individuals into distinct species (Punyasena et al. 2012; Barré et al. 2017; Wäldchen and Mäder 2018; Hsiang et al. 2019; Valan et al. 2019). Current research predominantly employs CNNs (Krizhevsky et al. 2017), which excel at extracting features from images and providing probability estimates to assign images to specific species classes. These methods tend to focus on classifying species and rarely describe the relationships between classes or higher-level classification, though there are some preliminary works in this area. For example, Kiel (2021) describes a method combining DL and computer vision approaches to train a CNN to categorize images of bivalve species into family groupings based on known taxonomy. For images of each species, the algorithm estimates the probability that it belongs to each family, and the results demonstrate that this approach was accurate for family-level classification and, to a lesser extent, for topology estimation.

Morphometric data are also available for use in species classification, and in recent years ML methods have been employed to accurately classify species using morphometric data (e.g., Salifu et al. 2022; Devine et al. 2023; Lin et al. 2024; Case Study 8). For instance, Elsayed et al. (2023) developed an automated approach using CNNs for identifying and classifying 2D images of tooth fossils from various animals, including sharks, elephants, hyrax, and primates. Additionally, elastic shape analysis can be used as a preprocessing technique for ML by quantifying differences between object shapes, before feeding into classic ML tools such as K-Nearest Neighbor (KNN), as in Salili-James et al. (2022a). Here, diffeomorphic methods were used to quantify differences between the shapes of objects in various 2D image datasets, from gastropod shells to leaves. A KNN classifier was then trained to classify the genus and species, based purely on the morphology of the objects in the image. Moreover, there have recently been studies that have incorporated DL techniques with elastic shape analysis, such as Hartman et al. (2021), where a Siamese neural network was trained to predict geodesic distances between curves with diffeomorphic methods to analyze and classify objects such as the boundary curves of leaves from the notable Swedish Leaf Dataset (Söderkvist 2001, 2016).

Each of these techniques must identify distinct morphological attributes for grouping, posing challenges for taxa with few specimens, such as many fossil taxa. Despite these constraints, the ability to use ML algorithms to differentiate, cluster, and classify taxa based on morphology has vast potential for fields from species delimitation and detection to phylogenetics.

Case Study 8: Specimen classification from images

In a recent study, Hou et al. (2020) introduced the ADMorph dataset, which trained and evaluated DL models for the morphological analysis of 3D digital microfossils. The study focused on enhancing the accuracy of DL models by testing the segmentation performance of multiview CNN (Su et al. 2015), PointNet (Charles et al. 2017), and VoxNet (Maturana and Scherer 2015). The ADMorph dataset is valuable for developing and evaluating DL algorithms to precisely analyze and classify microfossil structures. Building on this foundational work, a subsequent project by Hou et al. (2021) further expands the application of DL by automating the segmentation process. This study delineated and classified approximately 500 fish microfossils within CT images, showcasing the potential of DL models to significantly streamline and enhance the accuracy of morphological analysis in paleontological research.

Species delimitation

Species delimitation, opposed to classification, requires the ability to identify whether individuals belong to a population, which in some cases may lead to assignation of individuals as new taxonomic entities (Case Study 9). Genomic species delimitation methods have been extensively used in the last decade, including Bayesian species delimitation (Yang 2015) and unsupervised ML algorithms predicting clusters of individuals from genomic data (Derkarabetian et al. 2019). More recently, CNNs have been employed to build a morphology–molecule network that integrates morphological and molecular data for species delimitation (Yang et al. 2022). Despite their widespread adoption and increasing applications in taxonomy, these methods cannot deal with taxa that are not present in the training set, rendering them ineffective for identifying novel or undiscovered species.

Emerging techniques in one-class classification systems (Perera and Patel 2019) or open set recognition (Geng et al. 2021) offer promising avenues for extending species identification beyond initial classifications done through image analysis. However, inherent challenges remain: these techniques are currently used for outlier detection and would need to be adapted to establish species. An alternative approach would be to use phenotypic traits as a basis for delimitation. Individuals can be grouped into self-similar clusters by analyzing phenotypic traits, forming the basis for delineating populations and species (Ezard et al. 2010). Traditionally, Gaussian mixture models employing a maximum likelihood approach have commonly been used (Fraley and Raftery 2002; Baylac et al. 2003), However, the advent of deep Gaussian mixture models (Viroli and McLachlan 2019), which incorporate ML techniques, may be more suitable. These models show heightened levels of complexity, enabling them to capture intricate relationships within data. Combined with the increasing ability to acquire image or trait data rapidly, they may allow for a more nuanced and comprehensive understanding of taxonomy.

On balance, unsupervised or semi-supervised AI-based integrative taxonomic tools have the potential to play a key role in furthering species discovery. In addition to phenotypic traits, researchers are obtaining additional suites of organismal data such as acoustics, behavior, and ecology. AI will be key to bringing these complex datasets together for a biologically meaningful interpretation of a species.

Case Study 9: Delimiting species

Species delimitation methods have the possibility to discover new species in natural history collections. Hansen et al. (2020) created an image database of 65,841 museum specimens comprising 361 carabid beetles from Britain and Ireland (Fig. 13). A pretrained CNN model was fine-tuned on 31,533 images, validated on 25,334 images, and tested on 19,164, assigning 51.9% of test images to correct species and 74.9% to correct genus. The authors acknowledge that specimen size was a key factor in correctly identifying specimens, but model improvements may correct for this and the applications can be extended beyond high-throughput museum collections analysis to identifying species in the field using camera traps. Combined with further classification and clustering tools, such as with heatmap analysis (Hollister et al. 2023), these models can one day be used to identify potential new species.

Images of carabid beetles used by Hansen et al. (2020) to train and test a CNN model for identifying beetle species. Image has been cropped from the original, and is reproduced under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).
Fig. 13

Images of carabid beetles used by Hansen et al. (2020) to train and test a CNN model for identifying beetle species. Image has been cropped from the original, and is reproduced under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).

Phylogenies—building trees

Evolutionary studies frequently require data structured on the relatedness between taxa in the form of phylogenetic trees. ML is increasingly applied to address some of the limitations of traditional methods (Sapoval et al. 2022; Mo et al. 2024). This includes often extremely high computational expense common to Bayesian and maximum likelihood approaches (Azouri et al. 2021, 2023), model misspecification (Abadi et al. 2020; Burgstaller-Muehlbacher et al. 2023), as well as issues with missing data for distance-based approaches (Bhattacharjee and Bayzid 2020). However, the accuracy and scalability of these methods remains uncertain (Sapoval et al. 2022). Additionally, a significant obstacle for ML methods is the scarcity of training data for tree inference (Sapoval et al. 2022). Due to the uncertainty associated with phylogenetic inference, a ground truth of phylogenies is fundamentally unknowable, leading to a reliance on simulated data that may not accurately reflect evolutionary relationships (Sapoval et al. 2022).

In recent decades, genetic data have dominated phylogenetic studies, due to well-studied models of nucleotide evolution and the sheer quantity of genomic data available (Misof et al. 2014; Álvarez-Carretero et al. 2022). Consequently, recent reviews of ML approaches for tree building (Sapoval et al. 2022; Mo et al. 2024) have focused on molecular-based phylogenetics instead of morphology-based phylogenetics. However, for many extinct species, molecular data are often not available, meaning morphological data must be used (Lee and Palci 2015).

Approaches that have been applied to sequence data have the potential to be adapted for use on morphological data. CNNs and RNNs have been employed to infer quartet (four taxa) topologies using simulated sequence alignments and protein data (Suvorov et al. 2020; Zou et al. 2020). Simulated quartet experiments outperform traditional methods like maximum likelihood, especially in scenarios of high substitution heterogeneities, which is challenging for many standard models (Zou et al. 2020). However, more recent analyses contest this, and traditional methods have outperformed neural network methods when the taxon number is increased above four (Zaharias et al. 2022). These methods have mostly been applied to individual sequences, but applying them to species trees involves further complexities such as incomplete lineage sorting and introgression (Maddison and Knowles 2006; Degnan and Rosenberg 2009; Suvorov et al. 2020). Restrictions of limited taxa and the complexity of species tree inference are emerging areas of research, such as in a recent study applying generative adversarial networks (GANs) to simulated data (Goodfellow et al. 2014).

While ML methods to estimate evolutionary relationships using genetic data have begun to be explored, either by approximating distances between taxa or by directly inferring topologies, application to morphological data remains challenging (Case Study 10). For molecular phylogenetics, there are many complex substitution models available to describe the evolution of sequence data (Hasegawa et al. 1985; Tavaré 1986). However, the same cannot be said for morphological evolution where the underlying processes are more difficult to model (Lee and Palci 2015) and where there is a lack of clearly defined smallest units of change across the tree of life. Methods such as Phyloformer (Nesterenko et al. 2022; Case Study 10A) still rely on models of sequence evolution to compute topologies. Without explicit models of morphological evolution or an ability to discern homology, such methods may be prone to the confounding effects of homoplasy and convergent evolution (Case Study 10B). While, even without an explicit model, phenotypic trees from ML-extracted morphological features can still closely match phylogenies based on genetic models (Hoyal Cuthill et al. 2019), the comparison between the two remains difficult.

Estimating nucleotide substitution models for large sequence datasets through traditional maximum likelihood methods is computationally intensive. More recently, DNNs were used to create ModelTeller (Abadi et al. 2020) and ModelRevelator (Burgstaller-Muehlbacher et al. 2023), two approaches to phylogenetic model selection that focus specifically on identifying the most appropriate substitution models for large datasets, which are otherwise not achievable due to computational constraints. With continuous increases in the size of datasets used for generating phylogenetic hypotheses, methods such as these will be key to assess most suitable models before phylogenies are built. While these DNNs both focus on molecular substitution models, their existence opens the possibility of developing new systems for selecting morphological evolutionary models.

One common issue for several phylogenetic methods (including maximum likelihood, Bayesian, and maximum parsimony) regardless of data type is the use of heuristic searches. In the case of Bayesian approaches, model parameters (e.g., tree topology and branch length) are adjusted and the likelihood of each adjustment is then estimated and compared. This approach explores tree space for a set number of iterations, aiming to identify more optimal parameter combinations, but it is limited by the extent of tree search, which makes it computationally expensive. Recently, ML methods have been applied to improve the efficiency of heuristic searches by predicting which neighboring trees will increase the likelihood without actually calculating the value, thereby reducing computational load (Azouri et al. 2021, 2023).

Another major challenge in both molecular and morphological phylogenetic studies is the impact of missing data, especially for distance-based methods where many of the most commonly used methods (e.g., neighbor joining) require data with no missing entries (Bhattacharjee and Bayzid 2020). In the case of molecular phylogenetic studies, this refers to missing bases in sequences. For morphological data this could be a result of incomplete specimens where certain traits or biological structures are missing or difficult to measure or score. Previous studies have shown that missing data negatively affect the accuracy of tree inference methods (Wiens 2006; Roure et al. 2013). ML methods such as PhyloMissForest (Pinheiro et al. 2022), which is a method based on Random Forest approach, and two methods proposed by Bhattacharjee and Bayzid (2020), use ML to estimate missing distance values within a distance matrix and may outperform traditional statistical methods. Overall, while there are numerous areas in which ML can improve phylogenetic inferences for diverse data types and methodological approaches, it is at present very poorly developed, particularly for morphological data.

Case Study 10: Creating phylogenetic frameworks

  • (A)

    Molecular and ML tree building: phyloGAN and Phyloformer

ML-based phylogenetic inferences have been limited to inferring unrooted tree topologies for quartets of taxa as the number of plausible trees increases exponentially with increase in the number of taxa (Felsenstein 1978; Suvorov et al. 2020). The phyloGAN model is an ML approach that utilizes heuristic search strategies through GAN to build trees with molecular data (Smith and Hahn 2023). It uses two networks: a generator that suggests new topologies, and a discriminator trained to differentiate real and generated data, estimating the fit of proposed topologies and alignments (Fig. 14). This method imitates the heuristic search employed by many traditional methods to explore tree space for more optimal trees. PhyloGAN shows an improvement in the number of taxa that can be considered compared to previously mentioned methods as demonstrated using seven species of fungi, but is still limited compared to traditional methods, and hampered by lengthy computational times (Smith and Hahn 2023).

Overview of phyloGAN (Smith and Hahn 2023) wherein a generator generates tree topologies with branch lengths utilizing nearest neighbor interchange (NNI) and subtree pruning and regrafting (SPR) methods. These trees are then evaluated within the discriminator to identify real versus generated data. Reproduced under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).
Fig. 14

Overview of phyloGAN (Smith and Hahn 2023) wherein a generator generates tree topologies with branch lengths utilizing nearest neighbor interchange (NNI) and subtree pruning and regrafting (SPR) methods. These trees are then evaluated within the discriminator to identify real versus generated data. Reproduced under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).

Another molecular ML tree building approach is Phyloformer, which computes distances between molecular sequences in a multiple sequence alignment (MSA) (Nesterenko et al. 2022). This method simulates trees, and then uses probabilistic models of sequence evolution, working backward to simulate MSAs. Supervised learning is used to train a transformer-based model to reverse-engineer the phylogeny based on an associated MSA. For Phyloformer, the algorithm estimates pairs of evolutionary distances between sequences that can then be used to infer a tree, using traditional methods such as neighbor joining. Phyloformer outperforms standard distance-based methods, as well as maximum likelihood due to its higher computational speed.

  • (B)

    Convergence in morphological ML tree building

One of the key challenges in utilizing morphological data to build phylogenetic trees has been morphological convergence. Kiel (2021) used feature extraction to estimate family-level classifications of bivalves and by grouping them into orders and subclasses, has attempted to overcome the limitation posed by convergence of traits. They then used the degree of certainty in clustering as a proxy for morphological similarity between families of bivalves. These probability scores are used as a proxy for morphological similarity and to construct a distance matrix, which was in turn used to cluster the families and infer a topology. While this method did find significantly more bivalve families clustering with members of their known subclasses than expected by chance, the resulting phylogeny did indicate many unlikely placements. When multiple CNNs trained at different taxonomic levels were combined, the resulting phylogeny more closely matched the expected clustering based on existing taxonomic standing.

  • (C)

    Supplementing morphological data with “known” phylogenetic data

Adaïmé et al. (2024) present a novel method incorporating preexisting phylogenetic information into ML training. Pollen grains were classified into “known” or novel taxa by multiple CNNs trained on morphological characters (shape, internal structure, and texture). The morphological features of these taxa were then fed into a phylogenetic embedding model, which uses a preexisting molecular phylogeny as the ground truth. The model uses “known” phylogenetic positions of taxa as a template to guide how the morphological characters are used to estimate phylogeny, and is considered accurate if it can use morphological characters of known taxa to infer a phylogeny closely matching the molecular phylogeny, and subsequently to place novel and fossil taxa (where genetic data have not or cannot be obtained) into the topology (Fig. 15). They tested the accuracy of the method by taking taxa that had “known” phylogenetic placements and treating them as if they were novel. The model placed these pseudonovel taxa in their “correct” respective subclades with high support. By transforming morphological characters based on preexisting phylogenetic information, this method improves upon clustering based on morphological similarity alone. However, there are potential concerns over the assumption of a ground truth in phylogenetics.

Three neural networks trained in Adaïmé et al. (2024) to score the shape, internal structures, and texture. The scores are combined and images are classified into known versus new taxa. Reproduced under a Creative Commons CC-BY-NC-ND license.
Fig. 15

Three neural networks trained in Adaïmé et al. (2024) to score the shape, internal structures, and texture. The scores are combined and images are classified into known versus new taxa. Reproduced under a Creative Commons CC-BY-NC-ND license.

Phylogenetic comparative methods and evolutionary modeling

Using a phylogenetic framework to estimate the evolution of clades and traits has become a core part of evolutionary morphology over the past few decades (Felsenstein 1985; Adams and Collyer 2019). Analysis of trait variation across phylogenies and through time relies on the availability of well-supported topologies and time calibration. Recent advances in genome sequencing and big data approaches to taxonomic sampling and trait data collection have increased the availability of time-calibrated phylogenies (Álvarez-Carretero et al. 2022). In turn, this has enhanced our ability to reliably map the evolution of traits on phylogenies and consider phylogenetic relations when examining relationships between traits across multiple taxa.

The potential of AI to reconstruct trait evolution within a phylogenetic framework has been theoretically documented. For instance, Ruder (2017) described a multitask learning approach that involves an ML framework consolidating data from various tasks. It is achieved through an algorithm that minimizes the variance of estimators by employing a penalty term to align models more closely, facilitating the simultaneous estimation of ancestral states for multiple characters. Similarly, Ho et al. (2019) illustrated the theoretical application of ML to the ancestral estimation of phenotypic traits through a multitask learning approach applied to Brownian motion models of continuous biological traits. The study showed that this approach enhanced ancestral estimations compared to maximum likelihood models, albeit with a minor bias introduced in the phylogenetic estimates.

Despite theoretical advances, there are currently few practical applications of ML approaches to estimate trait evolution. A known issue that would benefit from an AI-based modeling approach is the assignment of distinct rates of character evolution to different parts of a given phylogenetic tree (i.e., King and Lee 2015). ML would enable the simultaneous pooling of multiple data sources, including distributions of states at the tips of phylogenetic trees, branch lengths, node ages, uncertainty in node resolution, and hidden states, and consideration of a wide variety of complex models that may better reflect phenomic datasets (Goswami and Clavel 2024). This would allow the assessment of trait covariations, studies of modularity and integration, changes through time using existing phylogenies, and probabilities of key innovations versus gradual variations.

ML approaches could also facilitate the comparison of simulations across trees. Furthermore, ML methods could account for phylogenetic relatedness in analyses of trait correlations. In the field of bioinformatics, using DNN and convolutional graph network (CGN) architectures in phylogenetic profiling for protein interactions improved predictions (Moi and Dessimoz 2022). In particular, combining CGN with a graphical representation of tree topology allowed for prediction across multiple species and could be used to predict pairwise interaction across time. Using these algorithms in conjunction with phylogenetic information is currently exploratory but could potentially streamline and improve multiple aspects of estimating trait evolution and ancestral states, allowing better modeling of the complex factors underlying evolution on a phenomic scale (Niemi 2020).

Function and adaptive landscapes

In evolutionary biology, adaptive landscapes are conceptual frameworks that illustrate the relationship between the phenotype of an organism and its fitness within a specific ecological context (McGhee 1980; Simpson 1984; McGhee 1999). They provide a visual representation of natural selection-driven trait space across the blanket of an adaptive landscape, where peaks of specific traits reflect higher fitness compared to putative trait space across the landscape (Arnold 2003). Over evolutionary time, genetic variation, mutation, recombination, and natural selection drive the population toward regions of higher fitness (Arnold et al. 2001). Utilizing models of trait diversification can be helpful in tracing adaptive peaks of species through time, adapting to different ecological niches or responding to environmental shifts (e.g., Martin and Wainwright 2013). The study of adaptive landscapes is key both to understanding the evolutionary adaptive mechanisms giving rise to biodiversity and predicting the future adaptive potential of species in light of anthropogenic-driven habitat loss and climate change.

Functional adaptive landscape analysis uses the morphology and function of skeletal elements to model landscapes (Polly et al. 2016; Dickson and Pierce 2019; Jones et al. 2021; Tseng et al. 2023). In paleontology, functional adaptive landscapes commonly employ finite element analysis (FEA) as a functional metric (Polly et al. 2016; Deakin et al. 2022). ML algorithms can replace FEA to predict the behavior of a beam in a one-dimensional system if the algorithms are first trained on initial FEA. Neural networks have been suggested to provide more accurate FEA results than boosting regression trees or random forest algorithms (Vurtur Badarinath et al. 2021). Additionally, AI has been increasingly applied to FEA-based biomechanical modeling (Liu 2019; Galbusera et al. 2020; Mouloodi et al. 2021). These techniques can be applied to data extracted from static images, 3D image data (Galbusera et al. 2020), and even motion capture (Mouloodi et al. 2021). This is particularly useful for the creation of models of the range of appendicular motion, relationships between internal organs, and even models of cytokinesis (Huiskes and Hollister 1993; Ross 2005; Shi et al. 2010).

Phenome–environment and ecometrics

One of the most established areas of phenotypic analysis is quantification of relationships between phenomes (the sum of phenotypic traits) and the environmental context in which they evolved (e.g., DeGusta and Vrba 2005; Stubbs and Benton 2016; Panciroli et al. 2017; Benevento et al. 2019). The end goal of many studies using this approach is to assign an ecomorphological characterization to phenotypic traits and to parse their ecological signal (e.g., Fortuny et al. 2011; Barr 2018). AI has been implemented in this field through the use of algorithms that infer present and past ecomorphologies by reducing the dimensionality of ecomorphological data through methods such as random forest (Spradley et al. 2019; Rabinovich 2021; Sosiak and Barden 2021; Mahendiran et al. 2022). Similarly, ML procedures have been used to discriminate and sort phenotypes based on their belonging to specific ecomorphs or ecological guilds (MacLeod et al. 2022). These studies have highlighted the advantages of AI-based approaches compared to standard procedures used to test the links between morphology and ecology, such as canonical variate analysis (Albrecht 1980).

The related field of ecometrics is a taxon-free approach to quantifying the distribution of functional traits across space and time (Eronen et al. 2010). Ecometric correspondence between environmental and phenotypic data is used to develop transfer functions, which can be used to reconstruct paleoenvironments or incorporate species distribution modeling (SDM) to model future spatial distributions of phenotypes given predicted climatic scenarios (Vermillion et al. 2018; Parker et al. 2023). Existing work uses linear and maximum likelihood approaches to perform ecometric modeling. These approaches have a limit of one or two climate inputs, normally limiting analyses to consider only annual precipitation and mean annual temperature (Parker et al. 2023). However, using a random forest approach would enable the model to use any number of climatic variables. Similarly, SDMs can be built using CNNs, capturing nonlinear transformations across multiple variables (Botella et al. 2018). DL approaches to quantifying phenome–environment would enable models to better approach the complex factors contributing to climate and trait distribution, as in studies of trait evolution.

Niches and niche evolution

ML algorithms, including boosted regression tree and random forest, have become standard methodologies for modeling the ecological niches of taxa and, by extension, their potential spatial distribution. Over the past decade, research has extensively focused on predicting the ecological effects of climate change by using ecological niche modeling (Qin et al. 2017; Deb et al. 2020; Tang et al. 2021; Karuppaiah et al. 2023). The most prominent ML model in this area is the maximum entropy modeling method (MaxEnt), which has been applied in thousands of studies since its introduction (Phillips et al. 2006; Merow et al. 2013).

MaxEnt's ubiquity in scientific literature is in part due to the algorithm requiring relatively few inputs (only species occurrences and geographic data) and relying on biologically reasonable assumptions. It assumes that a taxon will occupy as large an area as possible (maximum distribution entropy; Phillips et al. 2006; Elith et al. 2011). These limitations have also produced an abundance of literature critiquing and subsequently optimizing MaxEnt's statistical assumptions and processes (Cobos et al. 2019; Low et al. 2021; Sillero and Barbosa 2021; Campos et al. 2023).

Studies that use MaxEnt or other ML methods tend to consider niches as static entities, with many publications “projecting” the same niche onto environmental rasters representing distinct points in time, sometimes thousands or millions of years ago (Saupe et al. 2019). Niche evolution studies have instead relied on measuring the contemporary niche overlap of different taxa (usually via the methodology of Broennimann et al. 2012), considering the similarities and differences within a phylogenetic context (Doré et al. 2023; Padilla-García et al. 2023; Vasconcelos et al. 2023). While both approaches are useful in understanding ecological evolution across time, they are limited by their discrete temporal sampling—niches change continuously across space and time, and an individual niche of a taxon may also change over time.

ML methods could be developed further to identify and accommodate niches changing over time. Taxon occurrences sometimes have associated temporal metadata, which could be used by an AI tool to predict the continuous changes in a niche in the recent past or near future. This could prove especially invaluable in studying the effects of climate change at a higher resolution. Considering a geological timescale, the predicted ecological niches of fossil taxa (modeled with environmental data representing periods in deep time) could be used to calibrate and, thus, further validate continuous niche evolution models across phylogenetic trees.

Prospectus

The scope of evolutionary biology is immense, involving the history of life on Earth over the past 3.7 billion years. For the vast majority of species that ever lived, the only available data are morphological in nature; thus, studying morphology is crucial for understanding the evolution of organisms. Yet, methods for capturing morphological data remain largely manual, presenting a bottleneck for the study of morphological evolution, particularly in comparison to other biological fields with mature methods for “omics”-level analyses. The use of AI is bringing about a massive transformation in the field of evolutionary morphology, both for data capture and analysis. Integrating AI techniques into this area will become increasingly important as the field continues to move toward larger-scale analyses and bigger data.

As we have discussed, AI has been successfully applied to a range of data acquisition for evolutionary morphology, and is only increasing in the pace of development and accessibility for nonexperts. For example, AI is already making it quicker to generate, refine, and access image data of larger quantities and/or greater resolutions than ever before. Large gaps remain, however, including discriminating features or ROIs, extracting discrete traits or 3D morphometric data in datasets with large amounts of variation (which are common in comparative evolutionary analysis), and in applying AI for improving evolutionary models for morphological data. There are also numerous challenges with making AI tools accessible to nonspecialists and finding affordable and sustainable solutions for the storage, annotation, and processing of datasets that are larger in both number and size. These areas should be the focus of efforts over the coming years. While we have detailed applications of AI to several research areas in morphological evolution, there are many more for which AI has yet to make a significant impact. Next, we note a few subfields of evolutionary morphology that have clear pathways for improvement through AI. Finally, we close with some considerations on the accessibility and environmental effects of using AI in research.

Emerging fields

Retrodeformation

Several studies have demonstrated that fossil data are critical for accurately estimating phenotypic evolution through deep time (e.g., Slater et al. 2012; Goswami and Clavel 2024, and references therein). A common challenge in paleontology is encountering fossils that have undergone taphonomic distortion via brittle or plastic deformation (Schlager et al. 2018; Kammerer et al. 2020). This distortion can severely hinder attempts to assess and quantify intra- and interspecific shape by introducing nonbiological variation. Consequently, in addition to the lack of integration in phylogenetic analyses, fossil data are often excluded from geometric morphometrics and phylogenetic comparative methods. Retrodeformation is the process of restoring the original shape of an object by reversing taphonomic distortion (Lautenschlager 2016; Herbst et al. 2022). While landmark- and symmetry-based procedures to manually perform these operations are available (Schlager et al. 2018), they are time-consuming and can only be applied to relatively small datasets, limiting the taxonomic breadth of studies.

AI provides an opportunity to automate and enhance this process. ML models, such as neural networks, can be trained to recognize and correct specific types of deformations. These models can learn patterns of distortion and apply appropriate corrections. In the future, AI may aid in the reconstruction of 3D objects or scans of distorted or even completely flattened fossils, thereby helping to recover valuable 3D morphology. Once models have been trained on a dataset of naturally distorted fossils and manually performed retrodeformation simulations, they can be integrated into software applications or embedded in hardware systems for real-time correction and analysis. The choice of AI algorithms will depend on the specific application and the nature of the deformations to be corrected. For instance, de Oliveira Coelho (2015) used logistic model trees to predict the temperature at which human bone was burnt. Similarly, Zeng et al. (2021) used a support vector machine algorithm to detect small geological faults. Such methods could be adapted to estimate the extent of brittle and ductile deformation a fossil has undergone, thereby enabling evolutionary morphologists to apply the opposite forces to correct the distortion.

Histology

Histology examines the microscopic structure and morphology of tissues, encompassing both contemporary and fossil tissues in the field of paleohistology. Historically, paleohistology has provided insights into growth, physiology, and development, while its application has expanded to investigate tissue form and function. For instance, Bailleul et al. (2012, 2019) explored the function of duck-billed dinosaur dental batteries through paleohistological techniques. The advent of AI tools has significantly advanced histology, particularly in histopathology, enhancing cancer recognition and clinical oncology (Shmatko et al. 2022). AI holds promise for increasing throughput in pattern recognition tasks. Current applications of AI in biological research include the quality assessment of histological images (Haghighat et al. 2022) and the characterization of herbivore diets through microhistological analysis (Filella et al. 2023). Moreover, neural networks have been employed to identify primary and secondary osteon regions, producing segmented maps of various osteon regions. This segmentation, in conjunction with phylogenetic analysis, has elucidated developmental pathways leading to miniaturization in theropod dinosaurs (Qin et al. 2022b). The potential for AI in histological studies is substantial, particularly within the context of investigating evolutionary morphology using landmark-free morphometrics, marking it as a promising avenue for future research.

Genome–phenome mapping

AI has been applied in two main areas of genome–phenome association (GPA): the medical sciences and food production. This is not surprising, as both are umbrella areas of research with high societal impact. Different AI algorithms have been applied in a variety of genome-wide association studies related to human health, helping to link genetic variants to different pathologies in complex ways (Long et al. 2023). Neural network approaches have also been developed to both understand the association between small genomic mutations and clinical phenotypes (Mieth et al. 2021) and also the complex correlation between microbial communities and diseases (Liu et al. 2021a). In an agricultural setting, similar neural network approaches have been used to predict potential phenotypes in genetically modified rice crops (Islam et al. 2023). More complex approaches have recently been tested on both human and agricultural datasets and have been found to predict not just the genomic or phonemic component, but also potential new associations between the two through the use of weighted deep matrix GPA (Tan et al. 2022). GPA approaches have clear implications for the future of evolutionary biology and phenomics studies, for example by allowing to connect morphological changes with specific mutations. Although these methods are still in their infancy and have yet to find wider applications outside of the medical and agricultural fields.

Evo-devo

ML has been successfully applied to the study of gene expression in embryonic development of model organisms (Feltes et al. 2018; Naert et al. 2021; Čapek et al. 2023). Algorithms have also been developed to aid in phenotyping and staging embryos and to recognize diseases and malformations (e.g., Jeanray et al. 2015; Al-Saaidah et al. 2017). In evolutionary developmental biology (evo-devo), these methods have only recently been applied to phenotype identification. A few pilot studies have been conducted using both image and morphometric data on human cells, model organisms, and plants (Masaeli et al. 2016; Cai and Ge 2017; Chen et al. 2020). CNNs have been used to extract visual patterns from images, to aid embryo staging, and to analyze changes in phenotype during ontogeny (Feltes et al. 2018; Naert et al. 2021). Through further development of these methods and their applications to nonmodel organisms, it will be possible to conduct more through studies comparing developmental phenotypes across multiple lineages.

Data engineering

Focus should be not only on potential fields and AI methods but also on the morphological data itself. Data engineering involves the preparation of data before any analysis or methods can be applied—this is undoubtedly a crucial aspect of AI as a whole. With the acceleration of data acquisition, along with previously collected data (MorphoSource: Boyer et al. 2016; Phenome10K: Goswami 2015; DigiMorph: Rowe 2002), the volume of usable data is increasing. In order to take full advantage of the potential of AI and “big data” in evolutionary morphology studies, first the previously collected data should be transformed into AI-ready formats. This data is then suitable to be used to explore learning strategies such as self-supervised learning. Morphological data, similar to medical data, often require extensive domain knowledge for labeling, leading to the time-consuming creation of labeled training sets. Therefore, using unlabeled data in training could be a viable option. To have better data and performance evaluation, interdisciplinary collaboration is essential; biologists can help AI experts in tailoring methods to better suit the specific data. Preparing these large phenomics datasets for algorithm training is the first step to integrate AI methods into large-scale phenomics studies.

Accessibility and considerations

Until very recently, most AI models were built and applied using Python libraries such as Caffe, TensorFlow, and PyTorch (Jia et al. 2014; Abadi et al. 2015; Paszke et al. 2019), requiring both AI and programing knowledge. Additionally, running these models required specialized, expensive hardware, such as GPUs, which are commonly used in training AI models. Consequently, the required level of expert understanding of AI and costly hardware restricted the accessibility for many researchers in the biological sciences.

As AI continues to advance, it is becoming increasingly accessible to nonexperts and more affordable to implement due to several factors: (1) increasingly user-friendly software has reduced the need for in-depth AI-related knowledge; (2) the growth of open-source and pretrained models has significantly reduced the computational resources, data, and time required to develop AI models; and (3) the advent of cloud-based AI services has allowed researchers to access powerful AI models without investing in local GPUs. Furthermore, with the expansion of large language models in the public domain, via services such as ChatGPT (Brown et al. 2020), many researchers can now use AI to learn how to code, enabling them to program and train their own models (Cooper et al. 2024).

Despite these advancements, there are many challenges and certain aspects that require a degree of caution. AI relies on the data it has been trained on, and the people who have developed it. Biases in data, as well as cognitive bias, can cause or emphasize biases in the algorithm, and can lead to incorrect results (Mehrabi et al. 2021; Zhang et al. 2022). Attention must also be given to data cleaning and preprocessing to mitigate unrefined training data. Additionally, when it comes to data within the natural sciences, especially within the public domain, particular attention must be given to maintain consistency during the data processing stages, for example, to ensure morphological or taxonomic annotations are following standard framework.

Moreover, even with cloud-based AI services, storage and processing power are expensive and will be limiting factors for many researchers. Additionally, the environmental impact of AI cannot be overlooked, particularly as many studies in our fields aim to protect the natural world and limit human-caused climate change and destruction of biodiversity. Evolutionary morphology studies increasingly involve the collection and storage of large quantities of image data. These datasets are currently limited by the hours of manual input required, but will only increase in size as AI approaches allow for more efficient processing and analysis, leading to larger, more complex studies that in turn require increased hardware and energy input. Training large-scale models can consume substantial amounts of energy, contributing to carbon emissions, although admittedly the models trained and used in evolutionary biology are unlikely to be as large as those from tech giants like Google, Meta, and OpenAI. Some studies using large-scale genetic datasets have estimated the carbon footprint of their computational analyses (Philippe et al. 2019; Qin et al. 2022a). More formal approaches to sustainable computer science are being developed in the form of emission calculation tools (Lacoste et al. 2019; Lannelongue et al. 2021), assessments of their suitability for various approaches (Bouza et al. 2023), and proposed principles for greener computational science in the future (Lannelongue et al. 2023). As the scale of AI models and the demand for AI continue to grow, it will be increasingly important for us to evaluate the environmental impact of future studies in evolutionary morphology.

To conclude, we have here provided an introduction and overview of the current and potential future applications of AI to evolutionary morphology. At present, many of these methods remain technical and difficult to apply, due to the need for advanced coding knowledge and access to good hardware such as high-memory GPUs or high-performance computing systems. Developments are, therefore, required to make these methods more widely accessible and to allow for greater understanding and addressing of their capabilities and limitations. As AI becomes more accessible and tailored toward applications central to the study of evolutionary biology, we expect that it will transform the study of evolutionary morphology. By accelerating and improving capture and analysis of “big data” on phenotype for diverse comparative datasets, AI will allow the realization of evolutionary phenomics and launch a new phase in the study of past and present biodiversity.

Acknowledgments

For thought-provoking and valuable conversations that have broadened our thinking, we thank Katie Collins, Thomas Ezard, and the members of the AI and Innovation group at the Natural History Museum. We gratefully thank our families, friends, and colleagues who assisted in translating our abstract, including Yannick Okou Able, Oliver Hawlitschek, Michaël Ramalanjaona, Matthew T. Wisdom, Bjarte Rettedal, Lynn L. R. de Miranda, Adam Cieplinski, Monish Prasad, Megan Storme Gathercole, and Inèz Faul. Some abstract translations were assisted by ChatGPT, alongside native and non-native language speakers.

Funding

This work was supported by Leverhulme Trust [grant RPG-2021-424 to A.G., M.C., E.G., and Y.H.]; Natural Environmental Research Council [grants NE/S007210/1 and NE/P019269/1 to J.M.M.; NE/S007229/1 to E.C.W., N.S.B., and J.M.; and NE/S007415/1 to E.S.E.H. and O.K.-C.]; BBSRC [grant BB/X014819/1 to A.G. and L.E.R.]; Lateinamerika-Zentrum Zürich (Switzerland) to G.R.-deL.; EU Horizon 2020 Marie Skłodowska-Curie Actions to A.V.M.; a Daphne Jackson Research Fellowship funded by the Anatomical Society to V.H.; UKRI [grant EP/Y010256/1 to A.K. and T.W.]; SUMMIT grant sponsored by Charles Wilson and Rowena Olegario to A.S.J., Q.G., and S.T.S.P.; and funding from NHS-X, GSK, and Ely-Lilly to E.G.

Conflict of interest

The authors declare no competing interests.

Data availability

No new data were generated or analyzed in support of this research. The tools table in this paper will be kept updated at http://www.phenomeai.org/.

References

Abadi
S
,
Avram
O
,
Rosset
S
,
Pupko
T
,
Mayrose
I
.
2020
.
ModelTeller: model selection for optimal phylogenetic reconstruction using machine learning
.
Mol Biol Evol
37
:
3338
52
.

Abadi
M
,
Agarwal
A
,
Barham
P
,
Brevdo
E
,
Chen
Z
,
Citro
C
,
Corrado
GS
,
Davis
A
,
Dean
J
,
Devin
M
et al.
2015
.
TensorFlow: large-scale machine learning on heterogeneous systems
.
TensorFlow
(https://www.tensorflow.org/).

Adaïmé
M-É
,
Kong
S
,
Punyasena
SW
.
2024
.
Deep learning approaches to the phylogenetic placement of extinct pollen morphotypes
.
PNAS Nexus
3
:
pgad419
.

Adams
DC
,
Rohlf
FJ
,
Slice
DE
.
2004
.
Geometric morphometrics: ten years of progress following the ‘revolution
.’
Ital J Zool
71
:
5
16
.

Adams
DC
,
Collyer
ML
.
2019
.
Phylogenetic comparative methods and the evolution of multivariate phenotypes
.
Annu Rev Ecol Evol Syst
50
:
405
25
.

Ahmed
K
,
Keskar
NS
,
Socher
R
.
2017
.
Weighted transformer network for machine translation
.
arXiv published online
.

Ait Skourt
B
,
El Hassani
A
,
Majda
A
.
2018
.
Lung CT image segmentation using deep neural networks
.
Procedia Comput Sci
127
:
109
13
.

Akçakaya
M
,
Yaman
B
,
Chung
H
,
Ye
JC
.
2022
.
Unsupervised deep learning methods for biological image reconstruction and enhancement: an overview from a signal processing perspective
.
IEEE Signal Process Mag
39
:
28
44
.

Alathari
T
.
2015
.
Feature extraction in volumetric images
[doctoral thesis]. University of Southampton, Physical Sciences and Engineering. ePrints Soton
(https://eprints.soton.ac.uk/379936/).

Alberch
P
,
Gould
SJ
,
Oster
GF
,
Wake
DB
.
1979
.
Size and shape in ontogeny and phylogeny
.
Paleobiology
5
:
296
317
.

Albrecht
GH
.
1980
.
Multivariate analysis and the study of form with special reference to canonical variate analysis
.
Am Zool
20
:
679
93
.

Al-Kofahi
Y
,
Zaltsman
A
,
Graves
R
,
Marshall
W
,
Rusu
M
.
2018
.
A deep learning-based algorithm for 2-D cell segmentation in microscopy images
.
BMC Bioinf
19
:
365
.

Al-Saaidah
B
,
Al-Nuaimy
W
,
Al-Taee
M
,
Young
I
,
Al-Jubouri
Q
.
2017
.
Identification of tail curvature malformation in zebrafish embryos
.
ICIT 2017–8th International Conference on Information Technology Proceedings
. p.
588
93
.

Álvarez-Carretero
S
,
Tamuri
AU
,
Battini
M
,
Nascimento
FF
,
Carlisle
E
,
Asher
RJ
,
Yang
Z
,
Donoghue
PCJ
,
dos Reis
M
.
2022
.
A species-level timeline of mammal evolution integrating phylogenomic data
.
Nature
602
:
263
7
.

Amalfitano
D
,
Faralli
S
,
Hauck
JCR
,
Matalonga
S
,
Distante
D
.
2024
.
Artificial intelligence applied to software testing: a tertiary study
.
ACM Comput Surv
56
:
1
38
.

Angermueller
C
,
Pärnamaa
T
,
Parts
L
,
Stegle
O
.
2016
.
Deep learning for computational biology
.
Mol Syst Biol
12
:
878
.

Arganda-Carreras
I
,
Kaynig
V
,
Rueden
C
,
Eliceiri
KW
,
Schindelin
J
,
Cardona
A
,
Sebastian Seung
H
.
2017
.
Trainable Weka segmentation: a machine learning tool for microscopy pixel classification
.
Bioinformatics
33
:
2424
6
.

Ariede
RB
,
Lemos
CG
,
Batista
FM
,
Oliveira
RR
,
Agudelo
JFG
,
Borges
CHS
,
Iope
RL
,
Almeida
FLO
,
Hashimoto
DT
.
2023
.
Computer vision system using deep learning to predict rib and loin yield in the fish Colossoma macropomum
.
Anim Genet
54
:
375
88
.

Arnold
SJ
.
2003
.
Performance surfaces and adaptive landscapes
.
Integr Comp Biol
43
:
367
75
.

Arnold
SJ
,
Pfrender
ME
,
Jones
AG
.
2001
.
The adaptive landscape as a conceptual bridge between micro- and macroevolution
.
Genetica
112/113
:
9
32
.

Atz
K
,
Grisoni
F
,
Schneider
G
.
2021
.
Geometric deep learning on molecular representations
.
Nat Mach Intell
3
:
1023
32
.

Audagnotto
M
,
Czechtizky
W
,
De Maria
L
,
Käck
H
,
Papoian
G
,
Tornberg
L
,
Tyrchan
C
,
Ulander
J
.
2022
.
Machine learning/molecular dynamic protein structure prediction approach to investigate the protein conformational ensemble
.
Sci Rep
12
:
10018
.

Azouri
D
,
Abadi
S
,
Mansour
Y
,
Mayrose
I
,
Pupko
T
.
2021
.
Harnessing machine learning to guide phylogenetic-tree search algorithms
.
Nat Commun
12
:
1983
.

Azouri
D
,
Granit
O
,
Alburquerque
M
,
Mansour
Y
,
Pupko
T
,
Mayrose
I
.
2023
.
The tree reconstruction game: phylogenetic reconstruction using reinforcement learning
.
arXiv published online
().

Baevski
A
,
Auli
M
.
2019
.
Adaptive input representations for neural language modeling
.
arXiv published online
().

Bailleul
AM
,
Hall
BK
,
Horner
JR
.
2012
.
First evidence of dinosaurian secondary cartilage in the post-hatching skull of Hypacrosaurus stebingeri (Dinosauria: Ornithischia)
.
PLoS One
7
:
e36112
.

Bailleul
AM
,
O'Connor
J
,
Schweitzer
MH
.
2019
.
Dinosaur paleohistology: review, trends, and new avenues of investigation
.
PeerJ
7
:
e7764
.

Bardis
M
,
Houshyar
R
,
Chantaduly
C
,
Ushinsky
A
,
Glavis-Bloom
J
,
Shaver
M
,
Chow
D
,
Uchio
E
,
Chang
P
.
2020
.
Deep learning with limited data: organ segmentation performance by U-Net
.
Electronics
9
:
1199
.

Bardua
C
,
Felice
RN
,
Watanabe
A
,
Fabre
A-C
,
Goswami
A
.
2019a
.
A practical guide to sliding and surface semilandmarks in morphometric analyses
.
Integr Org Biol
1
:
obz016
.

Bardua
C
,
Wilkinson
M
,
Gower
DJ
,
Sherratt
E
,
Goswami
A
.
2019b
.
Morphological evolution and modularity of the caecilian skull
.
BMC Evol Biol
19
:
30
.

Barr
AW
.
2018
.
Ecomorphology
. In:
Croft
DA
,
Su
DF
,
Simpson
SW,
editors.
Methods in paleoecology: vertebrate paleobiology and paleoanthropology
.
Cham: Springer International Publishing
. p.
339
49

Barré
P
,
Stöver
BC
,
Müller
KF
,
Steinhage
V
.
2017
.
LeafNet: a computer vision system for automatic plant species identification
.
Ecol Inform
40
:
50
6
.

Baylac
M
,
Villemant
C
,
Simbolotti
G
.
2003
.
Combining geometric morphometrics with pattern recognition for the investigation of species complexes
.
Biol J Linn Soc
80
:
89
98
.

Beg
MF
,
Miller
MI
,
Trouvé
A
,
Younes
L
.
2005
.
Computing large deformation metric mappings via geodesic flows of diffeomorphisms
.
Int J Comput Vision
61
:
139
57
.

Benevento
GL
,
Benson
RBJ
,
Friedman
M
.
2019
.
Patterns of mammalian jaw ecomorphological disparity during the Mesozoic/Cenozoic transition
.
Proc R Soc B Biol Sci
286
:
20190347
.

Benn
J
,
Marsland
S
,
McLachlan
RI
,
Modin
K
,
Verdier
O
.
2019
.
Currents and finite elements as tools for shape space
.
J Math Imagng Vision
61
:
1197
220
.

Berg
S
,
Kutra
D
,
Kroeger
T
,
Straehle
CN
,
Kausler
BX
,
Haubold
C
,
Schiegg
M
,
Ales
J
,
Beier
T
,
Rudy
M
et al.
2019
.
Ilastik: interactive machine learning for (bio)image analysis
.
Nat Methods
16
:
1226
32
.

Bhattacharjee
A
,
Bayzid
MS
.
2020
.
Machine learning-based imputation techniques for estimating phylogenetic trees from incomplete distance matrices
.
BMC Genomics
21
:
497
.

Bhowmick
S
,
Nagarajaiah
S
,
Veeraraghavan
A
.
2020
.
Vision and deep learning-based algorithms to detect and quantify cracks on concrete surfaces from UAV videos
.
Sensors
20
:
6299
.

Bird
S
,
Loper
E
,
Klein
E
.
2009
.
Natural language processing with Python: analyzing text with the natural language toolkit
.
Sebastopol, California: O'Reilly Media, Inc
.

Blagoderov
V
,
Kitching
I
,
Livermore
L
,
Simonsen
T
,
Smith
V
.
2012
.
No specimen left behind: industrial scale digitization of natural history collections
.
ZooKeys
209
:
133
46
.

Blender Online Community
.
2018
.
Blender—a 3D modelling and rendering package
.
Stichting Blender Foundation
(https://www.blender.org).

Bookstein
FL
.
1997
.
Landmark methods for forms without landmarks: morphometrics of group differences in outline shape
.
Med Image Anal
1
:
225
43
.

Borowiec
ML
,
Dikow
RB
,
Frandsen
PB
,
McKeeken
A
,
Valentini
G
,
White
AE
.
2022
.
Deep learning as a tool for ecology and evolution
.
Methods Ecol Evol
13
:
1640
60
.

Botella
C
,
Joly
A
,
Bonnet
P
,
Monestiez
P
,
Munoz
F
.
2018
.
A deep learning approach to species distribution modelling
. In:
Joly
A
,
Vrochidis
S
,
Karatzas
K
,
Karppinen
A
,
Bonnet
P,
editors.
Multimedia tools and applications for environmental & biodiversity informatics
.
Cham: Springer International Publishing
. p.
169
99.

Bouza
L
,
Bugeau
A
,
Lannelongue
L
.
2023
.
How to estimate carbon footprint when training deep learning models? A guide and review
.
Environ Res Commun
5
:
115014
.

Boyer
DM
,
Puente
J
,
Gladman
JT
,
Glynn
C
,
Mukherjee
S
,
Yapuncich
GS
,
Daubechies
I
.
2015
.
A new fully automated approach for aligning and comparing shapes
.
Anat Rec
298
:
249
76
.

Boyer
DM
,
Gunnell
GF
,
Kaufman
S
,
McGeary
TM
.
2016
.
MorphoSource: archiving and sharing 3-D digital specimen data
.
Paleontol Soc Papers
22
:
157
81
.

Boykov
Y
,
Veksler
O
,
Zabih
R
.
1999
.
Fast approximate energy minimization via graph cuts
.
Proceedings of the Seventh IEEE International Conference on Computer Vision
, p.
377
84
.

Breiman
L
.
2001
.
Random forests
.
Mach Learn
45
:
5
32
.

Broennimann
O
,
Fitzpatrick
MC
,
Pearman
PB
,
Petitpierre
B
,
Pellissier
L
,
Yoccoz
NG
,
Thuiller
W
,
Fortin
M-J
,
Randin
C
,
Zimmermann
NE
et al.
2012
.
Measuring ecological niche overlap from occurrence and spatial environmental data
.
Global Ecol Biogeogr
21
:
481
97
.

Brown
TB
,
Mann
B
,
Ryder
N
,
Subbiah
M
,
Kaplan
J
,
Dhariwal
P
,
Neelakantan
A
,
Shyam
P
,
Sastry
G
,
Askell
A
et al.
2020
.
Language models are few-shot learners
.
arXiv published online
().

Buda
M
,
Maki
A
,
Mazurowski
MA
.
2018
.
A systematic study of the class imbalance problem in convolutional neural networks
.
Neural Netw
106
:
249
59
.

Burgstaller-Muehlbacher
S
,
Crotty
SM
,
Schmidt
HA
,
Reden
F
,
Drucks
T
,
Von Haeseler
A
.
2023
.
ModelRevelator: fast phylogenetic model estimation via deep learning
.
Mol Phylogenet Evol
188
:
107905
.

Byeon
W
,
Breuel
TM
,
Raue
F
,
Liwicki
M
.
2015
.
Scene labeling with LSTM recurrent neural networks
.
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
.
IEEE
. p.
3547
55.

Cai
Z
,
Ge
S
.
2017
.
Machine learning algorithms improve the power of phytolith analysis: a case study of the tribe Oryzeae (Poaceae)
.
J Syst Evol
55
:
377
84
.

Camaiti
M
,
Evans
AR
,
Hipsley
CA
,
Hutchinson
MN
,
Meiri
S
,
de Oliveira Anderson
R
,
Slavenko
A
,
Chapple
DG
.
2023
.
Macroecological and biogeographical patterns of limb reduction in the world's skinks
.
J Biogeogr
50
:
428
40
.

Campos
JC
,
Garcia
N
,
Alírio
J
,
Arenas-Castro
S
,
Teodoro
AC
,
Sillero
N
.
2023
.
Ecological niche models using MaxEnt in Google Earth Engine: evaluation guidelines and recommendations
.
Ecol Inform
76
:
102147
.

Candès
E
,
Demanet
L
,
Donoho
D
,
Ying
L
.
2006
.
Fast discrete curvelet transforms
.
Multiscale Model Simul
5
:
861
99
.

Canizo
M
,
Triguero
I
,
Conde
A
,
Onieva
E
.
2019
.
Multi-head CNN–RNN for multi-time series anomaly detection: an industrial case study
.
Neurocomputing
363
:
246
60
.

Čapek
D
,
Safroshkin
M
,
Morales-Navarrete
H
,
Toulany
N
,
Arutyunov
G
,
Kurzbach
A
,
Bihler
J
,
Hagauer
J
,
Kick
S
,
Jones
F
et al.
2023
.
EmbryoNet: using deep learning to link embryonic phenotypes to signaling pathways
.
Nat Methods
20
:
894
901
.

Carbonell
JG
,
Michalski
RS
,
Mitchell
TM
.
1983
.
Machine learning: a historical and methodological analysis
.
AI Mag
4
:
69
.

Cardini
A
,
Elton
S
.
2007
.
Sample size and sampling error in geometric morphometric studies of size and shape
.
Zoomorphology
126
:
121
34
.

Cardoso
MJ
,
Li
W
,
Brown
R
,
Ma
N
,
Kerfoot
E
,
Wang
Y
,
Murrey
B
,
Myronenko
A
,
Zhao
C
,
Yang
D
et al.
2022
.
MONAI: an open-source framework for deep learning in healthcare
.
arXiv published online
().

Caro
T
.
2017
.
Wallace on coloration: contemporary perspective and unresolved insights
.
Trends Ecol Evol
32
:
23
30
.

Chan
T
,
Vese
L
.
1999
.
An active contour model without edges
. In:
Nielsen
M
,
Johansen
P
,
Olsen
OF
,
Weickert
J,
editors.
Scale-space theories in computer vision
.
Berlin and Heidelberg: Springer
. p.
141
51.

Chapman
D
,
Daoust
T
,
Ormos
A
,
Lewis
J
.
2020
.
WeightShift: accelerating animation at Framestore with physics
.
Eurographics/ACM SIGGRAPH Symposium on Computer Animation—Showcases
.

Charles
RQ
,
Su
H
,
Kaichun
M
,
Guibas
LJ
.
2017
.
PointNet: deep learning on point sets for 3D classification and segmentation
.
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
.
IEEE
. p.
77
85.

Chen
L
,
Pan
XY
,
Guo
W
,
Gan
Z
,
Zhang
YH
,
Niu
Z
,
Huang
T
,
Cai
YD
.
2020
.
Investigating the gene expression profiles of cells in seven embryonic stages with machine learning algorithms
.
Genomics
112
:
2524
34
.

Chen
L-C
,
Papandreou
G
,
Schroff
F
,
Adam
H
.
2017
.
Rethinking atrous convolution for semantic image segmentation
.
arXiv published online
().

Chen
T
,
Guestrin
C
.
2016
.
XGBoost: a scalable tree boosting system
.
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
. p.
785
94.

Chen
X
,
Golovinskiy
A
,
Funkhouser
T
.
2009
.
A benchmark for 3D mesh segmentation
.
ACM Trans Graph
28
:
1
12
.

Choi
HJ
,
Wang
C
,
Pan
X
,
Jang
J
,
Cao
M
,
Brazzo
JA
,
Bae
Y
,
Lee
K
.
2021
.
Emerging machine learning approaches to phenotyping cellular motility and morphodynamics
.
Phys Biol
18
:
041001
.

Cobos
ME
,
Peterson
AT
,
Barve
N
,
Osorio-Olvera
L
.
2019
.
kuenm: an R package for detailed development of ecological niche models using MaxEnt
.
PeerJ
7
:
e6281
.

Comet Technologies Canada Inc
.
2022
.
Dragonfly
.
Dragonfly Software
(https://www.theobjects.com/dragonfly).

Cooney
CR
,
He
Y
,
Varley
ZK
,
Nouri
LO
,
Moody
CJA
,
Jardine
MD
,
Liker
A
,
Székely
T
,
Thomas
GH
.
2022
.
Latitudinal gradients in avian colourfulness
.
Nat Ecol Evol
6
:
622
9
.

Cooney
CR
,
Varley
ZK
,
Nouri
LO
,
Moody
CJA
,
Jardine
MD
,
Thomas
GH
.
2019
.
Sexual selection predicts the rate and direction of colour divergence in a large avian radiation
.
Nat Commun
10
:
1773
.

Cooney
CR
,
Bright
JA
,
Capp
EJ
,
Chira
AM
,
Hughes
EC
,
Moody
CJ
,
Nouri
LO
,
Varley
ZK
,
Thomas
GH
.
2017
.
Mega-evolutionary dynamics of the adaptive radiation of birds
.
Nature
542
:
344
7
.

Cooper
N
,
Clark
AT
,
Lecomte
N
,
Qiao
H
,
Ellison
AM
.
2024
.
Harnessing large language models for coding, teaching, and inclusion to empower research in ecology and evolution
.
Methods Ecol Evol
2041-210X
:
14325
.

Cornell Lab of Ornithology
.
2024
.
Merlin Bird ID
(https://merlin.allaboutbirds.org/).

Cortes
C
,
Vapnik
V
.
1995
.
Support-vector networks
.
Mach Learn
20
:
273
97
.

Cunningham
JA
,
Rahman
IA
,
Lautenschlager
S
,
Rayfield
EJ
,
Donoghue
PCJ
.
2014
.
A virtual world of paleontology
.
Trends Ecol Evol
29
:
347
57
.

Cuthill
IC
,
Allen
WL
,
Arbuckle
K
,
Caspers
B
,
Chaplin
G
,
Hauber
ME
,
Hill
GE
,
Jablonski
NG
,
Jiggins
CD
,
Kelber
A
et al.
2017
.
The biology of color
.
Science
35
:
eaan0221
.

Dalal
N
,
Triggs
B
.
2005
.
Histograms of oriented gradients for human detection
.
2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
Vol.
1
.
IEEE
. p.
886
93.

Dale
J
,
Dey
CJ
,
Delhey
K
,
Kempenaers
B
,
Valcu
M
.
2015
.
The effects of life history and sexual selection on male and female plumage coloration
.
Nature
527
:
367
70
.

Das
S
,
Nayak
GK
,
Saba
L
,
Kalra
M
,
Suri
JS
,
Saxena
S
.
2022
.
An artificial intelligence framework and its bias for brain tumor segmentation: a narrative review
.
Comput Biol Med
143
:
105273
.

Davies
TG
,
Rahman
IA
,
Lautenschlager
S
,
Cunningham
JA
,
Asher
RJ
,
Barrett
PM
,
Bates
KT
,
Bengtson
S
,
Benson
RBJ
,
Boyer
DM
et al.
2017
.
Open data and digital morphology
.
Proc R Soc B Biol Sci
284
:
20170194
.

de Oliveira Coelho
JPV
.
2015
.
Unwarping heated bones: a quantitative analysis of heat-induced skeletal deformations using 3D geometric morphometrics
[
doctoral dissertation
].
University of Coimbra
.

Deakin
WJ
,
Anderson
PS
,
Boer
W
,
Smith
TJ
,
Hill
JJ
,
Rücklin
M
,
Donoghue
PC
,
Rayfield
EJ
.
2022
.
Increasing morphological disparity and decreasing optimality for jaw speed and strength during the radiation of jawed vertebrates
.
Sci Adv
8
:
eabl3644
.

Deb
JC
,
Forbes
G
,
MacLean
DA
.
2020
.
Modelling the spatial distribution of selected North American woodland mammals under future climate scenarios
.
Mamm Rev
50
:
440
52
.

Degnan
JH
,
Rosenberg
NA
.
2009
.
Gene tree discordance, phylogenetic inference and the multispecies coalescent
.
Trends Ecol Evol
24
:
332
40
.

DeGusta
D
,
Vrba
E
.
2005
.
Methods for inferring paleohabitats from discrete traits of the bovid postcranial skeleton
.
J Archaeolog Sci
32
:
1115
23
.

Derkarabetian
S
,
Castillo
S
,
Koo
PK
,
Ovchinnikov
S
,
Hedin
M
.
2019
.
A demonstration of unsupervised machine learning in species delimitation
.
Mol Phylogenet Evol
139
:
106562
.

Dettmers
T
,
Minervini
P
,
Stenetorp
P
,
Riedel
S
.
2018
.
Convolutional 2D knowledge graph embeddings
.
Proc AAAI Conf Artif Intell
32
:
1811
8
.

Devine
J
,
Kurki
HK
,
Epp
JR
,
Gonzalez
PN
,
Claes
P
.
2023
.
Classifying high-dimensional phenotypes with ensemble learning
.
bioRxiv published online
().

Devine
J
,
Aponte
JD
,
Katz
DC
,
Liu
W
,
Vercio
LDL
,
Forkert
ND
,
Marcucio
R
,
Percival
CJ
,
Hallgrímsson
B
.
2020
.
A registration and deep learning approach to automated landmark detection for geometric morphometrics
.
Evol Biol
47
:
246
59
.

Devlin
J
,
Chang
M-W
,
Lee
K
,
Toutanova
K
.
2019
.
BERT: pre-training of deep bidirectional transformers for language understanding
.
arXiv published online
().

Dhanachandra
N
,
Manglem
K
,
Chanu
YJ
.
2015
.
Image segmentation using K-means clustering algorithm and subtractive clustering algorithm
.
Procedia Comput Sci
54
:
764
71
.

Diaz
R
,
Hallman
S
,
Fowlkes
CC
.
2013
.
Detecting dynamic objects with multi-view background subtraction
.
2013 IEEE International Conference on Computer Vision (ICCV)
.
IEEE
. p.
273
80.

Dickson
BV
,
Pierce
SE
.
2019
.
Functional performance of turtle humerus shape across an ecological adaptive landscape
.
Evolution
73
:
1265
77
.

Dome
S
,
Sathe
AP
.
2021
.
Optical character recognition using Tesseract and classification
.
2021 International Conference on Emerging Smart Computing and Informatics (ESCI).
IEEE
. p.
153
8.

Doré
M
,
Willmott
K
,
Lavergne
S
,
Chazot
N
,
Freitas
AVL
,
Fontaine
C
,
Elias
M
.
2023
.
Mutualistic interactions shape global spatial congruence and climatic niche evolution in Neotropical mimetic butterflies
.
Ecol Lett
26
:
843
57
.

Dosovitskiy
A
,
Beyer
L
,
Kolesnikov
A
,
Weissenborn
D
,
Zhai
X
,
Unterthiner
T
,
Dehghani
M
,
Minderer
M
,
Heigold
G
,
Gelly
S
et al.
2021
.
An image is worth 16×16 words: transformers for image recognition at scale
.
arXiv published online
().

Dumiak
M
.
2008
.
Book-scanning robots digitize delicate texts
.
IEEE Spectr
45
:
18
.

Durrleman
S
,
Prastawa
M
,
Charon
N
,
Korenberg
JR
,
Joshi
S
,
Gerig
G
,
Trouvé
A
.
2014
.
Morphometry of anatomical shape complexes with dense deformations and sparse parameters
.
Neuroimage
101
:
35
49
.

Dutia
K
,
Stack
J
.
2021
.
Heritage connector: a machine learning framework for building linked open data from museum collections
.
Appl AI Lett
2
:
e23
.

Edie
SM
,
Collins
KS
,
Jablonski
D
.
2023
.
High-throughput micro-CT scanning and deep learning segmentation workflow for analyses of shelly invertebrates and their fossils: examples from marine Bivalvia
.
Front Ecol Evol
11
:
1127756
.

Elhamod
M
,
Diamond
KM
,
Maga
AM
,
Bakis
Y
,
Bart
HL
Jr
,
Mabee
P
,
Dahdul
W
,
Leipzig
J
,
Greenberg
J
,
Karpatne
A
.
2022
.
Hierarchy-guided neural network for species classification
.
Methods Ecol Evol
13
:
642
52
.

Elith
J
,
Phillips
SJ
,
Hastie
T
,
Dudík
M
,
Chee
YE
,
Yates
CJ
.
2011
.
A statistical explanation of MaxEnt for ecologists
.
Divers Distrib
17
:
43
57
.

Elsayed
OR
,
ElKot
YG
,
ElRefaai
DA
,
Abdelfattah
HM
,
ElSayed
M
,
Hamdy
A
.
2023
.
Automated identification and classification of teeth fossils
.
2023 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC).
IEEE
. p.
179
86.

Eronen
JT
,
Polly
PD
,
Fred
M
,
Damuth
J
,
Frank
DC
,
Mosbrugger
V
,
Scheidegger
C
,
Stenseth
NC
,
Fortelius
M
.
2010
.
Ecometrics: the traits that bind the past and present together
.
Integr Zool
5
:
88
101
.

European Commission
.
2018
.
A definition of AI: main capabilities and scientific disciplines
(https://digital-strategy.ec.europa.eu/en/library/definition-artificial-intelligence-main-capabilities-and-scientific-disciplines).

Ezard
TH
,
Pearson
PN
,
Purvis
A
.
2010
.
Algorithmic approaches to aid species’ delimitation in multidimensional morphospace
.
BMC Evol Biol
10
:
175
.

Felsenstein
J
.
1978
.
The number of evolutionary trees
.
Syst Zool
27
:
27
.

Felsenstein
J
.
1985
.
Phylogenies and the comparative method
.
Am Nat
125
:
1
15
.

Feltes
BC
,
Grisci
BI
,
Poloni
JDF
,
Dorn
M
.
2018
.
Perspectives and applications of machine learning for evolutionary developmental biology
.
Mol Omics
14
:
289
306
.

Fenberg
PB
,
Self
A
,
Stewart
JR
,
Wilson
RJ
,
Brooks
SJ
.
2016
.
Exploring the universal ecological responses to climate change in a univoltine butterfly
.
J Anim Ecol
85
:
739
48
.

Feng
D
,
De Siqueira
AD
,
Yang
S
,
Tran
T
,
Bodrito
T
,
Van Der Walt
S
.
2021
.
machine-shop/mothra: v1.0-rc.2
(https://github.com/machine-shop/mothra/tree/v1.0-rc.2).

Fernandes
AFA
,
Turra
EM
,
De Alvarenga
ÉR
,
Passafaro
TL
,
Lopes
FB
,
Alves
GFO
,
Singh
V
,
Rosa
GJM
.
2020
.
Deep learning image segmentation for extraction of fish body measurements and prediction of body weight and carcass traits in Nile tilapia
.
Comput Electron Agric
170
:
105274
.

Ferreira
AC
,
Silva
LR
,
Renna
F
,
Brandl
HB
,
Renoult
JP
,
Farine
DR
,
Covas
R
,
Doutrelant
C
.
2020
.
Deep learning-based methods for individual recognition in small birds
.
Methods Ecol Evol
11
:
1072
85
.

Filella
JB
,
Bonilla
Q
,
C.
C
,
Quispe
E
,
Dalerum
F
.
2023
.
Artificial intelligence as a potential tool for micro-histological analysis of herbivore diets
.
Eur J Wildl Res
69
:
11
.

Fishial.ai
.
2019
.
Fishial
(https://www.fishial.ai/).

Foote
M
.
1997
.
The evolution of morphological diversity
.
Annu Rev Ecol Evol Syst
28
:
129
52
.

Foote
M
.
1993
.
Discordance and concordance between morphological and taxonomic diversity
.
Paleobiology
19
:
185
204
.

Fortuny
J
,
Marcé-Nogué
J
,
De Esteban-Trivigno
S
,
Gil
L
,
Galobart
À
.
2011
.
Temnospondyli bite club: ecomorphological patterns of the most diverse group of early tetrapods
.
J Evol Biol
24
:
2040
54
.

Fraley
C
,
Raftery
AE
.
2002
.
Model-based clustering, discriminant analysis, and density estimation
.
J Am Statist Assoc
97
:
611
31
.

Freitas
MV
,
Lemos
CG
,
Ariede
RB
,
Agudelo
JFG
,
Neto
RRO
,
Borges
CHS
,
Mastrochirico-Filho
VA
,
Porto-Foresti
F
,
Iope
RL
,
Batista
FM
et al.
2023
.
High-throughput phenotyping by deep learning to include body shape in the breeding program of pacu (Piaractus mesopotamicus)
.
Aquaculture
562
:
738847
.

Fukushima
K
.
1980
.
Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position
.
Biol Cybern
36
:
193
202
.

Furat
O
,
Wang
M
,
Neumann
M
,
Petrich
L
,
Weber
M
,
Krill
CE
,
Schmidt
V
.
2019
.
Machine learning techniques for the segmentation of tomographic image data of functional materials
.
Front Mater
6
:
145
.

Galbusera
F
,
Cina
A
,
Panico
M
,
Albano
D
,
Messina
C
.
2020
.
Image-based biomechanical models of the musculoskeletal system
.
Eur Radiol Exp
4
:
49
.

Garcia-Garcia
A
,
Orts-Escolano
S
,
Oprea
S
,
Villena-Martinez
V
,
Martinez-Gonzalez
P
,
Garcia-Rodriguez
J
.
2018
.
A survey on deep learning techniques for image and video semantic segmentation
.
Appl Soft Comput
70
:
41
65
.

Gehan
MA
,
Fahlgren
N
,
Abbasi
A
,
Berry
JC
,
Callen
ST
,
Chavez
L
,
Doust
AN
,
Feldman
MJ
,
Gilbert
KB
,
Hodge
JG
et al.
2017
.
PlantCV v2: image analysis software for high-throughput plant phenotyping
.
PeerJ
5
:
e4088
.

Geng
C
,
Huang
S-J
,
Chen
S
.
2021
.
Recent advances in open set recognition: a survey
.
IEEE Trans Pattern Anal Mach Intell
43
:
3614
31
.

Goëau
H
,
Bonnet
P
,
Joly
A
,
Bakić
V
,
Barbe
J
,
Yahiaoui
I
,
Selmi
S
,
Carré
J
,
Barthélémy
D
,
Boujemaa
N
et al.
2013
.
Pl@ntNet mobile app
.
Proceedings of the 21st ACM International Conference on Multimedia
.
ACM
. p.
423
4.

Goëau
H
,
Lorieul
T
,
Heuret
P
,
Joly
A
,
Bonnet
P
.
2022
.
Can artificial intelligence help in the study of vegetative growth patterns from herbarium collections? An evaluation of the tropical flora of the French Guiana forest
.
Plants (Basel)
11
:
530
.

Goodfellow
I
,
Pouget-Abadie
J
,
Mirza
M
,
Xu
B
,
Warde-Farley
D
,
Ozair
S
,
Courville
A
,
Bengio
Y
.
2014
.
Generative adversarial nets
.
Proceedings of the 27th International Conference on Neural Information Processing Systems—Volume 2
,
NIPS’14
.
Cambridge, MA: MIT Press
. p.
2672
80.

Goodfellow
IJ
,
Bengio
Y
,
Courville
A
.
2016
.
Deep learning
.
Cambridge, MA: MIT Press
.

Goswami
A
.
2015
.
Phenome10K: a free online repository for 3-D scans of biological and palaeontological specimens
(www.phenome10k.org).

Goswami
A
,
Noirault
E
,
Coombs
EJ
,
Clavel
J
,
Fabre
A-C
,
Halliday
TJD
,
Churchill
M
,
Curtis
A
,
Watanabe
A
,
Simmons
NB
et al.
2022
.
Attenuated evolution of mammals through the Cenozoic
.
Science
378
:
377
83
.

Goswami
A
,
Watanabe
A
,
Felice
RN
,
Bardua
C
,
Fabre
A-C
,
Polly
PD
.
2019
.
High-density morphometric analysis of shape and integration: the good, the bad, and the not-really-a-problem
.
Integr Comp Biol
59
:
669
83
.

Goswami
A
,
Clavel
J
.
2024
.
Morphological evolution in a time of phenomics
.
EcoEvoRxiv
published online
().

Graves
A
,
Mohamed
A
,
Hinton
G
.
2013
.
Speech recognition with deep recurrent neural networks
.
2013 IEEE International Conference on Acoustics
,
Speech and Signal Processing.
p.
6645
9
.

Gu
Q
,
Scott
B
,
Smith
V
.
2023
.
Planetary knowledge base: semantic transcription using graph neural networks
.
Biodivers Inf Sc Stand
7
:
e111168
.

Gu
Q
,
Scott
B
,
Smith
V
.
2022
.
Enhancing botanical knowledge graphs with machine learning
.
Biodivers Inf Sci Stand
6
:
e91384
.

Guillerme
T
,
Cooper
N
.
2016a
.
Assessment of available anatomical characters for linking living mammals to fossil taxa in phylogenetic analyses
.
Biol Lett
12
:
20151003
.

Guillerme
T
,
Cooper
N
.
2016b
.
Effects of missing data on topological inference using a total evidence approach
.
Mol Phylogenet Evol
94
:
146
58
.

Guo
S
,
Xu
P
,
Miao
Q
,
Shao
G
,
Chapman
CA
,
Chen
X
,
He
G
,
Fang
D
,
Zhang
H
,
Sun
Y
et al.
2020
.
Automatic identification of individual primates with deep learning techniques
.
iScience
23
:
101412
.

Haghighat
M
,
Browning
L
,
Sirinukunwattana
K
,
Malacrino
S
,
Khalid Alham
N
,
Colling
R
,
Cui
Y
,
Rakha
E
,
Hamdy
FC
,
Verrill
C
et al.
2022
.
Automated quality assessment of large digitised histology cohorts by artificial intelligence
.
Sci Rep
12
:
5002
.

Hallou
A
,
Yevick
HG
,
Dumitrascu
B
,
Uhlmann
V
.
2021
.
Deep learning for bioimage analysis in developmental biology
.
Development
148
:
dev199616
.

Hanocka
R
,
Hertz
A
,
Fish
N
,
Giryes
R
,
Fleishman
S
,
Cohen-Or
D
.
2019
.
MeshCNN: a network with an edge
.
ACM Trans Graph
38
:
1
12
.

Hansen
OLP
,
Svenning
J
,
Olsen
K
,
Dupont
S
,
Garner
BH
,
Iosifidis
A
,
Price
BW
,
Høye
TT
.
2020
.
Species-level image classification with convolutional neural network enables insect identification from habitus images
.
Ecol Evol
10
:
737
47
.

Hartman
E
,
Sukurdeep
Y
,
Charon
N
,
Klassen
E
,
Bauer
M
.
2021
.
Supervised deep learning of elastic SRV distances on the shape space of curves
.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
p.
4425
33
.

Hasegawa
M
,
Iida
Y
,
Yano
T
,
Takaiwa
F
,
Iwabuchi
M
.
1985
.
Phylogenetic relationships among eukaryotic kingdoms inferred from ribosomal RNA sequences
.
J Mol Evol
22
:
32
8
.

Hartman
E
,
Sukurdeep
Y
,
Klassen
E
,
Charon
N
,
Bauer
M
.
2023
.
Elastic shape analysis of surfaces with second-order Sobolev metrics: a comprehensive numerical framework
.
Int J Comput Vision
131
:
1183
209
.

He
K
,
Chen
X
,
Xie
S
,
Li
Y
,
Dollár
P
,
Girshick
R
.
2021
.
Masked autoencoders are scalable vision learners
.
arXiv published online
().

He
K
,
Fan
H
,
Wu
Y
,
Xie
S
,
Girshick
R
.
2020
.
Momentum contrast for unsupervised visual representation learning
.
arXiv published online
().

He
Y
,
Cooney
CR
,
Maddock
S
,
Thomas
GH
.
2023
.
Using pose estimation to identify regions and points on natural history specimens
.
PLoS Comput Biol
19
:
e1010933
.

He
Y
,
Varley
ZK
,
Nouri
LO
,
Moody
CJA
,
Jardine
MD
,
Maddock
S
,
Thomas
GH
,
Cooney
CR
.
2022
.
Deep learning image segmentation reveals patterns of UV reflectance evolution in passerine birds
.
Nat Commun
13
:
5068
.

Hedrick
BP
,
Heberling
JM
,
Meineke
EK
,
Turner
KG
,
Grassa
CJ
,
Park
DS
,
Kennedy
J
,
Clarke
JA
,
Cook
JA
,
Blackburn
DC
et al.
2020
.
Digitization and the future of natural history collections
.
Bioscience
70
:
243
51
.

Hennig
W
.
1966
.
Phylogenetic systematics
.
Champaign, IL: University of Illinois Press
.

Herbst
EC
,
Meade
LE
,
Lautenschlager
S
,
Fioritti
N
,
Scheyer
TM
.
2022
.
A toolbox for the retrodeformation and muscle reconstruction of fossil specimens in Blender
.
R Soc Open Sci
9
:
220519
.

Ho
LST
,
Dinh
V
,
Nguyen
CV
.
2019
.
Multi-task learning improves ancestral state reconstruction
.
Theor Popul Biol
126
:
33
9
.

Hochreiter
S
,
Schmidhuber
J
.
1996
.
LSTM can solve hard long time lag problems
. In:
Mozer
MC
,
Jordan
M
,
Petsche
T
, editors.
Advances in neural information processing systems
.
Cambridge, MA: MIT Press
.

Hoffstaetter
S
.
2022
.
pytesseract: python-tesseract is a python wrapper for Google's Tesseract-OCR
(https://github.com/madmaze/pytesseract).

Hollister
JD
,
Cai
X
,
Horton
T
,
Price
BW
,
Zarzyczny
KM
,
Fenberg
PB
.
2023
.
Using computer vision to identify limpets from their shells: a case study using four species from the Baja California peninsula
.
Front Mar Sci
10
:
1167818
.

Holm
EA
,
Cohn
R
,
Gao
N
,
Kitahara
AR
,
Matson
TP
,
Lei
B
,
Yarasi
SR
.
2020
.
Overview: computer vision and machine learning for microstructural characterization and analysis
.
Metall Mater Trans A
51
:
5985
99
.

Hou
J
,
He
Y
,
Yang
H
,
Connor
T
,
Gao
J
,
Wang
Y
,
Zeng
Y
,
Zhang
J
,
Huang
J
,
Zheng
B
et al.
2020
.
Identification of animal individuals using deep learning: a case study of giant panda
.
Biol Conserv
242
:
108414
.

Hou
Y
,
Canul-Ku
M
,
Cui
X
,
Hasimoto-Beltran
R
,
Zhu
M
.
2021
.
Semantic segmentation of vertebrate microfossils from computed tomography data using a deep learning approach
.
J Micropalaeontol
40
:
163
73
.

Houle
D
,
Govindaraju
DR
,
Omholt
S
.
2010
.
Phenomics: the next challenge
.
Nat Rev Genet
11
:
855
66
.

Hoyal Cuthill
JF
,
Guttenberg
N
,
Huertas
B
.
2024
.
Male and female contributions to diversity among birdwing butterfly images
.
Commun Biol
7
:
774
.

Hoyal Cuthill
JF
,
Guttenberg
N
,
Ledger
S
,
Crowther
R
,
Huertas
B
.
2019
.
Deep learning on butterfly phenotypes tests evolution's oldest mathematical model
.
Sci Adv
5
:
eaaw4967
.

Hsiang
AY
,
Brombacher
A
,
Rillo
MC
,
Mleneck-Vautravers
MJ
,
Conn
S
,
Lordsmith
S
,
Jentzen
A
,
Henehan
MJ
,
Metcalfe
B
,
Fenton
IS
et al.
2019
.
Endless forams: >34,000 modern planktonic foraminiferal images for taxonomic training and automated species recognition using convolutional neural networks
.
Paleoceanogr Paleoclimatol
34
:
1157
77
.

Hu
W
,
Zhang
C
,
Jiang
Y
,
Huang
C
,
Liu
Q
,
Xiong
L
,
Yang
W
,
Chen
F
.
2020
.
Nondestructive 3D image analysis pipeline to extract rice grain traits using X-ray computed tomography
.
Plant Phenomics
2020
:
3414926
.

Huang
T
,
Huang
Y
,
Lin
W
.
2013
.
Real-time horse gait synthesis
.
Comput Anim Virtual Worlds
24
:
87
95
.

Hudson
LN
,
Blagoderov
V
,
Heaton
A
,
Holtzhausen
P
,
Livermore
L
,
Price
BW
,
Van Der Walt
S
,
Smith
VS
.
2015
.
Inselect: automating the digitization of natural history collections
.
PLoS One
10
:
e0143402
.

Hughes
EC
,
Edwards
DP
,
Bright
JA
,
Capp
EJR
,
Cooney
CR
,
Varley
ZK
,
Thomas
GH
.
2022
.
Global biogeographic patterns of avian morphological diversity
.
Ecol Lett
25
:
598
610
.

Huiskes
R
,
Hollister
SJ
.
1993
.
From structure to process, from organ to cell: recent developments of FE-analysis in orthopaedic biomechanics
.
J Biomech Eng
115
:
520
7
.

Hussein
BR
,
Malik
OA
,
Ong
WH
,
Slik
JWF
.
2021
.
Automated extraction of phenotypic leaf traits of individual intact herbarium leaves from herbarium specimen images using deep learning based semantic segmentation
.
Sensors (Basel)
21
:
4549
.

Ioannides
M
,
Davies
R
,
Chatzigrigoriou
P
,
Papageorgiou
E
,
Leventis
G
,
Nikolakopoulou
V
,
Athanasiou
V
.
2017
.
3D digital libraries and their contribution in the documentation of the past
. In:
Ioannides
M
,
Magnenat-Thalmann
N
,
Papagiannakis
G,
editors.
Mixed reality and gamification for cultural heritage
.
Cham: Springer International Publishing
. p.
161
99.

Islam
T
,
Kim
CH
,
Iwata
H
,
Shimono
H
,
Kimura
A
.
2023
.
DeepCGP: a deep learning method to compress genome-wide polymorphisms for predicting phenotype of rice
.
IEEE/ACM Trans Comput Biol Bioinf
20
:
2078
88
.

Jeanray
N
,
Marée
R
,
Pruvot
B
,
Stern
O
,
Geurts
P
,
Wehenkel
L
,
Muller
M
.
2015
.
Phenotype classification of zebrafish embryos by supervised learning
.
PLoS One
10
:
1
20
.

Jermyn
IH
,
Kurtek
S
,
Laga
H
,
Srivastava
A
.
2017
.
Elastic shape analysis of three-dimensional objects. Synthesis lectures on computer vision
.
Cham: Springer International Publishing
.

Jia
Y
,
Shelhamer
E
,
Donahue
J
,
Karayev
S
,
Long
J
,
Girshick
R
,
Guadarrama
S
,
Darrell
T
.
2014
.
Caffe: convolutional architecture for fast feature embedding
.
arXiv published online
().

Jin
B
,
Cruz
L
,
Goncalves
N
.
2022
.
Pseudo RGB-D face recognition
.
IEEE Sensors J
22
:
21780
94
.

Johnson
KR
,
Owens
IFP
;
The Global Collection Group
.
2023
.
A global approach for natural history museum collections
.
Science
379
:
1192
4
.

Jolliffe
IT
,
Cadima
J
.
2016
.
Principal component analysis: a review and recent developments
.
Philos Trans A Math Phys Eng Sci
374
:
20150202
.

Jones
KE
,
Dickson
BV
,
Angielczyk
KD
,
Pierce
SE
.
2021
.
Adaptive landscapes challenge the “lateral-to-sagittal” paradigm for mammalian vertebral evolution
.
Curr Biol
31
:
1883
92
.

Joskowicz
L
,
Cohen
D
,
Caplan
N
,
Sosna
J
.
2019
.
Inter-observer variability of manual contour delineation of structures in CT
.
Eur Radiol
29
:
1391
9
.

Kale
RS
,
Thorat
S
.
2021
.
Image segmentation techniques with machine learning
.
Int J Sci Res Comput Sci Eng Inf Technol
7
:
232
5
.

Kammerer
CF
,
Deutsch
M
,
Lungmus
JK
,
Angielczyk
KD
.
2020
.
Effects of taphonomic deformation on geometric morphometric analysis of fossils: a study using the dicynodont Diictodon feliceps (Therapsida, Anomodontia)
.
PeerJ
8
:
e9925
.

Karashchuk
P
,
Rupp
KL
,
Dickinson
ES
,
Walling-Bell
S
,
Sanders
E
,
Azim
E
,
Brunton
BW
,
Tuthill
JC
.
2021
.
Anipose: a toolkit for robust markerless 3D pose estimation
.
Cell Rep
36
:
109730
.

Karnani
K
,
Pepper
J
,
Bakiş
Y
,
Wang
X
,
Bart
H
,
Breen
DE
,
Greenberg
J
.
2022
.
Computational metadata generation methods for biological specimen image collections
.
Int J Digit Libr
25
:
1
18
.

Karuppaiah
V
,
Maruthadurai
R
,
Das
B
,
Soumia
PS
,
Gadge
AS
,
Thangasamy
A
,
Ramesh
SV
,
Shirsat
DV
,
Mahajan
V
,
Krishna
H
et al.
2023
.
Predicting the potential geographical distribution of onion thrips, Thrips tabaci, in India based on climate change projections using MaxEnt
.
Sci Rep
13
:
7934
.

Kass
M
,
Witkin
A
,
Terzopoulos
D
.
1988
.
Snakes: active contour models
.
Int J Comput Vision
1
:
321
31
.

Kavzoglu
T
.
2009
.
Increasing the accuracy of neural network classification using refined training data
.
Environ Model Softw
24
:
850
8
.

Kendrick
C
,
Buckley
M
,
Brassey
C
.
2022
.
MiTiSegmenter: software for high throughput segmentation and meshing of microCT data in microtiter plate arrays
.
MethodsX
9
:
101849
.

Kiel
S
.
2021
.
Assessing bivalve phylogeny using deep learning and computer vision approaches
.
bioRxiv published online
().

Kikinis
R
,
Pieper
SD
,
Vosburgh
KG
.
2013
.
3D Slicer: a platform for subject-specific image analysis, visualization, and clinical support
. In:
Jolesz
F
editor.
Intraoperative imaging and image-guided therapy
.
New York: Springer
. p.
277
89.

King
B
,
Lee
MSY
.
2015
.
Ancestral state reconstruction, rate heterogeneity, and the evolution of reptile viviparity
.
Syst Biol
64
:
532
44
.

Kirillov
A
,
Mintun
E
,
Ravi
N
,
Mao
H
,
Rolland
C
,
Gustafson
L
,
Xiao
T
,
Whitehead
S
,
Berg
AC
,
Lo
W-Y
et al.
2023
.
Segment anything
.
Proceedings of the IEEE/CVF International Conference on Computer Vision.
p.
4015
26
.

Kishor
Kumar M
,
Senthil Kumar
R
,
Sankar
V
,
Sakthivel
T
,
Karunakaran
G
,
Tripathi
P
.
2017
.
Non-destructive estimation of leaf area of durian (Durio zibethinus)—an artificial neural network approach
.
Sci Hortic
219
:
319
25
.

Klassen
E
,
Srivastava
A
.
2006
.
Geodesics between 3D closed curves using path-straightening
. In:
Leonardis
A
,
Bischof
H
,
Pinz
A
, editors.
Computer Vision—ECCV 2006
.
Berlin and Heidelberg: Springer
. p.
95
106.

Kong
X
,
Li
J
.
2018
.
Vision-based fatigue crack detection of steel structures using video feature tracking
.
Comput-Aided Civ Infrastruct Eng
33
:
783
99
.

Korfmann
K
,
Gaggiotti
OE
,
Fumagalli
M
.
2023
.
Deep learning in population genetics
.
Genome Biol Evolut
15
:
evad008
.

Krizhevsky
A
,
Sutskever
I
,
Hinton
GE
.
2017
.
ImageNet classification with deep convolutional neural networks
.
Commun ACM
60
:
84
90
.

Kuhn
T
,
Hettich
J
,
Davtyan
R
,
Gebhardt
JCM
.
2021
.
Single molecule tracking and analysis framework including theory-predicted parameter settings
.
Sci Rep
11
:
9465
.

Kwon
Y
,
Kang
S
,
Choi
Y-S
,
Kim
I
.
2021
.
Evolutionary design of molecules based on deep learning and a genetic algorithm
.
Sci Rep
11
:
17304
.

Lacoste
A
,
Luccioni
A
,
Schmidt
V
,
Dandres
T
.
2019
.
Quantifying the Carbon Emissions of Machine Learning
.
arXiv published online
().

Lahiri
S
,
Robinson
D
,
Klassen
E
.
2015
.
Precise matching of PL curves in RN in the square root velocity framework
.
arXiv published online
().

Lannelongue
L
,
Aronson
H-EG
,
Bateman
A
,
Birney
E
,
Caplan
T
,
Juckes
M
,
McEntyre
J
,
Morris
AD
,
Reilly
G
,
Inouye
M
.
2023
.
GREENER principles for environmentally sustainable computational science
.
Nat Comput Sci
3
:
514
21
.

Lannelongue
L
,
Grealey
J
,
Inouye
M
.
2021
.
Green algorithms: quantifying the carbon footprint of computation
.
Adv Sci
8
:
2100707
.

Lautenschlager
S
.
2016
.
Reconstructing the past: methods and techniques for the digital restoration of fossils
.
R Soc Open Sci
3
:
160342
.

Le Guillarme
N
,
Thuiller
W
.
2022
.
TaxoNERD: deep neural models for the recognition of taxonomic entities in the ecological and evolutionary literature
.
Methods Ecol Evol
13
:
625
41
.

LeCun
Y
,
Bengio
Y
,
Hinton
G
.
2015
.
Deep learning
.
Nature
521
:
436
44
.

LeCun
Y
,
Boser
B
,
Denker
JS
,
Henderson
D
,
Howard
RE
,
Hubbard
W
,
Jackel
LD
.
1989
.
Backpropagation applied to handwritten zip code recognition
.
Neural Comput
1
:
541
51
.

Lee
MSY
,
Palci
A
.
2015
.
Morphological phylogenetics in the genomic age
.
Curr Biol
25
:
R922
9
.

Li
X
,
Zhang
Y
,
Wu
J
,
Dai
Q
.
2023
.
Challenges and opportunities in bioimage analysis
.
Nat Methods
20
:
958
61
.

Lin
D
,
Dai
J
,
Jia
J
,
He
K
,
Sun
J
.
2016
.
ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation
.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
.

Lin
H
,
Zhang
W
,
Mulqueeney
JM
,
Brombacher
A
,
Searle-Barnes
A
,
Nixon
M
,
Cai
X
,
Ezard
T
.
2024
.
3DKMI: a MATLAB package to generate shape signatures from Krawtchouk moments and an application to species delimitation in planktonic foraminifera
.
Methods Ecol Evol
published online
().

Ling
MH
,
Ivorra
T
,
Heo
CC
,
Wardhana
AH
,
Hall
MJR
,
Tan
SH
,
Mohamed
Z
,
Khang
TF
.
2023
.
Machine learning analysis of wing venation patterns accurately identifies Sarcophagidae, Calliphoridae and Muscidae fly species
.
Med Vet Entomol
37
:
767
81
.

Liu
GR
.
2019
.
FEA-AI and AI-AI: two-way deepnets for real-time computations for both forward and inverse mechanics problems
.
Int J Comput Methods
16
:
1950045
.

Liu
Y
,
Wang
S-L
,
Zhang
J-F
,
Zhang
W
,
Zhou
S
,
Li
W
.
2021a
.
DMFMDA: prediction of microbe-disease associations based on deep matrix factorization using bayesian personalized ranking
.
IEEE/ACM Trans Comput Biol Bioinf
18
:
1763
72
.

Liu
Z
,
Jin
L
,
Chen
J
,
Fang
Q
,
Ablameyko
S
,
Yin
Z
,
Xu
Y
.
2021b
.
A survey on applications of deep learning in microscopy image analysis
.
Comput Biol Med
134
:
104523
.

Liu
Z
,
Lin
Y
,
Cao
Y
,
Hu
H
,
Wei
Y
,
Zhang
Z
,
Lin
S
,
Guo
B
.
2021c
.
Swin transformer: hierarchical vision transformer using shifted windows
.
arXiv published online (
).

Lloyd
S
.
1982
.
Least squares quantization in PCM
.
IEEE Trans Inf Theory
28
:
129
37
.

Long
E
,
Wan
P
,
Chen
Q
,
Lu
Z
,
Choi
J
.
2023
.
From function to translation: decoding genetic susceptibility to human diseases via artificial intelligence
.
Cell Genomics
3
:
100320
.

Lösel
PD
,
Monchanin
C
,
Lebrun
R
,
Jayme
A
,
Relle
JJ
,
Devaud
J-M
,
Heuveline
V
,
Lihoreau
M
.
2023
.
Natural variability in bee brain size and symmetry revealed by micro-CT imaging and deep learning
.
PLoS Comput Biol
19
:
e1011529
.

Lösel
PD
,
Van De Kamp
T
,
Jayme
A
,
Ershov
A
,
Faragó
T
,
Pichler
O
,
Tan Jerome
N
,
Aadepu
N
,
Bremer
S
,
Chilingaryan
SA
et al.
2020
.
Introducing Biomedisa as an open-source online platform for biomedical image segmentation
.
Nat Commun
11
:
5577
.

Love
AC
.
2003
.
Evolutionary morphology, innovation, and the synthesis of evolutionary and developmental biology
.
Biol Philos
18
:
309
45
.

Love
NLR
,
Bonnet
P
,
Goëau
H
,
Joly
A
,
Mazer
SJ
.
2021
.
Machine learning undercounts reproductive organs on herbarium specimens but accurately derives their quantitative phenological status: a case study of Streptanthus tortuosus
.
Plants (Basel)
10
:
2471
.

Low
BW
,
Zeng
Y
,
Tan
HH
,
Yeo
DCJ
.
2021
.
Predictor complexity and feature selection affect Maxent model transferability: evidence from global freshwater invasive species
.
Divers Distrib
27
:
497
511
.

Lowe
DG
.
2004
.
Distinctive image features from scale-invariant keypoints
.
Int J Comput Vision
60
:
91
110
.

Lu
Y
,
Wang
R
,
Hu
T
,
He
Q
,
Chen
ZS
,
Wang
J
,
Liu
L
,
Fang
C
,
Luo
J
,
Fu
L
et al.
2023
.
Nondestructive 3D phenotyping method of passion fruit based on X-ray micro-computed tomography and deep learning
.
Front Plant Sci
13
:
1087904
.

Luo
D
,
Zeng
W
,
Chen
J
,
Tang
W
.
2021
.
Deep learning for automatic image segmentation in stomatology and its clinical application
.
Front Med Technol
3
:
767836
.

Lürig
MD
.
2022
.
phenopype: a phenotyping pipeline for Python
.
Methods Ecol Evol
13
:
569
76
.

Lürig
MD
,
Donoughe
S
,
Svensson
EI
,
Porto
A
,
Tsuboi
M
.
2021
.
Computer vision, machine learning, and the promise of phenomics in ecology and evolutionary biology
.
Front Ecol Evol
9
:
642774
.

Macleod
N
.
2017
.
On the use of machine learning in morphometric analysis
. In
P. E,
 Lestrel
editor.
Biological Shape Analysis—Proceedings of the 4th International Symposium
. p.
134
71
.
World Scientific
.

MacLeod
N
,
Price
B
,
Stevens
Z
.
2022
.
What you sample is what you get: ecomorphological variation in Trithemis (Odonata, Libellulidae) dragonfly wings reconsidered
.
BMC Ecol Evol
22
:
43
.

MacQueen
J
.
1967
.
Some methods for classification and analysis of multivariate observations
.
Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability
.
Oakland, CA, USA
. p.
281
97.

Maddison
WP
,
Knowles
LL
.
2006
.
Inferring phylogeny despite incomplete lineage sorting
.
Syst Biol
55
:
21
30
.

Mäder
P
,
Boho
D
,
Rzanny
M
,
Seeland
M
,
Wittich
HC
,
Deggelmann
A
,
Wäldchen
J
.
2021
.
The Flora Incognita app—interactive plant species identification
.
Methods Ecol Evol
12
:
1335
42
.

Maga
AM
,
Tustison
NJ
,
Avants
BB
.
2017
.
A population level atlas of Mus musculus craniofacial skeleton and automated image-based shape analysis
.
J Anat
231
:
433
43
.

Mahendiran
M
,
Parthiban
M
,
Azeez
PA
.
2022
.
Signals of local bioclimate-driven ecomorphological changes in wild birds
.
Sci Rep
12
:
15815
.

Maia
R
,
Gruson
H
,
Endler
JA
,
White
TE
.
2019
.
pavo 2: new tools for the spectral and spatial analysis of colour in R
.
Methods Ecol Evol
10
:
1097
107
.

Marks
M
,
Jin
Q
,
Sturman
O
,
Von Ziegler
L
,
Kollmorgen
S
,
Von Der Behrens
W
,
Mante
V
,
Bohacek
J
,
Yanik
MF
.
2022
.
Deep-learning-based identification, tracking, pose estimation and behaviour classification of interacting primates and mice in complex environments
.
Nat Mach Intell
4
:
331
40
.

Martin
CH
,
Wainwright
PC
.
2013
.
Multiple fitness peaks on the adaptive landscape drive adaptive radiation in the wild
.
Science
339
:
208
11
.

Martin-Brualla
R
,
Radwan
N
,
Sajjadi
MS
,
Barron
JT
,
Dosovitskiy
A
,
Duckworth
D
.
2021
.
Nerf in the wild: neural radiance fields for unconstrained photo collections
.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
. p.
7210
9
.

Masaeli
M
,
Gupta
D
,
O'Byrne
S
,
Tse
HTK
,
Gossett
DR
,
Tseng
P
,
Utada
AS
,
Jung
HJ
,
Young
S
,
Clark
AT
et al.
2016
.
Multiparameter mechanical and morphometric screening of cells
.
Sci Rep
6
:
1
11
.

Mathis
A
,
Mamidanna
P
,
Cury
KM
,
Abe
T
,
Murthy
VN
,
Mathis
MW
,
Bethge
M
.
2018
.
DeepLabCut: markerless pose estimation of user-defined body parts with deep learning
.
Nat Neurosci
21
:
1281
9
.

Maturana
D
,
Scherer
S
.
2015
.
VoxNet: a 3D convolutional neural network for real-time object recognition
.
2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
.
IEEE
. p.
922
8.

McCulloch
WS
,
Pitts
W
.
1943
.
A logical calculus of the ideas immanent in nervous activity
.
Bull Math Biophys
5
:
115
33
.

McGhee
GR
.
1999
.
Theoretical morphology: the concept and its applications
.
New York: Columbia University Press
.

McGhee
GR
.
1980
.
Shell form in the biconvex articulate Brachiopoda: a geometric analysis
.
Paleobiology
6
:
57
76
.

Medina
JJ
,
Maley
JM
,
Sannapareddy
S
,
Medina
NN
,
Gilman
CM
,
McCormack
JE
.
2020
.
A rapid and cost-effective pipeline for digitization of museum specimens with 3D photogrammetry
.
PLoS One
15
:
e0236417
.

Mehrabi
N
,
Morstatter
F
,
Saxena
N
,
Lerman
K
,
Galstyan
A
.
2021
.
A survey on bias and fairness in machine learning
.
ACM Comput Surv
54
:
1
35
.

Merow
C
,
Smith
MJ
,
Silander
JA
.
2013
.
A practical guide to MaxEnt for modeling species’ distributions: what it does, and why inputs and settings matter
.
Ecography
36
:
1058
69
.

Mieth
B
,
Rozier
A
,
Rodriguez
JA
,
Höhne
MMC
,
Görnitz
N
,
Müller
K-R
.
2021
.
DeepCOMBI: explainable artificial intelligence for the analysis and discovery in genome-wide association studies
.
NAR Genom Bioinform
3
:
lqab065
.

Milletari
F
,
Navab
N
,
Ahmadi
S-A
.
2016
.
V-Net: fully convolutional neural networks for volumetric medical image segmentation
.
2016 Fourth International Conference on 3D Vision (3DV)
.
IEEE
. p.
565
71.

Minaee
S
,
Wang
Y
.
2019
.
An ADMM approach to masked signal decomposition using subspace representation
.
IEEE Trans Image Process
28
:
3192
204
.

Misof
B
,
Liu
S
,
Meusemann
K
,
Peters
RS
,
Donath
A
,
Mayer
C
,
Frandsen
PB
,
Ware
J
,
Flouri
T
,
Beutel
RG
et al.
2014
.
Phylogenomics resolves the timing and pattern of insect evolution
.
Science
346
:
763
7
.

Mitteroecker
P
,
Schaefer
K
.
2022
.
Thirty years of geometric morphometrics: achievements, challenges, and the ongoing quest for biological meaningfulness
.
Am J Biol Anthropol
178
:
181
210
.

Mo
YK
,
Hahn
MW
,
Smith
ML
.
2024
.
Applications of machine learning in phylogenetics
.
Mol Phylogenet Evol
196
:
108066
.

Moen
E
,
Bannon
D
,
Kudo
T
,
Graf
W
,
Covert
M
,
Van Valen
D
.
2019
.
Deep learning for cellular image analysis
.
Nat Methods
16
:
1233
46
.

Mohammadi
V
,
Minaei
S
,
Mahdavian
AR
,
Khoshtaghaza
MH
,
Gouton
P
.
2021
.
Estimation of leaf area in bell pepper plant using image processing techniques and artificial neural networks
.
2021 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)
.
IEEE
. p.
173
8.

Moi
D
,
Dessimoz
C
.
2022
.
Reconstructing protein interactions across time using phylogeny-aware graph neural networks
.
bioRxiv published online
().

Montagne
S
,
Hamzaoui
D
,
Allera
A
,
Ezziane
M
,
Luzurier
A
,
Quint
R
,
Kalai
M
,
Ayache
N
,
Delingette
H
,
Renard-Penna
R
.
2021
.
Challenge of prostate MRI segmentation on T2-weighted images: inter-observer variability and impact of prostate morphology
.
Insights Imaging
12
:
71
.

Mouloodi
S
,
Rahmanpanah
H
,
Gohari
S
,
Burvill
C
,
Tse
KM
,
Davies
HMS
.
2021
.
What can artificial intelligence and machine learning tell us? A review of applications to equine biomechanical research
.
J Mech Behav Biomed Mater
123
:
104728
.

Mulqueeney
JM
,
Searle-Barnes
A
,
Brombacher
A
,
Sweeney
M
,
Goswami
A
,
Ezard
THG
.
2024a
.
How many specimens make a sufficient training set for automated three-dimensional feature extraction?
R Soc Open Sci
11
:
rsos.240113
.

Mulqueeney
JM
,
Ezard
THG
,
Goswami
A
.
2024b
.
Assessing the application of landmark-free morphometrics to macroevolutionary analyses
.
bioRxiv
published online (
).

Naert
T
,
Çiçek
Ö
,
Ogar
P
,
Bürgi
M
,
Shaidani
NI
,
Kaminski
MM
,
Xu
Y
,
Grand
K
,
Vujanovic
M
,
Prata
D
et al.
2021
.
Deep learning is widely applicable to phenotyping embryonic development and disease
.
Development (Cambridge)
148
:
1
18
.

Najman
L
,
Schmitt
M
.
1994
.
Watershed of a continuous function
.
Signal Process
38
:
99
112
.

Narayana
PA
,
Coronado
I
,
Sujit
SJ
,
Wolinsky
JS
,
Lublin
FD
,
Gabr
RE
.
2020
.
Deep-learning-based neural tissue segmentation of MRI in multiple sclerosis: effect of training set size
.
Magn Reson Imaging
51
:
1487
96
.

Nath
T
,
Mathis
A
,
Chen
AC
,
Patel
A
,
Bethge
M
,
Mathis
MW
.
2019
.
Using DeepLabCut for 3D markerless pose estimation across species and behaviors
.
Nat Protoc
14
:
2152
76
.

Nesterenko
L
,
Boussau
B
,
Jacob
L
.
2022
.
Phyloformer: towards fast and accurate phylogeny estimation with self-attention networks
.
bioRxiv
published online
().

Niemi
MO
.
2020
.
Phylogenetic machine learning methods and application to mammal dental traits and bioclimatic variables
[master's thesis]
.
Helsinki:
University of Helsinki
.

Nock
R
,
Nielsen
F
.
2004
.
Statistical region merging
.
IEEE Trans Pattern Anal Mach Intell
26
:
1452
8
.

Norman
B
,
Pedoia
V
,
Majumdar
S
.
2018
.
Use of 2D U-Net convolutional neural networks for automated cartilage and meniscus segmentation of knee MR imaging data to determine relaxometry and morphometry
.
Radiology
288
:
177
85
.

Norouzzadeh
MS
,
Nguyen
A
,
Kosmala
M
,
Swanson
A
,
Palmer
MS
,
Packer
C
,
Clune
J
.
2018
.
Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning
.
Proc Nat Acad Sci USA
115
:
E5716
25
.

Oord
A
,
Li
Y
,
Vinyals
O
.
2019
.
Representation learning with contrastive predictive coding
.
arXiv
published online (
).

Osher
S
,
Sethian
JA
.
1988
.
Fronts propagating with curvature-dependent speed: algorithms based on Hamilton-Jacobi formulations
.
J Comput Phys
79
:
12
49
.

Otsu
N
.
1979
.
A threshold selection method from gray-level histograms
.
IEEE Trans Syst Man Cybern
9
:
62
6
.

Padilla-García
N
,
Šrámková
G
,
Záveská
E
,
Šlenker
M
,
Clo
J
,
Zeisek
V
,
Lučanová
M
,
Rurane
I
,
Kolář
F
,
Marhold
K
.
2023
.
The importance of considering the evolutionary history of polyploids when assessing climatic niche evolution
.
J Biogeogr
50
:
86
100
.

Panchen
AL
.
1980
.
Notes on the behaviour of Rajah Brooke's birdwing butterfly, Trogonoptera brookiana
.
Entomol Rec J Var
92
:
98
102
.

Panciroli
E
,
Janis
C
,
Stockdale
M
,
Martín-Serra
A
.
2017
.
Correlates between calcaneal morphology and locomotion in extant and extinct carnivorous mammals
.
J Morphol
278
:
1333
53
.

Papageorgiou
CP
,
Oren
M
,
Poggio
T
.
1998
.
A general framework for object detection
.
Proceedings of the Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).
p.
555
62
.

Parham
JF
,
Donoghue
PCJ
,
Bell
CJ
,
Calway
TD
,
Head
JJ
,
Holroyd
PA
,
Inoue
JG
,
Irmis
RB
,
Joyce
WG
,
Ksepka
DT
et al.
2012
.
Best practices for justifying fossil calibrations
.
Syst Biol
61
:
346
59
.

Parker
AK
,
Müller
J
,
Boisserie
J-R
,
Head
JJ
.
2023
.
The utility of body size as a functional trait to link the past and present in a diverse reptile clade
.
Proc Nat Acad Sci USA
120
:
e2201948119
.

Paszke
A
,
Gross
S
,
Massa
F
,
Lerer
A
,
Bradbury
J
,
Chanan
G
,
Killeen
T
,
Lin
Z
,
Gimelshein
N
,
Antiga
L
et al.
2019
.
PyTorch: an imperative style, high-performance deep learning library
. In:
Wallach
H
,
Larochelle
H
,
Beygelzimer
A
,
d'Alché-Buc
F
,
Fox
E
,
Garnett
R
, editors.
Advances in neural information processing systems
32
.
Curran Associates, Inc
. p.
8024
35
.

Pedregosa
F
,
Varoquaux
G
,
Gramfort
A
,
Michel
V
,
Thirion
B
,
Grisel
O
,
Blondel
M
,
Prettenhofer
P
,
Weiss
R
,
Dubourg
V
et al.
2011
.
Scikit-learn: machine learning in Python
.
J Mach Learn Res
12
:
2825
30
.

Pereira
TD
,
Aldarondo
DE
,
Willmore
L
,
Kislin
M
,
Wang
SS-H
,
Murthy
M
,
Shaevitz
JW
.
2019
.
Fast animal pose estimation using deep neural networks
.
Nat Methods
16
:
117
25
.

Perera
P
,
Patel
VM
.
2019
.
Learning deep features for one-class classification
.
IEEE Trans Image Process
28
:
5450
63
.

Perronnin
F
,
Dance
C
.
2007
.
Fisher kernels on visual vocabularies for image categorization
.
2007 IEEE Conference on Computer Vision and Pattern Recognition.
p.
1
8
.

Perronnin
F
,
Sánchez
J
,
Mensink
T
.
2010
.
Improving the Fisher kernel for large-scale image classification
.
Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV
.
Berlin and Heidelberg: Springer
. p.
143
56.

Philippe
H
,
Poustka
AJ
,
Chiodin
M
,
Hoff
KJ
,
Dessimoz
C
,
Tomiczek
B
,
Schiffer
PH
,
Müller
S
,
Domman
D
,
Horn
M
et al.
2019
.
Mitigating anticipated effects of systematic errors supports sister-group relationship between Xenacoelomorpha and Ambulacraria
.
Curr Biol
29
:
1818
26.e6
.

Phillips
SJ
,
Anderson
RP
,
Schapire
RE
.
2006
.
Maximum entropy modeling of species geographic distributions
.
Ecol Modell
190
:
231
59
.

Phillips
SJ
,
Dudík
M
,
Schapire
RE
.
2024
.
Maxent software for modeling species niches and distributions
(http://biodiversityinformatics.amnh.org/open_source/maxent/).

Pichler
M
,
Hartig
F
.
2023
.
Machine learning and deep learning—a review for ecologists
.
Methods Ecol Evol
14
:
994
1016
.

Pierson
E
,
Daoudi
M
,
Tumpach
A-B
.
2021
.
A Riemannian framework for analysis of human body surface
.
arXiv published online
().

Pinheiro
D
,
Santander-Jimenéz
S
,
Ilic
A
.
2022
.
PhyloMissForest: a random forest framework to construct phylogenetic trees with missing data
.
BMC Genomics
23
:
377
.

Pl@ntNet IPT.
2023
.
Pl@ntNet
(https://plantnet.org/en/).

Pollock
TI
,
Panagiotopoulou
O
,
Hocking
DP
,
Evans
AR
.
2022
.
Taking a stab at modelling canine tooth biomechanics in mammalian carnivores with beam theory and finite-element analysis
.
R Soc Open Sci
9
:
220701
.

Polly
PD
,
Stayton
CT
,
Dumont
ER
,
Pierce
SE
,
Rayfield
EJ
,
Angielczyk
KD
.
2016
.
Combining geometric morphometrics and finite element analysis with evolutionary modeling: towards a synthesis
.
J Vertebr Paleontol
36
:
e1111225
.

Pomidor
BJ
,
Makedonska
J
,
Slice
DE
.
2016
.
A landmark-free method for three-dimensional shape analysis
.
PLoS One
11
:
e0150368
.

Poon
STS
,
Hanna
FWF
,
Lemarchand
F
,
George
C
,
Clark
A
,
Lea
S
,
Coleman
C
,
Sollazzo
G
.
2023
.
Detecting adrenal lesions on 3D CT scans using a 2.5D deep learning model
.
medRxiv
published online
().

Porto
A
,
Rolfe
S
,
Maga
AM
.
2021
.
ALPACA: a fast and accurate computer vision approach for automated landmarking of three-dimensional biological structures
.
Methods Ecol Evol
12
:
2129
44
.

Porto
A
,
Voje
KL
.
2020
.
ML-morph: a fast, accurate and general approach for automated detection and landmarking of biological structures in images
.
Methods Ecol Evol
11
:
500
12
.

Pratapa
A
,
Doron
M
,
Caicedo
JC
.
2021
.
Image-based cell phenotyping with deep learning
.
Curr Opin Chem Biol
65
:
9
17
.

Price
SA
,
Friedman
ST
,
Corn
KA
,
Martinez
CM
,
Larouche
O
,
Wainwright
PC
.
2019
.
Building a body shape morphospace of teleostean fishes
.
Integr Comp Biol
59
:
716
30
.

Price
BW
,
Dupont
S
,
Allan
EL
,
Blagoderov
V
,
Butcher
AJ
,
Durrant
J
,
Holtzhausen
P
,
Kokkini
P
,
Livermore
L
,
Hardy
H
et al.
2018
.
ALICE: angled label image capture and extraction for high throughput insect specimen digitisation
.
OSF Preprints published online
().

Punyasena
SW
,
Tcheng
DK
,
Wesseln
C
,
Mueller
PG
.
2012
.
Classifying black and white spruce pollen using layered machine learning
.
New Phytol
196
:
937
44
.

Qin
A
,
Liu
B
,
Guo
Q
,
Bussmann
RW
,
Ma
F
,
Jian
Z
,
Xu
G
,
Pei
S
.
2017
.
Maxent modeling for predicting impacts of climate change on the potential distribution of Thuja sutchuenensis Franch., an extremely endangered conifer from southwestern China
.
Glob Ecol Conserv
10
:
139
46
.

Qin
Y
,
Havulinna
AS
,
Liu
Y
,
Jousilahti
P
,
Ritchie
SC
,
Tokolyi
A
,
Sanders
JG
,
Valsta
L
,
Brożyńska
M
,
Zhu
Q
et al.
2022a
.
Combined effects of host genetics and diet on human gut microbiota and incident disease in a single population cohort
.
Nat Genet
54
:
134
42
.

Qin
Z
,
Qin
F
,
Li
Y
,
Yu
C
.
2022b
.
Intelligent objective osteon segmentation based on deep learning
.
Front Earth Sci
10
:
783481
.

Rabinovich
JE
.
2021
.
Morphology, life cycle, environmental factors and fitness—a machine learning analysis in kissing bugs (Hemiptera, Reduviidae, Triatominae)
.
Front Ecol Evol
9
:
651683
.

Radford
A
,
Kim
JW
,
Hallacy
C
,
Ramesh
A
,
Goh
G
,
Agarwal
S
,
Sastry
G
,
Askell
A
,
Mishkin
P
,
Clark
J
et al.
2021
.
Learning transferable visual models from natural language supervision
. In:
Meila
M
,
Zhang
T
, editors.
Proceedings of the 38th International Conference on Machine Learning, Proceedings of Machine Learning Research.
New York: PMLR
. p.
8748
63.

Radford
A
,
Narasimhan
K
,
Salimans
T
,
Sutskever
I
.
2018
.
Improving language understanding by generative pre-training
.

Radford
A
,
Wu
J
,
Child
R
,
Luan
D
,
Amodei
D
,
Sutskever
I
.
2019
.
Language models are unsupervised multitask learners
(https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf).

Raissi
M
,
Perdikaris
P
,
Karniadakis
GE
.
2019
.
Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations
.
J Comput Phys
378
:
686
707
.

Rau
C
,
Marathe
S
,
Bodey
AJ
,
Storm
M
,
Batey
D
,
Cipiccia
S
,
Li
P
,
Ziesche
R
.
2021
.
High-throughput micro and nano-tomography
. In:
Müller
B
,
Wang
G
, editors.
Developments in x-ray tomography XIII
.
San Diego: SPIE
. p.
49.

Ravindran
S
.
2022
.
Five ways deep learning has transformed image analysis
.
Nature
609
:
864
6
.

Ren
S
,
He
K
,
Girshick
R
,
Sun
J
.
2016
.
Faster R-CNN: towards real-time object detection with region proposal networks
.
arXiv
published online (
)
.

Rezaeitaleshmahalleh
M
,
Mu
N
,
Lyu
Z
,
Zhou
W
,
Zhang
X
,
Rasmussen
TE
,
McBane
RD
,
Jiang
J
.
2023
.
Radiomic-based textural analysis of intraluminal thrombus in aortic abdominal aneurysms: a demonstration of automated workflow
.
J Cardiovasc Transl Res
16
:
1123
34
.

Robillard
AJ
,
Trizna
MG
,
Ruiz-Tafur
M
,
Dávila Panduro
EL
,
de Santana
CD
,
White
AE
,
Dikow
RB
,
Deichmann
JL
.
2023
.
Application of a deep learning image classifier for identification of Amazonian fishes
.
Ecol Evol
13
:
1
9
.

Rolfe
S
,
Pieper
S
,
Porto
A
,
Diamond
K
,
Winchester
J
,
Shan
S
,
Kirveslahti
H
,
Boyer
D
,
Summers
A
,
Maga
AM
.
2021
.
SlicerMorph: an open and extensible platform to retrieve, visualize and analyse 3D morphology
.
Methods Ecol Evol
12
:
1816
25
.

Rolfe
SM
,
Whikehart
SM
,
Maga
AM
.
2023
.
Deep learning enabled multi-organ segmentation of mouse embryos
.
Biol Open
12
:
bio059698
.

Ronneberger
O
,
Fischer
P
,
Brox
T
.
2015
.
U-Net: convolutional networks for biomedical image segmentation
.
Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015
.
Lecture Notes in Computer Science. Presented at the International Conference on Medical Image Computing and Computer-Assisted Intervention
.
Munich:
Springer
. p.
234
41.

Rosenblatt
F
.
1958
.
The perceptron: a probabilistic model for information storage and organization in the brain
.
Psychol Rev
65
:
386
408
.

Ross
CF
.
2005
.
Finite element analysis in vertebrate biomechanics
.
Anat Rec A Discov Mol Cell Evol Biol
283
:
253
8
.

Roure
B
,
Baurain
D
,
Philippe
H
.
2013
.
Impact of missing data on phylogenies inferred from empirical phylogenomic data sets
.
Mol Biol Evol
30
:
197
214
.

Rowe
T
.
2002
.
DigiMorph
(https://www.digimorph.org).

Ruder
S
.
2017
.
An overview of multi-task learning in deep neural networks
.
arXiv published online (
).

Rummel
AD
,
Sheehy
ET
,
Schachner
ER
,
Hedrick
BP
.
2024
.
Sample size and geometric morphometrics methodology impact the evaluation of morphological variation
.
Integr Org Biol
6
:
obae002
.

Russakovsky
O
,
Deng
J
,
Su
H
,
Krause
J
,
Satheesh
S
,
Ma
S
,
Huang
Z
,
Karpathy
A
,
Khosla
A
,
Bernstein
M
et al.
2015
.
ImageNet large scale visual recognition challenge
.
arXiv published online (
).

Russell
SJ
,
Norvig
P
.
2021
.
Artificial intelligence: a modern approach.
4th ed.
Harlow: Pearson
.

Salifu
D
,
Ibrahim
EA
,
Tonnang
HEZ
.
2022
.
Leveraging machine learning tools and algorithms for analysis of fruit fly morphometrics
.
Sci Rep
12
:
7208
.

Salili-James
A
,
Mackay
A
,
Rodriguez-Alvarez
E
,
Rodriguez-Perez
D
,
Mannack
T
,
Rawlings
TA
,
Palmer
AR
,
Todd
J
,
Riutta
TE
,
Macinnis-Ng
C
et al.
2022a
.
Classifying organisms and artefacts by their outline shapes
.
J R Soc Interface
19
:
20220493
.

Salili-James
A
,
Scott
B
,
Smith
V
.
2022b
.
ALICE Software: machine learning & computer vision for automatic label extraction
.
Biodivers Inf Sci Stand
6
:
e91443
.

Samoili
S
,
Cobo
ML
,
Gomez
E
,
De Prato
G
,
Martinez-Plumed
F
,
Delipetrev
B
.
2020
.
AI watch. Defining artificial intelligence. Towards an operational definition and taxonomy of artificial intelligence
(https://publications.jrc.ec.europa.eu/repository/handle/JRC118163).

Sapoval
N
,
Aghazadeh
A
,
Nute
MG
,
Antunes
DA
,
Balaji
A
,
Baraniuk
R
,
Barberan
CJ
,
Dannenfelser
R
,
Dun
C
,
Edrisi
M
et al.
2022
.
Current progress and open challenges for applying deep learning across the biosciences
.
Nat Commun
13
:
1728
.

Saupe
EE
,
Farnsworth
A
,
Lunt
DJ
,
Sagoo
N
,
Pham
KV
,
Field
DJ
.
2019
.
Climatic shifts drove major contractions in avian latitudinal distributions throughout the Cenozoic
.
Proc Nat Acad Sci USA
116
:
12895
900
.

Schlager
S
,
Profico
A
,
Vincenzo
FD
,
Manzi
G
.
2018
.
Retrodeformation of fossil specimens based on 3D bilateral semi-landmarks: implementation in the R package “Morpho”
.
PLoS One
13
:
e0194073
.

Schmidt
S
,
Balke
M
,
Lafogler
S
.
2012
.
DScan—a high-performance digital scanning system for entomological collections
.
ZooKeys
209
:
183
91
.

Schneider
CA
,
Rasband
WS
,
Eliceiri
KW
.
2012
.
NIH Image to ImageJ: 25 years of image analysis
.
Nat Methods
9
:
671
5
.

Schneider
L
,
Niemann
A
,
Beuing
O
,
Preim
B
,
Saalfeld
S
.
2021
.
MedmeshCNN—enabling MeshCNN for medical surface models
.
Comput Methods Programs Biomed
210
:
106372
.

Schneider
S
,
Greenberg
S
,
Taylor
GW
,
Kremer
SC
.
2020
.
Three critical factors affecting automated image species recognition performance for camera traps
.
Ecol Evol
10
:
3503
17
.

Schuettpelz
E
,
Frandsen
PB
,
Dikow
RB
,
Brown
A
,
Orli
S
,
Peters
M
,
Metallo
A
,
Funk
VA
,
Dorr
LJ
.
2017
.
Applications of deep convolutional neural networks to digitized natural history collections
.
Biodivers Data J
5
:
e21139
.

Schwartz
ST
,
Alfaro
ME
.
2021
.
Sashimi: a toolkit for facilitating high-throughput organismal image segmentation using deep learning
.
Methods Ecol Evol
12
:
2341
54
.

Scott
B
,
Livermore
L
.
2021
.
Extracting data at scale: machine learning at the Natural History Museum
.
Biodivers Inf Sci Stand
5
:
e74031
.

Scott
B
,
Salili-James
A
,
Smith
V
.
2023
.
Robot-in-the-loop: prototyping robotic digitisation at the Natural History Museum
.
Biodivers Inf Sci Stand
7
:
e112947
.

Shallue
CJ
,
Vanderburg
A
.
2018
.
Identifying exoplanets with deep learning: a five-planet resonant chain around Kepler-80 and an eighth planet around Kepler-90
.
Astron J
155
:
94
.

Sharif Razavian
A
,
Azizpour
H
,
Sullivan
J
,
Carlsson
S
.
2014
.
CNN features off-the-shelf: an astounding baseline for recognition
.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.
p.
806
13
.

Shearer
BM
,
Cooke
SB
,
Halenar
LB
,
Reber
SL
,
Plummer
JE
,
Delson
E
,
Tallman
M
.
2017
.
Evaluating causes of error in landmark-based data collection using scanners
.
PLoS One
12
:
e0187452
.

Shi
D
,
Wang
Y
,
Ai
Z
.
2010
.
Effect of anterior cruciate ligament reconstruction on biomechanical features of knee in level walking: a meta-analysis
.
Chin Med J (Engl)
123
:
3137
.

Shmatko
A
,
Ghaffari Laleh
N
,
Gerstung
M
,
Kather
JN
.
2022
.
Artificial intelligence in histopathology: enhancing cancer research and clinical oncology
.
Nature Cancer
3
:
1026
38
.

Shorten
C
,
Khoshgoftaar
TM
.
2019
.
A survey on image data augmentation for deep learning
.
J Big Data
6
:
60
.

Shu
Z
,
Yang
S
,
Wu
H
,
Xin
S
,
Pang
C
,
Kavan
L
,
Liu
L
.
2022
.
3D shape segmentation using soft density peak clustering and semi-supervised learning
.
Comput-Aided Des
145
:
103181
.

Sillero
N
,
Barbosa
AM
.
2021
.
Common mistakes in ecological niche models
.
Int J Geogr Inf Sci
35
:
213
26
.

Silver
D
,
Huang
A
,
Maddison
CJ
,
Guez
A
,
Sifre
L
,
Van Den Driessche
G
,
Schrittwieser
J
,
Antonoglou
I
,
Panneershelvam
V
,
Lanctot
M
et al.
2016
.
Mastering the game of Go with deep neural networks and tree search
.
Nature
529
:
484
9
.

Simpson
GG
.
1984
.
Tempo and mode in evolution
.
New York: Columbia University Press
.

Sinpoo
C
,
Disayathanoowat
T
,
Williams
PH
,
Chantawannakul
P
.
2019
.
Prevalence of infection by the microsporidian Nosema spp. in native bumblebees (Bombus spp.) in northern Thailand
.
PLoS One
14
:
e0213171
.

Slater
GJ
,
Harmon
LJ
,
Alfaro
ME
.
2012
.
Integrating fossils with molecular phylogenies improves inference of trait evolution
.
Evolution
66
:
3931
44
.

Smart
U
,
Ingrasci
MJ
,
Sarker
GC
,
Lalremsanga
H
,
Murphy
RW
,
Ota
H
,
Tu
MC
,
Shouche
Y
,
Orlov
NL
,
Smith
EN
.
2021
.
A comprehensive appraisal of evolutionary diversity in venomous Asian coralsnakes of the genus Sinomicrurus (Serpentes: Elapidae) using Bayesian coalescent inference and supervised machine learning
.
J Zool Syst Evol Res
59
:
2212
77
.

Smith
VS
,
Blagoderov
V
.
2012
.
Bringing collections out of the dark
.
ZooKeys
209
:
1
6
.

Smith
ML
,
Hahn
MW
.
2023
.
Phylogenetic inference using generative adversarial networks
.
Bioinformatics
39
:
btad543
.

Smith
ND
,
Turner
AH
.
2005
.
Morphology's role in phylogeny reconstruction: perspectives from paleontology
.
Syst Biol
54
:
166
73
.

Söderkvist
OJO
.
2016
.
Swedish leaf dataset
.
Linköping
,
Linköping University
.

Söderkvist
OJO
.
2001
.
Computer vision classification of leaves from Swedish trees [master's thesis]
. Linkoping University
.

Soltis
PS
,
Nelson
G
,
Zare
A
,
Meineke
EK
.
2020
.
Plants meet machines: prospects in machine learning for plant biology
.
Appl Plant Sci
8
:
e11371
.

Sosiak
CE
,
Barden
P
.
2021
.
Multidimensional trait morphology predicts ecology across ant lineages
.
Funct Ecol
35
:
139
52
.

Spradley
JP
,
Glazer
BJ
,
Kay
RF
.
2019
.
Mammalian faunas, ecological indices, and machine-learning regression for the purpose of paleoenvironment reconstruction in the Miocene of South America
.
Palaeogeogr Palaeoclimatol Palaeoecol
518
:
155
71
.

Srivastava
A
,
Klassen
E
,
Joshi
SH
,
Jermyn
IH
.
2011
.
Shape analysis of elastic curves in Euclidean spaces
.
IEEE Trans Pattern Anal Mach Intell
33
:
1415
28
.

Stevens
S
,
Wu
J
,
Thompson
MJ
,
Campolongo
EG
,
Song
CH
,
Carlyn
DE
,
Dong
L
,
Dahdul
WM
,
Stewart
C
,
Berger-Wolf
T
et al.
2024
.
Bioclip: a vision foundation model for the tree of life
.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
. p.
19412
24
.

Ströbel
B
,
Schmelzle
S
,
Blüthgen
N
,
Heethoff
M
.
2018
.
An automated device for the digitization and 3D modelling of insects, combining extended-depth-of-field and all-side multi-view imaging
.
ZooKeys
759
:
1
27
.

Stubbs
TL
,
Benton
MJ
.
2016
.
Ecomorphological diversifications of Mesozoic marine reptiles: the roles of ecological opportunity and extinction
.
Paleobiology
42
:
547
73
.

Su
H
,
Maji
S
,
Kalogerakis
E
,
Learned-Miller
E
.
2015
.
Multi-view convolutional neural networks for 3D shape recognition
.
Proceedings of the IEEE International Conference on Computer Vision.
p.
945
53
.

Sun
C-Y
,
Yang
Y-Q
,
Guo
H-X
,
Wang
P-S
,
Tong
X
,
Liu
Y
,
Shum
H-Y
.
2023
.
Semi-supervised 3D shape segmentation with multilevel consistency and part substitution
.
Computational Visual Media
9
:
229
47
.

Sun
X
,
Shi
J
,
Liu
L
,
Dong
J
,
Plant
C
,
Wang
X
,
Zhou
H
.
2018
.
Transferring deep knowledge for object recognition in low-quality underwater videos
.
Neurocomputing
275
:
897
908
.

Suvorov
A
,
Hochuli
J
,
Schrider
DR
.
2020
.
Accurate inference of tree topologies from multiple sequence alignments using deep learning
.
Syst Biol
69
:
221
33
.

Tan
H
,
Qiu
S
,
Wang
J
,
Yu
G
,
Guo
W
,
Guo
M
.
2022
.
Weighted deep factorizing heterogeneous molecular network for genome-phenome association prediction
.
Methods
205
:
18
28
.

Tan
C
,
Sun
F
,
Kong
T
,
Zhang
W
,
Yang
C
,
Liu
C
.
2018
.
A survey on deep transfer learning
. In:
Kůrková
V
,
Manolopoulos
Y
,
Hammer
B
,
Iliadis
L
,
Maglogiannis
I
, editors.
Artificial neural networks and machine learning—ICANN 2018.
Cham: Springer International Publishing
. p.
270
9.

Tang
X
,
Yuan
Y
,
Li
X
,
Zhang
J
.
2021
.
Maximum entropy modeling to predict the impact of climate change on pine wilt disease in China
.
Front Plant Sci
12
:
652500
.

Tavaré
S
,
Balding
DJ
,
Griffiths
RC
,
Donnelly
P
.
1997
.
Inferring coalescence times from DNA sequence data
.
Genetics
145
;
505
18
.

Tesseract
OCR
.
2021
.
Tesseract documentation
(https://tesseract-ocr.github.io).

Toussaint
N
,
Redhead
Y
,
Vidal-García
M
,
Lo Vercio
L
,
Liu
W
,
Fisher
EMC
,
Hallgrímsson
B
,
Tybulewicz
VLJ
,
Schnabel
JA
,
Green
JBA
.
2021
.
A landmark-free morphometrics pipeline for high-resolution phenotyping: application to a mouse model of Down syndrome
.
Development
148
:
dev188631
.

Tseng
ZJ
,
Garcia-Lara
S
,
Flynn
JJ
,
Holmes
E
,
Rowe
TB
,
Dickson
BV
.
2023
.
A switch in jaw form–function coupling during the evolution of mammals
.
Philos Trans R Soc Lond B Biol Sci
378
:
20220091
.

Tuladhar
A
,
Schimert
S
,
Rajashekar
D
,
Kniep
HC
,
Fiehler
J
,
Forkert
ND
.
2020
.
Automatic segmentation of stroke lesions in non-contrast computed tomography datasets with convolutional neural networks
.
IEEE Access
8
:
94871
9
.

Unger
S
,
Rollins
M
,
Tietz
A
,
Dumais
H
.
2021
.
iNaturalist as an engaging tool for identifying organisms in outdoor activities
.
J Biol Educ
55
:
537
47
.

Valan
M
,
Makonyi
K
,
Maki
A
,
Vondráček
D
,
Ronquist
F
.
2019
.
Automated taxonomic identification of insects with expert-level accuracy using effective feature transfer from convolutional networks
.
Syst Biol
68
:
876
95
.

van de Kamp
T
,
Schwermann
AH
,
dos Santos Rolo
T
,
Lösel
PD
,
Engler
T
,
Etter
W
,
Faragó
T
,
Göttlicher
J
,
Heuveline
V
,
Kopmann
A
et al.
2018
.
Parasitoid biology preserved in mineralized fossils
.
Nat Commun
9
:
3325
.

Van Den Berg
CP
,
Troscianko
J
,
Endler
JA
,
Marshall
NJ
,
Cheney
KL
.
2020
.
Quantitative colour pattern analysis (QCPA): a comprehensive framework for the analysis of colour patterns in nature
.
Methods Ecol Evol
11
:
316
32
.

Van Der Bijl
W
,
Zeuss
D
,
Chazot
N
,
Tunström
K
,
Wahlberg
N
,
Wiklund
C
,
Fitzpatrick
JL
,
Wheat
CW
.
2020
.
Butterfly dichromatism primarily evolved via Darwin's, not Wallace's, model
.
Evol Lett
4
:
545
55
.

van der Walt
S
,
Schönberger
JL
,
Nunez-Iglesias
J
,
Boulogne
F
,
Warner
JD
,
Yager
N
,
Gouillart
E
,
Yu
T
; the scikit-image contributors.
2014
.
scikit-image: image processing in Python
.
PeerJ
2
:
e453
.

Vasconcelos
T
,
Boyko
JD
,
Beaulieu
JM
.
2023
.
Linking mode of seed dispersal and climatic niche evolution in flowering plants
.
J Biogeogr
50
:
43
56
.

Vaswani
A
,
Shazeer
N
,
Parmar
N
,
Uszkoreit
J
,
Jones
L
,
Gomez
AN
,
Kaiser
L
,
Polosukhin
I
.
2017
.
Attention is all you need
.
arXiv published online (
)
.

Vedaldi
A
,
Favaro
P
,
Grisan
E
.
2007
.
Boosting invariance and efficiency in supervised learning
.
2007 IEEE 11th International Conference on Computer Vision.
IEEE
. p.
1
8.

Vermillion
WA
,
Polly
PD
,
Head
JJ
,
Eronen
JT
,
Lawing
AM
.
2018
.
Ecometrics: a trait-based approach to paleoclimate and paleoenvironmental reconstruction
. In:
Croft
DA
,
Su
DF
,
Simpson
SW
, editors.
Methods in paleoecology
,
vertebrate paleobiology and paleoanthropology.
Cham: Springer International Publishing
. p.
373
94.

Vigneron
JP
,
Kertész
K
,
Vértesy
Z
,
Rassart
M
,
Lousse
V
,
Bálint
Z
,
Biró
LP
.
2008
.
Correlated diffraction and fluorescence in the backscattering iridescence of the male butterfly Troides magellanus (Papilionidae)
.
Phys Rev E
78
:
021903
.

Viroli
C
,
McLachlan
GJ
.
2019
.
Deep Gaussian mixture models
.
Stat Comput
29
:
43
51
.

Vision AI
.
n.d.
Google Cloud
(https://cloud.google.com/vision).

Vurtur Badarinath
P
,
Chierichetti
M
,
Davoudi Kakhki
F
.
2021
.
A machine learning approach as a surrogate for a finite element analysis: status of research and application to one-dimensional systems
.
Sensors
21
:
1654
.

Vydana
HK
,
Karafiat
M
,
Zmolikova
K
,
Burget
L
,
Cernocky
H
.
2021
.
Jointly trained transformers models for spoken language translation
.
ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
IEEE
. p.
7513
7.

Wäldchen
J
,
Mäder
P
.
2018
.
Machine learning for image-based species identification
.
Methods Ecol Evol
9
:
2216
25
.

Walker
BE
,
Tucker
A
,
Nicolson
N
.
2022
.
Harnessing large-scale herbarium image datasets through representation learning
.
Front Plant Sci
12
:
1
12
.

Walton
S
,
Livermore
L
,
Dillen
M
,
Smedt
SD
,
Groom
Q
,
Koivunen
A
,
Phillips
S
.
2020
.
A cost analysis of transcription systems
.
Res Ideas Outcomes
6
:
e56211
.

Wang
L
,
Shao
J
,
Fang
F
.
2021a
.
Propensity model selection with nonignorable nonresponse and instrument variable
.
Stat Sin
published online
().

Wang
Y
,
Yao
Q
,
Kwok
JT
,
Ni
LM
.
2021b
.
Generalizing from a few examples: a survey on few-shot learning
.
ACM Comput Surv
53
:
1
34
.

Weaver
WN
,
Ng
J
,
Laport
RG
.
2020
.
LeafMachine: using machine learning to automate leaf trait extraction from digitized herbarium specimens
.
Appl Plant Sci
8
:
e11367
.

Weaver
WN
,
Smith
SA
.
2023
.
From leaves to labels: building modular machine learning networks for rapid herbarium specimen analysis with LeafMachine2
.
Appl Plant Sci
11
:
e11548
.

Wei
S-E
,
Ramakrishna
V
,
Kanade
T
,
Sheikh
Y
.
2016
.
Convolutional pose machines
.
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
IEEE
. p.
4724
32.

Weller
HI
,
Hiller
AE
,
Lord
NP
,
Van Belleghem
SM
.
2024
.
recolorize: an R package for flexible colour segmentation of biological images
.
Ecol Lett
27
:
e14378
.

Whang
SE
,
Roh
Y
,
Song
H
,
Lee
J-G
.
2023
.
Data collection and quality challenges in deep learning: a data-centric AI perspective
.
VLDB J
32
:
791
813
.

White
AE
,
Dikow
RB
,
Baugh
M
,
Jenkins
A
,
Frandsen
PB
.
2020
.
Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning
.
Appl Plant Sci
8
:
1
8
.

White
HE
,
Goswami
A
,
Tucker
AS
.
2021
.
The intertwined evolution and development of sutures and cranial morphology
.
Front Cell Dev Biol
9
:
653579
.

Wiens
JJ
.
2006
.
Missing data and the design of phylogenetic analyses
.
J Biomed Inform
39
:
34
42
.

Wiens
JJ
.
2001
.
Character analysis in morphological phylogenetics: problems and solutions
.
Syst Biol
50
:
689
99
.

Wilf
P
,
Wing
SL
,
Meyer
HW
,
Rose
JA
,
Saha
R
,
Serre
T
,
Cúneo
NR
,
Donovan
MP
,
Erwin
DM
,
Gandolfo
MA
et al.
2021
.
An image dataset of cleared, x-rayed, and fossil leaves vetted to plant family for human and machine learning
.
PhytoKeys
187
:
93
128
.

Willers
C
,
Bauman
G
,
Andermatt
S
,
Santini
F
,
Sandkühler
R
,
Ramsey
KA
,
Cattin
PC
,
Bieri
O
,
Pusterla
O
,
Latzin
P
.
2021
.
The impact of segmentation on whole-lung functional MRI quantification: repeatability and reproducibility from multiple human observers and an artificial neural network
.
Magn Reson Med
85
:
1079
92
.

Wilson
RJ
,
De Siqueira
AF
,
Brooks
SJ
,
Price
BW
,
Simon
LM
,
Van Der Walt
SJ
,
Fenberg
PB
.
2023
.
Applying computer vision to digitised natural history collections for climate change research: temperature-size responses in British butterflies
.
Methods Ecol Evol
14
:
372
84
.

Wolfram Research, Inc
.
2024
.
Mathematica
(https://www.wolfram.com/mathematica/).

Wu
D
,
Wu
D
,
Feng
H
,
Duan
L
,
Dai
G
,
Liu
X
,
Wang
K
,
Yang
P
,
Chen
G
,
Gay
AP
et al.
2021
.
A deep learning-integrated micro-CT image analysis pipeline for quantifying rice lodging resistance-related traits
.
Plant Commun
2
:
100165
.

Wu
Y
,
Kirillov
A
.
2019
.
Detectron2
(https://github.com/facebookresearch/detectron2).

Wu
Z
,
Xiong
Y
,
Yu
S
,
Lin
D
.
2018
.
Unsupervised feature learning via non-parametric instance-level discrimination
.
arXiv published online (
).

Yang
B
,
Zhang
Z
,
Yang
C-Q
,
Wang
Y
,
Orr
MC
,
Wang
H
,
Zhang
A-B
.
2022
.
Identification of species by combining molecular and morphological data using convolutional neural networks
.
Syst Biol
71
:
690
705
.

Yang
Z
.
2015
.
The BPP program for species tree estimation and species delimitation
.
Curr Zool
61
:
854
65
.

Ye
S
,
Lauer
J
,
Zhou
M
,
Mathis
A
,
Mathis
MW
.
2023
.
AmadeusGPT: a natural language interface for interactive animal behavioral analysis
.
arXiv published online (
).

Young
R
,
Maga
AM
.
2015
.
Performance of single and multi-atlas based automated landmarking methods compared to expert annotations in volumetric microCT datasets of mouse mandibles
.
Front Zool
12
:
33
.

Yu
C
,
Qin
F
,
Li
Y
,
Qin
Z
,
Norell
M
.
2022
.
CT segmentation of dinosaur fossils by deep learning
.
Front Earth Sci
9
:
805271
.

Yu
C
,
Qin
F
,
Watanabe
A
,
Yao
W
,
Li
Y
,
Qin
Z
,
Liu
Y
,
Wang
H
,
Jiangzuo
Q
,
Hsiang
AY
et al.
2024
.
Artificial intelligence in paleontology
.
Earth Sci Rev
252
:
104765
.

Yu
L
,
Shi
J
,
Huang
C
,
Duan
L
,
Wu
D
,
Fu
D
,
Wu
C
,
Xiong
L
,
Yang
W
,
Liu
Q
.
2021
.
An integrated rice panicle phenotyping method based on X-ray and RGB scanning and deep learning
.
Crop J
9
:
42
56
.

Zaharias
P
,
Grosshauser
M
,
Warnow
T
.
2022
.
Re-evaluating deep neural networks for phylogeny estimation: the issue of taxon sampling
.
J Comput Biol
29
:
74
89
.

Zarkogiannis
SD
,
Antonarakou
A
,
Fernandez
V
,
Mortyn
PG
,
Kontakiotis
G
,
Drinia
H
,
Greaves
M
.
2020a
.
Evidence of stable foraminifera biomineralization during the last two climate cycles in the tropical Atlantic Ocean
. J Marine Sci Eng
8
:
737
.

Zarkogiannis
SD
,
Kontakiotis
G
,
Gkaniatsa
G
,
Kuppili
VSC
,
Marathe
S
,
Wanelik
K
,
Lianou
V
,
Besiou
E
,
Makri
P
,
Antonarakou
A
.
2020b
.
An improved cleaning protocol for foraminiferal calcite from unconsolidated core sediments: hyPerCal—a new practice for micropaleontological and paleoclimatic proxies
.
J Marine Sci Eng
8
:
998
.

Zelditch
ML
,
Fink
WL
,
Sheets
HD
,
Swiderski
DL
.
2004
.
Geometric morphometrics for biologists: a primer
.
New York: Elsevier
.

Zelditch
ML
,
Goswami
A
.
2021
.
What does modularity mean?
Evol Devel
23
:
377
403
.

Zeng
A
,
Yan
L
,
Huang
Y
,
Ren
E
,
Liu
T
,
Zhang
H
.
2021
.
Intelligent detection of small faults using a support vector machine
.
Energies
14
:
6242
.

Zhang
D
,
Maslej
N
,
Brynjolfsson
E
,
Etchemendy
J
,
Lyons
T
,
Manyika
J
,
Ngo
H
,
Niebles
JC
,
Sellitto
M
,
Sakhaee
E
et al.
2022
.
The AI Index 2022 annual report
(https://aiindex.stanford.edu/ai-index-report-2022/).

Zhang
H
,
Starke
S
,
Komura
T
,
Saito
J
.
2018
.
Mode-adaptive neural networks for quadruped motion control
.
ACM Trans Graph
37
:
145:1–11
.

Zhang
L
,
Wang
Y
,
Ruhl
M
,
Xu
Y
,
Zhu
Y
,
An
P
,
Chen
H
,
Yan
D
.
2023
.
Machine-learning-based morphological analyses of leaf epidermal cells in modern and fossil ginkgo and their implications for palaeoclimate studies
.
Palaeontology
66
:
e12684
.

Zhao
M
,
Liu
Q
,
Jha
A
,
Deng
R
,
Yao
T
,
Mahadevan-Jansen
A
,
Tyska
MJ
,
Millis
BA
,
Huo
Y
.
2021
.
VoxelEmbed: 3D instance segmentation and tracking with voxel embedding based deep learning
. In:
Lian
C
,
Cao
X
,
Rekik
I
,
Xu
X
,
Yan
P,
editors.
Machine learning in medical imaging
(
Lecture Notes in Computer Science
).
Cham: Springer International Publishing
. p.
437
46.

Zhou
C
,
Sun
C
,
Liu
Z
,
Lau
F
.
2015
.
A C-LSTM neural network for text classification
.
arXiv published online (
).

Zhu
S
,
Brazil
G
,
Liu
X
.
2020
.
The edge of depth: explicit constraints between segmentation and depth
.
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
.
IEEE
. p.
13113
22.

Zhu
X
,
Goldberg
AB
.
2022
.
Introduction to semi-supervised learning
.
Switzerland: Springer Nature
.

Zou
Z
,
Zhang
H
,
Guan
Y
,
Zhang
J
.
2020
.
Deep residual neural networks resolve quartet molecular phylogenies
.
Mol Biol Evol
37
:
1495
507
.

Author notes

Joint first-authors.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.