Estimating animal pose using deep learning: a trained deep learning model outperforms morphological analysis
Keywords:animal pose estimation, deep learning, morphological analysis
INTRODUCTION: Analyzing animal behavior helps researchers understand their decision-making process and helper tools are rapidly becoming an indispensable part of many interdisciplinary studies. However, researchers are often challenged to estimate animal pose because of the limitation of the tools and its vulnerability to a specific environment. Over the years, deep learning has been introduced as an alternative solution to overcome these challenges.
OBJECTIVES: This study investigates how deep learning models can be applied for the accurate prediction of animal behavior, comparing with traditional morphological analysis based on image pixels.
METHODS: Transparent Omnidirectional Locomotion Compensator (TOLC), a tracking device, is used to record videos with a wide range of animal behavior. Recorded videos contain two insects: a walking red imported fire ant (Solenopsis invicta) and a walking fruit fly (Drosophila melanogaster). Body parts such as the head, legs, and thorax, are estimated by using an open-source deep-learning toolbox. A deep learning model, ResNet-50, is trained to predict the body parts of the fire ant and the fruit fly respectively. 500 image frames for each insect were annotated by humans and then compared with the predictions of the deep learning model as well as the points generated from the morphological analysis.
RESULTS: The experimental results show that the average distance between the deep learning-predicted centroids and the human-annotated centroids is 2.54, while the average distance between the morphological analysis-generated centroids and the human-annotated centroids is 6.41 over the 500 frames of the fire ant. For the fruit fly, the average distance of the centroids between the deep learning- predicted and the human-annotated is 2.43, while the average distance of the centroids between the morphological analysis-generated and the human-annotated is 5.06 over the 477 image frames.
CONCLUSION: In this paper, we demonstrate that the deep learning model outperforms traditional morphological analysis in terms of estimating animal pose in a series of video frames.
How to Cite
This is an open-access article distributed under the terms of the Creative Commons Attribution CC BY 3.0 license, which permits unlimited use, distribution, and reproduction in any medium so long as the original work is properly cited.