Publication Date:
2024-05-23
Description:
The ongoing biodiversity crisis, driven by factors such as land-use change and global
warming, emphasizes the need for effective ecological monitoring methods. Acoustic monitoring of
biodiversity has emerged as an important monitoring tool. Detecting human voices in soundscape
monitoring projects is useful both for analyzing human disturbance and for privacy filtering. Despite
significant strides in deep learning in recent years, the deployment of large neural networks on
compact devices poses challenges due to memory and latency constraints. Our approach focuses
on leveraging knowledge distillation techniques to design efficient, lightweight student models for
speech detection in bioacoustics. In particular, we employed the MobileNetV3-Small-Pi model to
create compact yet effective student architectures to compare against the larger EcoVADteacher model,
a well-regarded voice detection architecture in eco-acoustic monitoring. The comparative analysis
included examining various configurations of the MobileNetV3-Small-Pi-derived student models to
identify optimal performance. Additionally, a thorough evaluation of different distillation techniques
was conducted to ascertain the most effective method for model selection. Our findings revealed that
the distilled models exhibited comparable performance to the EcoVAD teacher model, indicating a
promising approach to overcoming computational barriers for real-time ecological monitoring.
Keywords:
passive acoustic monitoring
;
eco-acoustics
;
deep learning
;
knowledge distillation
;
bioacoustics
;
classification
;
transfer learning
;
speech detection
Repository Name:
National Museum of Natural History, Netherlands
Type:
info:eu-repo/semantics/article
Format:
application/pdf
Permalink