AnimalEnvNet: A Deep Reinforcement Learning Method for Constructing Animal Agents Using Multimodal Data Fusion

Chen, Zhao; Wang, Dianchang; Zhao, Feixiang; Dai, Lingnan; Zhao, Xinrong; Jiang, Xian; Zhang, Huaiqing

doi:10.3390/app14146382

Open AccessArticle

AnimalEnvNet: A Deep Reinforcement Learning Method for Constructing Animal Agents Using Multimodal Data Fusion

by

Zhao Chen

^1,2,

Dianchang Wang

^1,2,

Feixiang Zhao

^1,2,

Lingnan Dai

^1,2,

Xinrong Zhao

^1,2,

Xian Jiang

³ and

Huaiqing Zhang

^3,*

¹

School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China

²

Engineering Research Center for Forestry-Oriented Intelligent Information Processing, National Forestry and Grassland Administration, Beijing 100083, China

³

Institute of Forest Resource Information Techniques, Chinese Academy of Forestry, Beijing 100091, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(14), 6382; https://doi.org/10.3390/app14146382

Submission received: 11 June 2024 / Revised: 20 July 2024 / Accepted: 20 July 2024 / Published: 22 July 2024

Download

Browse Figures

Versions Notes

Abstract

:

Simulating animal movement has long been a central focus of study in the area of wildlife behaviour studies. Conventional modelling methods have difficulties in accurately representing changes over time and space in the data, and they generally do not effectively use telemetry data. Thus, this paper introduces a new and innovative deep reinforcement learning technique known as AnimalEnvNet. This approach combines historical trajectory data and remote sensing images to create an animal agent using deep reinforcement learning techniques. It overcomes the constraints of conventional modelling approaches. We selected pandas as the subject of our research and carried out research using GPS trajectory data, Google Earth images, and Sentinel-2A remote sensing images. The experimental findings indicate that AnimalEnvNet reaches convergence during supervised learning training, attaining a minimal mean absolute error (MAE) of 28.4 m in single-step prediction when compared to actual trajectories. During reinforcement learning training, the agent has the capability to replicate animal locomotion for a maximum of 12 iterations, while maintaining an error margin of 1000 m. This offers a novel approach and viewpoint for mimicking animal behaviour.

Keywords:

deep reinforcement learning; AnimalEnvNet; multimodal data fusion; animal movement behaviour mode; CNN; LSTM

1. Introduction

Due to the increasing occurrence of global climate change, the ecological environment is encountering unparalleled difficulties [1,2]. This alteration not only has a direct effect on the natural environments where wild animals live, but it also significantly influences their movements during migration, their methods of searching for food, and their techniques for staying alive [3,4]. Therefore, it is essential to gain a more profound comprehension of the behavioural tendencies shown by wild animals in relation to climate change [5].

Animal movement modelling refers to the systematic analysis of animal movement behaviour using quantitative methods [6]. Nevertheless, conventional modelling techniques that rely on binary location data have inherent constraints in accurately representing the intricacies of animal movement [7]. Technological improvements have enabled the use of sophisticated deep learning and reinforcement learning methods, which provide unparalleled versatility [8]. The current approaches for modelling animal behaviour include mathematical theory-based methods, agent-based methods, and intelligence methods based on deep learning [9,10,11]. The step-selection function (SSF) is a common technique in animal locomotion modelling that incorporates environmental variables into the model to simulate movement [12]. Nevertheless, the SSF usually assumes that the animal’s decision-making process is unchanging and does not vary over time, which may not fully capture the dynamic character of animal behaviour.

Mathematical theory-based approaches serve as essential tools for analysing animal movement behaviour [13]. Among these models, those that use random walks may effectively replicate animal movement over extended periods of time, although their foundation is mostly in stochastic modelling [14,15]. Without more data about an animal’s movements, it is challenging to differentiate behavioural changes resulting from environmental or spatial interactions just through observation and analysis of movement trajectories. Models such as Carlson and Soucek’s [16] firefly mating behaviour model, Adeva’s [17] bee foraging activity model, and Molina-Delgado et al.’s [18] rat behaviour model can accurately replicate animal behaviour in certain situations. However, their capacity to accurately depict the intricate cause-and-effect relationships between behaviour and the environment in real-life settings is restricted. When dealing with situations where there are multiple causal links and uncertainty between behaviour and environment, it is often necessary to use more intricate and thorough models and algorithms. Agent-based methodologies prioritise the creation of models based on the behaviours shown by individual agents [19]. For instance, Heit et al. [7] employed a Hidden Markov Model (HMM) together with GPS data to replicate the three-dimensional motion of a puma, overcoming the constraints of conventional two-dimensional models. Anderson et al. [20] developed an agent-based model that integrates environmental factors to replicate the movement patterns of individual animals, offering a different approach compared to conventional techniques. Agent-based modelling systems have the capability to explore the interactions between animals and their environment. However, their reliance on extensive environmental data makes it challenging to use them in real situations. Deep learning methods primarily utilise designs such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs). CNNs are employed for dimensionality reduction in the feature space, while RNN is a neural network architecture specifically developed for forecasting time-series data [21]. Long short-term memory (LSTM) is a specific sort of RNN structure that excels at retaining information for extended durations. This characteristic makes it highly proficient in tasks involving the prediction of sequences. LSTM is commonly employed as a distinct form of architecture in recurrent neural networks [22,23]. Roy et al. [17] employed generative adversarial networks (GANs) along with LSTM and CNNs to create generators and discriminators. They conducted experiments using various structure combinations on a dataset of seabird foraging trajectories. Among them, GANs composed of CNN-CNN achieved the best results. However, GANs still fall short in capturing local-scale descriptive statistics, such as step-speed distributions. Wijeyakulasuriya et al. [21] developed a comprehensive paradigm for forecasting animal locomotion that may be used universally. Their research involved a comparative analysis of single-step and multi-step long-term prediction outcomes using random forests, neural networks, and recurrent neural networks. The findings revealed that deep learning techniques exhibited superior performance in single-step prediction, whereas traditional models demonstrated better results in multi-step long-term prediction. Out of all the deep learning models, LSTM had the highest performance in multi-step long-term prediction. Constructing machine learning and deep learning models is more straightforward compared to classic parametric motion models, as the latter sometimes rely on limiting assumptions. Nevertheless, the interpretability of machine learning and deep learning models is more challenging compared to parametric motion models.

Remote sensing images are visual representations of surface data acquired using remote sensing technologies. The Earth’s surface is remotely viewed and recorded using sensors deployed on platforms such as satellites, aircraft, drones, and other similar devices. Both classic machine learning algorithms and deep learning models face a common challenge: they are unable to incorporate complex environmental variables, particularly unstructured data in the form of remote sensing images. Remote sensing images include abundant information about the surrounding environment, which can be incorporated as environmental variables into animal movement models. The challenge lies in extracting valuable information from remote sensing images in an efficient manner.

Reinforcement learning is a machine learning approach that acquires optimum judgements via iterative interactions with the environment [24,25]. This learning technique involves an agent making decisions based on the current state at each time step and modifying its approach depending on feedback rewards from the environment in order to maximise the total rewards over the long run [26]. The advancement of deep learning-based technologies in remote sensing images has facilitated the creation of agents that can replicate animal movement behaviour [27,28,29,30]. The progress of these technologies has increased the ease of obtaining animal trajectory data and environmental data. Wildlife monitoring technology, such as GPS collars, tags, and acoustic localisation, are extensively used [31]. Furthermore, drones and satellite remote sensing images have the capability to provide intricate and precise surface data [32]. Agent-based animal movement models enable the adoption of an individual-based approach to manipulate important variables, such as internal states, external stimuli, locomotion skills, and navigation abilities [33,34]. This allows for a more thorough investigation of the underlying mechanisms that influence animal behaviour. However, such models still face a number of challenges and problems, such as the complexity of animal movement and interaction with the environment [35]. In order to tackle the difficulties encountered in modelling wildlife behaviour, it is crucial to investigate novel ways for simulating animal behaviour. Specifically, agent models that can include various data sources and replicate animal movement behaviour are of the utmost importance.

The main contributions of this study are as follows:

1.: We introduce a new technique called AnimalEnvNet, which combines LSTM, CNN, and Attention Mechanism processes to extract and merge features from trajectory data and remote sensing images. This technique successfully simulates animal behavioural tendencies.
2.: By using a deep reinforcement learning framework, we improved the model’s capacity to generalise and adapt by allowing the agents to perceive environmental input and acquire animal behaviour tactics.
3.: This work on the subject of animal behaviour modelling addresses the shortcomings of conventional methodologies, offering fresh insights and approaches for investigating the behavioural patterns of animals.

2. Materials

2.1. Trajectory Data

This study utilised sensor technology stamp data to automatically collect the latitude and longitude coordinates, along with corresponding timestamps, of a wild giant panda every two hours between 22 November 2017 and 10 May 2018. The data collection was focused on the region within Sichuan Province, China, specifically between the coordinates 28.8532659° to 29.0683029° N and 102.3025714° to 102.4376521° E. Figure 1 displays the trajectory data, with the horizontal axis representing longitude and the vertical axis representing latitude. We gathered a grand total of 1925 data points for giant pandas. During the process of data manipulation, we eliminated a total of 123 data points as a result of their absence or corruption. The data utilised in this investigation were exclusively gathered for this study and are currently not accessible to the public.

Additionally, we utilised a publicly available African elephant dataset, which includes the migration trajectory data of three elephants, specifically identified by the codes A9, A8, and A7. This dataset was sourced from Movebank (movebank.org, accessed on 22 July 2024, study name “African elephant (Migration) Chamaillé-Jammes Hwange NP”, study ID 307786785), and published in the Movebank Data Repository.

2.2. Remote Sensing Image Data

The European Space Agency launched Sentinel-2 [36,37], an advanced Earth observation satellite specifically intended to detect subtle changes on the Earth’s surface using the visible and near-infrared spectral bands. Google Earth works as a comprehensive global platform for geographical information, including satellite images and geographic information services on a worldwide scale. Nevertheless, Google Earth images just cover the fundamental red, green, and blue spectral bands. Thus, in this study, we see it as additional data to Sentinel images, using its abundant surface texture information to improve the analytical efficacy of the research. The research area is shown in the following Figure 2.

Figure 1. Panda trajectory information.

Figure 2. Remote sensing image map of the research area. (A,B) Sentinel-2 remote sensing image example. (C,D) Google Earth remote sensing image example.

3. Methods

A technical roadmap of this paper is shown in Figure 3, consisting of the following three components:

1.: During the data preprocessing stage, the historical trajectory data and remote sensing image data are subjected to meticulous processing.
2.: Proposal of a method for constructing animal agents through multimodal data fusion. The feasibility of the AnimalEnvNet model in simulating animal behaviour was evaluated using AnimalEnvNet.
3.: Use of a multi-layer perceptron-based Actor-Critic model for policy generation and value evaluation in reinforcement learning tasks.

3.1. Data Preprocessing

3.1.1. Trajectory Data

To ensure consistency in the distance measures in both the east–west and north–south directions, we transformed the longitude and latitude coordinates to projected coordinates, since they indicate distinct actual distances. Following that, we performed differential transformation processing. The disparities in the panda’s movement between each successive step, represented by XDIFF and YDIFF, were computed, facilitating a more profound comprehension of its locomotion patterns. The link between these disparities is shown in the accompanying Figure 4, with the horizontal axis representing XDIFF and YDIFF, and the vertical axis representing amount.

In addition, time plays an important part in the movement of animals. We divided the date data into separate month, day, and hour fields and applied one-hot encoding to the time field. The other variables related to animal movement were standardised. In the traditional Chinese calendar, the lunar day is used to measure dates. The lunar calendar is based on the phases of the moon, and its dates vary according to these phases. The corresponding lunar date is determined, and the lunar day field is added to the dataset, taking into account the impact of moonlight on animal movement and the fact that the lunar date can more correctly reflect the moon’s waxing and waning condition. The precise fields of the trajectory dataset are shown in the following Table 1. Since animal trajectories are represented as time-series data, it is necessary to transform them into supervised learning data. In order to do this, we used a sliding window methodology, in which the first N trajectories and their corresponding motion parameters are utilised as input, while the N + 1th trajectory is utilised as the output. The supervised learning data are a training dataset that contains labels. Each data point in this dataset comprises both input features and their associated target values. This study incorporates input features, such as the panda’s current position and time, among others. The objective is to predict the panda’s subsequent position, which serves as the target value. Using these data, the model can acquire the mapping between the input features and the goal value, allowing for accurate prediction of the panda’s movement.

3.1.2. Remote Sensing Image Data

Initially, we conducted radiometric calibration and atmospheric correction on the Sentinel L1C image. Subsequently, we transformed it into more precise L2A data and performed reprojection conversion to guarantee the precision of spatial information. To ensure consistency in data processing and improve the accuracy of analysis, we resampled all bands of the Sentinel images to a uniform resolution of 10 m. Simultaneously, we made the decision to modify the resolution of the Google images, opting to set it at 6 m. This is due to the fact that a resolution that is too high can result in an excessive amount of data and a lack of compatibility with elements, such as the resolution of smart body motion and the alignment of feature maps. Hence, we opted for a resolution of 6 m in order to enhance the overall performance and efficiency of data processing.

Figure 3. The technical framework.

Figure 4. Histogram depicting the correlation between the discrepancies in XDIFF and YDIFF.

Table 1. Trajectory dataset fields.

Serial Number	Field Name	Range of Values	Remarks
1	Month	1–12	One-Hot Encoding
2	Day	1–31	One-Hot Encoding
3	Lunar day	1–31	One-Hot Encoding
4	Hour	0–23	One-Hot Encoding
5	Minute	0–59	One-Hot Encoding
6	Seconds	/	Normalise
7	XDIFF	/	Normalise
8	YDIFF	/	Normalise
9	Height	/	Normalise

In order to enhance training efficiency and reduce resource use, we performed pre-cropping of the remote sensing images before training. We used square buffer zones as sites for cultivation. By using pre-cropping, we successfully included multi-threaded data loading during training by utilising PyTorch data loaders. This resulted in a decrease in CPU utilisation and a faster training procedure.

3.2. AnimalEnvNet Network

We created a new neural network structure named AnimalEnvNet. The network has three essential modules: trajectory feature extraction, environment feature extraction, and feature fusion with dimensionality reduction. The trajectory feature extraction module leverages LSTM to capture the temporal information in animal movement trajectories, while the environment feature extraction module utilises multiple CNNs. LSTM excels at analysing and capturing temporal patterns in animal movement trajectories, whereas CNN is proficient in extracting spatial characteristics from remote sensing images. The AnimalEnvNet model combines the benefits of processing time-series data and high-resolution image data. This allows for the integrated utilisation of environmental and trajectory data, resulting in improved prediction accuracy and adaptability of the model. The feature fusion module, which includes dimensionality reduction, utilises attention processes to selectively combine and compress features, resulting in reduced duplication. Compared to existing models, AnimalEnvNet is able to process both animal trajectory data and high-resolution remotely sensed image data through multimodal data fusion. This combination allows the model to fully capture the complexity of animal behaviour and its interaction with the environment. The model being suggested, AnimalEnvNet, is shown in Figure 5.

3.2.1. Trajectory Feature Extraction

The trajectory data include both date and motion parameter information, which are stored in distinct formats and so need independent processing. The LSTM model attains precise manipulation of input sequence information via the incorporation of memory cells, input gates, forget gates, and output gates [38]. Each layer of the LSTM must calculate the following functions for every element in the input sequence:

i_{t} = σ (W_{i i} x_{t} + b_{i i} + W_{h i} h_{t - 1} + b_{h i})

(1)

f_{t} = σ (W_{i f} x_{t} + b_{i f} + W_{h f} h_{t - 1} + b_{h f})

(2)

g_{t} = t a n h (W_{i g} x_{t} + b_{i g} + W_{h g} h_{t - 1} + b_{h g})

(3)

o_{t} = σ (W_{i o} x_{t} + b_{i o} + W_{h o} h_{t - 1} + b_{h o})

(4)

c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ g_{t}

(5)

h_{t} = o_{t} ⊙ tanh (c_{t})

(6)

In the equation, the output gate’s activation value is shown by

o_{t}

. The state of the memory cell at the current time step is indicated by

c_{t}

. The hidden state of the current time step is indicated by

h_{t}

. The sigmoid activation function is indicated by

σ

. The

t a n h

activation function is shown by

t a n h

.

W_{i i}

,

W_{i f}

,

W_{i o}

,

W_{i f}

,

W_{i g}

, the weight matrices input to each gate, are denoted by

W_{i o}

. For

W_{h i}

,

W_{h f}

,

W_{h g}

, we have

W_{h o}

,

W_{h i}

,

W_{h f}

, and

W_{h g}

. The weight matrices of each gate’s final hidden state are indicated by the symbol

W_{h o}

. Let

b_{i i}

,

b_{i f}

,

b_{i g}

, and

b_{i o}

be the bias terms for each gate. To signify the current input,

x_{t}

is used. The previous time step’s concealed state is indicated by the symbol

h_{t - 1}

. The memory cell state from the previous time step is indicated by

h_{c - 1}

. The element-by-element multiplication operation is shown by ⊙.

In the trajectory feature extraction module, we use two LSTM blocks to independently process discrete variables, such as dates, and normalised motion parameters. The outputs of the two LSTM blocks are one-dimensional features with dimensions of 200 and 425, respectively. These characteristics are combined and fed into a fully linked layer, producing a one-dimensional trajectory feature with a size of 625.

3.2.2. Environmental Feature Extraction

We use remote sensing data obtained from both Sentinel and Google Earth images, which exhibit distinct spatial resolutions. Performing direct upsampling of low-resolution images or downsampling of high-resolution images would result in either significant processing expenses or reduced accuracy. In order to achieve a balance between computing economy and accuracy, we use several convolutional structures to resize the features of the remote sensing images to a uniform size. This helps to decrease the processing requirements. Ultimately, we resize the Sentinel-2A picture to a dimension of (201, 201), with a pixel size of 10 m, and the Google Earth image to a dimension of (334, 334), with a pixel size of 6 m. Both include an area of roughly 1 square kilometre.

Within the module for extracting features from remote sensing images, we first use three distinct convolutional blocks to individually process the Sentinel-2A and Google Earth images, ensuring that the feature maps are standardised to the same dimensions. Abstract characteristics are retrieved gradually using a series of convolution and pooling procedures at many levels [39]. The selection and use of convolution and subsampling techniques in this work are based on their notable benefits in processing remote sensing image data. The convolution procedure efficiently extracts spatial features by systematically moving the filter across the input image and finds crucial environmental information associated with animal movement. Subsampling minimizes computing complexity and memory needs by decreasing the spatial dimensions of the feature map, while maintaining important characteristics and limiting overfitting. The computing formula for the convolution kernel at a certain location in the image is provided as follows:

W_{i j} = f (\sum_{x = 1}^{s} \sum_{y = 1}^{s} p_{i + x - 1, j + y - 1} \cdot k_{x y} + b)

(7)

W_{i j}

is the output value at position

(i, j)

in the output feature map. f is the activation function. s is the size of the filter.

P_{i + x - 1, j + y - 1}

is the input value at position

(i + x - 1, j + y - 1)

in the input feature map.

K_{x y}

is the value of the filter at position

(x, y)

. b is the bias term.

3.2.3. Feature Fusion and Dimensionality Reduction

The core of the Attention Mechanism is to compute the dot product between the query and the key to measure their correlation [40]. The formula is:

Attention (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(8)

The term

A t t e n t i o n

refers to the Attention Mechanism, where Q represents the query matrix, K represents the key matrix, and V represents the value matrix. The term softmax refers to the activation function known as softmax. The expression

\frac{Q K^{T}}{\sqrt{d_{k}}}

represents the result of multiplying the transpose of the query matrix and the key matrix, and then dividing it by the square root of the dimension of the key. The symbol

\sqrt{d_{k}}

represents the square root of the dimension of the key matrix used for scaling.

This study utilises a multi-head Attention Mechanism that involves several sets of distinct query, key, and value matrices. It calculates different attention weights for each set. Each attention head records distinct attention orientations and characteristic representations, allowing the model to concentrate on several levels, positions, or types of information concurrently. Through acquiring knowledge about the allocation of weight across several heads, the model effectively manages intricate inputs and derives comprehensive feature representations. The formula is as stated below:

MultiHead (Q, K, V) = C o n c a t (h e a d_{1}, \dots, h e a d_{n}) W^{O} h e a d_{i} = A t t e n t i o n (Q W_{i}^{Q}, K W_{i}^{k}, V W_{i}^{V})

(9)

This function takes three inputs Q, K, and V. These are linearly transformed into multiple heads of attention using weight matrices

W_{i}

,

W_{i}^{k}

, and

W_{i}^{V}

for each head.

W^{O}

is another weight matrix used to linearly transform the concatenated outputs of the attention heads. Each

h e a d_{i}

represents the result of applying attention for the i-th head.

Within the module for extracting features from remote sensing images, we initially subject remote sensing images from various sources to multi-layer convolution blocks to extract their features. Afterward, the attention layer is utilised to combine these qualities. To synchronise the characteristics of various modalities, we modify the output of the trajectory feature extraction module to match the shape of the remote sensing image features. We then duplicate this modified output 128 times to create a three-dimensional feature with 128 channels and a size of (25, 25). Then, we merge the integrated remote sensing image features with the trajectory features to create a fusion feature that has 320 channels and a size of (25, 25). The combined feature is subjected to a multi-head attention method in order to further consolidate and enhance the correlation between features. Ultimately, the fusion features undergo a reduction process using multi-layer convolution blocks, resulting in two-dimensional features with dimensions of (25, 25).

We developed two distinct kinds of output layers: one for continuous action spaces and another for discrete action spaces. Initially, the two-dimensional characteristics are compressed into one-dimensional vectors and fed into a fully connected layer with 625 hidden units, which represents the ultimate prediction outcomes. The completely connected layer in the continuous action space output layer produces a one-dimensional array of size 2, which predicts the distances of animal movement in the x- and y-directions. The output layer in the discrete action space consists of a fully connected layer that produces a one-dimensional array of size 2560. The output values are limited to a range of 0 to 1 using the sigmoid activation function. These values represent the one-hot encoding of the agent’s movement direction and distance. In our study, we discretised the continuous variables, such as direction and distance, into bins and then applied one-hot encoding to these bins. For example, the direction was divided into eight bins (each representing a 45-degree range), and the distance was divided into several bins based on range intervals.

3.3. Agent Training by Reinforcement Learning

We utilised a multi-layer perceptron as the Actor-Critic model, comprising two primary components: the Actor Network and the Critic Network [41]. The Actor Network is responsible for generating the policy to select actions based on the observed environmental conditions. The Critic Network is used to assess the value of the actions generated by the actor in the current environmental state. Figure 6 shows the specific structure of the model.

3.3.1. Action Space

We used a deep neural network comprising CNN and LSTM to forecast the animal’s displacement in the x- and y-directions in the subsequent time step. This was achieved by using remote sensing images and the animal’s present position as input data. In the context of supervised learning, this objective may be achieved by undergoing training over numerous batches. However, in the field of reinforcement learning, the policy network may encounter difficulties, such as non-convergence or very slow convergence owing to the large environmental space and continuous action space. In order to tackle the difficulties presented by the ongoing range of actions in reinforcement learning training, we redefine the animal’s action space by dividing it into two components: the direction of movement and the distance of movement. The purpose of this formulation is to simplify the training of the animal agent in reinforcement learning by using a discrete action space.

A c t i o n = {a n g l e, d i s t a n c e}

(10)

The direction of movement is quantified in degrees and approximated to the closest whole number, spanning from 0 to 360. The displacement is quantified in metres and rounded to the closest whole number, ranging from 0 to 6600. In addition, we decrease the size of the action area even further by reducing the action resolution. We used a 3 m action resolution to strike a compromise between computing efficiency and the level of detail in the action space. By converting the continuous action space into a discrete one and decreasing the granularity of the action space, we greatly minimise the computational complexity and shorten the training time of the animal agent, all while meeting the necessary conditions for modelling animal behaviour.

3.3.2. Observation Space

The observation space is partitioned into two distinct components: time and location. The time variable indicates the exact moment that the agent performs the current action, with a precision level of one hour. The location specifies the present coordinates of the agent, with a precision level at the metre scale.

\begin{matrix} O b s e r v a t i o n & = {t i m e, & l o c a t i o n, i m a g e s_{1}, i m a g e s_{2}, \dots i m a g e s_{n}} \end{matrix}

(11)

This equation defines an

o b s e r v a t i o n

as a set containing

t i m e

,

l o c a t i o n

, and a series of

i m a g e s

labelled as

i m a g e s_{1}

,

i m a g e s_{2}

,

i m a g e s_{n}

.

During practical operation, we designate the agent’s present location as the focal point and choose a certain area around it, known as the agent’s viewport, with a buffer size. Next, we extract the remote sensing images that are included inside this specified region. This process guarantees that the agent may acquire environmental data in the vicinity of its present position, and by modifying the buffer size, we can effectively manage the dimensions of the viewport.

When we crop images, we take into account the variety of information included in the image and may use different forms of remote sensing images that capture several bands of data. The data collected by these bands may provide valuable environmental characteristics, assisting the agent in obtaining a thorough sense and comprehension of the surrounding environment. In our study, we discretised the continuous variables, such as direction and distance, into bins and then applied one-hot encoding to these bins. For example, the direction was divided into eight bins (each representing a 45-degree range), and the distance was divided into several bins based on range intervals.

3.3.3. Agent Reward Calculation

The agent learning aim is to accurately replicate the locomotion patterns shown by actual animals inside a designated area. Hence, it is essential to quantify the resemblance between the trajectories produced by the agent and the actual trajectories. The Euclidean distance [42,43] was used as the metric for quantifying the similarity of trajectories. The Euclidean distance is calculated as the shortest distance between two locations in a two-dimensional coordinate system, following a straight-line path. When measuring trajectory similarity, we see each trajectory as a sequence of coordinate points. Next, we compute the summation of the Euclidean distances between each corresponding coordinate point of the two trajectories in order to obtain a comprehensive distance measurement. The computation procedure is as outlined below:

R e w a r d = 1 / (α * a b s (D_{l a s t}) + β * M A E (D_{a l l}))

(12)

In the formula, the variable

R e w a r d

reflects the amount of reward that the agent receives from the environment after doing each action. The weight parameter for the final step action is denoted as

α

, whereas the weight parameter for the sum of action distances from the true trajectory is denoted as

β

. In this work, we use weight values of 0.8 and 0.2, respectively.

In addition, we permit a certain degree of discrepancy between the calculated trajectory and the actual trajectory at specified time intervals within a given trajectory. If the last step closely aligns with the actual trajectory, it is deemed acceptable. In order to emphasise the impact of the final step action on the total reward, we calculate the reward as the reciprocal of the sum of the mean absolute error of the Euclidean distances between all generated trajectories and the real data, and the absolute value of the Euclidean distance between the last generated trajectory and the real position.

4. Results

4.1. Experimental Design

The starting learning rate in the experiment was set at 0.001. In the experiment, we adopted the configuration shown in Table 2. To evaluate the generalisation ability of the model, we divided each dataset into a training dataset and a validation dataset. We randomly selected 70% of the data as the training dataset and the remaining 30% as the validation dataset.

4.2. Comparison and Evaluation Results of Different Models

We used the mean absolute error (MAE) to evaluate the similarity between predicted and actual trajectories due to its computational efficiency and simplicity. While the Fréchet distance is a robust measure for trajectory similarity, it requires significantly more computational resources. MAE quantifies the average size of discrepancies between projected values and actual observations inside a model. The calculating formula is as follows:

MAE = \frac{1}{n} \sum_{i = 1}^{n} | {\hat{y}}_{i} - y_{i} |

(13)

M A E

is the mean absolute error. n is the sample size.

{\hat{y}}_{i}

is the predicted value for the i-th observation.

y_{i}

is the actual value for the i-th observation.

This work aims to examine the impact of environmental information on the model. To achieve this, we propose a model called LSTM-basic. Furthermore, we investigate its impact in contrast to cutting-edge GAN models. The LSTM-basic model is employed to extract trajectory features. It comprises two LSTM units: an Attention Mechanism Unit and a Fully Connected Unit. During the model operation, the time and motion parameters are inputted into two distinct LSTM units for individual processing, and then the outputs of both units are combined. The feature data that have been spliced are then subjected to further processing using the Attention Mechanism Unit and the Fully Connected unit, resulting in the generation of the final output. GANs are composed of a generator and a discriminator. The generator generates simulated trajectory data, while the discriminator attempts to differentiate between real trajectory data and generated trajectory data. We choose MSELoss as the loss function for the continuous action space in order to quantify the discrepancies between the model’s predictions and the actual values. During the training process, other components that share the same structure can utilise the identical model weights. It enhances the model’s resilience in handling various data circumstances. Detailed information of the parameters of different models is shown in Table 3. The training outcomes are displayed in Table 4.

The LSTM-basic model has an MAE of 102.9, the AnimalEnvnet model has an MAE of 28.4, and the GANs have an MAE of 55.5. The AnimalEnvnet model exhibits the lowest MAE value, suggesting that it has less average error in its predictions and is more accurate in estimating the real values. AnimalEnvNet has both a lower initial MAE and a faster convergence rate compared to the other models. By around round 45, the performance of the model reaches a stable MAE of around 40. In contrast, the LSTM-basic and GAN models stabilise at higher MAE values of approximately 110 and 70, respectively.

The LSTM-basic model has a loss of 0.0021, while the AnimalEnvnet model has a loss of 0.0002. The AnimalEnvnet model demonstrates improved data fitting during the training process, exhibiting reduced discrepancy between its projected and actual values. Regarding training speed, the AnimalEnvNet model, however, reached a stable performance level around the 50th batch. Although AnimalEnvNet requires a longer training period, its ultimate performance is significantly superior, achieving an average training duration of approximately 49 s per batch. Figure 7 displays a comparison of the MAE results for the models. Figure 8 displays a comparison of the outcomes obtained from the MSELoss model.

In conclusion, the AnimalEnvNet model achieves a single-step trajectory prediction MAE of 28.4 m. By incorporating environmental information feature modules, AnimalEnvNet demonstrates a significant improvement in single-step prediction accuracy.

To further validate the generalisation capability of the models, we conducted experiments on the African elephant dataset. The results of the training and validation processes are shown in Figure 9 and Figure 10. Similar to the giant panda dataset, the AnimalEnvNet model also exhibited the best predictive performance on the African elephant dataset.

Despite the use of different datasets, the AnimalEnvNet model was still able to quickly converge during the training process and maintain the lowest prediction error. In contrast, the GAN model and LSTM-basic model had similar error performances, but the GAN model slightly outperformed the LSTM-basic model. By conducting experiments on two different datasets, it can be seen that the AnimalEnvNet model has strong generalisation capabilities across different trajectory datasets. This indicates that the AnimalEnvNet model has high robustness in capturing temporal and spatial features, performing excellently not only on the giant panda dataset but also maintaining low prediction errors on the African elephant dataset.

4.3. Results of Reinforcement Training for Agent

Table 5 shows the parameters of reinforcement learning training. During the training of reinforcement learning, at the start of each episode, the agent navigates through the environment based on a predetermined policy and obtains rewards at each time step. If the inverse of the reward surpasses 1000, it signifies that the agent has strayed from the genuine activity region of the animals and the story concludes. Each action in the game corresponds to a one-hour time span in the real world. Once a certain number of episodes have passed, the agent collects data from the experience buffer in order to learn. Both the Actor and Critic models undergo repeated updates, using lambda_gae_adv and lambda_entropy to regulate the advantage value estimate and the exploration ratio.

Figure 11 shows the agent’s learning curve during the training procedure. An evident surge is seen after about 20 iterations, indicating that the agent begins to proficiently acquire knowledge and navigate the surroundings. Once the ordinate hits around 12,000, the average reward becomes stable. This indicates that the agent has acquired efficient methods within the environment and is either nearing or attaining performance levels that are close to ideal.

Figure 12 shows the mean number of steps accomplished by the agent during the training procedure. A grand total of 200 batches were taught, with each individual batch comprising training across 4000 episodes. The x-axis shows the training batches for the agent, while the y-axis reflects the average number of steps performed by the agent every episode. The agent’s average completion steps in the first 20 batches is around five steps. The main reason for this is the division of the possible actions into discrete options and the restrictions on how far the agent may advance, which prevents it from satisfying the criteria for ending an episode. As the training advances, starting with the 20th iteration, there is a rise in the mean number of steps accomplished by the agent. After completing 200 training batches, which included exploring over eight hundred thousand episodes, the average number of steps eventually stabilises at 12 steps.

Figure 13 presents visual information on the differences between the real and predicted trajectories. It consists of four subplots labelled A, B, C, and D, each comparing real trajectories (left) and predicted trajectories (right) at different time points. In all subplots, the starting points are marked, and the number of steps is indicated by colour gradients. The trajectory positions are plotted on a coordinate grid with latitude and longitude axes.

In conclusion, the AnimalEnvNet model’s single-step prediction trajectory on this dataset can have an average absolute inaccuracy of around 25.8 m. AnimalEnvNet has enhanced its single-step prediction accuracy by incorporating an environmental information feature module, surpassing the LSTM-basic model that just relies on trajectory information.

4.4. Ablation Study

We conducted an ablation study to analyse the impact of each component of the AnimalEnvNet model. The key components considered include the LSTM module, the CNN module, the Attention Mechanism, and the Feature Fusion with Dimensionality Reduction module. To assess the contribution of each component, we designed the following ablation experiments:

1.: AnimalEnvNet: The complete AnimalEnvNet model with all components included.
2.: A: Directly combines the outputs of LSTM and CNN without using the Attention Mechanism.
3.: B: Replaces the CNN module with simple fully connected layers.
4.: C: Replaces the LSTM module with simple fully connected layers.
5.: D: Removes the Feature Fusion with Dimensionality Reduction module. Directly concatenates the outputs of LSTM and CNN without feature fusion.

We evaluated the performance of each model using three metrics: MAE, the convergence rate, and the training time. The results of the ablation study are shown in Table 6.

Figure 13. Comparison of differences between actual and predicted trajectories. (A–D) represent four different moments.

The MAE of AnimalEnvNet is 28.4 m, indicating the highest prediction accuracy. Removing the Attention Mechanism increases the MAE to 40.2 m, demonstrating the importance of the Attention Mechanism in focusing on relevant features. Removing the CNN module significantly increases the MAE to 55.3 m, highlighting the critical role of spatial feature extraction. The model without the LSTM module has the highest MAE at 60.7 m, indicating the importance of LSTM in capturing temporal dependencies. Removing the Feature Fusion module increases the MAE to 45.1 m, showing the benefit of combining features from different modalities.

In terms of convergence rate, the complete model reaches stable performance within 55 batches, while the models without the Attention Mechanism, CNN, LSTM, and Feature Fusion module require 80, 100, 120, and 90 batches, respectively. This indicates that all components contribute to faster convergence, with LSTM and CNN being the most critical. Regarding training time, the complete model takes an average of 49 s per batch. The simplified models show a reduction in training time due to their reduced complexity, but this comes at the cost of increased MAE and longer convergence times. The ablation study highlights the significance of each component in the AnimalEnvNet model. The LSTM module is crucial for capturing temporal dependencies, the CNN module is essential for spatial feature extraction, the Attention Mechanism enhances the model’s ability to focus on important features, and the Feature Fusion module effectively combines multimodal data. Removing any of these components results in a substantial drop in performance, confirming their importance in the model.

5. Discussion

Simulating animal movement trajectories has long been an important area of study in the field of animal behaviour studies. Traditional approaches, such as random walk models [44] and HMMs [45], are often used in these types of research. These approaches are very effective in capturing local behavioural patterns, but they have limits when it comes to capturing complicated time and space changes [46,47]. Roy et al. [48] achieved exceptional results in simulating core feeding trajectories by using GANs. However, GANs generally focus the generation of trajectories that resemble actual data distributions, placing less attention on the acquisition of knowledge and adjustment to environmental and behavioural methods [49]. AnimalEnvNet has greater training stability compared to trajectory simulation using GANs, since GAN models are susceptible to encountering challenges of training instability. Although GAN models are capable of producing lifelike trajectories, they are limited in their capacity to adjust to changes in the environment and behavioural methods. The AnimalEnvNet, which uses reinforcement learning-trained agents, is capable of precisely imitating animal movement for a maximum of 12 steps, with an error limit of 1000 m. This showcases its ability to adapt to changes in the environment and acquire novel strategies via learning.

While our model has shown some level of success in forecasting animal movement, it fails to provide a clear explanation for the precise impacts of environmental factors on locomotor behaviour. The primary objective of our work is to forecast movement trajectories rather than elucidating the underlying reasons driving movement. However, comprehending the impact of the environment on animal locomotion is a crucial facet of ecological and behavioural investigation. Hence, our methodology has certain constraints in this aspect.

Our model has satisfactory performance in making forecasts for short time intervals, with an error of approximately 28.4 m. However, the accuracy of our predictions drastically diminishes when forecasting over longer time periods, with an error of around 1 km after 12 h. This implies that simulations may have limited efficacy in imitating animal motions over extended periods of time. Forecasting animal movements over extended time periods is a highly intricate undertaking that encompasses numerous aspects, including the stochastic nature of animal behaviour, fluctuations in the environment, and various sources of uncertainty. Hence, it is crucial that we engage in explicit discussions regarding the constraints and anticipated outcomes of models. While the model’s accuracy diminishes when making predictions over longer time periods, it remains valuable for short-term predictions and recognising behavioural patterns. To enhance the model’s long-term forecasting capability, future research may explore the incorporation of additional environmental factors and the utilisation of more advanced time-series analysis techniques. Furthermore, the implementation of the model should be customised to suit unique requirements. For instance, the model’s exceptional accuracy in predicting short-term outcomes would be highly valuable for monitoring animal behaviour and responding to emergencies in wildlife reserves. Ultimately, although our method has certain limits, it remains highly valuable for studying animal locomotor behaviour and making short-term predictions. In future investigations, it is necessary to precisely delineate the practical application scenarios of the model and thoroughly investigate potential enhancements.

6. Conclusions

One notable benefit of AnimalEnvNet is its incorporation of animal trajectory data and remote sensing images data. Through the integration of these two data sources, AnimalEnvNet can enhance the precision of its predictions regarding animal movement trajectories. Remote sensing photography offers comprehensive environmental data, including topography, vegetation distribution, and human activities, which are crucial determinants of animal behaviour and migration. AnimalEnvNet efficiently incorporates these understandings by employing deep reinforcement learning methods, hence, improving the accuracy and resilience of the model’s predictions. The AnimalEnvNet model was experimentally validated and shown to be highly proficient at predicting single-step trajectories, with an average MAE of 28.4 m. This performance significantly surpasses that of previous models. In addition, AnimalEnvNet demonstrates high performance in making predictions that span multiple steps, accurately anticipating animal migration paths for up to 12 h with errors kept to under 1 km. Subsequent studies can investigate ways to amplify the influence of environmental elements and employ more sophisticated time-series analytic methods to enhance the model’s ability to make accurate long-term predictions.

Author Contributions

Z.C. completed the experiments and wrote the paper; D.W. and F.Z. designed the specific scheme; L.D. and X.Z. completed the result data analysis; H.Z. and X.J. collected the field data; X.Z. modified and directed the writing of the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Fundamental Research Funds of CAF: CAFYBB2022SY028.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Abrahms, B.; Carter, N.H.; Clark-Wolf, T.J.; Gaynor, K.M.; Johansson, E.; McInturff, A.; Nisi, A.C.; Rafiq, K.; West, L. Climate change as a global amplifier of human–wildlife conflict. Nat. Clim. Chang. 2023, 13, 224–234. [Google Scholar] [CrossRef]
Yuan, X.; Wang, Y.; Ji, P.; Wu, P.; Sheffield, J.; Otkin, J.A. A global transition to flash droughts under climate change. Science 2023, 380, 187–191. [Google Scholar] [CrossRef]
Hill, G.M.; Kawahara, A.Y.; Daniels, J.C.; Bateman, C.C.; Scheffers, B.R. Climate change effects on animal ecology: Butterflies and moths as a case study. Biol. Rev. 2021, 96, 2113–2126. [Google Scholar] [CrossRef] [PubMed]
Harvey, J.A.; Tougeron, K.; Gols, R.; Heinen, R.; Abarca, M.; Abram, P.K.; Basset, Y.; Berg, M.; Boggs, C.; Brodeur, J. Scientists’ warning on climate change and insects. Ecol. Monogr. 2023, 93, e1553. [Google Scholar] [CrossRef]
Shaw, A.K. Causes and consequences of individual variation in animal movement. Mov. Ecol. 2020, 8, 12. [Google Scholar] [CrossRef] [PubMed]
Jonsen, I.D.; Flemming, J.M.; Myers, R.A. Robust state–space modeling of animal movement data. Ecology 2005, 86, 2874–2880. [Google Scholar] [CrossRef]
Heit, D.R.; Wilmers, C.C.; Ortiz-Calo, W.; Montgomery, R.A. Incorporating vertical dimensionality improves biological interpretation of hidden Markov model outputs. Oikos 2023, 2023, e09820. [Google Scholar] [CrossRef]
Matsuo, Y.; LeCun, Y.; Sahani, M.; Precup, D.; Silver, D.; Sugiyama, M.; Uchibe, E.; Morimoto, J. Deep learning, reinforcement learning, and world models. Neural Netw. 2022, 152, 267–275. [Google Scholar] [CrossRef]
Patterson, T.A.; Thomas, L.; Wilcox, C.; Ovaskainen, O.; Matthiopoulos, J. State-space models of individual animal movement. Trends Ecol. Evol. 2008, 23, 87–94. [Google Scholar] [CrossRef]
Hooten, M.B.; Johnson, D.S.; McClintock, B.T.; Morales, J.M. Animal Movement: Statistical Models for Telemetry Data; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
Nathan, R.; Monk, C.T.; Arlinghaus, R.; Adam, T.; Alós, J.; Assaf, M.; Baktoft, H.; Beardsworth, C.E.; Bertram, M.G.; Bijleveld, A.I. Big-data approaches lead to an increased understanding of the ecology of animal movement. Science 2022, 375, eabg1780. [Google Scholar] [CrossRef]
Cífka, O.; Chamaillé-Jammes, S.; Liutkus, A. MoveFormer: A Transformer-based model for step-selection animal movement modelling. bioRxiv 2023. [Google Scholar] [CrossRef]
Sakiyama, T.; Gunji, Y.-P. Emergence of an optimal search strategy from a simple random walk. J. R. Soc. Interface 2013, 10, 20130486. [Google Scholar] [CrossRef] [PubMed]
Reynolds, A.M. Towards a mechanistic framework that explains correlated random walk behaviour: Correlated random walkers can optimize their fitness when foraging under the risk of predation. Ecol. Complex. 2014, 19, 18–22. [Google Scholar] [CrossRef]
Reynolds, A.M. Mussels realize Weierstrassian Levy walks as composite correlated random walks. Sci. Rep. 2014, 4, 4409. [Google Scholar] [CrossRef] [PubMed]
Carlson, A.D.; Souček, B. Computer simulation of firefly flash sequences. J. Theor. Biol. 1975, 55, 353–370. [Google Scholar] [CrossRef] [PubMed]
Garcia Adeva, J.J. Simulation modelling of nectar and pollen foraging by honeybees. Biosyst. Eng. 2012, 112, 304–318. [Google Scholar] [CrossRef]
Molina-Delgado, M.; Padilla-Mora, M.; Fonaguera, J. Simulation of behavioral profiles in the plus-maze: A Classification and Regression Tree approach. Biosystems 2013, 114, 69–77. [Google Scholar] [CrossRef] [PubMed]
van Vuuren, B.J.; Potgieter, L.; van Vuuren, J.H. An agent-based simulation model of Eldana saccharina Walker. Nat. Resour. Model. 2017, 31, e12153. [Google Scholar] [CrossRef]
Anderson, J.H.; Downs, J.A.; Loraamm, R.; Reader, S. Agent-based simulation of Muscovy duck movements using observed habitat transition and distance frequencies. Comput. Environ. Urban Syst. 2017, 61, 49–55. [Google Scholar] [CrossRef]
Wijeyakulasuriya, D.A.; Eisenhauer, E.W.; Shaby, B.A.; Hanks, E.M. Machine learning for modeling animal movement. PLoS ONE 2020, 15, e0235750. [Google Scholar] [CrossRef]
Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef]
Smagulova, K.; James, A.P. A survey on LSTM memristive neural network architectures and applications. Eur. Phys. J. Spec. Top. 2019, 228, 2313–2324. [Google Scholar] [CrossRef]
Press, C.; Heyes, C.; Kilner, J.M. Learning to understand others’ actions. Biol. Lett. 2010, 7, 457–460. [Google Scholar] [CrossRef]
Pineau, J.; Bellemare, M.G.; Islam, R.; Henderson, P.; François-Lavet, V. An Introduction to Deep Reinforcement Learning. Found. Trends® Mach. Learn. 2018, 11, 219–354. [Google Scholar] [CrossRef]
Najar, A.; Bonnet, E.; Bahrami, B.; Palminteri, S. The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning. PLoS Biol. 2020, 18, e3001028. [Google Scholar] [CrossRef] [PubMed]
Jonsen, I.D.; Grecian, W.J.; Phillips, L.; Carroll, G.; McMahon, C.; Harcourt, R.G.; Hindell, M.A.; Patterson, T.A. aniMotum, an R package for animal movement data: Rapid quality control, behavioural estimation and simulation. Methods Ecol. Evol. 2023, 14, 806–816. [Google Scholar] [CrossRef]
Chiara, V.; Kim, S.Y. AnimalTA: A highly flexible and easy-to-use program for tracking and analysing animal movement in different environments. Methods Ecol. Evol. 2023, 14, 1699–1707. [Google Scholar] [CrossRef]
Arce Guillen, R.; Lindgren, F.; Muff, S.; Glass, T.W.; Breed, G.A.; Schlägel, U.E. Accounting for unobserved spatial variation in step selection analyses of animal movement via spatial random effects. Methods Ecol. Evol. 2023, 14, 2639–2653. [Google Scholar] [CrossRef]
Scharf, H.R.; Buderman, F.E. Animal movement models for multiple individuals. Wiley Interdiscip. Rev. Comput. Stat. 2020, 12, e1506. [Google Scholar] [CrossRef]
He, P.; Klarevas-Irby, J.A.; Papageorgiou, D.; Christensen, C.; Strauss, E.D.; Farine, D.R. A guide to sampling design for GPS-based studies of animal societies. Methods Ecol. Evol. 2023, 14, 1887–1905. [Google Scholar] [CrossRef]
Alhichri, H.; Alswayed, A.S.; Bazi, Y.; Ammour, N.; Alajlan, N.A. Classification of Remote Sensing Images Using EfficientNet-B3 CNN Model With Attention. IEEE Access 2021, 9, 14078–14094. [Google Scholar] [CrossRef]
Nonaka, E.; Holme, P. Agent-based model approach to optimal foraging in heterogeneous landscapes: Effects of patch clumpiness. Ecography 2007, 30, 777–788. [Google Scholar] [CrossRef]
Cristiani, E.; Menci, M.; Papi, M.; Brafman, L. An all-leader agent-based model for turning and flocking birds. J. Math. Biol. 2021, 83, 45. [Google Scholar] [CrossRef] [PubMed]
Rew, J.; Park, S.; Cho, Y.; Jung, S.; Hwang, E. Animal Movement Prediction Based on Predictive Recurrent Neural Network. Sensors 2019, 19, 4411. [Google Scholar] [CrossRef] [PubMed]
Fletcher, K.; European Space Agency. Sentinel-2: ESA’s Optical High-Resolution Mission for GMES Operational Services; ESA Communications: Noordwijk, The Netherlands, 2012; p. vi. 70p. [Google Scholar]
Senty, P.; Guzinski, R.; Grogan, K.; Buitenwerf, R.; Ardö, J.; Eklundh, L.; Koukos, A.; Tagesson, T.; Munk, M. Fast Fusion of Sentinel-2 and Sentinel-3 Time Series over Rangelands. Remote Sens. 2024, 16, 1833. [Google Scholar] [CrossRef]
Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaria, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef] [PubMed]
Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Zhou, C.; Huang, B.; Fränti, P. A review of motion planning algorithms for intelligent robots. J. Intell. Manuf. 2022, 33, 387–424. [Google Scholar] [CrossRef]
Fabbri, R.; Costa, L.D.F.; Torelli, J.C.; Bruno, O.M. 2D Euclidean distance transform algorithms: A comparative survey. ACM Comput. Surv. 2008, 40, 1–44. [Google Scholar] [CrossRef]
Eschmann, J. Reward function design in reinforcement learning. In Reinforcement Learning Algorithms: Analysis and Applications; Springer: Cham, Switzerland, 2021; pp. 25–33. [Google Scholar]
Reynolds, A.M.; Lepretre, L.; Bohan, D.A. Movement patterns of Tenebrio beetles demonstrate empirically that correlated-random-walks have similitude with a Levy walk. Sci. Rep. 2013, 3, 3158. [Google Scholar] [CrossRef] [PubMed]
Togunov, R.R.; Derocher, A.E.; Lunn, N.J.; Auger-Méthé, M. Characterising menotactic behaviours in movement data using hidden Markov models. Methods Ecol. Evol. 2021, 12, 1984–1998. [Google Scholar] [CrossRef]
Proulx, C.L.; Proulx, L.; Blouin-Demers, G. Improving the realism of random walk movement analyses through the incorporation of habitat bias. Ecol. Model. 2013, 269, 18–20. [Google Scholar] [CrossRef]
Griffiths, C.A.; Patterson, T.A.; Blanchard, J.L.; Righton, D.A.; Wright, S.R.; Pitchford, J.W.; Blackwell, P.G. Scaling marine fish movement behavior from individuals to populations. Ecol. Evol. 2018, 8, 7031–7043. [Google Scholar] [CrossRef] [PubMed]
Roy, A.; Fablet, R.; Bertrand, S.L. Using generative adversarial networks (GAN) to simulate central-place foraging trajectories. Methods Ecol. Evol. 2022, 13, 1275–1287. [Google Scholar] [CrossRef]
Saxena, D.; Cao, J. Generative adversarial networks (GANs) challenges, solutions, and future directions. ACM Comput. Surv. 2021, 54, 1–42. [Google Scholar] [CrossRef]

Figure 5. Overview of the study area.

Figure 6. The agent reinforcement training network framework.

Figure 7. Comparison of model MAE results.

Figure 8. Comparison of model MSELoss results.

Figure 9. Comparison of model MAE results.

Figure 10. Comparison of model MSELoss results.

Figure 11. Reward learning curve.

Figure 12. Average steps of agent in single episode.

Table 2. Configuration of experiment.

Configuration	Type Model/Parameter
Processor	Intel Core i5 9400F
Graphics Card	Nvidia Tesla P100 16 GB
Memory	64 GB DDR4
Hard Disk	1T SSD
Operating System	Ubuntu 18.04
Deep Learning Framework	Pytorch 2.1.0

Table 3. Detailed information of the different model parameters.

Model Name	Architecture	Learning Parameters
LSTM-basic	2 LSTM units, Attention, Fully Connected	Loss: MSELoss, LR: 0.001
GANS	Generator and Discriminator	Loss: MSELoss, LR: 0.0002
AnimalEnvNet	Deep reinforcement learning, LSTM, Attention, Fully Connected	Loss: MSELoss, LR: 0.001

Table 4. Model performance comparison in single-step prediction.

Model Name	Training Batch	Model Loss	MAE	Mean	Model Size
LSTM-basic	100	0.0021	102.9	109.2	2 M
GANS	100	0.0007	55.5	79.5	17 MB
AnimalEnvNet	100	0.0002	28.4	64.2	18 MB

Table 5. Parameters of reinforcement learning training.

Parameter Name	Value	Remarks
batch-size	32	Batch Size
learn-rate	1 × 10⁻²	Learning Rate
gamma	0.99	Bonus Discount
buffer_size	512	Experience Buffer Size
repeat_count	16	Number of Network Updates
eval_step	64	Estimated Number of Steps
reward_scale	1 × 10⁴	Reward Scaling
lambda_gae_adv	0.95	Advantage Estimation Parameters
lambda_entropy	0.05	Exploration Scaling

Table 6. Ablation study results.

Model Name	MAE	Convergence Rate	Training Time (Seconds/Batch)
AnimalEnvNet	28.4	55 batches	49
A	40.2	80 batches	45
B	55.3	100 batches	35
C	60.7	120 batches	30
D	45.1	90 batches	40

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Z.; Wang, D.; Zhao, F.; Dai, L.; Zhao, X.; Jiang, X.; Zhang, H. AnimalEnvNet: A Deep Reinforcement Learning Method for Constructing Animal Agents Using Multimodal Data Fusion. Appl. Sci. 2024, 14, 6382. https://doi.org/10.3390/app14146382

AMA Style

Chen Z, Wang D, Zhao F, Dai L, Zhao X, Jiang X, Zhang H. AnimalEnvNet: A Deep Reinforcement Learning Method for Constructing Animal Agents Using Multimodal Data Fusion. Applied Sciences. 2024; 14(14):6382. https://doi.org/10.3390/app14146382

Chicago/Turabian Style

Chen, Zhao, Dianchang Wang, Feixiang Zhao, Lingnan Dai, Xinrong Zhao, Xian Jiang, and Huaiqing Zhang. 2024. "AnimalEnvNet: A Deep Reinforcement Learning Method for Constructing Animal Agents Using Multimodal Data Fusion" Applied Sciences 14, no. 14: 6382. https://doi.org/10.3390/app14146382

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AnimalEnvNet: A Deep Reinforcement Learning Method for Constructing Animal Agents Using Multimodal Data Fusion

Abstract

1. Introduction

2. Materials

2.1. Trajectory Data

2.2. Remote Sensing Image Data

3. Methods

3.1. Data Preprocessing

3.1.1. Trajectory Data

3.1.2. Remote Sensing Image Data

3.2. AnimalEnvNet Network

3.2.1. Trajectory Feature Extraction

3.2.2. Environmental Feature Extraction

3.2.3. Feature Fusion and Dimensionality Reduction

3.3. Agent Training by Reinforcement Learning

3.3.1. Action Space

3.3.2. Observation Space

3.3.3. Agent Reward Calculation

4. Results

4.1. Experimental Design

4.2. Comparison and Evaluation Results of Different Models

4.3. Results of Reinforcement Training for Agent

4.4. Ablation Study

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI