1. Introduction
The Internet has many different kinds of data and information that are intersected and stored on social networks, prompting many different research fields to start to pay attention to social networks. Users on social networks obtain the resources that they need by visiting web pages [
1,
2,
3,
4]. The key to studying social networks is to analyze how these social networks are used. The data analyzed can be used to improve the social network itself, making it more convenient for users to browse the data required. They can also be used to analyze users’ preferences to deliver advertisements at designated points. They can also be used to analyze user behavior and to predict the transactions that users participate in [
5,
6]. One of the main ways to analyze these results is to perform extensive data analysis on the weblogs of these sites [
7,
8]. Every time a user requests a page or some of the resources on that page, such as video, sound, etc., a new record is added to the weblog of the site [
9]. This information contains information about the user’s favourite pages (i.e., the most frequently visited pages), the sequence of visits to ordinary pages, and even hints at the user’s characteristics. This information analysis method can be called web page using data mining (WUM) based on weblog information [
10,
11,
12]. To use WUM, a sequence of interactions between a single user and a web page needs to be extracted. The resulting file should contain at least the following fields: the user’s IP address, timestamp, requested resource, code for the result of the operation, the previous web address before entering the web page, and the browser used [
13,
14]. Using and analyzing this sequence, the pattern of the user’s access to the web page can be obtained.
The value of big data is essentially reflected as follows: it provides new thinking and means for a human to understand complex systems. In theory, a virtual digital image of the natural world can be constructed by digitizing the real world on a sufficiently small scale of time and space that carries the running rules of the real world. On the premise of sufficient computing power and efficient data analysis methods, an in-depth analysis of this virtual digital image will make it possible to understand and to discover the existing complex system’s operation behavior, state, and law. Big data provides a new way of thinking and a new means for exploring objective laws and for transforming nature and society for human beings. Due to the high and increasing demand for big data mining and analysis work, the era is forced to gradually use more efficient quantum scientific research technology [
15] to meet the gap of technology improvement means, and then to improve the efficiency of information extraction work and to promote the progress of more scientific research. In view of the above requirements and improved thinking, this paper makes use of the special advantages of quantum algorithms compared with traditional iterative algorithms [
16], as a new field of quantum neural network, and studies the neural network autonomous learning scheme [
17] for network log information extraction, which highlights the high-value prospects of the quantum field in big data mining and analysis.
This search hassle can be carried out in
with the use of Grover’s quantum search algorithm. Grover’s quantum search algorithm [
18] makes use of the amplitude amplification approach in quantum computing to attain quadratic acceleration in unsorted search problems. Grover’s quantum search algorithm has been efficiently applied on classical computer systems with the usage of a quantum laptop language (QCL). For an unordered database search, the Grover algorithm achieves quadratic acceleration compared with the classical algorithm. Then, analysis, induction, and variant research are carried out [
19,
20]. Except for the lower bound, Grover’s methodology can be used for the case where the
fraction of the target term is unknown. The fixed point strategy and the trial and error method are the two quantum search strategies that can be implemented.
The Binary QNN Model in this paper is a kind of model based on the Grover algorithm and the QNN supervised learning algorithm. First of all, this is based on the traditional Grover algorithm being analyzed and improved, using Younes’ algorithm to improve the search algorithm efficiency, inserting this into the iterative learning process of the quantum neural network [
21,
22], and using quantum processes to promote the efficiency of the algorithm and neural network to realize multiple network synchronization searching and learning [
23,
24] for each iteration algorithm to improve the efficiency of solutions [
25]. The quantum neural network learning scheme in this paper can be applied to quickly find the user’s IP address in massive weblogs, and then accurately and efficiently identifying and classifying relevant and valuable information such as the IP addresses of network logs. The results can be sorted according to the number of search categories to identify and analyze user behavior accurately. It can not only quickly discover the person’s record of specific interests, but also can divide the session of a single user, which ensures the accuracy of person document identification and improves the effectiveness of user activity queries.
This article is structured as follows.
Section 2 introduces the trial-and-error method (Younes’ algorithm) used in this paper, and the basic principles of the QNN supervised learning division task.
Section 3 describes our binary QNN model.
Section 4 describes the evolution process of the original dataset, and provides entropy analysis to show the advantages of this model.
Section 5 summarizes and discusses the role of the proposed model in the development of user behavior pattern prediction.
3. The Binary QNN Model
We simulate the creation of a binary analysis algorithm that uses quantum states to process information, as shown in
Figure 2. The algorithm proposed in this paper is uniformly represented as a BQM field in the following content. As shown in
Figure 2, BQM uses a specified variable component subcircuit
, and an
H gate to replace the Oracle
. The variable component subcircuit
, based on the training data, can conditionally flip a flag qubit. The tagged qubit is then used as part of the
H gate to guide a Grover search algorithm to identify the index of the specified example; i.e., the state of the tagged qubit, such as “0” or “1”, determines the probability of success in identifying the target index. BQM optimizes the trainable parameters of the variable component subcircuit
. When the corresponding training instance is positive, the success probability of sampling a target index is maximized. Otherwise, BQM minimizes the probability of the success of sampling target indicators. The design of our algorithm has some advantages in terms of query complexity by inheriting attributes from the Grover search algorithm and performing binary classification tasks on these attributes while allowing the setting of search constraints [
35]. Under the above observation, a quantum classifier must have certain advantages [
36,
37,
38].
3.1. Pretreatment Stage of a Dichotomous Task
In the pretreatment stage, a dichotomous task uses the dataset
T defined in Equation (
1) as the extended dataset
.
To apply the Grover search algorithm to obtain index
, for
, the
pair data training rules
are as follows.
The pair of data in
is like the
pair of
; this means that
. The first
pair of
uniformly samples from a subset of
, when every label
is the opposite of
.
, and are constructed, which contain only those examples of with the labels ‘0’ and ‘1’, respectively. When , the pair samples before K from ; it is same as for the situation where , in which the pair samples before K from .
Various quantum classifiers encode
into quantum states in different ways [
39]. For the sake of notation, we indicate that
that analogously connects with the
example is
as
is a coding operation.
3.2. The Training Process of the Learning Plan
Compared with the traditional Grover algorithm, combining the variational learning method and the Grover search algorithm produces quantum advantages [
40,
41,
42,
43,
44]. The adopted variable component subcircuit
is designed to find a hyperplane to keep the last pair in the
away from the pair of samples before
K.
For the variational quantum circuits in BQM, a NISQ device scheme consists of a trainable single-qubit gate and two-qubit gates such as CNOT or CZ, which implement generation and discrimination tasks using variational hybrid quantum-classical algorithms. is denoted as , where each layer contains parameterized single-qubit gates and at most, fixed two-qubit gates with the same layout.
In the optimal situation, given a initial state
defined in Equation (
9),
is applied to obtain the following goals:
- 1.
If the pair of samples before
K as
analogously connects with the label
, the expected state is
- 2.
If the pair of samples before
K as
analogously connects with the label
, the expected state is
The label
(or
) as the quantum state of the
characteristics register the first qubits
(or
). As shown in
Figure 2A, state
was prepared. Our binary QNN model iteratively applied the H door to register by the characteristics of the first qubit and index on the index register control, using the
register and the
calculation characteristics, and to finish the first cycle, it applied the diffusion operator
to the index register. All quantum processes, such as
U, are part of a period.
Define
. Except on a loop, the binary QNN model is repeatedly in the initial state
to
U, and the application of the unitary operation is replaced with
The brown shade in
Figure 2B shows this. According to the traditional Grover search algorithm [
45], before making quantum measurements, the binary QNN model polls
U and
for a total of
times.
3.3. The Evolution of the Quantum State
We analyze how quantum states evolve under and :
3.4. The Loss Function
With the observation above leading to Theorem 1, the proof is given above:
Theorem 1. For BQM, under the optimal setting, if the label of the last item of is , the probability of the sampling resulting in is asymptotically 1.
Proof of Theorem 1. We discussed the case where the last entry in has labels and .
In the instance of
, assuming that the label of the final item in
is
, it is possible to determine from Equation (
14) that after the first cycle, the generation state of BQM is
The chance of picking any index when
U (apply to
) is the same, according to the formula above. After
U is applied to
via induction, with the migration with time
n, the state changes as
where the given
N is any positive integer, and the probability of sampling
is
. In the last loop, the quantum operation
defined in Equation (
13) is applied to state
and the resulting state is
where the first equality uses Equation (
18); the second equation uses Equation (
15) and exploits the application of the diffusion operator
, then
.
In the instance of
, assuming that the label of the last item in
is
, it is possible to determine from Equation (
16) that after the first cycle, the generation state of BQM is
The chance of picking any index when
U (apply to
) is the same, according to the formula above. After
U is applied to
by induction, with the migration with time
n, the state changes as follows:
where given that
n is any positive integer,
,
,
. In the last loop, the quantum operation
defined in Equation (
13) is applied to state
, and the resulting state is
where the first equality uses Equation (
21) and the second equation uses Equation (
16) to design the feature register. It uses
H to flip the phase of
whose first qubit of the feature register is
, and the last equation comes from the application of the diffusion operator
.
According to Equation (
22), in the ideal situation, the probability of sampling
is near to 1 when
, and then
is close to 1.
The result of Equation (
19) shows that when
, the probability of sampling
never increases. Thus, we can follow that the sampling probability of the result
asymptotically approaches one if and only if the label of the last term of
is
. □
According to Theorem 1 of the BQM’s special property, the output distribution is different for different labels of the input while performing the binary classification task. According to the analysis of Theorem 1 mentioned above, the calculation basis will be present in the output state of the BQM; that is, , which corresponds to , and its probability is close to 1. The matching output state for will, however, include the same computational foundation .
According to the mechanism of the Grover search algorithm, the loss function of BQM is deduced as
where
is the sign function,
, and
is defined in Equation (
12).
The success probability of sampling and obtaining the first feature qubit as ’1’(’0’) is maximized (minimized) when (), when faced with the challenge of minimizing the loss function .
3.5. Gradient-Based Parameter Optimization
The optimization method in this paper uses a multiple-layer parameterized quantum circuit (MPQC), according to the principle that the arrangement of quantum gates in each layer is the same [
46], and the operation formed by the layer
c is expressed as
, produced by quantum states produced by MPQC
where
C is the total number of layers. BQM uses MPQC to construct
The circuit layout of
at layer
l is shown in
Figure 3. When the number of layers is
C, the total number of trainable parameters of BQM is
.
The update rules of BQM at the
iteration are as follows
where
is the learning rate. Given the explicit form of
in the defined Equation (
23), the gradient of
can be rewritten as
where
is the label of the last item in
,
is the symbol function, and ∏ is the measurement operator.
BQM employs a gradient-based method, according to the parameter displacement rule, to obtain the gradient
, to optimize
. The parameter shift rule [
47] iteratively calculates each gradient entry under its guiding principle.
For
, only the
parameter is rotated by
, i.e.,
Combining Equations (
26)–(
28), the update rule of BQM at the
iteration of the e item is
where
.
3.6. Circuit Implementation of Label Prediction
After the training of BQM is completed, the trained
can use the corresponding circuit (as shown in
Figure 4) to predict the label of an instance with
query complexity.
Denoting the new input as , we encode a into quantum states using the same encoding method used during training; i.e., , then we apply the trained to .
When the size of the dataset loaded by the binary QNN model is K, a well-trained binary QNN model can obtain the index with query complexity.
3.7. Synthetic Construction of Datasets
Given the training example
, the embedded function
used to encode
into a quantum state is represented as
where
is a specified mapping function. The above formula means that g (xi) can be converted into a series of quantum operations, the implementation of which is shown in
Figure 5a. To encode multiple training examples into quantum states simultaneously, we should treat
as a controlled version, the implementation of which is shown in
Figure 5b.
3.8. The Details of BQM
The implementation of GBLS is shown in
Figure 5c. In it, the data encoding the unitary
consists of a controlled set of
quantum operations. The implementation of encoding unity
depends on the size of the batch B. For the quantum kernel classifier with BCE loss and MSE loss (
), it can be seen from Formula (
30) that the unitary encoding is
For a quantum kernel classifier with MSE loss (
, the implementation of encoding unitary
is the same as that of BQM, as shown in
Figure 5.
4. Results
We will analyze the security of the proposed BQM, intuitively express the evolution and generation process of the dataset by using the dot plot, and evaluate the algorithm’s performance through entropy analysis and testing [
48,
49,
50].
4.1. Dataset Evolution
This algorithm’s dataset generation process is shown in
Figure 6. It uses the top
pair and the
pair in the original dataset for label classification and uniform sampling. It divides all data pairs into sub-datasets labeled as 1 (or 0) according to the
y value of 0 (or 1).
The yellow cube in the figure represents the data point in the data pair whose value of y is 1, the blue cube represents the data point in the data pair whose value of y is 0, and the point whose circle in red represents the data point. The specific rules are as follows:
- 1.
Represents the original dataset when as the `A’ cube or when as the `B’ cube in the pair of data ;
- 2.
Represents the uniform sampling from the sub-dataset labeled 1 in
Figure 6a to generate a new dataset in
Figure 6b;
- 3.
Represents the uniform sampling from the sub-dataset labeled 0 in
Figure 6a to generate a new dataset in
Figure 6c.
4.2. Stability and Convergence Analysis
Define a utility-bound R as a utility measure to evaluate the distance between the optimization result and the stationary point in the optimized environment.
For the BQM quantum classifier with a depolarization noise setting, the utility bound of output
after
j iterations is
where
P is the depolarization rate,
l is the total number of trainable parameters,
K is the number of measurements to estimate the quantum expected value,
d is the circuit depth of the variable component sub-circuit, and
B is the number of batches.
We use the decay rate of
to define the asymptotic convergence rate of this optimization algorithm [
51,
52]. According to Equation (
33), the attenuation rate of
is slower than that of
, which proves that this algorithm has a sublinear convergence rate.
When , we input each sample in turn to variable component subcircuits to obtain . Once the set is collected, the gradient can be estimated by . Assuming that the number of measurements required to estimate the derivative of the JTH parameter is K, the total number of measurements obtained is for . Therefore, the estimate of with l parameters requires measurements.
For the above definition of utility bound R, the results show that a large number of lot B can guarantee a better realization of utility bound R by increasing the total number of measurements.
4.3. Performance Analysis under Depolarization Noise
We employ depolarization channels to mimic the system noise, since the number of measurements and the quantum system’s noise are both constrained. We next examine how well BQM performs in the presence of depolarization noise [
53].
If a quantum state is
, we define the depolarizing channel
acting on this quantum state as
where
P is the depolarization rate, and
l is the total number of trainable parameters.
We compare the performance of BQM and two other quantum kernel classifiers when quantum system noise and measurement delays are considered. Among them, BQM stands for a binary classification quantum neural network model based on the optimized Grover algorithm proposed in this paper. The two classifiers compared with BQM are defined as “BCE” and “MSE”, respectively. “BCE” stands for the quantum kernel classifier with binary cross-entropy loss, and “MSE” means the quantum kernel classifier with the mean square error loss (B = N). We simulated the statistics for each of the three classifiers by repeating the values 10 times.
Figure 7 illustrates the simulation findings. After 20 periods, BQM, BCE, and MSE quantum classifiers achieve the same performances. It can be observed that the quantum classifiers with MSE loss have lower convergence speeds and larger variances than the BQM and BCE classifiers. This phenomenon reflects that using BQM for classification tasks with different batches is meaningful. In
Table 1, we compare the average training and testing accuracies of the BQM, BCE, and MSE quantum classifiers in the last stage. Considering the measurement error and quantum gate noise, BQM still achieves a stable performance because of its minimal variance.
The binary QNN model based on Grover, quantum kernel classifier BCE loss, and quantum kernel classifier mean square error loss is reflected by the labels `BQM’, `BCE’, and `MSE’. The train and test accuracies of the BQM quantum classifier are shown in the left and right figures. The vertical bar represents the train and test accuracy variation at each iteration, where all hyperparameter settings are the same as those used in the above numerical simulation.
According to the numerical simulation results of the three quantum classifiers in
Figure 7, BQM can obtain a good utility boundary R through some tests. When they achieve basically comparable performances, BQM reduces the number of measurements required by K = 4 times compared to quantum classifiers with BCE losses and MSE losses (B = N). This result shows that when N is larger, there is a large separation of computational efficiency between BQM and the previous B = N quantum classifier.
The above data demonstrate that, when BQM is compared to the other two quantum classifiers, the number of measurements required by BQM is decreased by four times, demonstrating BQM’s efficacy.
4.4. Complex Comparsion Analysis
After receiving the encoded data in the variable component subcircuit of the quantum classifier, the measurement operator defines a query as one measurement. The iterative process of this algorithm is shown in
Figure 8. According to the quantum classifier’s training mechanism, calculating the total number of measurements for variable component subcircuits is comparable to the query complexity of obtaining the gradient in a time frame.
The next step is to derive the number of measurements required for a quantum kernel classifier with BCE loss in one period. For the dataset
T, the
loss is generated
where
is the label of the
example, and
is the prediction probability of the label
; the output of its quantum circuit is
where
,
is defined in Equation (
25), and
is the measurement operator. According to the parameter displacement rule, the derivative of
loss is satisfied
where
is defined in Equation (
28),
. To obtain the gradient of BCE loss according to the above equation, we need to give each training example to the quantum kernel classifier to estimate
, and then calculate the coefficient
.
BQM uses the superposition property of the loss function
L defined in Equation (
17) to obtain the gradient
. According to Equation (
23), the gradient of BQM satisfies
where
refers to the label of the last pair in the training example
. The gradient of
may be calculated using
measurements, where the first
K measurement aims to approach
and the last
K measurement aims to approximate
, according to the equation above.
Complex comparison analysis determines the effect of the dataset size on the binary QNN model in this paper. The standard Grover search algorithm’s search complexity is
for information data entries of size
K and the total number of trainable parameters
l. The optimal algorithm classification complexity value is
, as can be shown in the following
Table 2. The reduced query complexity of BQM implies a potential quantum advantage in completing the classification task.