Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (90)

Search Parameters:
Keywords = semantic similarity recognition

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 8056 KiB  
Article
Construction of Three-Dimensional Semantic Maps of Unstructured Lawn Scenes Based on Deep Learning
by Xiaolin Xie, Zixiang Yan, Zhihong Zhang, Yibo Qin, Hang Jin, Cheng Zhang and Man Xu
Appl. Sci. 2024, 14(11), 4884; https://doi.org/10.3390/app14114884 (registering DOI) - 4 Jun 2024
Abstract
Traditional automatic gardening pruning robots generally employ electronic fences for the delineation of working boundaries. In order to quickly determine the working area of a robot, we combined an improved DeepLabv3+ semantic segmentation model with a simultaneous localization and mapping (SLAM) system to [...] Read more.
Traditional automatic gardening pruning robots generally employ electronic fences for the delineation of working boundaries. In order to quickly determine the working area of a robot, we combined an improved DeepLabv3+ semantic segmentation model with a simultaneous localization and mapping (SLAM) system to construct a three-dimensional (3D) semantic map. To reduce the computational cost of its future deployment in resource-constrained mobile robots, we replaced the backbone network of DeepLabv3+, ResNet50, with MobileNetV2 to decrease the number of network parameters and improve recognition speed. In addition, we introduced an efficient channel attention network attention mechanism to enhance the accuracy of the neural network, forming an improved Multiclass MobileNetV2 ECA DeepLabv3+ (MM-ED) network model. Through the integration of this model with the SLAM system, the entire framework was able to generate a 3D semantic point cloud map of a lawn working area and convert it into octree and occupancy grid maps, providing technical support for future autonomous robot operation and navigation. We created a lawn dataset containing 7500 images, using our own annotated images as ground truth. This dataset was employed for experimental purposes. Experimental results showed that the proposed MM-ED network model achieved 91.07% and 94.71% for MIoU and MPA metrics, respectively. Using a GTX 3060 Laptop GPU, the frames per second rate reached 27.69, demonstrating superior recognition performance compared to similar semantic segmentation architectures and better adaptation to SLAM systems. Full article
(This article belongs to the Special Issue Advanced 2D/3D Computer Vision Technology and Applications)
21 pages, 2274 KiB  
Article
Category-Based Effect on False Memory of People with Down Syndrome
by Ching-Fen Hsu, Qian Jiang and Shi-Yu Rao
Brain Sci. 2024, 14(6), 538; https://doi.org/10.3390/brainsci14060538 - 24 May 2024
Viewed by 249
Abstract
Background: People with Down syndrome (DS) are deficient in verbal memory but relatively preserved in visuospatial perception. Verbal memories are related to semantic knowledge. Receptive ability is better than expressive ability in people with DS but still seriously lags behind their age-matched [...] Read more.
Background: People with Down syndrome (DS) are deficient in verbal memory but relatively preserved in visuospatial perception. Verbal memories are related to semantic knowledge. Receptive ability is better than expressive ability in people with DS but still seriously lags behind their age-matched controls. This lag may result in the weak semantic integration of people with DS. Aims: This study aimed to examine the ability of semantic integration of people with DS by using false-memory tasks. Possible differences in the number of false memories induced by nouns and verbs were of focus. Methods and Procedures: Two phases were involved in the false-memory task. In the study phase, ten-word lists with semantically related associates were presented. In the recognition phase, judgments were to be made about whether the words presented had been heard before. Three types of words were tested: previously presented associates, semantically related lures, and semantically unrelated new words. Outcomes and Results: People with DS overall showed the lowest accuracy among groups in response to tested word types. In the processing of lures, people with DS were worse in recognition than MA controls. In processing unrelated words, people with DS responded least accurately to all types of words compared to control groups. In the processing of associates, people with DS showed similar recognition rates as the MA controls but were less accurate than the CA controls. No difference was observed between nouns and verbs in recognizing word types among groups, though faster responses to nouns than to verbs emerged in college students. Further analyses on topic-wised comparisons of errors across syntactic categories revealed differences in specific concepts among groups, suggesting people with DS were atypical in semantic organization. Conclusions and Implications: People with DS showed mixed patterns in semantic integration by false-memory tasks with delay to associates and deviance to lures together with unrelated words. People with DS showed distinct patterns in processing nouns and verbs while conducting topic-wise comparisons, suggesting that they formed false memories differently based on distinct syntactic categories. We concluded that people with DS develop a deviant semantic structure, hence showing problems in language and social cognition. Category-based rehabilitation is suggested to be implemented for people with DS to improve their semantic knowledge through lexical connections. Full article
Show Figures

Figure 1

19 pages, 26836 KiB  
Article
Single-Species Leaf Detection against Complex Backgrounds with YOLOv5s
by Ziyi Wang, Xiyou Su and Shiwei Mao
Forests 2024, 15(6), 894; https://doi.org/10.3390/f15060894 - 21 May 2024
Viewed by 345
Abstract
Accurate and rapid localization and identification of tree leaves are of significant importance for urban forest planning and environmental protection. Existing object detection neural networks are complex and often large, which hinders their deployment on mobile devices and compromises their efficiency in detecting [...] Read more.
Accurate and rapid localization and identification of tree leaves are of significant importance for urban forest planning and environmental protection. Existing object detection neural networks are complex and often large, which hinders their deployment on mobile devices and compromises their efficiency in detecting plant leaves, especially against complex backgrounds. To address this issue, we collected eight common types of tree leaves against complex urban backgrounds to create a single-species leaf dataset. Each image in this dataset contains only one type of tree but may include multiple leaves. These leaves share similar shapes and textures and resemble various real-world background colors, making them difficult to distinguish and accurately identify, thereby posing challenges to model precision in localization and recognition. We propose a lightweight single-species leaf detection model, SinL-YOLOv5, which is only 15.7 MB. First, we integrated an SE module into the backbone to adaptively adjust the channel weights of feature maps, enhancing the expression of critical features such as the contours and textures of the leaves. Then, we developed an adaptive weighted bi-directional feature pyramid network, SE-BiFPN, utilizing the SE module within the backbone. This approach enhances the information transfer capabilities between the deep semantic features and shallow contour texture features of the network, thereby accelerating detection speed and improving detection accuracy. Finally, to enhance model stability during learning, we introduced an angle cost-based bounding box regression loss function (SIoU), which integrates directional information between ground-truth boxes and predicted boxes. This allows for more effective learning of the positioning and size of leaf edges and enhances the model’s accuracy in detecting leaf locations. We validated the improved model on the single-species leaf dataset. The results showed that compared to YOLOv5s, SinL-YOLOv5 exhibited a notable performance improvement. Specifically, SinL-YOLOv5 achieved an increase of nearly 4.7 percentage points in the [email protected] and processed an additional 20 frames per second. These enhancements significantly enhanced both the accuracy and speed of localization and recognition. With this improved model, we achieved accurate and rapid detection of eight common types of single-species tree leaves against complex urban backgrounds, providing technical support for urban forest surveys, urban forestry planning, and urban environmental conservation. Full article
(This article belongs to the Special Issue Computer Application and Deep Learning in Forestry)
Show Figures

Figure 1

28 pages, 2121 KiB  
Article
Task-Adaptive Multi-Source Representations for Few-Shot Image Recognition
by Ge Liu, Zhongqiang Zhang and Xiangzhong Fang
Information 2024, 15(6), 293; https://doi.org/10.3390/info15060293 - 21 May 2024
Viewed by 373
Abstract
Conventional few-shot learning (FSL) mainly focuses on knowledge transfer from a single source dataset to a recognition scenario with only a few training samples available but still similar to the source domain. In this paper, we consider a more practical FSL setting where [...] Read more.
Conventional few-shot learning (FSL) mainly focuses on knowledge transfer from a single source dataset to a recognition scenario with only a few training samples available but still similar to the source domain. In this paper, we consider a more practical FSL setting where multiple semantically different datasets are available to address a wide range of FSL tasks, especially for some recognition scenarios beyond natural images, such as remote sensing and medical imagery. It can be referred to as multi-source cross-domain FSL. To tackle the problem, we propose a two-stage learning scheme, termed learning and adapting multi-source representations (LAMR). In the first stage, we propose a multi-head network to obtain efficient multi-domain representations, where all source domains share the same backbone except for the last parallel projection layers for domain specialization. We train the representations in a multi-task setting where each in-domain classification task is taken by a cosine classifier. In the second stage, considering that instance discrimination and class discrimination are crucial for robust recognition, we propose two contrastive objectives for adapting the pre-trained representations to be task-specialized on the few-shot data. Careful ablation studies verify that LAMR significantly improves representation transferability, showing consistent performance boosts. We also extend LAMR to single-source FSL by introducing a dataset-splitting strategy that equally splits one source dataset into sub-domains. The empirical results show that LAMR can achieve SOTA performance on the BSCD-FSL benchmark and competitive performance on mini-ImageNet, highlighting its versatility and effectiveness for FSL of both natural and specific imaging. Full article
(This article belongs to the Special Issue Few-Shot Learning for Knowledge Engineering and Intellectual System)
Show Figures

Figure 1

25 pages, 6796 KiB  
Article
RQ-OSPTrans: A Semantic Classification Method Based on Transformer That Combines Overall Semantic Perception and “Repeated Questioning” Learning Mechanism
by Yuanjun Tan, Quanling Liu, Tingting Liu, Hai Liu, Shengming Wang and Zengzhao Chen
Appl. Sci. 2024, 14(10), 4259; https://doi.org/10.3390/app14104259 - 17 May 2024
Viewed by 327
Abstract
The pre-trained language model based on Transformers possesses exceptional general text-understanding capabilities, empowering it to adeptly manage a variety of tasks. However, the topic classification ability of the pre-trained language model will be seriously affected in the face of long colloquial texts, expressions [...] Read more.
The pre-trained language model based on Transformers possesses exceptional general text-understanding capabilities, empowering it to adeptly manage a variety of tasks. However, the topic classification ability of the pre-trained language model will be seriously affected in the face of long colloquial texts, expressions with similar semantics but completely different expressions, and text errors caused by partial speech recognition. We propose a long-text topic classification method called RQ-OSPTrans to effectively address these challenges. To this end, two parallel learning modules are proposed to learn long texts, namely, the repeat question module and the overall semantic perception module. The overall semantic perception module will conduct average pooling on the semantic embeddings produced by BERT, in addition to multi-layer perceptron learning. The repeat question module will learn the text-embedding matrix, extracting detailed clues for classification based on words as fundamental elements. Comprehensive experiments demonstrate that RQ-OSPTrans can achieve a generalization performance of 98.5% on the Chinese dataset THUCNews. Moreover, RQ-OSPTrans can achieve state-of-the-art performance on the arXiv-10 dataset (84.4%) and has a comparable performance with other state-of-the-art pre-trained models on the AG’s News dataset. Finally, the results indicate that our method exhibits a superior performance compared with the baseline methods on small-scale domain-specific datasets by validating RQ-OSPTrans on a specific task scenario by using our custom-built dataset CCIPC. Full article
Show Figures

Figure 1

19 pages, 7263 KiB  
Article
SCFNet: Lightweight Steel Defect Detection Network Based on Spatial Channel Reorganization and Weighted Jump Fusion
by Hongli Li, Zhiqi Yi, Liye Mei, Jia Duan, Kaimin Sun, Mengcheng Li, Wei Yang and Ying Wang
Processes 2024, 12(5), 931; https://doi.org/10.3390/pr12050931 - 2 May 2024
Viewed by 799
Abstract
The goal of steel defect detection is to enhance the recognition accuracy and accelerate the detection speed with fewer parameters. However, challenges arise in steel sample detection due to issues such as feature ambiguity, low contrast, and similarity among inter-class features. Moreover, limited [...] Read more.
The goal of steel defect detection is to enhance the recognition accuracy and accelerate the detection speed with fewer parameters. However, challenges arise in steel sample detection due to issues such as feature ambiguity, low contrast, and similarity among inter-class features. Moreover, limited computing capability makes it difficult for small and medium-sized enterprises to deploy and utilize networks effectively. Therefore, we propose a novel lightweight steel detection network (SCFNet), which is based on spatial channel reconstruction and deep feature fusion. The network adopts a lightweight and efficient feature extraction module (LEM) for multi-scale feature extraction, enhancing the capability to extract blurry features. Simultaneously, we adopt spatial and channel reconstruction convolution (ScConv) to reconstruct the spatial and channel features of the feature maps, enhancing the spatial localization and semantic representation of defects. Additionally, we adopt the Weighted Bidirectional Feature Pyramid Network (BiFPN) for defect feature fusion, thereby enhancing the capability of the model in detecting low-contrast defects. Finally, we discuss the impact of different data augmentation methods on the model accuracy. Extensive experiments are conducted on the NEU-DET dataset, resulting in a final model achieving an mAP of 81.2%. Remarkably, this model only required 2.01 M parameters and 5.9 GFLOPs of computation. Compared to state-of-the-art object detection algorithms, our approach achieves a higher detection accuracy while requiring fewer computational resources, effectively balancing the model size and detection accuracy. Full article
Show Figures

Figure 1

18 pages, 2271 KiB  
Article
Document Retrieval System for Biomedical Question Answering
by Harun Bolat and Baha Şen
Appl. Sci. 2024, 14(6), 2613; https://doi.org/10.3390/app14062613 - 20 Mar 2024
Viewed by 678
Abstract
In this paper, we describe our biomedical document retrieval system and answers extraction module, which is part of the biomedical question answering system. Approximately 26.5 million PubMed articles are indexed as a corpus with the Apache Lucene text search engine. Our proposed system [...] Read more.
In this paper, we describe our biomedical document retrieval system and answers extraction module, which is part of the biomedical question answering system. Approximately 26.5 million PubMed articles are indexed as a corpus with the Apache Lucene text search engine. Our proposed system consists of three parts. The first part is the question analysis module, which analyzes the question and enriches it with biomedical concepts related to its wording. The second part of the system is the document retrieval module. In this step, the proposed system is tested using different information retrieval models, like the Vector Space Model, Okapi BM25, and Query Likelihood. The third part is the document re-ranking module, which is responsible for re-arranging the documents retrieved in the previous step. For this study, we tested our proposed system with 6B training questions from the BioASQ challenge task. We obtained the best MAP score on the document retrieval phase when we used Query Likelihood with the Dirichlet Smoothing model. We used the sequential dependence model at the re-rank phase, but this model produced a worse MAP score than the previous phase. In similarity calculation, we included the Named Entity Recognition (NER), UMLS Concept Unique Identifiers (CUI), and UMLS Semantic Types of the words in the question to find the sentences containing the answer. Using this approach, we observed a performance enhancement of roughly 25% for the top 20 outcomes, surpassing another method employed in this study, which relies solely on textual similarity. Full article
(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications—2nd Edition)
Show Figures

Figure 1

15 pages, 563 KiB  
Article
Camouflaged Object Detection Based on Deep Learning with Attention-Guided Edge Detection and Multi-Scale Context Fusion
by Yalin Wen, Wei Ke and Hao Sheng
Appl. Sci. 2024, 14(6), 2494; https://doi.org/10.3390/app14062494 - 15 Mar 2024
Viewed by 848
Abstract
In nature, objects that use camouflage have features like colors and textures that closely resemble their background. This creates visual illusions that help them hide and protect themselves from predators. This similarity also makes the task of detecting camouflaged objects very challenging. Methods [...] Read more.
In nature, objects that use camouflage have features like colors and textures that closely resemble their background. This creates visual illusions that help them hide and protect themselves from predators. This similarity also makes the task of detecting camouflaged objects very challenging. Methods for camouflaged object detection (COD), which rely on deep neural networks, are increasingly gaining attention. These methods focus on improving model performance and computational efficiency by extracting edge information and using multi-layer feature fusion. Our improvement is based on researching ways to enhance efficiency in the encode–decode process. We have developed a variant model that combines Swin Transformer (Swin-T) and EfficientNet-B7. This model integrates the strengths of both Swin-T and EfficientNet-B7, and it employs an attention-guided tracking module to efficiently extract edge information and identify objects in camouflaged environments. Additionally, we have incorporated dense skip links to enhance the aggregation of deep-level feature information. A boundary-aware attention module has been incorporated into the final layer of the initial shallow information recognition phase. This module utilizes the Fourier transform to quickly relay specific edge information from the initially obtained shallow semantics to subsequent stages, thereby more effectively achieving feature recognition and edge extraction. In the latter phase, which is focused on deep semantic extraction, we employ a dense skip joint attention module to enhance the decoder’s performance and efficiency, ensuring accurate capture of deep-level information, feature recognition, and edge extraction. In the later stage of deep semantic extraction, we use a dense skip joint attention module to improve the decoder’s performance and efficiency in capturing precise deep information. This module efficiently identifies the specifics and edge information of undetected camouflaged objects across channels and spaces. Differing from previous methods, we introduce an adaptive pixel strength loss function for handling key captured information. Our proposed method shows strong competitive performance on three current benchmark datasets (CHAMELEON, CAMO, COD10K). Compared to 26 previously proposed methods using 4 measurement metrics, our approach exhibits favorable competitiveness. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

23 pages, 640 KiB  
Article
Intent Identification by Semantically Analyzing the Search Query
by Tangina Sultana, Ashis Kumar Mandal, Hasi Saha, Md. Nahid Sultan and Md. Delowar Hossain
Modelling 2024, 5(1), 292-314; https://doi.org/10.3390/modelling5010016 - 22 Feb 2024
Viewed by 753
Abstract
Understanding and analyzing the search intent of a user semantically based on their input query has emerged as an intriguing challenge in recent years. It suffers from small-scale human-labeled training data that produce a very poor hypothesis of rare words. The majority of [...] Read more.
Understanding and analyzing the search intent of a user semantically based on their input query has emerged as an intriguing challenge in recent years. It suffers from small-scale human-labeled training data that produce a very poor hypothesis of rare words. The majority of data portals employ keyword-driven search functionality to explore content within their repositories. However, the keyword-based search cannot identify the users’ search intent accurately. Integrating a query-understandable framework into keyword search engines has the potential to enhance their performance, bridging the gap in interpreting the user’s search intent more effectively. In this study, we have proposed a novel approach that focuses on spatial and temporal information, phrase detection, and semantic similarity recognition to detect the user’s intent from the search query. We have used the n-gram probabilistic language model for phrase detection. Furthermore, we propose a probability-aware gated mechanism for RoBERTa (Robustly Optimized Bidirectional Encoder Representations from Transformers Approach) embeddings to semantically detect the user’s intent. We analyze and compare the performance of the proposed scheme with the existing state-of-the-art schemes. Furthermore, a detailed case study has been conducted to validate the model’s proficiency in semantic analysis, emphasizing its adaptability and potential for real-world applications where nuanced intent understanding is crucial. The experimental result demonstrates that our proposed system can significantly improve the accuracy for detecting the users’ search intent as well as the quality of classification during search. Full article
Show Figures

Figure 1

22 pages, 24530 KiB  
Article
Identifying Land Use Functions in Five New First-Tier Cities Based on Multi-Source Big Data
by Wangmin Yang, Yang Ye, Bowei Fan, Shuang Liu and Jingwen Xu
Land 2024, 13(3), 271; https://doi.org/10.3390/land13030271 - 21 Feb 2024
Cited by 1 | Viewed by 789
Abstract
With the continuous development of big data technology, semantic-rich multi-source big data provides broader prospects for the research of urban land use function recognition. This study relied on POI data and OSM data to select the central urban areas of five new first-tier [...] Read more.
With the continuous development of big data technology, semantic-rich multi-source big data provides broader prospects for the research of urban land use function recognition. This study relied on POI data and OSM data to select the central urban areas of five new first-tier cities as the study areas. The TF-IDF algorithm was used to identify the land use functional layout of the study area and establish a confusion matrix for accuracy verification. The results show that: (1) The common feature of these five cities is that the total number and area of land parcels for residential land, commercial service land, public management and service land, and green space and open space land all account for over 90%. (2) The Kappa coefficients were all in the range [0.61, 0.80], indicating a high consistency of accuracy evaluation. (3) Chengdu and Tianjin have the highest land use function mixing degree, followed by Xi‘an, Nanjing, and Hangzhou. (4) Among the five new first-tier cities, Hangzhou and Nanjing have the highest similarity in land use function structure layout. This study attempts to reveal the current land use situation of five cities, which will provide a reference for urban development planning and management. Full article
(This article belongs to the Special Issue Planning for Sustainable Urban and Land Development)
Show Figures

Figure 1

18 pages, 4329 KiB  
Article
Advancing Human Motion Recognition with SkeletonCLIP++: Weighted Video Feature Integration and Enhanced Contrastive Sample Discrimination
by Lin Yuan, Zhen He, Qiang Wang and Leiyang Xu
Sensors 2024, 24(4), 1189; https://doi.org/10.3390/s24041189 - 11 Feb 2024
Viewed by 682
Abstract
This paper introduces ‘SkeletonCLIP++’, an extension of our prior work in human action recognition, emphasizing the use of semantic information beyond traditional label-based methods. The innovation, ‘Weighted Frame Integration’ (WFI), shifts video feature computation from averaging to a weighted frame approach, enabling a [...] Read more.
This paper introduces ‘SkeletonCLIP++’, an extension of our prior work in human action recognition, emphasizing the use of semantic information beyond traditional label-based methods. The innovation, ‘Weighted Frame Integration’ (WFI), shifts video feature computation from averaging to a weighted frame approach, enabling a more nuanced representation of human movements in line with semantic relevance. Another key development, ‘Contrastive Sample Identification’ (CSI), introduces a novel discriminative task within the model. This task involves identifying the most similar negative sample among positive ones, enhancing the model’s ability to distinguish between closely related actions. Incorporating the ‘BERT Text Encoder Integration’ (BTEI) leverages the pre-trained BERT model as our text encoder to refine the model’s performance. Empirical evaluations on HMDB-51, UCF-101, and NTU RGB+D 60 datasets illustrate positive improvements, especially in smaller datasets. ‘SkeletonCLIP++’ thus offers a refined approach to human action recognition, ensuring semantic integrity and detailed differentiation in video data analysis. Full article
(This article belongs to the Special Issue Smart Sensing Technology for Human Activity Recognition)
Show Figures

Figure 1

16 pages, 34884 KiB  
Article
ControlFace: Feature Disentangling for Controllable Face Swapping
by Xuehai Zhang, Wenbo Zhou, Kunlin Liu, Hao Tang, Zhenyu Zhang, Weiming Zhang and Nenghai Yu
J. Imaging 2024, 10(1), 21; https://doi.org/10.3390/jimaging10010021 - 11 Jan 2024
Viewed by 1612
Abstract
Face swapping is an intriguing and intricate task in the field of computer vision. Currently, most mainstream face swapping methods employ face recognition models to extract identity features and inject them into the generation process. Nonetheless, such methods often struggle to effectively transfer [...] Read more.
Face swapping is an intriguing and intricate task in the field of computer vision. Currently, most mainstream face swapping methods employ face recognition models to extract identity features and inject them into the generation process. Nonetheless, such methods often struggle to effectively transfer identity information, which leads to generated results failing to achieve a high identity similarity to the source face. Furthermore, if we can accurately disentangle identity information, we can achieve controllable face swapping, thereby providing more choices to users. In pursuit of this goal, we propose a new face swapping framework (ControlFace) based on the disentanglement of identity information. We disentangle the structure and texture of the source face, encoding and characterizing them in the form of feature embeddings separately. According to the semantic level of each feature representation, we inject them into the corresponding feature mapper and fuse them adequately in the latent space of StyleGAN. Owing to such disentanglement of structure and texture, we are able to controllably transfer parts of the identity features. Extensive experiments and comparisons with state-of-the-art face swapping methods demonstrate the superiority of our face swapping framework in terms of transferring identity information, producing high-quality face images, and controllable face swapping. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

18 pages, 11160 KiB  
Article
mid-DeepLabv3+: A Novel Approach for Image Semantic Segmentation Applied to African Food Dietary Assessments
by Thierry Roland Baban A Erep and Lotfi Chaari
Sensors 2024, 24(1), 209; https://doi.org/10.3390/s24010209 - 29 Dec 2023
Viewed by 867
Abstract
Recent decades have witnessed the development of vision-based dietary assessment (VBDA) systems. These systems generally consist of three main stages: food image analysis, portion estimation, and nutrient derivation. The effectiveness of the initial step is highly dependent on the use of accurate segmentation [...] Read more.
Recent decades have witnessed the development of vision-based dietary assessment (VBDA) systems. These systems generally consist of three main stages: food image analysis, portion estimation, and nutrient derivation. The effectiveness of the initial step is highly dependent on the use of accurate segmentation and image recognition models and the availability of high-quality training datasets. Food image segmentation still faces various challenges, and most existing research focuses mainly on Asian and Western food images. For this reason, this study is based on food images from sub-Saharan Africa, which pose their own problems, such as inter-class similarity and dishes with mixed-class food. This work focuses on the first stage of VBDAs, where we introduce two notable contributions. Firstly, we propose mid-DeepLabv3+, an enhanced food image segmentation model based on DeepLabv3+ with a ResNet50 backbone. Our approach involves adding a middle layer in the decoder path and SimAM after each extracted backbone feature layer. Secondly, we present CamerFood10, the first food image dataset specifically designed for sub-Saharan African food segmentation. It includes 10 classes of the most consumed food items in Cameroon. On our dataset, mid-DeepLabv3+ outperforms benchmark convolutional neural network models for semantic image segmentation, with an mIoU (mean Intersection over Union) of 65.20%, representing a +10.74% improvement over DeepLabv3+ with the same backbone. Full article
Show Figures

Figure 1

20 pages, 1805 KiB  
Article
Chinese Fine-Grained Named Entity Recognition Based on BILTAR and GlobalPointer Modules
by Weijun Li, Jintong Liu, Yuxiao Gao, Xinyong Zhang and Jianlai Gu
Appl. Sci. 2023, 13(23), 12845; https://doi.org/10.3390/app132312845 - 30 Nov 2023
Viewed by 692
Abstract
The task of fine-grained named entity recognition is to locate entities in text and classify them into predefined fine-grained categories. At present, Chinese fine-grained NER only uses the pretrained language model to encode the characters in the sentence and lacks the ability to [...] Read more.
The task of fine-grained named entity recognition is to locate entities in text and classify them into predefined fine-grained categories. At present, Chinese fine-grained NER only uses the pretrained language model to encode the characters in the sentence and lacks the ability to extract the deep semantic, sequence, and position information. The sequence annotation method is character-based and lacks the processing of entity boundaries. Fine-grained entity categories have a high degree of similarity, which makes it difficult to distinguish similar categories. To solve the above problems, this paper constructs the BILTAR deep semantic extraction module and adds the GlobalPointer module to improve the accuracy of Chinese fine-grained named entity recognition. The BILTAR module is used to extract deep semantic features from the coding information of pretrained language models and use higher-quality features to improve the model performance. In the GlobalPointer module, the model first adds the rotation position encoding information to the feature vector, using the position information to achieve data enhancement. Finally, the model considers all possible entity boundaries through the GlobalPointer module and calculates the scores for all possible entity boundaries in each category. In this paper, all possible entity boundaries in the text are considered by the above method, and the accuracy of entity recognition is improved. In this paper, the corresponding experiments were carried out on CLUENER 2020 and the micro Chinese fine-grained NER dataset, and the F1 scores of the model in this paper reached 80.848% and 75.751%, respectively. In ablation experiments, the proposed method outperforms the most advanced baseline model and improves the performance of the basic model. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

12 pages, 1982 KiB  
Article
Development of Bleeding Artificial Intelligence Detector (BLAIR) System for Robotic Radical Prostatectomy
by Enrico Checcucci, Pietro Piazzolla, Giorgia Marullo, Chiara Innocente, Federico Salerno, Luca Ulrich, Sandro Moos, Alberto Quarà, Gabriele Volpi, Daniele Amparore, Federico Piramide, Alexandru Turcan, Valentina Garzena, Davide Garino, Sabrina De Cillis, Michele Sica, Paolo Verri, Alberto Piana, Lorenzo Castellino, Stefano Alba, Michele Di Dio, Cristian Fiori, Eugenio Alladio, Enrico Vezzetti and Francesco Porpigliaadd Show full author list remove Hide full author list
J. Clin. Med. 2023, 12(23), 7355; https://doi.org/10.3390/jcm12237355 - 28 Nov 2023
Cited by 4 | Viewed by 1083
Abstract
Background: Addressing intraoperative bleeding remains a significant challenge in the field of robotic surgery. This research endeavors to pioneer a groundbreaking solution utilizing convolutional neural networks (CNNs). The objective is to establish a system capable of forecasting instances of intraoperative bleeding during robot-assisted [...] Read more.
Background: Addressing intraoperative bleeding remains a significant challenge in the field of robotic surgery. This research endeavors to pioneer a groundbreaking solution utilizing convolutional neural networks (CNNs). The objective is to establish a system capable of forecasting instances of intraoperative bleeding during robot-assisted radical prostatectomy (RARP) and promptly notify the surgeon about bleeding risks. Methods: To achieve this, a multi-task learning (MTL) CNN was introduced, leveraging a modified version of the U-Net architecture. The aim was to categorize video input as either “absence of blood accumulation” (0) or “presence of blood accumulation” (1). To facilitate seamless interaction with the neural networks, the Bleeding Artificial Intelligence-based Detector (BLAIR) software was created using the Python Keras API and built upon the PyQT framework. A subsequent clinical assessment of BLAIR’s efficacy was performed, comparing its bleeding identification performance against that of a urologist. Various perioperative variables were also gathered. For optimal MTL-CNN training parameterization, a multi-task loss function was adopted to enhance the accuracy of event detection by taking advantage of surgical tools’ semantic segmentation. Additionally, the Multiple Correspondence Analysis (MCA) approach was employed to assess software performance. Results: The MTL-CNN demonstrated a remarkable event recognition accuracy of 90.63%. When evaluating BLAIR’s predictive ability and its capacity to pre-warn surgeons of potential bleeding incidents, the density plot highlighted a striking similarity between BLAIR and human assessments. In fact, BLAIR exhibited a faster response. Notably, the MCA analysis revealed no discernible distinction between the software and human performance in accurately identifying instances of bleeding. Conclusion: The BLAIR software proved its competence by achieving over 90% accuracy in predicting bleeding events during RARP. This accomplishment underscores the potential of AI to assist surgeons during interventions. This study exemplifies the positive impact AI applications can have on surgical procedures. Full article
(This article belongs to the Section Nephrology & Urology)
Show Figures

Figure 1

Back to TopTop