Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Few-Shot Image Classification of Crop Diseases Based on Vision-Language Models

Version 1 : Received: 19 June 2024 / Approved: 20 June 2024 / Online: 20 June 2024 (15:08:05 CEST)

How to cite: Zhou, Y.; Yan, H.; Ding, K.; Cai, T.; Zhang, Y. Few-Shot Image Classification of Crop Diseases Based on Vision-Language Models. Preprints 2024, 2024061456. https://doi.org/10.20944/preprints202406.1456.v1 Zhou, Y.; Yan, H.; Ding, K.; Cai, T.; Zhang, Y. Few-Shot Image Classification of Crop Diseases Based on Vision-Language Models. Preprints 2024, 2024061456. https://doi.org/10.20944/preprints202406.1456.v1

Abstract

Accurate crop diseases classification is crucial for ensuring food security and enhancing agricultural productivity. However, existing crop disease classification algorithms primarily focus on a single image modality and typically require a large number of samples. Our research counters these issues by using pre-trained Vision-Language Models (VLMs), which enhances multimodal synergy for better crop disease classification than traditional unimodal approaches. Firstly, we apply the multimodal model Qwen-VL to generate meticulous textual descriptions for representative disease images selected through clustering from the training set, which will serve as prompt text for generating classifier weights. Compared to using solely the language model for prompt text generation, this approach better captures and conveys fine-grained and image-specific information, thereby enhancing prompt quality. Secondly, we integrate the cross-attention and the SE (Squeeze-and-Excitation) attention into the training-free mode VLCD (Vision-Language model for Crop Diseases classification) and the training-required mode VLCD-T (VLCD-Training) respectively for prompt text processing, enhancing classifier weights by emphasizing key text features. Experimental outcomes conclusively prove our method’s heightened classification effectiveness in few-shot crop disease scenarios, tackling data limitations and intricate disease recognition issues. It offers a pragmatic tool for agricultural pathology and reinforces the smart farming surveillance infrastructure.

Keywords

Few-Shot Learning; Crop Disease Classification; Vision-Language Models; Attention Mechanisms

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.