Convolutional Neural Network Technique for Distinguishing Nine Varieties of Vegetable Crops

Young-Yeol Cho; Ki Young Choi; Bo Hyun Sung; Nayoung Kwak

doi:10.22698/jales.20240011

Preview

Research Article

Journal of Agricultural, Life and Environmental Sciences. 30 June 2024. 123-131
https://doi.org/10.22698/jales.20240011

Convolutional Neural Network Technique for Distinguishing Nine Varieties of Vegetable Crops

Young-Yeol Cho¹²^*

Ki Young Choi³

Bo Hyun Sung⁴

Nayoung Kwak⁴

¹Professor, Horticultural Science Major, Jeju National University, Jeju 63243, Korea

²Researcher, Research Institute for Subtropical Agriculture and Animal Biotechnology, SARI, Jeju National University, Jeju, 63243, Korea

³Associate Professor, Department of Smart farm and Agricultural industry, Kangwon National University, Chunchon, 24341, Korea

⁴Graduate student, Department of Horticultural Science, Jeju National University, Jeju, 63243, Korea

^{*Corresponding Author}

License (open-access, https://creativecommons.org/licenses/by-nc/4.0/):

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

ABSTRACT

In the field of agriculture, conducting research on neural network models for image classification is necessary to accurately categorize crops based on their types and health conditions and distinguish them from other species, to minimize crop losses. This study aimed to compare multiple neural network models to select the optimal model that can classify the images of nine vegetable seedlings, such as carrot, Kimchi cabbage, kohlrabi, lettuce, mallow, mustard, pak-choi, spinach, and sweet pepper. The best model was selected based on its accuracy (precision, recall, and F1 score) from eight trained models, namely DenseNet201, InceptionResNetV2, InceptionV3, MobileNetV2, ResNet152V2, VGG16, VGG19, and Xception. To train the models, a 9-class dataset, 20 epochs, 32 batch sizes, Adam optimizer, and a learning rate of 0.001 were used. The DenseNet201 model exhibited the highest accuracy and was, therefore, selected as the optimal model. With a batch size of 128, Adam optimizer, and a learning rate of 0.001, this model exhibited high precision, recall, and F1 score, and its superiority was confirmed using a confusion matrix. As a result, the DenseNet201 model is expected to improve the recognition performance of the model by using images of various plant species, exploring more networks, and optimizing the hyperparameters to achieve higher recognition accuracy.

Keywords

Confusion matrix

Convolutional neural network

DenseNet201

Image classification

Vegetables

MAIN

Introduction
Materials and Methods
Plant materials
Artificial Intelligence Model
Performance metrics
Results and Discussion

Introduction

Automatic classification of agricultural products has become possible owing to recent advances in digital image processing technology. Visual methods are subjective and influenced by the observer’s psychological state, so more accurate and objective image processing is needed. Research on image classification neural network models in agricultural fields has been on the increase, and is expected to help in the efficient production and management of agricultural products (Elnemr, 2019; Han et al., 2020; Kalaivani et al., 2022; Milioto et al., 2018; Moon et al., 2020; Purwaningsih et al., 2018; Samiei et al., Sandeep Kumar et al., 2018; Yang and Xu, 2021). Seedling classification has become extremely important in precision agriculture. Accurate classification is crucial due to competition with the same or other plants for nutrient, water, and sunlight. Especially, classification of weeds among crops and their early removal can improve crop productivity and quality (Elnemr, 2019; Milioto et al., 2018; Sandeep Kumar et al., 2018).

Although efforts have been made to use neural networks to remove unwanted vegetation, Convolutional Neural Networks (CNNs) are widely used as an efficient method in artificial intelligence technologies. CNNs extract features from data to identify patterns; they are one of the methods that can be used for image processing; and they are commonly used for analyzing visual images. The network consists of multiple layers, such as convolutional, pooling, and fully connected layers. Convolutional layers use filters to perform convolution operations on the input images and extract specific features; pooling layers then reduce the size of the image while maintaining its important features; and fully connected layers combine the extracted features to classify the image. CNNs extract features from input images through multiple convolution and pooling operations. Convolution operations use the filters of a specified size to multiply the filter and input image at regular intervals and sum the results. This process extracts the features that occur in specific parts of the input images. Pooling operations reduce the image size while providing invariance to changes in the image location and reduce computation. These extracted features are then connected through fully connected layers to obtain the final classification result. This structure is highly effective for image recognition because of the series of steps required to extract the image features. This technology has been used for classifying weeds (Elnemr, 2019; Milioto et al., 2018; Sandeep Kumar et al., 2018), classifying vegetables and flower seedlings(Xiao et al., 2019), predicting the biomass of red chili peppers (Moon et al., 2020), classifying tomato diseases (Han et al., 2020) and strawberry pests and diseases (Choi et al., 2022), classifying plant seed images (Kalaivani et al., 2022), and predicting the number of days until lettuce cultivation (Baek et al., 2023). In the early stages of seedling growth, image classification based on object recognition can significantly impact seedling development. Therefore, early classification during growth is crucial. Classifying crops can be challenging, especially when they grow together. In this study, a seedling classification model was established using transfer learning based on convolutional neural networks. Seedling samples were collected, and the established model was trained and tested to conduct the research.

Precision, recall, and F1 score are metrics used to evaluate the performance of classification models (Kumaratenna and Cho, 2024). Precision measures how accurate the positive predictions of the model are, recall measures the number of actual positive observations correctly identified by the model, and F1 score balances the precision and recall to provide an overall evaluation of the model performance. Therefore, we selected an optimal neural network model for the classification of nine vegetable seedling images and evaluated its precision, recall, and F1 score.

Materials and Methods

Plant materials

Nine vegetable species were used for image classification: they include carrot (Dream7, Danong Co., Korea, Kimchi cabbage (Asia Mini F₁, Asiaseed Co., Korea), kohlrabi (Greenkohl, Asiaseed, Korea), lettuce (Summer Cheongchukmyeon, Jeliseed Co., Korea), mallow(Asia Chi Ma, Asiaseed Co., Korea), mustard (Jeil Cheong, Jeliseed Co., Korea), pak-choi (Jeil, Jeliseed Co., Korea), spinach (Susilo, Asiaseed Co., Korea), and sweet pepper (Volidano, Enxa Zaden, Australia). A total of 868 plant images were obtained using mobile phones (iPhone 12 Pro, SE2, 8, iPhone, USA and Galaxy S22, Samsung, Korea) from the time the vegetable leaves appeared. The images were classified into nine labels, as shown in Fig. 1.

https://cdn.apub.kr/journalsite/sites/ales/2024-036-02/N0250360206/images/ales_36_02_06_F1.jpg

Fig. 1.

Samples of captured images of the nine classes of vegetables to test the convolutional neural network models.

Artificial Intelligence Model

Eight artificial intelligence models were used: they include DenseNet201, InceptionResNetV2, InceptionV3, MobileNetV2, ResNet152V2, VGG16, VGG19, and Xception. All the models were trained using a transfer learning approach with a 224 × 224 pixel input resolution. Using data augmentation techniques, the model training was enhanced, and these transformation methods significantly expanded the dataset, thereby improving the model’s performance. To select the best model, model accuracy was used as a criterion to minimize the root mean squared error (RMSE) between the estimated and actual values. Since it exhibited the highest accuracy under the conditions of 20 epochs, 32 batch size, Adam optimizer, and a learning rate of 0.001, all models were analyzed under the same parameters. Python language (ver 3.8.8) was used to write the code for this experiment, and various libraries such as TensorFlow, Numpy, Pandas, and Keras were used.

Performance metrics

In this study, three performance evaluation metrics, such as precision, recall, and F1 score, were used to select the optimal model and assess its performance. Each metric is presented vertically, and the formula is provided in (1), (2), and (3), respectively.

(1)

Precision = \frac{T P}{T P + F P}

(2)

Pecall = \frac{T P}{T P + F N}

(3)

F 1 Score = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s o n + R e c a l l}

Precision represents the number of true positive labels when positive labels are provided. Recall describes the number of correctly labeled positive instances. F1 score considers both precision and recall to measure the overall performance of the model. True positive (TP) represents cases in which the true answer is predicted as true; false positive (FP) represents cases in which the false answer is predicted as true; false negative (FN) represents cases in which the true answer is predicted as false; and true negative (TN) represents cases in which the false answer is predicted as false.

Results and Discussion

In this study, 868 vegetable images were classified using eight different artificial intelligence models, and experiments were conducted. Results showed that DenseNet201, InceptionV3, MobileNetV2, ResNet152V2, VGG16, and Xception models exhibited high accuracies of over 80%. Among them, DenseNet201 model showed the best learning results with an accuracy of 90.5% (Table 1). These results were obtained under the same conditions (epochs, batch size, optimizer, and learning rate) for all models.

Table 1.

Analysis of classification accuracy of eight convolutional neural network models

Model	Precision (%)	Model	Precision (%)
DenseNet201	90.5^z	ResNet152V2	82.5
InceptionResNetV2	78.3	VGG16	80.8
InceptionV3	82.3	VGG19	67.6
MobileNetV2	85.0	Xception	80.9

^zHyperparameters to perform training stage were 9 classes, 20 epochs, 32 batch size, Adam optimizer, and 0.001 learning rate.

Kalaivani et al. (2022) reported that the optimization of hyperparameters such as batch size, optimizer, and learning rate for VGG19 and ResNet101 models is needed to achieve a performance that exceeds that of CNN in classifying 12 plant species. Gupta et al. (2020) used ResNet50, VGG16, VGG19, Xception, MobileNetV2, and CNN models to classify 12 plant species and found that ResNet50 achieved an accuracy of 95.23 %. However, in this study, hyperparameters were not optimized for models other than InceptionResNetV2 and VGG19, because all the models showed an accuracy of 80% or higher.

Densely Connected Convolutional Network (DenseNet) achieves better performance with fewer parameters than ResNet and Pre-Activation ResNet (Huang et al., 2018). In particular, the DenseNet201 model used in this study consisted of a CNN with 201 layers. It consists of five Dense Blocks and two transition layers (Fig. 2), and the input image size is 224 × 224 pixels.

https://cdn.apub.kr/journalsite/sites/ales/2024-036-02/N0250360206/images/ales_36_02_06_F2.jpg

Fig. 2.

A deep DenseNet201 model with five dense blocks. The layers between two adjacent blocks are referred to as transition layers and feature-map sizes were changed via convolution and pooling.

DenseNet introduces the concept of a Dense Block for pooling operations, and it is composed of multiple layers, with pooling operations performed between Dense Blocks. The BN, 1 × 1 conv, and 2 × 2 avg_pool operations used for this purpose are called the transition layers. DenseNets offer several advantages by alleviating the vanishing gradient problem, enhancing feature propagation, encouraging feature reuse, and reducing the number of parameters. Hence, they can achieve high performance with fewer computations and significantly improve performance more than recent technologies do (Huang et al., 2018).

Elnemr (2019) classified 12 types of crops (three crops and nine weeds) and evaluated the results using accuracy, precision, recall, and F1 score. These evaluation metrics were also used in the present study. The DenseNet201 model exhibited high accuracy (Table 2) with a precision of 92%, a macro-average F1 score of 91%, and a weighted-average F1 score of 92%.

Table 2.

Performance metrics of the DenseNet201 model

Seedling	Precision	Recall	F1-Score	Support
Kimchi cabbage	0.90	0.85	0.87	71
Carrot	0.99	0.99	0.99	113
Kohlrabi	0.88	0.85	0.86	117
Lettuce	0.89	0.94	0.91	109
Mallow	0.89	1.00	0.94	54
Mustard	0.92	0.93	0.93	67
Pakchoi	0.92	0.93	0.93	116
Spinach	0.94	0.81	0.87	77
Sweet pepper	0.91	0.98	0.94	141
Accuracy			0.92	865
Macro avg	0.92	0.91	0.91	865
Weight avg	0.92	0.92	0.92	865

^zHyperparameters to perform training stage were nine classes, 20 epochs, 128 batch size, Adam optimizer, and 0.001 learning rate.

For all vegetables classified using the DenseNet201 model, the precision and recall values were at least 88% and 81%, respectively, with cauliflower having the lowest precision and spinach having the lowest recall. The F1 score was above 86% for all vegetables. These results were attributed to the high precision, recall, and F1 score achieved by the DenseNet201 model when classifying vegetable images. Gupta et al. (2020) reported high precision, recall, and F1 scores for the VGG16 and ResNet50 models used for classifying 12 plant species and selected the ResNet50 model because it showed a higher validation accuracy than VGG16.

According to Fig. 3, the error and accuracy of the DenseNet201 model showed a decreasing and an increasing trend, respectively from epoch 2 onwards. There was no change in the error or accuracy when training was performed for more epochs, from 20 to 30. Additionally, a batch size of 128 yielded higher precision, recall, and F1 scores than a batch size of 32 (data not shown).

https://cdn.apub.kr/journalsite/sites/ales/2024-036-02/N0250360206/images/ales_36_02_06_F3.jpg

Fig. 3.

Training and validation loss (A) and accuracy (B) of the DenseNet201 model. The hyperparameters used to perform training were 9 classes, 20 epochs, a batch size of 128, Adam optimizer, and a learning rate of 0.001.

On examining the confusion matrix (Fig. 4) for crop classification, it was apparent that the predicted values of the model matched the actual results well for each crop. In general, a good model’s confusion matrix has high values on the diagonal, and in this study, the high diagonal values indicate that the predictions of the models were accurate. However, when inspecting the cases in which the model made four or more incorrect predictions, cabbage was frequently misclassified as lettuce, cauliflower as mugwort, mustard as red pepper, watercress as lettuce, and spinach as cauliflower or red pepper.

https://cdn.apub.kr/journalsite/sites/ales/2024-036-02/N0250360206/images/ales_36_02_06_F4.jpg

Fig. 4.

Confusion matrix of the DenseNet201 model. The hyperparameters used to perform training stage were 9 classes, 20 epochs, a batch size of 128, Adam optimizer, and a learning rate of 0.001.

According to Yang and Xu (2021), 33.8% of deep learning technology applications in the field of horticulture from 2016 to 2021 are for species and variety classification. Artificial intelligence (AI) technology has been widely used for plant classification, and a CNN-based classification using images has shown a high accuracy of 92.96%. Elnemr (2019) reported an accuracy of 94.38% using CNN classification technology that can distinguish between three crops and nine weeds. Such a classification technology can be useful for crop management, and it is expected to be developed in the future to classify various types of crops. This can be very helpful in crop production, especially in the direct removal of weeds without having to harm the crops or apply herbicides. Therefore, these technologies are expected to play critical roles in agriculture. Sandeep Kumar et al. (2018) reported that crop productivity is significantly affected by weed control. Weeds compete with crops for nutrients, water, and light, consequently resulting in reduced crop yields, which negatively affects crop productivity (Elnemr, 2019); therefore, weed removal is crucial. Thus, it is believed that a quick classification of weeds and crops could help to improve crop productivity. According to Sandeep Kumar et al. (2018), convolutional neural networks can differentiate weeds from carrots with 95% confidence. However, the efficiency and accuracy of weed detection remain an issue, and the application of deep learning technology is necessary to solve this problem. Additionally, by using time-based image classification technology, it is possible to monitor the development and vitality of seedlings at low cost (Samiei et al., 2020) and determine their quality. This technology can also be used to continuously manage farms at various growth stages (Elnemr, 2019; Milioto et al., 2018; Yang and Xu, 2021). CNNs require a large amount of labeled data to perform well. If the dataset is small or lacks diversity, the CNN may not generalize well to new or unseen examples. In this study, 868 images were used, but it is determined that more images are needed to increase accuracy. In addition, it is necessary to expand the study from nine vegetable crops to more vegetable crops.

In this study, 868 images across 9 classes were obtained and classified using artificial intelligence techniques. Despite the limited number of images, they were successfully utilized for training, thanks to the application of data augmentation techniques. Data augmentation is a technique that increases the quantity of data by utilizing geometric transformations and task-based transformation methods. Geometric transformations involve altering the geometric properties of image data, such as resizing, flipping, cropping, and rotating, to expand the dataset. Task-based transformation methods focus on specific objects for recognition, utilizing techniques like cutting and flipping to extend the dataset (Choi et al., 2022; Shorten and Khoshgoftaar, 2019).

This study proposed a DenseNet201 model for object recognition based on images of vegetable seedlings. The model was designed for use in horticulture, and the goal was to improve its recognition performance by collecting more diverse object images of various plant species in the future. To achieve this, more networks will be applied, and high recognition accuracy would be achieved through hyperparameter optimization. We hope that this will contribute to the improvements of agricultural productivity.

Acknowledgements

This work was supported by Korea Institute of Planning and Evaluation for Technology in Food, Agriculture and Forestry (IPET) and Korea Smart Fam R&D Foundation (KosFarm) through Smart Farm Innovation Technology Development Program, funded by Ministry of Agriculture, Food and Rural Affairs (MAFRA) and Ministry of Science and ICT (MSIT), Rural Development Administration (RDA)(421009043HD020).

References

Baek, Y. T., Sul, S. G., Cho, Y. Y. (2023) Estimation of days after transplanting using an artificial intelligence CNN(Convolutional Neural Network) model in a closed-type plant factory. Hortic Sci Technol 41:81-90. doi:10.7235/HORT.20230008

10.7235/HORT.20230008

Choi, Y. W., Kim, N. E., Paudel, B., Kim, H. T. (2022) Strawberry pests and disease detection technique optimized for symptons using deep learning algorithm. J Bio-Env Con 31:255-260. doi:10.12791/KSBEC.2022.31.3255

10.12791/KSBEC.2022.31.3.255

Elnemr, H. A. (2019) Convolutional neural network architecture for plant seedling classification. Int J Adv Comput Sci Appl 10:319-325. doi:10.14569/IJACSA.2019.0100841

10.14569/IJACSA.2019.0100841

Gupta, K., Rani, R., Bahia, N. K. (2020) Plant-seedling classification using transfer learning-based deep convolutional neural networks. Int J Agr & Environ Inform Sys 11:25-40. doi:10.4018/IJAEIS.2020100102

10.4018/IJAEIS.2020100102

Han, H. S., Kim, D. H., Chae, J. W., Lee, S. A., Kim, Y. J., Cho, H. U., Cho, H. C. (2020) A Study of tomato disease classification system based on deep learning. Trans Korean Inst Electr Eng 69:349-355. doi:10.5370/KIEE.2020.69.2.349

10.5370/KIEE.2020.69.2.349

Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K. Q. (2018) Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2261-2269. doi:10.1109/CVPR.2017.243

10.1109/CVPR.2017.243PMC5598342

Kalaivani, K. S., Kanimozhiselvi, C. S., Priyadharshini, N., Nivedhashri, S., Nandhini, R. (2022) Classification of plant seedling using deep learning techniques. In: Hemanth, D.J., Pelusi, D, Vuppalapati, C. (eds) Intelligent data communication technologies and internet of things. Lecture Notes on Data Engineering and Communications Technologies, vol. 101. Springer, Singapore. doi:10.1007/978-981-16-7610-9_76

10.1007/978-981-16-7610-9_76

Kumaratenna, K. P. S., Cho, Y. Y. (2024) Tea leaf disease classification using artificial intelligence (AI) models. J Bio-Env Con 33:1-11. doi:10.12791/KSBEC.2024.33.1.001

10.12791/KSBEC.2024.33.1.001

Milioto, A., Lottes, P., Stachniss, C. (2018) Real-time semantic segmentation of crop and weed for precision agriculture robots leveraging background knowledge in CNNs. In 2018 IEEE Intl Conf on Robotics & Automation. doi:10.48550/arXiv.1709.06764

10.1109/ICRA.2018.8460962

Moon, T. W., Park, J. Y., Son, J. E. (2020) Estimation of sweet pepper crop fresh weight with convolutional neural network. Protected Hortic Plant Fac 29:381-387. doi:10.12791/KSBEC.2020.29.4.381

10.12791/KSBEC.2020.29.4.381

Purwaningsih, T., Anjani, I. A., Utami, P. B. (2018) Convolutional neural networks implementation for chili classification. In: 2018 International Symposium on Advanced Intelligent Informatics (SAIN) 190-194. doi:10.1109/SAIN.2018.8673373

10.1109/SAIN.2018.8673373

Samiei, S., Rasti, P., Ly Vu, J., Rousseau, D. (2020) Deep learning-based detection of seedling development. Plant Methods 16:103. doi/10.1186/s13007-020-00647-9

10.1186/s13007-020-00647-932742300PMC7391498

Sandeep Kumar, K., Rajeswari, Usha, B. N. (2018) Convolution neural network based weed detection in horticulture plantation. Int J Sci Res Rev 7:41-47. doi:16.10089.IJSRR.2018.V7I06.5245.2394

Shorten, C., Khoshgoftaar, T. M. (2019) A survey on image data augmentation for deep learning. J Bio Data 6:60. doi:10.1186/s40537-019-0197-0

10.1186/s40537-019-0197-0

Xiao, Z., Tan, Y., Liu, X., Yang, S. (2019) Classification method of plug seedlings based on transfer learning. Appl Sci 9:2725. doi:10.3390/app9132725

10.3390/app9132725

Yang, B., Xu, Y. (2021) Application of deep-learning approaches in horticultural research: a review. Hortic Res 8:123. doi:10.1038/s41438-021-00560-9

10.1038/s41438-021-00560-934059657PMC8167084

Journal of Agricultural, Life and Environmental Sciences ISSN:2233-8322(Print) 2508-870X(Online)

Preview

Convolutional Neural Network Technique for Distinguishing Nine Varieties of Vegetable Crops

ABSTRACT

MAIN

Fig. 1.

Samples of captured images of the nine classes of vegetables to test the convolutional neural network models.

(1)

(2)

(3)

Table 1.

Analysis of classification accuracy of eight convolutional neural network models

Fig. 2.

A deep DenseNet201 model with five dense blocks. The layers between two adjacent blocks are referred to as transition layers and feature-map sizes were changed via convolution and pooling.

Table 2.

Performance metrics of the DenseNet201 model

Fig. 3.

Training and validation loss (A) and accuracy (B) of the DenseNet201 model. The hyperparameters used to perform training were 9 classes, 20 epochs, a batch size of 128, Adam optimizer, and a learning rate of 0.001.

Fig. 4.

Confusion matrix of the DenseNet201 model. The hyperparameters used to perform training stage were 9 classes, 20 epochs, a batch size of 128, Adam optimizer, and a learning rate of 0.001.

Acknowledgements

References