Edinburgh Research Explorer Machine Learning for Plant Phenotyping Needs Image Processing

We found the article by Singh et al. [1] extremely interesting because it introduces and showcases the utility of machine learning for high-throughput data-driven plant phenotyping. With this letter we aim to emphasize the role that image analysis and processing have in the phenotyping pipeline beyond what is suggested in [1], both in analyzing phenotyping data (e.g., to measure growth) and when providing effective feature extraction to be used by machine learning. Key recent reviews have shown that it is image analysis itself (what the authors of [1] consider as part of pre-processing) that has brought a renaissance in phenotyping [2].

showcases the utility of machine learning for high throughput data-driven plant phenotyping.
With this letter we want to emphasize the role that image analysis and processing have in the phenotyping pipeline beyond what [1] suggests, both in analyzing phenotyping data (e.g., to measure growth) but also when providing effective feature extraction to be used by machine learning.Key recent reviews have shown that it is image analysis itself (what the authors of [1] consider as part of pre-processing) that has brought a renaissance in phenotyping [2].At the same time, the lack of robust methods to analyze these images is now the new bottleneck [3][4][5].And this bottleneck is not easy to overcome.As the following aims to illustrate, it is coupled to the imaging system and the environment but also to the analysis task at hand and requires new skills to help deal with the challenges introduced.
A successful high-throughput image-based phenotyping system starts with the imaging approach itself.The choices are to image many plants simultaneously or one plant at a time, requiring movable systems to bring the plant to the camera or vice versa.These systems add cost but have the benefit of isolating the object of interest.In turn this simplifies its processing, for example facilitating object segmentation, i.e. the image analysis process isolating the plant from background (e.g., soil) as Figure 1(A) shows.[There are many image processing tasks related to how we perceive and analyze an object of interest, such as segmentation, detection, tracking, and many others.]When this is not the case, plant segmentation can be extremely complex because here the objects of interest may touch and overlap each other (known as occlusion), as in Figure 1(B).In the open field [6] this becomes exceedingly more complex: light variations, plant movements due to wind, and other factors are introduced, and background (e.g., other plants) may look like the subject of interest, as Figure 1(C) illustrates.Thus, the process of extracting information from image data is directly linked with the setup and the environment.
In some cases, the actual analysis task becomes hard just by the information to be soughtafter, as a recent article describes in depth [3].To offer an example, Figure 1(D) illustrates the task of segmenting individual plant leaves [7] for estimating per-leaf growth (when this task is repeated in a longitudinal fashion [8]).Here occlusion and lack of discernible boundaries (edges) between leaves make the segmentation task difficult and additional information (e.g., depth) may be required.
While image analysis may help us identify plant parts and extract relevant traits, typically it is their agglomeration across a study that could provide suitable input for machine learning.There is a need for mechanisms to represent the image data in a way that machine learning algorithms can use, and this process is known as feature extraction (another component bundled under preprocessing in [1]).At present, features need to be designed and extracted carefully by expert supervision requiring specific domain knowledge (a process known as feature engineering), the translation of which to image analysis protocols and image filters (e.g., edge detectors) does require significant image processing expertise and skills.For example, in drought tolerance studies one can rely on the overall amount of green or yellow pixels as potential features.
However, this simple approach may not always let us discriminate between stressed and not stressed plants.It is well known in machine learning that finding good features for the application at hand is intrinsic to an effective use of learning approaches (even sophisticated ones).Thus, image processing is key to obtaining accurate and reliable phenotypic results.
Solving the phenotyping bottleneck requires machine learning, but also good image processing and good features, significantly broadening the required skill-set from a practitioner's perspective.The last few years have brought significant progress towards bringing the image analysis experts closer to plant biology using a variety of targeted actions to help diffuse skills and know-how.There exist both isolated workshops aimed at training biologists in image analysis (e.g., IAMPS 1 ), but also new workshop series that run in conjunction with major computer vision conferences 2,3 to help introduce new scientists into this exciting application area of image analysis (e.g., 'Computer Vision Problems in Plant Phenotyping').A recent special issue on Computer Vision and Image Analysis in Plant Phenotyping provided a good summary of the advances that occurred based on these efforts [9].These workshops also served as the hosting venue to image-based phenotyping challenges 4 , which led to a summarizing collation study [7].However, we should not dismiss the recent potential to actually devise intelligent algorithms that can start from raw images to arrive directly to a phenotyping decision or trait.After all, this is the promise of deep learning that is making waves in the news when a significant amount of annotated data to learn from is available.These algorithms find optimal features from the raw data (the images) -in a process known as representation learning-which are then used to train supervised counterparts.We are not there yet, but some early findings have appeared in the context of phenotyping, e.g., to count leaves for phenotyping purposes [10].
The promise of deep learning (and machine learning in general) cannot be materialized without the availability of annotated data.Thus, recent efforts to lower the entry barrier and accelerate this process were aimed at releasing open access data together with suitable performance evaluation protocols (see [11,12] and http://www.plant-phenotyping.org/datasets).The diffusion and adoption of such datasets as benchmarks will allow for the parallel growth of methods and the fair comparison of approaches across the years to come.In addition, in the field, where experimental design is poorer due to reduced control over confounding variables and the imaging setup is less than ideal, it is the combination of machine learning and computer vision that can make a significant contribution in meeting phenotyping challenges in this challenging domain.Again here the availability of data will be critical and efforts such as the one described in [7] are a good start towards this goal.
To conclude, to make leaps towards addressing future issues of agricultural demand, phenotyping will certainly play a key role and will be aided by innovations in machine learning and computer vision and the multidisciplinary collaboration among the biological, engineering, and computer sciences.However, when we image many plants together in the lab (B), or in the field (C, left) segmentatio n becomes much harder when plants touch each other and overlap.The process is inherently hard when objects cannot be isolated before segmentation, e.g., when we want to delineate each leaf within a single plant (D).Before machine learning can be used for phenotyping, the process of segmentation is more often than not necessary in order to design good features.

Figure 1 :
Figure 1: The process of segmentation (delineation of plant from background or leaves from each other) changes in complexity according to the imaging conditions and task at hand.A: Plant segmentation of isolated plants.B: Tray with overlapping plants.C: Image from the field (adapted from the dataset presented in [6] reproduced according to the Creative Commons Attribution 4.0 International License, http://creativecommons.org/licenses/by/4.0/).D: Leaf segmentation of isolated plants.When plants are isolated (A or C, right), reliable segmentation procedures exist.