Skip to content

Cherry Powdery Mildew Detector Handbook

Claudia Cifaldi edited this page Mar 2, 2023 · 4 revisions

Business Case Assessment

  1. What are the business requirements?
  • The client is interested in conducting a study to visually differentiate a cherry leaf that is healthy from one that contains powdery mildew.
  • The client is interested in predicting if a cherry leaf is healthy or contains powdery mildew.
  • The client is interested in obtaining a prediction report of the examined leaves.
  1. Is there any business requirement that can be answered with conventional data analysis?
  • Yes, we can use conventional data analysis to conduct a study to visually differentiate a cherry leaf that is healthy from one that contains powdery mildew. The model is required to assign to a cherry leaf one of the two categories: healthy/infected, which makes it a classification problem. It could be seen as a binary classification (healthy vs NOT healthy) or as a multiclass classification where each output is assigned one and only one label from more than two classes (just two in our case: healthy vs infected).
  1. Does the client need a dashboard or an API endpoint?
  • The client needs a dashboard. Streamlit was chosen to develop this dashboard containing the above pages as per client request:
    • Summary page (homepage, presentation of the dashboard)
    • Leaves visualizer (see Business Requirement #1)
    • Model performance (plot model performance with explaination)
    • Mildew detector (upload widget and download button)
  1. What does the client consider as a successful project outcome?
  • A study showing how to visually differentiate a cherry leaf that is healthy from one that contains powdery mildew (Business Requirement #1).
  • Also, the capability to predict if a cherry leaf is healthy or contains powdery mildew with a sufficient accuracy (>86%) and a model potentially applicable to other crops.
  1. Can you break down the project into Epics and User Stories?

The following tasks are part of the CRISP DM process used to develop this project. (See the Project page)

  • Information gathering and data collection.
  • Data visualization, cleaning, and preparation.
  • Model training, optimization and validation.
  • Dashboard planning, designing, and development.
  • Dashboard deployment and release.
  1. Ethical or Privacy concerns?
  • The client provided the data under an NDA (non-disclosure agreement), therefore the data should only be shared with professionals that are officially involved in the project.
  1. Does the data suggest a particular model?
  • The data suggests an image classifier, indicating whether a particular cherry leaf is healthy or contains powdery mildew. Two approaches (binary/multiclass classification) were evaluated in the README.md
  1. What are the model's inputs and intended outputs?
  • The input is a cherry leaf image from the Kaggle dataset and the output (showed and downloadable from the dashboard) is a prediction of whether the cherry leaf is healthy or contains powdery mildew.
  1. What are the criteria for the performance goal of the predictions?
  • We agreed with the client a degree of 86% accuracy.
  1. How will the client benefit?
  • The client will not supply the market with a product of compromised quality.

Project Considerations

Business Requirement 1

The study includes an analysis on:

  • average images and variability images for each class (healthy or powdery mildew),
  • the differences between average healthy and average powdery mildew cherry leaves,
  • an image montage for each class.

Business Requirement 2

  • Develop an ML system that is capable of predicting whether a cherry leaf is healthy or contains powdery mildew using a Neural Networks to map the relationships between the features and the labels.
  • Take into consideration GitHub and Heroku file size restrictions before push to GitHub and Deploy on Heroku. GitHub needs Git LFS (Large File Storage) to push files larger than 100Mb, Heroku maximum slug size is 500Mb.

Dashboard Expectations

The dashboard contains:

  • A project summary page, showing the project dataset summary and the client's requirements.
  • A page listing your findings related to a study to visually differentiate a cherry leaf that is healthy from one that contains powdery mildew
  • A page containing:
    • A link to download a set of cherry leaf images for live prediction (you may use the Kaggle repository that was provided to you).
    • A User Interface with a file uploader widget. The user should have the capacity to upload multiple images. For each image, it will display the image and a prediction statement, indicating if a cherry leaf is healthy or contains powdery mildew and the probability associated with this statement.
  • A table with the image name and prediction results, and a download button to download the table.
  • A page indicating your project hypothesis and how you validated it across the project.
  • A technical page displaying your model performance.