HBS D3 Machine Learning for Leaders 26 Oct 2022

HBS D3 Machine Learning for Leaders 26 Oct 2022

  • AI versus ML (as definitions)
    • AI is a useful, informal term (no formal, precise definition in industry or academia at this time)
  • ML has a precise meaning
    • technique that lets software learn behavior from sample data, rather than from pre-defined rules
  • Example: rule-based programming vs ML for spam filtering
    • how do you filter bad email?
    • traditional approach:
      • rules (if email contains X + if email contains Y + if email contains Z + …)
    • ML:
      • model learns patterns from labeled example emails
      • classify by learning pattern
      • model: a mathematical representation of patterns and insights
      • model tries to classify new emails as spam or not spam
      • model is refined to reduce errors
      • the bigger (and more varied) the data set, the better the training
  • How does a machine learn?
    • teachable machine at google
      • https://teachablemachine.withgoogle.com/v1/
    • exercise
      • create labeled image data from source based on pre-existing model
      • system will learn patterns and classify images
      • person face, orange, banana
    • what led to errors in model?
    • what examples would improve the model?
  • ML development typically involves re-use
    • datasets
      • large data sets are expensive and time onsuming
      • often re-used
      • new datasets are created from scratch for specific use cases
    • ML models
      • could and on-device platforms offer AI products that may be used and re-used with no modification
      • DIY projects often start with pre-trained models
  • From classification to generation
    • teachable machine did classification
      • given input predict 1 of 3 classes
    • some models can predict thousands of classes
      • example: given a snippet of text, predict the next workd
      • this enables generation of prose, software code, etc
    • may seem like dramatic different, capability, but underlying issues remain the same
      • training data drives performance
      • bias is an issue
      • results are often noisy
  • generating text: why does it matter?
    • researchers realized that high-performance autocomplete can be used to tackle many different problems, controlled by a plain English interface
    • I saw the Red Sox play at (+ geographic knowledge)
    • 3 + 7 = (+ arithmetic)
    • GPT-3 (OpenAI, 2020) helped make this apparent
  • global conversation about AI principles
    • organizations, companies, HigherEd, governments, etc.
  • Google’s AI principles
    • AI should
      • be socially beneficial
      • avoid create or reinforce bias
      • built for safety
      • be accountable to people
      • privacy design
      • scientifici excellence
      • be made available for uses that accord w/ these principles
    • Google will not pursue AI projects that:
      • likely to cause harm
      • direct injury
      • surveillance violating international norms and laws
      • contravene international law and human rights
  • Concrete action for accountability:
    • good documentation
    • data cards
      • dataset provenance
      • intended / suited able use cases
      • data-set make-up, distributions
    • model cards
      • model provenance
      • model usage
      • ethics-informed evaluation
  • Explainability: can you explain the output of a ML system?
    • explanation for whom?
    • what do we mean by explain?
    • helps developers identity problems
    • id sources of error bias
    • empower users of ML
  • Is ML a black box?
    • not inherently, but it IS complex
    • tools for developers to understand models
    • new ways to explain
    • can customize explanation to roles
    • some limitations
      • can’t always provide comprehensive explanations
      • research continues
  • Case Study on Diabetic Retinopathy (DR)

    • fastest cause of preventable blindness
    • not always enough doctors to diagnose
      • India, for example (shortage of 127k eye doctors)
      • 45% of patients suffer vision loss before diagnosis
    • diagnosing DR is difficult
      • requires expertise
      • disagreements amongst experts
      • disagreements are resolved thru discussion
      • develop software linking data gathering and corresponding discussions
    • how do you train a model?
      • take an existing model and re-train it
      • Google started with an existing model that classified images of broccoli, fish, fire trucks etc
      • 130K images of eye scans provided and model re-trained
    • models CAN be retrained for a new task
    • incorporate privacy design principles
      • transparency
      • consent
      • access
    • health regulators have been thinking about automation for a long time
      • governance documentation by federal govt
    • avoid creating or reinforcing unfair bias?
      • what groups need to be well-represented in eye scan dataset?
      • age, sex, pupil size, image quality (in DR use case)
    • ARDA (Automated Retinal Disease Assessment) was developed
    • result to date
      • screened 5k patients
      • model performance is on par with eye speicalists
      • published 2 papers
      • did better than generalists, not quite as good as specialists
      • opened screening sites in Thailand
    • Observations
      • nurses required less specialists in remote facilities
      • wait times for referrals went from weeks to minutes
      • imagines readable by humans were often deemed too “low quality”
      • poor internet connectivity led to frustrating wait times
    • Dataset size
      • if you’re using a robust, pre-trained ML model, and re-training it for a more specific use case, the dataset used for re-training could be smaller
      • text prediction, in particular, seems to work well w/ these types of lower dataset sizes
      • more research being done on this
    • articles
      • Healthcare AI systems that put people at the center

      • https://dl.acm.org/doi/abs/10.1145/3313831.3376718

  • Takeaways
    • data is critical across systems
      • wide range of systems, but for each one, training data is the driver for performance, bias, etc
    • we can explain some things, some of the time
      • very contextual, depends on specific application and systems
    • Human-AI interaction
      • evaluate beyond benchmarks, direct resources towards understanding people in the loop, societal consequences