+ - 0:00:00
Notes for current slide
Notes for next slide
Interactivity, Interpretability, and Generative Models
Kris Sankaran
ksankaran@wisc.edu
Lab: go.wisc.edu/pgb8nl
16 | June | 2024
INFORMS ALIO-ASOCIO
Slides: go.wisc.edu/3u4m16
1 / 35

Generalist Models

  1. Modern machine learning models are being designed to solve many problems simultaneously.

  2. Multimodal datasets are becoming the norm, and new systems allow navigation across many sources.

  3. We are also seeing increasingly rich ways to interact with them.

2 / 35

Generalist Models

  1. Modern machine learning models are being designed to solve many problems simultaneously.

  2. Multimodal datasets are becoming the norm, and new systems allow navigation across many sources.

  3. We are also seeing increasingly rich ways to interact with them.

3 / 35

Generalist Models

  1. Modern machine learning models are being designed to solve many problems simultaneously.

  2. Multimodal datasets are becoming the norm, and new systems allow navigation across many sources.

  3. We are also seeing increasingly rich ways to interact with them.

4 / 35

What can go wrong?

5 / 35

What Makes a Model Interpretable?


6 / 35

What Makes a Model Interpretable?


This is a difficult questions.... let's start with an easier one.

7 / 35

What Makes a Visualization Good?


8 / 35

Key Properties

A good visualization is:

  1. Legible: It omits extraneous, distracting elements.
  2. Annotated: It shows data within the problem context.
  3. Information Dense: It shows relevant variation efficiently.

9 / 35

Key Properties

A good visualization is:

  1. Legible: It omits extraneous, distracting elements.
  2. Annotated: It shows data within the problem context.
  3. Information Dense: It shows relevant variation efficiently.

10 / 35

Below-the-Surface

More subtly, it should pay attention to:

  1. Data Provenance: If we don't know the data sources, we should be skeptical or anything that's shown, no matter how compelling.
  2. Audience: The effectiveness of a visualization is dependent on the visual vocabulary of its audience.
  3. Prioritization: Every design emphasizes some comparisons over others. Are the "important" patterns visible?
  4. Interactivity: Does it engage the reader's problem solving capacity?

We should think about model interpretability with the same nuance that we think about data visualization.

11 / 35

Vocabulary

  1. Interpretable Model: A model that, by virtue of its design, is easy for its stakeholders to accurately describe and alter.
  2. Explainability Technique: A method that shapes our mental models about black box systems.

12 / 35

Vocabulary

  1. Local Explanation: An artifact for reasoning about individual predictions.

  2. Global Explanation: An artifact for reasoning about an entire model.

13 / 35

Interpretability Examples

14 / 35

Hypothetical Study

Problem: Imagine sampling longitudinal microbiome profiles from 500 study participants, some of whom eventually developed a disease. Can we discovery any microbiome-related risk factors? This simulation is motivated by microbiome studies of HIV risk (Gosmann et al., 2017).

15 / 35

Transformers

  1. A principle of deep learning is that end-to-end optimization is more general than expert design.
  2. We can apply the GPT2 architecture to our problem, viewing a sequence of microbiome profiles like a sequence of words.

16 / 35

Transformers

  1. A principle of deep learning is that end-to-end optimization is more general than expert design.
  2. We can apply the GPT2 architecture to our problem, viewing a sequence of microbiome profiles like a sequence of words.

17 / 35

Transformers

Applying a transformer model to the raw series, we reach a hold-out performance of ~ 83.2%, which is nearly as good as a model with knowledge of the true underlying features.

18 / 35

Embeddings

In text data, we can understand context-dependent meaning by looking for clusters in the PCA of embeddings (Coenen et al., 2019). These represent a type of interaction.

19 / 35

Embeddings

We can build the analogous visualization for our microbiome problem. Samples that are nearby in the embedding space are similar w.r.t. predictive features.

20 / 35

Interpolations

Another common technique is to analyze linear interpolations in this space (Liu et al., 2019). This figure traces out the microbiome profiles between two samples.

21 / 35

Concept Bottlenecks

Alternatively, we can explain a decision by reducing the arbitrary feature space to a set of human-interpretable concepts (Koh et al., 2020). This is part of a larger body of work that attempts to establish shared language/representations for interacting with models.

22 / 35

Concept Bottlenecks

We reconfigure our transformer model to first predict the concept label before making a final classification.

23 / 35

Concept Bottlenecks

Performance is in fact slightly better than before (84%), and we also obtain concept labels to help us explain each instance's prediction.

24 / 35

Interactivity

25 / 35

Scientific Generative Models

  1. Simulators have emerged as a general problem-solving device across various domains, many of which now have rich, open-source libraries.
  2. Where is the interface with statistics?
    • Experimental design, model building, and decision-making.

The E3SM is used for long-term climate projections.

26 / 35

Scientific Generative Models

  1. Simulators have emerged as a general problem-solving device across various domains, many of which now have rich, open-source libraries.
  2. Where is the interface with statistics?
    • Experimental design, model building, and decision-making.

Splatter generates synthetic single-cell genomics data.

27 / 35

Grammar of Generative Models

Transparent simulators can be built by interactively composing simple modules. Probabilistic programming has streamlined the process.

a. Regression
b: Hierarchy
c: Latent Structure
d: Temporal Variation

28 / 35

Discrepancy and Iterability

By learning a discriminator to contrast real vs. simulated data, we can systematically improve the assumed generative mechanism.

29 / 35

Covasim

Following the outbreak of COVID-19, the research community came together to build simulators that could inform pandemic response.

  • E.g., "What would happen if we held classes remotely for two weeks?"

30 / 35

Covasim

Covasim is an example of an agent-based model. Starting from local interaction rules, it lets us draw global inferences.

Statistical emulators mimic the relationship between input hyperparameters and output data, substantially reducing the computational burden.

31 / 35

Thank you!

Contact: ksankaran@wisc.edu

Acknowledgments

  • Lab Members: Margaret Thairu, Hanying Jiang, Shuchen Yan, Yuliang Peng, Kaiyan Ma, Kai Cui, Sam Merten, and Kobe Uko
  • Funding: NIGMS R01GM152744.
33 / 35

References

Coenen, A. et al. (2019). "Visualizing and Measuring the Geometry of BERT". In: ArXiv abs/1906.02715.

Gosmann, C. et al. (2017). "Lactobacillus‐Deficient Cervicovaginal Bacterial Communities Are Associated with Increased HIV Acquisition in Young South African Women". In: Immunity 46, p. 29–37.

Koh, P. W. et al. (2020). "Concept Bottleneck Models". In: ArXiv abs/2007.04612.

Liu, Y. et al. (2019). "Latent Space Cartography: Visual Analysis of Vector Space Embeddings". In: Computer Graphics Forum 38.

34 / 35

References

Liu, Y. et al. (2019). "Latent Space Cartography: Visual Analysis of Vector Space Embeddings". In: Computer Graphics Forum 38.

35 / 35

Generalist Models

  1. Modern machine learning models are being designed to solve many problems simultaneously.

  2. Multimodal datasets are becoming the norm, and new systems allow navigation across many sources.

  3. We are also seeing increasingly rich ways to interact with them.

2 / 35
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow