Modern machine learning models are being designed to solve many problems simultaneously.
Multimodal datasets are becoming the norm, and new systems allow navigation across many sources.
We are also seeing increasingly rich ways to interact with them.
Modern machine learning models are being designed to solve many problems simultaneously.
Multimodal datasets are becoming the norm, and new systems allow navigation across many sources.
We are also seeing increasingly rich ways to interact with them.
Modern machine learning models are being designed to solve many problems simultaneously.
Multimodal datasets are becoming the norm, and new systems allow navigation across many sources.
We are also seeing increasingly rich ways to interact with them.
This is a difficult questions.... let's start with an easier one.
A good visualization is:
A good visualization is:
More subtly, it should pay attention to:
We should think about model interpretability with the same nuance that we think about data visualization.
Local Explanation: An artifact for reasoning about individual predictions.
Global Explanation: An artifact for reasoning about an entire model.
Problem: Imagine sampling longitudinal microbiome profiles from 500 study participants, some of whom eventually developed a disease. Can we discovery any microbiome-related risk factors? This simulation is motivated by microbiome studies of HIV risk (Gosmann et al., 2017).
Applying a transformer model to the raw series, we reach a hold-out performance of ~ 83.2%, which is nearly as good as a model with knowledge of the true underlying features.
In text data, we can understand context-dependent meaning by looking for clusters in the PCA of embeddings (Coenen et al., 2019). These represent a type of interaction.
We can build the analogous visualization for our microbiome problem. Samples that are nearby in the embedding space are similar w.r.t. predictive features.
Another common technique is to analyze linear interpolations in this space (Liu et al., 2019). This figure traces out the microbiome profiles between two samples.
Alternatively, we can explain a decision by reducing the arbitrary feature space to a set of human-interpretable concepts (Koh et al., 2020). This is part of a larger body of work that attempts to establish shared language/representations for interacting with models.
We reconfigure our transformer model to first predict the concept label before making a final classification.
Performance is in fact slightly better than before (84%), and we also obtain concept labels to help us explain each instance's prediction.
The E3SM is used for long-term climate projections.
Splatter generates synthetic single-cell genomics data.
Transparent simulators can be built by interactively composing simple modules. Probabilistic programming has streamlined the process.
a. Regression
b: Hierarchy
c: Latent Structure
d: Temporal Variation
By learning a discriminator to contrast real vs. simulated data, we can systematically improve the assumed generative mechanism.
Following the outbreak of COVID-19, the research community came together to build simulators that could inform pandemic response.
Covasim is an example of an agent-based model. Starting from local interaction rules, it lets us draw global inferences.
Statistical emulators mimic the relationship between input hyperparameters and output data, substantially reducing the computational burden.
Contact: ksankaran@wisc.edu
Acknowledgments
Coenen, A. et al. (2019). "Visualizing and Measuring the Geometry of BERT". In: ArXiv abs/1906.02715.
Gosmann, C. et al. (2017). "Lactobacillus‐Deficient Cervicovaginal Bacterial Communities Are Associated with Increased HIV Acquisition in Young South African Women". In: Immunity 46, p. 29–37.
Koh, P. W. et al. (2020). "Concept Bottleneck Models". In: ArXiv abs/2007.04612.
Liu, Y. et al. (2019). "Latent Space Cartography: Visual Analysis of Vector Space Embeddings". In: Computer Graphics Forum 38.
Liu, Y. et al. (2019). "Latent Space Cartography: Visual Analysis of Vector Space Embeddings". In: Computer Graphics Forum 38.
Modern machine learning models are being designed to solve many problems simultaneously.
Multimodal datasets are becoming the norm, and new systems allow navigation across many sources.
We are also seeing increasingly rich ways to interact with them.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |