Repository for the session on global-, model-, and representation-level interpretations.
- Experiment 1 (15 minutes):
- Load one pretrained model from the Transformers library
- Load dataset of texts with part-of-speech (POS) annotations
- Run pretrained model on texts and extract representations
- Train and evaluate linear classifier on classifying representations to POS tags
- Experiment 2 (10 minutes):
- Repeat the same for representations from all layers and compare accuracy across layers
- Experiment 3 (10 minutes):
- Repeat the same for non-linear classifier
- Experiment 4 (10 minutes):
- Create control experiment with random labels as per Hewitt and Liang
- Calculate selectivity and compare to previous results
- Other topics as time permits:
- Other word-level linguistic properties besides parts-of-speech
- Sentence-level properties using aggregation of word-level representations or using sentence tokens
- Structural probe
- Methods for finding linguistic information in attention weights
- Other models from the Transformers library
- Examining units of a classifier.
- Load a pretrained VGG classifier trained to classify scenes.
- Load a dataset of scene images, as well as a pretrained segmentation network.
- Run the classifier on the scene images to visualize top-activating images for each unit.
- Count agreement between segmentation classes and units to identify unit semantics.
- Examining units of a GAN generator.
- Repeat the same, but for a pretrained GAN generator trained to generate scenes.
- Examine units accross layers.
- Test units using interventions.