global-model-repr

Repository for the session on global-, model-, and representation-level interpretations.

Outline

Experiment 1 (15 minutes):
- Load one pretrained model from the Transformers library
- Load dataset of texts with part-of-speech (POS) annotations
- Run pretrained model on texts and extract representations
- Train and evaluate linear classifier on classifying representations to POS tags
Experiment 2 (10 minutes):
- Repeat the same for representations from all layers and compare accuracy across layers
Experiment 3 (10 minutes):
- Repeat the same for non-linear classifier
Experiment 4 (10 minutes):
- Create control experiment with random labels as per Hewitt and Liang
- Calculate selectivity and compare to previous results
Other topics as time permits:
- Other word-level linguistic properties besides parts-of-speech
- Sentence-level properties using aggregation of word-level representations or using sentence tokens
- Structural probe
- Methods for finding linguistic information in attention weights
- Other models from the Transformers library

Examining units of a classifier.
- Load a pretrained VGG classifier trained to classify scenes.
- Load a dataset of scene images, as well as a pretrained segmentation network.
- Run the classifier on the scene images to visualize top-activating images for each unit.
- Count agreement between segmentation classes and units to identify unit semantics.
Examining units of a GAN generator.
- Repeat the same, but for a pretrained GAN generator trained to generate scenes.
- Examine units accross layers.
Test units using interventions.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
netdissect		netdissect
notebooks		notebooks
probing		probing
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md