-
Each student will be randomly assigned 2 topics, one about NLP and one about Python.
-
The main focus will be on the NLP topic, but sufficient knowledge (at least that equivalent to a passing grade) must be demonstrated in both topics to pass the exam.
-
Students will have at least 15 minutes of preparation time.
-
The exam grade will count towards 50% of the final grade.
Date | Time | Location |
---|---|---|
December 20 (Wed) | 08:00 | Q.B203 |
January 3 (Wed) | 10:00 | Q.B203 |
January 17 (Wed) | 10:00 | Q.B203 |
- POS-tags: definition, examples from Universal Tagset (not all tags, but DET, NOUN, ADJ, VERB)
- Tagging in general: definitions and terminology (from lecture notebook)
- NP-chunking: definition, examples
- NER: definition, examples
- Naive ways limitations (non-unique POS-tag, capital letters in NER)
- Supervised learning, labeled data in general
- Sequential data: windows approach, problem with long-distance relationships
- Simple POS-tagger: most seen pos tag in a given context of words
- what is POS-tagging
- The Viterbi algorithm (k=2,3)
- Backtracking
- labeled corpus
- train/test set, seen/unseen data
- word accuracy, sentence accuracy
- Binary classification metrics (TP/TN/FP/FN table, precision/recall, Fscore)
- Basic concepts: morphemes, tags, analysis and generation
- Why do we need morphology: tasks
- Phenomena:
- free and bound morphems, affix placement, affix types
- compounding, clitics
- non-concatenative morphology: reduplication, templates, ablaut
- language types
- Computational morphology:
- components: lexicon, morphotactics, orthography
- analysis, generation
- Finite state:
- FSA, FST, mathematical description
- Operations on FSTs: inversion, composition, projection
- Morphology:
- FSTs as morphological analyzers; analysis / generation, backtracking
- lexc
- Related tasks:
- spell checking, lemmatization, tokenization
- how a morphological FST can be used for these tasks
- Basics: definition, grammaticality
- Concepts: POS (open&closed), grammatical relations (predicate, arguments, adjuncts), subcategorization (valency, transitivity) , constituency (tests)
- Phrase-structure grammars:
- Components: production rules, terminals, nonterminals
- Derivation, the parse tree, Chomsky Normal Form
- Chomsky Normal Form and its algorithm
- Dependency Grammar:
- Difference from PSG
- The dependency tree
- Parsing
- Top-down and bottom-up parsing
- Challenges: nondeterminism and ambiguity (global/local); examples
- Dynamic programming; the CKY algorithm
- PCFG
- Motivation and mathematical formulation
- Training and testing: treebanks and PARSEVAL
- Problems and solutions: subcategorization and lexicalization, independence and annotation
- Translation divergences: systematic, idiosynchretic and lexical divergences; examples
- The classical model
- Translation steps and approaches; the Vauquois triangle
- Direct translation, transfer, interlingua. Advantages and disadvantages of each
- Problems with the classical model
- Mathematical description of statistical MT:
- the noisy channel model
- language and translation models; fidelity and fluency
- Training and decoding (parallel corpora, decoding as search)
- Evaluation:
- Manual evaluation (what and how)
- Overview of BLEU (modified precision, brevity penalty, micro average)
- Word alignment; examples and arity
- Phrase acquisition for phrase-based MT
- IBM Model 1: algorithm and mathematical formulation
- Representations: distributional and ontology-based approaches, pros and cons for each.
- Simple word relationships: synonymy, hipernymy.
- Common lexical resources: WordNet, FrameNet.
- Measuring word similarity: approaches and issues.
-
Relation extraction
- task definition, versions, examples
- rule-based approach, pros and cons
- supervised learning approach, pros and cons
- semi-supervised methods
-
Sentiment analysis
- task definition, examples
- supervised learning approach, common features
- task description, existing systems
- major approaches and their limitations
- The IR-based approach:
- question processing
- query generation
- retrieval
- answer ranking
- What is Jupyter?
- cell types, cell magic
- kernel
- short history of Python
- Python community, PEP, Pythonista
- args, kwargs
- default arguments
- lambda functions
- generators, yield statement
- static vs. dynamic typing
- built-in types (numeric, boolean), operators
- mutability
- list vs. tuple
- operators
- advanced indexing
- extra: time complexity of basic operations (lookup, insert, remove, append etc.)
- set operations
- character encodings: Unicode vs. UTF-8, encoding, decoding
- extra: Python 2 vs. Python 3 encodings
- common string operations
- string formatting (at least two solutions)
- data attributes, methods, class attirbutes
- inheritance,
super
- duck typing
- magic methods, operator overloading
- assignment, shallow copy, deep copy
- object introspection
- class decorators, static methods, class methods
- properties
- basic list comprehension (you should be able to write it on paper)
- generator expressions
- extra: iteration protocol, writing an iterator class
- set and dict comprehension
- yield keyword
- basic keywords
- defining exception classes
- basic usage
- defining context managers
- What are decorators?
@wraps
decorator- decorators with parameters
- extra: classes as decorators
- map, filter, reduce
- Why is it good to package your code?
- main object (
ndarray
)shape
,dtype
- dimension (
len(shape)
) - elementwise operations (
exp
,sin
,+ - * /
,pow
or**
) - indexing
reshape
- matrix operations:
dot
,transpose
- reductions:
sum
along a given axis - broadcasting
- the
None
index - example:
- row-wise normalization
- one-liner for a complex grid
- the
- some matrix constructions
ones
,zeros
,eye
,diag
- scipy
- sparse matrices
- solving linear equations, svd (sparse and dense)