Subject Index
adversarial testing, 6, 121 Amazon Mechanical Turk (AMT), 3G, 37, 46, 47, 57, 59, 60, 63, 66, 71, 72, 75, 86, 88 artificial intelligence AI, 4-6, 22, 99, 122 attention, 12-14, 22 attribute-value graph, 114
BERT, 1, 16, 22, 34, 37-39, 41- 43, 80, 86, 88, 91, 92, 116, 120
bidirectional, 3, 13, 16, 38, 39, 59, 81, 82, 84, 86, 88, 93 bilinear, 103
Chomsky Hierarchy, 98 cognitive load, 74, 75, 78, 79, 87 compression effect, 2, 72, 75, 77- 79, 87
Context-Free Grammar (CFG), 55
Probabilistic CFG Parser (PCFG), 53, 55, 57 Convolutional Neural Network CNN, 1, 11, 12, 14, 18, 19, 22, 24, 25, 28, 29, 90, 99 cosine, 102
cross-serial dependency, 115
dependency parser, 24, 26, 34, 41-43, 63, 91, 105, 109 discourse coherence, 74, 78, 79, 87
Dynamic Syntax, 114
ELMO, 41-43, 91 empiricism, 119 entropy, 35, 65
cross entropy, 8, 9, 35, 65 Maximum Entropy (ME),
54
feed forward network, 1, 7, 8, 10,
13, 21
formal
concepts, 1 elements, 1 limits, 6 syntax, 3 function
cross entropy, 8 non-linear, 7 sigmoid, 7, 8, 11, 18 softmax, 9, 12, 18, 20 tanh, 11
Gated Recurrent Unit
GRU, 24, 25, 29, 30, 90, 99 Gibbs sampling, 54 GPT
GPT-2, 1, 14, 15, 34, 80, 86,
88, 95, 96, 108, 116 GPT-3, 1, 15, 16, 22, 43 gradience, 46, 66, 72, 93, 101, 109, 110, 117, 119 grammar, 3, 4
Arc Pair Grammar (APG),
115
Combinatory Categorial Grammar (CCG), 114
Context-Free Grammar (CFG), 115
formal, 93, 116, 118 formalism, 113, 116 generative, 115 Head-Driven Phrase Structure Grammar (HPSG),
114
Lexical Functional Grammar (LFG), 114 Mildly Context-Sensitive Grammar (MCSG), 115 model theoretic, 115 pregroup, 4, 103, 104 synchronous TAG, 115 transformational, 113 Tree Adjoining Grammar (TAG), 115
type logical categorial grammar, 114
grammar induction, 5, 6, 98 grammaricality, 45, 46, 66, 92,
94, 109
Identification In the Limit (IIL),
98
image description, 105, 106, 110
Kneser-Ney smoothing, 35 knowledge distillation, 39, 92
L2 distance/norm, 41, 91 lambda (A)-calculus, 114 Lambek calculus, 114 language learning/acquisition, 3- 5, 7, 91,96,98, 116, 119 language model, 2, 3, 10, 14, 24,
- 27, 30, 33, 35, 45, 51, 55-57, 63,67, 72, 92-95,
- 97, 106, 117, 118 n-gram, 51, 53, 54, 57
Bayesian Hidden Markov Model (BHMM), 51, 54, 57
bigram, 97
LSTM, 62-67, 72-74, 80, 87,
88
Recurrent Neural Network LM (RNNLM), 53, 54, 57, 60, 64
Topic Driven Language Model (TDLM), 72-74, 80, 88
two-tier Bayesian Hidden Markov Model (2T), 51, 54, 57
lexical embedding, 9, 14, 15, 18, 20, 22, 27, 30, 33, 42, 43, 80, 86, 91, 94, 101
Long Short Term Memory
LSTM, 1, 2, 11, 12, 14, 18, 19, 22, 24, 25, 27-30,33, 90-92, 94, 99, 106
machine translation, 2, 4, 12, 13, 46, 51, 59, 60, 63, 66, 67, 69, 70, 74, 75, 86, 88, 99, 105, 106, 110
Markov Chain Monte Carlo procedure, 54
Maximum Entropy, 54, 116
Minimalist Program, 113
multi-modal representation, 99, 108, 110, 120, 121
nativism, 119
natural language inference (NLI), 4, 39-41, 90
neuroscience, 5, 34, 122
paraphrase task, 1, 4, 17-22, 72, 87
parsing
CCG, 11C
exponential time, 115 polynomial time, 115 probabilistic, 118 statistical, 11G
Pearson correlation, 20, 41, 55, 57,61,04, 66, 71,73, 79, 82, 90, 93
Penn Tree Bank (PTB), 41, 42 perplexity, 35, 36, 65, 66 probability, 45, 46, 55, 56, 66 (non-)normalised, 9 conditional, 7, 28, 53 distribution, 8-10, 12, 20, 24, 35, 39, 45, 46, 54- 56, 66, 72, 117, 119 logprob, 55, 56, 66, 83 model, 9, 119
of a sentence, 15, 35, 37, 62, 73, 80, 83, 117 unigram, 35, 56, 83, 94
quantum mechanics, 118
Recurrent Neural Network
RNN, 1,2, 10-13, 22,91,93, 94, 97, 105, 106, 120 RNN Grammar (RNNG), 38, 39, 92, 94, 108 TreeRNN, 39, 40, 42, 90, 105, 109 regression
linear, 71, 76-81, 84, 87, 88 to the mean, 3, 77, 78 total least squares, 2, 77, 78, 87
Reinforcement Learning (RL), 99, 110, 121, 122
semantics
distributional, 4, 100, 110
dynamic, 101 formal, 101, 106, 110, 117 underspecified, 101 sentence acceptability, 2, 3, 45, 46,51,55,56,60-62,64, 66,69-75, 78, 80, 86-88, 93, 94, 97, 110, 117, 118 sentiment analysis, 39, 40, 90 seq2seq, 4, 12, 22, 99, 110 Spearman correlation, 21, 41, 73, 90
supertagger, 116 syntactic constructions centre embedding, 95 filler-gap, 95 garden path, 95 negative polarity, 95 pseudo clefts, 95 reflexive pronouns, 95
transformers, 1, 3, 13-15, 22, 34, 39, 42, 43, 59, 66, 80, 86, 88, 93, 94, 106, 108, 116 tree probe, 41-43, 91, 109 type theory, 4, 101, 118 Type Theory with Records (TTR), 114
typed feature structure, 114
unidirectional, 3, 14, 15, 63, 80, 82, 84, 86, 88 unification, 114
Universal Grammar (UG), 96, 110, 119
vector, 4, 9-13, 18, 19, 22, 24, 25, 41-43, 72, 91, 99, 102, 104-106, 109, 117 Vector Space Model (VSM), 101, 103-105, 110
XLNet, 80, 81, 84, 86, 88