Kaggle: Tweet Sentiment Extraction - common methods

Note This post is the first part of overall summarization of the competition. The second half is here. Before we start I attended two NLP competition in June, Tweet Sentiment Extraction and Jigsaw Multilingual Toxic Comment Classification, and I’m happy to be a Kaggle Expert from now on :) Tweet Sentiment Extraction Goal: The objective in this competition is to “Extract support phrases for sentiment labels”. More precisely, this competition asks kagglers to construct a model that can figure out what word or phrase best supports the given tweet from the labeled sentiment....

July 1, 2020 · 11 min

Recurrent Neural Network (RNN) and Long Short Term Memory (LSTM)

Sequence Data There are many sequence data in applications. Here are some examples Machine translation from text sequence to text sequence. Text Summarization from text sequence to text sequence. Sentiment classification from text sequence to categories. Music Generation from nothing or some simple stuff (character, integer, etc) to wave sequence. Name entity recognition (NER)...

June 4, 2020 · 5 min

Intro to Deep Learning and Backpropagation

Deep Learning v.s. Machine Learning The major difference between Deep Learning and Machine Learning technique is the problem solving approach. Deep Learning techniques tend to solve the problem end to end, where as Machine learning techniques need the problem statements to break down to different parts to be solved first and then their results to be combine at final stage. Forward Propagation The general procedure is the following: $$ \begin{aligned} a^{(1)}(x) &= w^{(1)^T} \cdot x + b^{(1)} \\ h^{(1)}(x) &= g_1(a^{(1)}(x)) \\ a^{(2)}(x) &= w^{(2)^T} \cdot h^{(1)}(x) + b^{(2)} \\ h^{(2)}(x) &= g_2(a^{(2)}(x)) \\ &…… \\ a^{(L+1)}(x) &= w^{(L+1)^T} \cdot h^{(L)}(x) + b^{(L+1)} \\ h^{(L+1)}(x) &= g_{L+1}(a^{(L+1)}(x)) \end{aligned} $$...

May 26, 2020 · 7 min

Log-Linear Model, Conditional Random Field(CRF)

Log-Linear model Let $x$ be an example, and let $y$ be a possible label for it. A log-linear model assumes that $$ p(y | x ; w)=\frac{\exp [\sum_{j=1}^J w_{j} F_{j}(x, y)]}{Z(x, w)} $$ where the partition function $$ Z(x, w)=\sum_{y^{\prime}} \exp [\sum_{j=1}^J w_{j} F_{j}\left(x, y^{\prime}\right)] $$ Note that in $\sum_{y^{\prime}}$, we make a summation over all possible $y$. Therefore, given $x$, the label predicted by the model is $$ \hat{y}=\underset{y}{\operatorname{argmax}} p(y | x ; w)=\underset{y}{\operatorname{argmax}} \sum_{j=1}^J w_{j} F_{j}(x, y) $$...

May 19, 2020 · 8 min

Gaussian mixture model (GMM), k-means

Gaussian mixture model (GMM) A Gaussian mixture model is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters. Interpretation from geometry $p(x)$ is a weighted sum of multiple Gaussian distribution. $$p(x)=\sum_{k=1}^{K} \alpha_{k} \cdot \mathcal{N}\left(x | \mu_{k}, \Sigma_{k}\right) $$ Interpretation from mixture model setup: The total number of Gaussian distribution $K$. $x$, a sample (observed variable)....

May 16, 2020 · 6 min

Probabilistic Graphical Model (PGM)

Probabilistic Graphical Model (PGM) Definition: A probabilistic graphical model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. In general, PGM obeys following rules: $$ \begin{aligned} &\text {Sum Rule : } p\left(x_{1}\right)=\int p\left(x_{1}, x_{2}\right) d x_{2}\\ &\text {Product Rule : } p\left(x_{1}, x_{2}\right)=p\left(x_{1} | x_{2}\right) p\left(x_{2}\right)\\ &\text {Chain Rule: } p\left(x_{1}, x_{2}, \cdots, x_{p}\right)=\prod_{i=1}^{p} p\left(x_{i} | x_{i+1, x_{i+2}} \ldots x_{p}\right)\\ &\text {Bayesian Rule: } p\left(x_{1} | x_{2}\right)=\frac{p\left(x_{2} | x_{1}\right) p\left(x_{1}\right)}{p\left(x_{2}\right)} \end{aligned} $$...

May 11, 2020 · 8 min