Posts

Graph Convolutional Neural Network - Spatial Convolution

Note This is the second post of the Graph Neural Networks (GNNs) series. Convolutional graph neural networks (ConvGNNs) Convolutional graph neural networks (ConvGNNs) generalize the operation of convolution from grid data to graph data. The main idea is to generate a node $v$’s representation by aggregating its own features $\mathbf{x}_{v}$ and neighbors’ features $\mathbf{x}_{u}$, where $u \in N(v)$. Different from RecGNNs, ConvGNNs stack fixed number of multiple graph convolutional layers with different weights to extract high-level node representations....

Introduction to Graph Neural Network (GNN)

Note This is the first post of the Graph Neural Networks (GNNs) series. Background and Intuition There is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. For examples, in e-commence, a graph-based learning system can exploit the interactions between users and products to make highly accurate recommendations. In chemistry, molecules are modeled as graphs....

Kaggle: Jigsaw Multilingual Toxic Comment Classification - top solutions

Before we start Two of my previous post might be helpful in getting a general understanding of the top solutions of this competition. Please feel free to check them out. Knowledge Distillation clearly explained Common Multilingual Language Modeling methods (M-Bert, LASER, MultiFiT, XLM) Jigsaw Multilingual Toxic Comment Classification Use TPUs to identify toxicity comments across multiple languages. Overview of the competition Jigsaw Multilingual Toxic Comment Classification is the 3rd annual competition organized by the Jigsaw team....

Multi-lingual: M-Bert, LASER, MultiFiT, XLM

Multilingual Models are a type of Machine Learning model that can understand different languages. In this post, I’m going to discuss four common multi-lingual language models Multilingual-Bert (M-Bert), Language-Agnostic SEntence Representations (LASER Embeddings), Efficient multi-lingual language model fine-tuning (MultiFiT) and Cross-lingual Language Model (XLM). Ways of tokenization Word-based tokenization Word-based tokenization works well for the morphologically poor English, but results in very large and sparse vocabularies for morphologically rich languages, such as Polish and Turkish....

Knowledge Distillation

Currently, especially in NLP, very large scale models are being trained. A large portion of those can’t even fit on an average person’s hardware. We can train a small network that can run on the limited computational resource of our mobile device. But small models can’t extract many complex features that can be handy in generating predictions unless you devise some elegant algorithm to do so. Plus, due to the Law of diminishing returns, a great increase in the size of model barely maps to a small increase in the accuracy....

Kaggle: Tweet Sentiment Extraction - top solutions

Note This post is the second part of overall summarization of the competition. The first half is here. Noteworthy ideas in 1st place solution Idea First step: Use transformers to extract token level start and end probabilities. Second step: Feed these probabilities to a character level model. This step gives the team a huge improve on the final score since it handled the “noise” in the data properly. Last step:...