BERT and RoBERTa

BERT Recap Overview Bert (Bidirectional Encoder Representations from Transformers) uses a “masked language model” to randomly mask some tokens from the input and predict the original vocabulary id of the masked token. Bert shows that “pre-trained representations reduce the need for many heavily-engineered task-specific architectures”. BERT Specifics There are two steps to the BERT framework: pre-training and fine-tuning During pre training, the model is trained on unlabeled data over different pre-training tasks....

September 18, 2020 · 3 min

Paper Review - What Does BERT Look At? An Analysis of BERT’s Attention

Authors: Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D. Manning

September 14, 2020 · 4 min

Kaggle: Jigsaw Multilingual Toxic Comment Classification - top solutions

Before we start Two of my previous post might be helpful in getting a general understanding of the top solutions of this competition. Please feel free to check them out. Knowledge Distillation clearly explained Common Multilingual Language Modeling methods (M-Bert, LASER, MultiFiT, XLM) Jigsaw Multilingual Toxic Comment Classification Use TPUs to identify toxicity comments across multiple languages. Overview of the competition Jigsaw Multilingual Toxic Comment Classification is the 3rd annual competition organized by the Jigsaw team....

August 11, 2020 · 9 min

Multi-lingual: M-Bert, LASER, MultiFiT, XLM

Multilingual Models are a type of Machine Learning model that can understand different languages. In this post, I’m going to discuss four common multi-lingual language models Multilingual-Bert (M-Bert), Language-Agnostic SEntence Representations (LASER Embeddings), Efficient multi-lingual language model fine-tuning (MultiFiT) and Cross-lingual Language Model (XLM). Ways of tokenization Word-based tokenization Word-based tokenization works well for the morphologically poor English, but results in very large and sparse vocabularies for morphologically rich languages, such as Polish and Turkish....

August 8, 2020 · 15 min

Kaggle: Tweet Sentiment Extraction - top solutions

Note This post is the second part of overall summarization of the competition. The first half is here. Noteworthy ideas in 1st place solution Idea First step: Use transformers to extract token level start and end probabilities. Second step: Feed these probabilities to a character level model. This step gives the team a huge improve on the final score since it handled the “noise” in the data properly. Last step:...

July 2, 2020 · 14 min

Kaggle: Tweet Sentiment Extraction - common methods

Note This post is the first part of overall summarization of the competition. The second half is here. Before we start I attended two NLP competition in June, Tweet Sentiment Extraction and Jigsaw Multilingual Toxic Comment Classification, and I’m happy to be a Kaggle Expert from now on :) Tweet Sentiment Extraction Goal: The objective in this competition is to “Extract support phrases for sentiment labels”. More precisely, this competition asks kagglers to construct a model that can figure out what word or phrase best supports the given tweet from the labeled sentiment....

July 1, 2020 · 11 min