BERT and GPT-2: we all love language models… I mean, who doesn’t? Language models like BERT and GPT-2 (and GPT-3) have had an enormous impact on the entire NLP field. Most of the models that obtained groundbreaking results on the famous GLUE benchmark are based on BERT. I, too, have benefited from BERT, since I released a library for topic modeling and some HuggingFace … [Read more...] about Can Too Much BERT Be Bad for You?
Technical Guide
Online Experiments Tricks – Variance Reduction
Why do we need variance reduction? When we do online experiments or A/B testing, we need to ensure our test has high statistical power so that we have a high probability to find the experimental effect if it does exist. What are the factors that might affect power? Sample sizes, sampling variance of the experiment metric, significance level alpha, and effect size. The … [Read more...] about Online Experiments Tricks – Variance Reduction
How To Visualize Databases As Network Graphs In Python
At work I recently faced the challenge of having to analyze the data model of an SQL database consisting of more than 500 tables with thousands of relations. At this scale, the built-in visualization function of phpMyAdmin is insufficient for getting a deep understanding of the structure. What I needed was a tool in which I can apply various filters (e.g., table and … [Read more...] about How To Visualize Databases As Network Graphs In Python
A Guide To Knowledge Graphs
Table of Contents Introduction- What is a Knowledge Graph (KG)?- Why KG?- How to use KG?KG in practice- Open source KGs- Creating custom KG- KG ontology- Hosting KG (database)- Query facts from KG Introduction In this section, we will introduce KG by asking some simple but intuitive questions about KG. In fact, we will cover the what, why, and how of the knowledge … [Read more...] about A Guide To Knowledge Graphs
Practical Guide To Ensemble Learning
Ensemble learning is a technique used in machine learning to combine multiple models into a group model, in other words into an ensemble model. The ensemble model aims to perform better than each model alone or if not, to perform at least as well as the best individual model in the group. In this article, you will learn popular ensemble … [Read more...] about Practical Guide To Ensemble Learning
Hands-on Survival Analysis With Python
Survival analysis is a popular statistical method to investigate the expected duration of time until an event of interest occurs. We can recall it from medicine as patients' survival time analysis, from engineering as reliability analysis or time-to-failure analysis, and from economics as duration analysis. Besides these disciplines, survival analysis can also be used by HR … [Read more...] about Hands-on Survival Analysis With Python
The Creative Side Of Vision Transformers
What's creativity? The most accredited definition is the following: “Creativity is the capability of creating novel things” It is considered one of the most important and irreplaceable peculiarities of humankind. But if this is such a special characteristic, it would be impossible for a neural network to imitate it, isn't it? Well, not exactly. Today we are facing some … [Read more...] about The Creative Side Of Vision Transformers
Building A Face Recognition System Using Scikit Learn In Python
What’s face recognition? Face recognition is the task of comparing an unknown individual’s face to images in a database of stored records. The mapping could be one–to–one or one–to–many, depending on whether we are running face verification or face identification. In this tutorial, we are interested in building a facial identification system that will verify if … [Read more...] about Building A Face Recognition System Using Scikit Learn In Python
Data Scientist’s Guide To Efficient Coding In Python
In this article, I wanted to share a few tips for writing cleaner codes that I have absorbed in the last year — mainly from pair programming. Generally speaking, including them as part of my everyday coding routine has helped me generate supreme quality Python scripts, that are easily maintainable and scalable over time. Ever thought why senior developer’s code look so … [Read more...] about Data Scientist’s Guide To Efficient Coding In Python
How To Do Bayesian A/B Testing At Scale
If you’ve read my previous post, you already know why I think you should move to Bayesian A/B testing. In this post, I give a short overview over the statistical models behind Bayesian A/B tests, and present the ways we implemented them at Wix.com — where we deal with a massive scale of A/B tests. I wrote some practical examples in Python along this post. You can easily … [Read more...] about How To Do Bayesian A/B Testing At Scale