How it works
5 Basic Statistics Concepts Data Scientists Need to Know
Today, we’re going to look at 5 basic statistics concepts that data scientists need to know and how they can be applied most effectively, curated by Benjamin Libor
Want to add a method? Just click Collaborate (blue button)…
CONCEPT  DESCRIPTION  TUTORIAL  

2


Statistical Features 
Statistical features is probably the most used statistics concept in data science. It’s often the first stats...More Statistical features is probably the most used statistics concept in data science. It’s often the first stats technique you would apply when exploring a dataset and includes things like bias, variance, mean, median, percentiles, and many others. It’s all fairly easy to understand and implement in code! Check out the graphic below for an illustration.

youtube.com  
1


Probability Distributions 
We can define probability as the percent chance that some event will occur. In data science this is commonly...More We can define probability as the percent chance that some event will occur. In data science this is commonly quantified in the range of 0 to 1 where 0 means we are certain this will not occur and 1 means we are certain it will occur. A probability distribution is then a function which represents the probabilities of all possible values in the experiment. Check out the graphic below for an illustration.

khanacademy.org  
0


Dimensionality Reduction 
The term Dimensionality Reduction is quite intuitive to understand. We have a dataset and we would like to reduce the...More The term Dimensionality Reduction is quite intuitive to understand. We have a dataset and we would like to reduce the number of dimensions it has. In data science this is the number of feature variables. Check out the graphic below for an illustration.

arxiv.org  
0


Over and Under Sampling 
Over and Under Sampling are techniques used for classification problems. Sometimes, our classification dataset might...More Over and Under Sampling are techniques used for classification problems. Sometimes, our classification dataset might be too heavily tipped to one side. For example, we have 2000 examples for class 1, but only 200 for class 2. That’ll throw off a lot of the Machine Learning techniques we try and use to model the data and make predictions! Our Over and Under Sampling can combat that. Check out the graphic below for an illustration.

youtube.com  
0


Bayesian Statistics 
Fully understanding why we use Bayesian Statistics requires us to first understand where Frequency Statistics fails....More Fully understanding why we use Bayesian Statistics requires us to first understand where Frequency Statistics fails. Frequency Statistics is the type of stats that most people think about when they hear the word “probability”. It involves applying math to analyze the probability of some event occurring, where specifically the only data we compute on is prior data.

youtube.com  
CURATOR
Benjamin Libor
Follow
Creator behind Spreadshare, interested in design and content creation of all sorts.
TAGS
STATS
10 Subscriptions
0 Collaborations
0 Comments