# Problem Statement

Given the person’s attribute: Age, Sex, BMI, Smoker etc. we have to predict insurance cost.

# Data description

• Age is a real number
• Sex binary variable male and female
• bmi (body mass index) is a real number
• children is number of children a person has
• smoker is a binary variable
• region is class variable
• charges is dependent variable (y)

# Linear regression from scratch

clickhere to check out the course.

# Problem Statement:

• Given the Years of experience we have to predict salary of the person. Since dependent variable salary is continuous, it is a regression problem.
• We will use three methods i.e. Gradient descent (Optimization method), Statistical method (formula) and Scikit learn Linear regression library to estimate the regression parameter.
• After that we will compare the parameters learned from all these three methods.

# Measure of central tendency and Dispersion

Let’s say you have dataset in which one column has numeric data type and there are 1000 data points (rows) in that column, It is hard and time consuming to go through each and every data point, hence to overcome this problem we use descriptive statistics which describes our data and makes our task much more simpler. We also use visualizations such as histogram and boxplot to understand the distribution of the data.

# What is summary statistics?

Rather than understanding 1000 rows, summary statistics only has 1 number which can give the idea of whole data.

There are basically 2 types of summarizing techniques. 