Sign in

Problem Statement

Given the person’s attribute: Age, Sex, BMI, Smoker etc. we have to predict insurance cost.

Data description

  • Age is a real number
  • Sex binary variable male and female
  • bmi (body mass index) is a real number
  • children is number of children a person has
  • smoker is a binary variable
  • region is class variable
  • charges is dependent variable (y)

Importing required libraries and loading dataset

importing libraries

For this article the prerequisite is: Andrew Ng’s linear regression lecture.

clickhere to check out the course.

Problem Statement:

  • Given the Years of experience we have to predict salary of the person. Since dependent variable salary is continuous, it is a regression problem.
  • We will use three methods i.e. Gradient descent (Optimization method), Statistical method (formula) and Scikit learn Linear regression library to estimate the regression parameter.
  • After that we will compare the parameters learned from all these three methods.

Importing libraries and Loading dataset

Importing libraries and Loading dataset

Let’s say you have dataset in which one column has numeric data type and there are 1000 data points (rows) in that column, It is hard and time consuming to go through each and every data point, hence to overcome this problem we use descriptive statistics which describes our data and makes our task much more simpler. We also use visualizations such as histogram and boxplot to understand the distribution of the data.

What is summary statistics?

Rather than understanding 1000 rows, summary statistics only has 1 number which can give the idea of whole data.

There are basically 2 types of summarizing techniques.

Mayank Dubey

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store