DATA PREPROCESSING

Data preprocessing is were important phase in building machine learning model

where there are 6 Phase to take care of

1.IMPORTING REQUIRED LIBRARYS

2.IMPORTING DATASET

3.TAKING CARE OF MISSING VALUES

4.ENCODING CATEGORICAL DATA

encoding Independent variable
encoding dependent variables

5.SPLITTING THE DATA INTO THE TRAINING SET AND TEST SET

6.FEATURE SCALING

IMPORT THE LIBRARIES

There are three libraries which are very import to import

1.numpy(used for manipulation list)

2.pandas(used for data manipulation and analysis.)

3.mathplot(used to plot graphs)

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

IMPORTING THE DATASETS

For performing operations on dataset ,we import the dataset,for importing dataset we use use pandas (class)

dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values #independent variable
y = dataset.iloc[:, -1].values #dependent variable

TAKING CARE OF MISSING VALUES

we need to care of missing data in the datasets (if we ignore that it may lead to bad prediction)

there are many ways to handle missing values some of the are

we can replace the null values with mean, median, mode of the column

for this, we need to import a class from a library sci-kit learn

(sci-kit learn is a famous machine learning library)

the class name is Simple Imputer

we can go through the documentation on class Simple Imputer

https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer

code:

from sklearn.impute import SimpleImputer
imputer=SimpleImputer(missing_values=np.nan,strategy='mean')
imputer.fit(x[:,1:3])
x[:,1:3]=imputer.transform(x[:,1:3])

ENCODING CATEGORICAL DATA

ENCODING INDEPENDENT VARIABLE

this is not only for independent variable we an use for dependent variable .Here I am showing for independent variable
if there are three categorical variable to be encoded then we use one hot encoding ,as we already know that machine know only 0,1 so we can't encode three category at a time
then we use OneHotEncoding

Search This Blog

DATA PREPROCESSING

Comments

Post a Comment

Popular posts from this blog

AI (Artificial Intelligence) Governance: How To Get It Right