Data preprocessing is were important phase in building machine learning model
where there are 6 Phase to take care of
1.IMPORTING REQUIRED LIBRARYS
2.IMPORTING DATASET
3.TAKING CARE OF MISSING VALUES
4.ENCODING CATEGORICAL DATA
- encoding Independent variable
- encoding dependent variables
import numpy as npimport matplotlib.pyplot as pltimport pandas as pd
IMPORTING THE DATASETS
For performing operations on dataset ,we import the dataset,for importing dataset we use use pandas (class)
dataset = pd.read_csv('Data.csv')X = dataset.iloc[:, :-1].values #independent variabley = dataset.iloc[:, -1].values #dependent variable
TAKING CARE OF MISSING VALUES
we need to care of missing data in the datasets (if we ignore that it may lead to bad prediction)
there are many ways to handle missing values some of the are
we can replace the null values with mean, median, mode of the column
https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer
code:
from sklearn.impute import SimpleImputerimputer=SimpleImputer(missing_values=np.nan,strategy='mean')imputer.fit(x[:,1:3])x[:,1:3]=imputer.transform(x[:,1:3])
ENCODING CATEGORICAL DATA
Comments
Post a Comment