Gender Classification Machine Learning Model using Sk-learn in Python with 97%+ accuracy and deployment

Overview

Gender-classification

Machine Learning

This is a ML model to classify Male and Females using some physical characterstics Data. Python Libraries like Pandas,Numpy and Sklearn are used In this.

Data set credits: Kaggle.com

1. Importing Libraries

import pandas as pd
import sklearn
import numpy as np
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.ensemble import VotingClassifier,RandomForestClassifier
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

2. Loading Data and exploring data

data = pd.read_csv("L:\Gender classification\gender_classification_v7.csv")
data.head(20)
#checking for null values
data.isnull().sum()
data.describe()

Visualizing physical characters & diffrences using Graphs and plots

#visualising forehead length data
sns.lineplot(data['forehead_width_cm'],data['forehead_height_cm'], hue=data["gender"])

Graph

#visualising nose length data
sns.lineplot(data['nose_long'],data['nose_wide'], hue=data["gender"])

Graph

3. Encoding data and splitting data

twogender = {'Female':0, 'Male':1}
data['gender'] = data['gender'].map(twogender)

X = data.drop('gender', axis=1)
y = data['gender']

#splitting data for testing and traing process
from sklearn.model_selection import train_test_split, GridSearchCV
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3)
print(X_train.shape)
print(X_val.shape)
print(y_train.shape)
print(y_val.shape)

Now we will test diffrent Sklearn Models to find best accuracy

4. Importing All required prerequisites

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, plot_confusion_matrix
from sklearn.ensemble import RandomForestClassifier

5. Decision tree-classifier

dt = DecisionTreeClassifier(random_state=0)

dt.fit(X_train, y_train)
dt_pred = dt.predict(X_val)
dt_acc = accuracy_score(y_val, dt_pred)
print('Accuracy of Decision Tree is: {:.2f}%'.format(dt_acc*100))

6. RandomforestClassifier

rf = RandomForestClassifier(random_state=0)

rf.fit(X_train, y_train)
rf_pred = rf.predict(X_val)
rf_acc = accuracy_score(y_val, rf_pred)
print('Accuracy of Random Forest is: {:.2f}%'.format(rf_acc*100))

7. Logistic regression

lr = LogisticRegression(random_state=0)

lr.fit(X_train, y_train)
lr_pred = lr.predict(X_val)
lr_acc = accuracy_score(y_val, lr_pred)
print('Accuracy of Logistic Regression is: {:.2f}%'.format(lr_acc*100))

8. K-nearest neighbour

knn = KNeighborsClassifier()
params = {'n_neighbors':[2,3,4,5,6,7,8,9]}

model = GridSearchCV(knn, params, cv=5)
model.fit(X_train, y_train)
model.best_params_

kn = KNeighborsClassifier(n_neighbors=8)

kn.fit(X_train, y_train)
kn_pred = kn.predict(X_val)
kn_acc = accuracy_score(y_val, kn_pred)
print('Accuracy of KNeighbors is: {:.2f}%'.format(kn_acc*100))

RESULTS

1. Accuracy of Decision Tree is: 96.87%

It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome. It is a graphical representation for getting all the possible solutions to a problem/decision based on given conditions.

2. Accuracy of Random Forest is: 97.53%

Random Forest is a classifier that contains a number of decision trees on various subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset.

3. Accuracy of Logistic Regression is: 97.27%

Logistic regression is one of the most popular Machine Learning algorithms, which comes under the Supervised Learning technique. It is used for predicting the categorical dependent variable using a given set of independent variables

4. Accuracy of KNeighbors is: 97.20%

K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the category that is most similar to the available categorie K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means when new data appears then it can be easily classified into a well suite category by using K- NN algorithm.

Deployment process(in-complete)

File index.html(interface for deployment of webapp)

HTML TEMPLATE

import pandas as pd
import sklearn
import numpy as np
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.ensemble import VotingClassifier,RandomForestClassifier
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import confusion_matrix

data = pd.read_csv("L:\Gender classification\gender_classification_v7.csv")

#encoding data
twogender = {'Female':0, 'Male':1}
data['gender'] = data['gender'].map(twogender)

X = data.drop('gender', axis=1)
y = data['gender']

#splitting data for testing and traing process
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, plot_confusion_matrix


X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3)

from sklearn.ensemble import RandomForestClassifier
#randomforestClassifier 
rf = RandomForestClassifier(random_state=0)

rf.fit(X_train, y_train)
rf_pred = rf.predict(X_val)
rf_acc = accuracy_score(y_val, rf_pred)
print('Accuracy of Random Forest is: {:.2f}%'.format(rf_acc*100))

#pickeling model
import pickle
pickle.dump(rf,open("model.pkl","wb"))

model =pickle.load(open("model.pkl","rb"))

this creates a model.pkl file and stores model

Contribution(s)

Contributions are always welcome! You can contribute to this project in the following way:

  • Deployment of model
  • Accuracy improvement
  • Bug fixes

Author

  • Aryan Raj

ForTheBadge built-with-love by Aryan Raj

Owner
Aryan raj
Computer Science and Engineering , SRM Institute of Science and Technology, Kattankulathur, Chennai
Aryan raj
Improving Transferability of Representations via Augmentation-Aware Self-Supervision

Improving Transferability of Representations via Augmentation-Aware Self-Supervision Accepted to NeurIPS 2021 TL;DR: Learning augmentation-aware infor

hankook 38 Sep 16, 2022
This code implements constituency parse tree aggregation

README This code implements constituency parse tree aggregation. Folder details code: This folder contains the code that implements constituency parse

Adithya Kulkarni 0 Oct 11, 2021
Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

peng gao 42 Nov 26, 2022
Source code of our TTH paper: Targeted Trojan-Horse Attacks on Language-based Image Retrieval.

Targeted Trojan-Horse Attacks on Language-based Image Retrieval Source code of our TTH paper: Targeted Trojan-Horse Attacks on Language-based Image Re

fine 7 Aug 23, 2022
Blender Add-on that sets a Material's Base Color to one of Pantone's Colors of the Year

Blender PCOY (Pantone Color of the Year) MCMC (Mid-Century Modern Colors) HG71 (House & Garden Colors 1971) Blender Add-ons That Assign a Custom Color

Don Schnitzius 15 Nov 20, 2022
Using modified BiSeNet for face parsing in PyTorch

face-parsing.PyTorch Contents Training Demo References Training Prepare training data: -- download CelebAMask-HQ dataset -- change file path in the pr

zll 1.6k Jan 08, 2023
The final project of "Applying AI to EHR Data" of "AI for Healthcare" nanodegree - Udacity.

Patient Selection for Diabetes Drug Testing Project Overview EHR data is becoming a key source of real-world evidence (RWE) for the pharmaceutical ind

Omar Laham 1 Jan 14, 2022
PiRapGenerator - Make anyone rap the digits of pi

PiRapGenerator Make anyone rap the digits of pi (sample files are of Ted Nivison

7 Oct 02, 2022
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group 8.4k Jan 03, 2023
An end-to-end project on customer segmentation

End-to-end Customer Segmentation Project Note: This project is in progress. Tools Used in This Project Prefect: Orchestrate workflows hydra: Manage co

Ocelot Consulting 8 Oct 06, 2022
A framework to train language models to learn invariant representations.

Invariant Language Modeling Implementation of the training for invariant language models. Motivation Modern pretrained language models are critical co

6 Nov 16, 2022
SAS output to EXCEL converter for Cornell/MIT Language and acquisition lab

CORNELLSASLAB SAS output to EXCEL converter for Cornell/MIT Language and acquisition lab Instructions: This python code can be used to convert SAS out

2 Jan 26, 2022
The Wearables Development Toolkit - a development environment for activity recognition applications with sensor signals

Wearables Development Toolkit (WDK) The Wearables Development Toolkit (WDK) is a framework and set of tools to facilitate the iterative development of

Juan Haladjian 114 Nov 27, 2022
Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning.

Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning. Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive

<a href=[email protected](SZ)"> 7 Dec 16, 2021
PyTorch code of "SLAPS: Self-Supervision Improves Structure Learning for Graph Neural Networks"

SLAPS-GNN This repo contains the implementation of the model proposed in SLAPS: Self-Supervision Improves Structure Learning for Graph Neural Networks

60 Dec 22, 2022
Playable Video Generation

Playable Video Generation Playable Video Generation Willi Menapace, Stéphane Lathuilière, Sergey Tulyakov, Aliaksandr Siarohin, Elisa Ricci Paper: ArX

Willi Menapace 136 Dec 31, 2022
magiCARP: Contrastive Authoring+Reviewing Pretraining

magiCARP: Contrastive Authoring+Reviewing Pretraining Welcome to the magiCARP API, the test bed used by EleutherAI for performing text/text bi-encoder

EleutherAI 43 Dec 29, 2022
Official code for 'Pixel-wise Energy-biased Abstention Learning for Anomaly Segmentationon Complex Urban Driving Scenes'

PEBAL This repo contains the Pytorch implementation of our paper: Pixel-wise Energy-biased Abstention Learning for Anomaly Segmentation on Complex Urb

Yu Tian 117 Jan 03, 2023
Build tensorflow keras model pipelines in a single line of code. Created by Ram Seshadri. Collaborators welcome. Permission granted upon request.

deep_autoviml Build keras pipelines and models in a single line of code! Table of Contents Motivation How it works Technology Install Usage API Image

AutoViz and Auto_ViML 102 Dec 17, 2022
Implementation of Squeezenet in pytorch, pretrained models on Cifar 10 data to come

Pytorch Squeeznet Pytorch implementation of Squeezenet model as described in https://arxiv.org/abs/1602.07360 on cifar-10 Data. The definition of Sque

gaurav pathak 86 Oct 28, 2022