Heart Disease Prediction Using Keras Deep Learning
![Heart disease prediction using Keras deep learning](https://www.pythian.com/hs-fs/hubfs/Imported_Blog_Media/Post-2.jpeg?width=675&height=450&name=Post-2.jpeg)
Exploratory Analysis
Before we start the detailed data analysis, let's begin with the exploratory analysis to understand how data is distributed and extract the preliminary knowledge. First things first, download the data from the link provided above and import the dataset to pandas DataFrame. Download the csv file from the link provided above and upload the csv dataset file.from google.colab import files uploaded = files.upload()Import the dataset to a pandas DataFrame and print the first 20 records.
import io Data = pd.read_csv(io.BytesIO(uploaded['heart.csv'])) print(Data.head(20))
![Dataset in Pandas dataframe](https://www.pythian.com/hs-fs/hubfs/Imported_Blog_Media/Post-22.png?width=716&height=400&name=Post-22.png)
f = sns.countplot(x='target', data=Data) f.set_title("Heart disease distribution") f.set_xticklabels(['No Heart disease', 'Heart Disease']) plt.xlabel("");
![Heart disease distribution](https://www.pythian.com/hs-fs/hubfs/Imported_Blog_Media/Post-23.png?width=489&height=450&name=Post-23.png)
f = sns.countplot(x='target', data=Data, hue='sex') plt.legend(['Female', 'Male']) f.set_title("Heart disease by gender") f.set_xticklabels(['No Heart disease', 'Heart Disease']) plt.xlabel("");
![Heart disease by gender](https://www.pythian.com/hs-fs/hubfs/Imported_Blog_Media/Post-24.png?width=487&height=450&name=Post-24.png)
heat_map = sns.heatmap(Data.corr(method='pearson'), annot=True, fmt='.2f', linewidths=2) heat_map.set_xticklabels(heat_map.get_xticklabels(), rotation=45); plt.rcParams["figure.figsize"] = (50,50)[caption id="attachment_108840" align="aligncenter" width="500"]
![Heatmap between all 14 attributes](https://www.pythian.com/hs-fs/hubfs/Imported_Blog_Media/Post-25.png?width=500&height=500&name=Post-25.png)
Building Keras Binary Classifier
After data exploration, it's time to build a Keras classifier to predict heart disease. We split the dataset into two sets: training set and testing set. To split the data, we've used the scikit-learn library, more specifically, we've leveraged the sklearn.model_selection.train_test_split() function.from sklearn.model_selection import train_test_split Input_train, Input_test, Target_train, Target_test = train_test_split(InputScaled, Target, test_size = 0.30, random_state = 5) print(Input_train.shape) print(Input_test.shape) print(Target_train.shape) print(Target_test.shape)Here is the size of each of above set respectively:
(212, 13) (91, 13) (212, 1) (91, 1)We'll use the Keras Sequential model.
from keras.models import Sequential from keras.layers import Dense model = Sequential() model.add(Dense(30, input_dim=13, activation='tanh')) model.add(Dense(20, activation='tanh')) model.add(Dense(1, activation='sigmoid'))In the first line, we set the model as sequential. Then, we add the three fully connected dense layers: two hidden and one output. These are defined using the dense class. The first level has a dimension of 13 which corresponds to 13 column attributes. We use tanh to set the activation function. The second layer has 20 neurons and the tanh activation function. The output layer has a single neuron (output) and the sigmoid activation function suited for binary classification problems. Let’s compile and fit the model:
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy']) model.fit(Input_train, Target_train, epochs=100, verbose=1)The compile function has three arguments:
- The adam optimizer: An algorithm for first-order gradient-based optimization.
- The binary_crossentropy loss function: logarithmic loss, which for a binary classification problem is defined in Keras as binary_crossentropy.
- The accuracy metric: to evaluate the performance of your model during training and testing.
model.summary() score = model.evaluate(Input_test, Target_test, verbose=0) print('Model Accuracy = ',score[1])Here is the output I got:
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_4 (Dense) (None, 30) 420 _________________________________________________________________ dense_5 (Dense) (None, 20) 620 _________________________________________________________________ dense_6 (Dense) (None, 1) 21 ================================================================= Total params: 1,061 Trainable params: 1,061 Non-trainable params: 0 _________________________________________________________________ Model Accuracy = 0.9010988965139284The model when evaluated on the test data, is about 90.10 percent accurate.
Summary
We have trained a Keras model for classifying heart disease based on the open source dataset. Although we achieved this result on a smaller dataset, you'll be able to apply the same concepts of data exploration, feature engineering and model building on bigger datasets. I have made code available here in a github repo, please feel free to download and experiment with it. As always, happy learning! Note: This was originally posted on Medium.Share this
You May Also Like
These Related Stories
An Oracle "oraenv" script solution for Windows with PowerShell
An Oracle "oraenv" script solution for Windows with PowerShell
Jul 3, 2018
1
min read
Oracle Live SQL revised
Oracle Live SQL revised
Sep 7, 2018
2
min read
EBS: R12 Default OACORE Memory Settings Are Not Enough
EBS: R12 Default OACORE Memory Settings Are Not Enough
Dec 10, 2012
2
min read
No Comments Yet
Let us know what you think