Covid 19 - India State Wise analysis


 

Covid EDA on Indian Covid Dataset

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
import warnings
warnings.filterwarnings("ignore")

India reported 34,703 cases, 553 deaths in last 24 hours. As cases and fatalities dip, the recovery rate has risen to 97.17%. The country's active caseload has declined to 4,64,357.

image.pngimage.png

Don't be like them. Stay Home Stay Safe. Now let's start our EDA

In [3]:
data= pd.read_csv("../input/latest-covid19-india-statewise-data/Latest Covid-19 India Status.csv")
In [4]:
data.head()
Out[4]:
State/UTsTotal CasesActiveDischargedDeathsActive RatioDischarge RatioDeath Ratio
0Maharashtra606140411955858199011219451.9796.022.01
1Kerala29241651013432809587132353.4796.080.45
2Karnataka2843810765282732242350402.6996.081.23
3Tamil Nadu2479696381912408886326191.5497.141.32
4Andhra Pradesh1889513383381838469127062.0397.300.67
In [5]:
data.isnull().sum()
Out[5]:
State/UTs          0
Total Cases        0
Active             0
Discharged         0
Deaths             0
Active Ratio       0
Discharge Ratio    0
Death Ratio        0
dtype: int64
In [6]:
data.describe()
Out[6]:
Total CasesActiveDischargedDeathsActive RatioDischarge RatioDeath Ratio
count3.600000e+0136.0000003.600000e+0136.00000036.00000036.00000036.000000
mean8.447676e+0514534.9166678.191366e+0511096.0833332.90250095.8444441.252778
std1.209487e+0628312.1841031.164892e+0621092.4967074.0694663.9772420.567247
min7.467000e+0331.0000007.308000e+034.0000000.07000079.0000000.040000
25%5.863075e+041422.5000005.664575e+04776.5000000.40750096.0650000.927500
50%4.270470e+052779.0000004.085465e+054431.0000001.42500097.2000001.315000
75%9.629365e+057856.0000009.502918e+0512838.2500003.25250098.1850001.660000
max6.061404e+06119558.0000005.819901e+06121945.00000020.55000099.5400002.700000

Some stats gathered from the dataset

In [7]:
plt.figure(figsize=(6,6))
sns.heatmap(data.corr(),annot=True,cmap='summer')
plt.title("Heatmap of the dataset")
Out[7]:
Text(0.5, 1.0, 'Heatmap of the dataset')

High correlation between Active and Discharged, Active and Deaths, and Total cases and Active and Deaths

In [8]:
sns.distplot(data.Active,color='Blue')
plt.title("Active cases in India",fontsize=15)
Out[8]:
Text(0.5, 1.0, 'Active cases in India')
In [9]:
sns.distplot(data.Deaths,color='Red')
plt.title("Death cases in India",fontsize=15)
Out[9]:
Text(0.5, 1.0, 'Death cases in India')
In [10]:
sns.distplot(data['Discharged'],color='Yellow')
plt.title("Discharged cases in India",fontsize=15)
Out[10]:
Text(0.5, 1.0, 'Discharged cases in India')

The dataset is a bit right skewed

In [11]:
plt.figure(figsize=(10,10))
sns.barplot(x='State/UTs',y='Total Cases',palette='CMRmap',data=data)
plt.title("The total cases as per State/UTs are",fontsize=15)
plt.xticks(rotation=90)
Out[11]:
(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
        34, 35]),
 [Text(0, 0, 'Maharashtra'),
  Text(1, 0, 'Kerala'),
  Text(2, 0, 'Karnataka'),
  Text(3, 0, 'Tamil Nadu'),
  Text(4, 0, 'Andhra Pradesh'),
  Text(5, 0, 'Uttar Pradesh'),
  Text(6, 0, 'West Bengal'),
  Text(7, 0, 'Delhi'),
  Text(8, 0, 'Chhattisgarh'),
  Text(9, 0, 'Rajasthan'),
  Text(10, 0, 'Odisha'),
  Text(11, 0, 'Gujarat'),
  Text(12, 0, 'Madhya Pradesh'),
  Text(13, 0, 'Haryana'),
  Text(14, 0, 'Bihar'),
  Text(15, 0, 'Telengana'),
  Text(16, 0, 'Punjab'),
  Text(17, 0, 'Assam'),
  Text(18, 0, 'Jharkhand'),
  Text(19, 0, 'Uttarakhand'),
  Text(20, 0, 'Jammu and Kashmir'),
  Text(21, 0, 'Himachal Pradesh'),
  Text(22, 0, 'Goa'),
  Text(23, 0, 'Puducherry'),
  Text(24, 0, 'Manipur'),
  Text(25, 0, 'Tripura'),
  Text(26, 0, 'Chandigarh'),
  Text(27, 0, 'Meghalaya'),
  Text(28, 0, 'Arunachal Pradesh'),
  Text(29, 0, 'Nagaland'),
  Text(30, 0, 'Sikkim'),
  Text(31, 0, 'Mizoram'),
  Text(32, 0, 'Ladakh'),
  Text(33, 0, 'Dadra and Nagar Haveli and Daman and Diu'),
  Text(34, 0, 'Lakshadweep'),
  Text(35, 0, 'Andaman and Nicobar')])

Maharashtra has the highest number of total cases in India, Kerala and Karnataka rank second and third although there is not much difference between their total counts.

In [12]:
plt.figure(figsize=(7,7))
labels = data.index
plt.pie(x='Active',data=data[:5],labels='State/UTs',startangle=90,autopct='%.1f%%')
plt.title("Active Cases in Top 5 states in India", fontsize = 24) 
plt.tight_layout() 
plt.show()

32% of the active cases are in Maharshtra alone, 27.1% and 20.5% are in Kerela and Karnataka

In [13]:
plt.figure(figsize=(7,7))
labels = data.index
plt.pie(x='Total Cases',data=data[:5],labels='State/UTs',startangle=90,autopct='%.1f%%')
plt.title("Total Cases in Top 5 states in India", fontsize = 24) 
plt.tight_layout() 
plt.show()

Maharashtra has the highest count with 37.4% of total cases

In [14]:
plt.figure(figsize=(7,7))
labels = data.index
plt.pie(x='Discharged',data=data[:5],labels='State/UTs',startangle=90,autopct='%.1f%%')
plt.title("People Discharged in Top 5 states in India", fontsize = 24) 
plt.tight_layout() 
plt.show()

It's good to see the most affected state with Top Discharge Rate

In [15]:
plt.figure(figsize=(7,7))
labels = data.index
plt.pie(x='Deaths',data=data[:5],labels='State/UTs',startangle=90,autopct='%.1f%%')
plt.title("Deaths in Top 5 states in India", fontsize = 24) 
plt.tight_layout() 
plt.show()

56.6% of the total deaths have happened in Maharashtra followed by 16.3% in Karanataka

In [16]:
sns.scatterplot(x='Active Ratio',y='Death Ratio',data=data[:10],palette='Spectral',legend='brief',hue='State/UTs')
plt.title("Top 10 Active Ratio to Deaths Ratio in India",fontsize=15)
Out[16]:
Text(0.5, 1.0, 'Top 10 Active Ratio to Deaths Ratio in India')

Maharashtra has the highest Active to Death Ratio in the country. Delhi standing at second and Kerela with the lowest ratio

In [17]:
sort_data= data.sort_values(by='Active Ratio',ascending=False)
sns.barplot(x='State/UTs',y='Active Ratio',data=data[:10],palette='copper',hue='Death Ratio')
plt.xticks(rotation=90)
plt.title("Top 10 Active Ratio to Deaths Ratio in India",fontsize=15)
Out[17]:
Text(0.5, 1.0, 'Top 10 Active Ratio to Deaths Ratio in India')


In [18]:
sns.lineplot(y='Active Ratio',data=data[:10],x='State/UTs')
plt.xticks(rotation=90)
plt.title("Line Plot for Active Ratio",fontsize=15)
Out[18]:
Text(0.5, 1.0, 'Line Plot for Active Ratio')

Kerela has the highest Active Ratio in the country

In [19]:
sns.barplot(x='State/UTs',y='Death Ratio',data=data[:10],hue='Death Ratio')
plt.xticks(rotation=90)
plt.title("Top 10 Death Ratio in India",fontsize=15)
Out[19]:
Text(0.5, 1.0, 'Top 10 Death Ratio in India')

Death ratio in Maharashtra and Delhi are the highest

In [20]:
sns.lineplot(y='Death Ratio',data=data[:10],x='State/UTs')
plt.xticks(rotation=90)
plt.title(" Lineplot for Death Ratio",fontsize=15)
Out[20]:
Text(0.5, 1.0, ' Lineplot for Death Ratio')
In [21]:
sns.scatterplot(x='Discharge Ratio',y='Active Ratio',data=data[:10],palette='twilight',hue='State/UTs')
plt.title("Discharge Ratio for top 10 states",fontsize=15)
Out[21]:
Text(0.5, 1.0, 'Discharge Ratio for top 10 states')

People in Kerela and Karnataka have been discharged the most from Hospitals

In [22]:
sns.barplot(x='State/UTs',y='Death Ratio',data=data[:10],hue='Discharge Ratio')
plt.xticks(rotation=90)
plt.title("Top 10 Discharge Ratio in India",fontsize=15)
Out[22]:
Text(0.5, 1.0, 'Top 10 Discharge Ratio in India')
In [23]:
sns.lineplot(y='Discharge Ratio',data=data[:10],x='State/UTs')
plt.xticks(rotation=90)
plt.title(" Lineplot for Discharge Ratio",fontsize=15)
Out[23]:
Text(0.5, 1.0, ' Lineplot for Discharge Ratio')

Rajasthan has the highest Discharge Ratio in the country. Good to see people coming back to home from hospitals

In [24]:
data['Recovered']=data['Total Cases']-(data['Active']+data['Deaths'])
In [25]:
data.head()
Out[25]:
State/UTsTotal CasesActiveDischargedDeathsActive RatioDischarge RatioDeath RatioRecovered
0Maharashtra606140411955858199011219451.9796.022.015819901
1Kerala29241651013432809587132353.4796.080.452809587
2Karnataka2843810765282732242350402.6996.081.232732242
3Tamil Nadu2479696381912408886326191.5497.141.322408886
4Andhra Pradesh1889513383381838469127062.0397.300.671838469

Recovered is a custom column for the difference between total cases and sum of active and death cases

In [26]:
plt.figure(figsize=(10,10))
sns.barplot(x='State/UTs',y='Recovered',palette='CMRmap',data=data)
plt.title("The Recovered cases as per State/UTs are",fontsize=15)
plt.xticks(rotation=90)
Out[26]:
(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
        34, 35]),
 [Text(0, 0, 'Maharashtra'),
  Text(1, 0, 'Kerala'),
  Text(2, 0, 'Karnataka'),
  Text(3, 0, 'Tamil Nadu'),
  Text(4, 0, 'Andhra Pradesh'),
  Text(5, 0, 'Uttar Pradesh'),
  Text(6, 0, 'West Bengal'),
  Text(7, 0, 'Delhi'),
  Text(8, 0, 'Chhattisgarh'),
  Text(9, 0, 'Rajasthan'),
  Text(10, 0, 'Odisha'),
  Text(11, 0, 'Gujarat'),
  Text(12, 0, 'Madhya Pradesh'),
  Text(13, 0, 'Haryana'),
  Text(14, 0, 'Bihar'),
  Text(15, 0, 'Telengana'),
  Text(16, 0, 'Punjab'),
  Text(17, 0, 'Assam'),
  Text(18, 0, 'Jharkhand'),
  Text(19, 0, 'Uttarakhand'),
  Text(20, 0, 'Jammu and Kashmir'),
  Text(21, 0, 'Himachal Pradesh'),
  Text(22, 0, 'Goa'),
  Text(23, 0, 'Puducherry'),
  Text(24, 0, 'Manipur'),
  Text(25, 0, 'Tripura'),
  Text(26, 0, 'Chandigarh'),
  Text(27, 0, 'Meghalaya'),
  Text(28, 0, 'Arunachal Pradesh'),
  Text(29, 0, 'Nagaland'),
  Text(30, 0, 'Sikkim'),
  Text(31, 0, 'Mizoram'),
  Text(32, 0, 'Ladakh'),
  Text(33, 0, 'Dadra and Nagar Haveli and Daman and Diu'),
  Text(34, 0, 'Lakshadweep'),
  Text(35, 0, 'Andaman and Nicobar')])
In [27]:
sns.jointplot(x='Recovered',y='State/UTs',data=data)
plt.xticks(rotation=90)
Out[27]:
(array([0., 1., 2.]), [])
In [28]:
sns.barplot(x='State/UTs',y='Recovered',data=data[:10])
plt.xticks(rotation=90)
plt.title("Top 10 most affected States",fontsize=15)
Out[28]:
Text(0.5, 1.0, 'Top 10 most affected States')
In [29]:
plt.figure(figsize=(10,5))
sns.pointplot(x='State/UTs',y='Recovered',data=data[:10],color='Red')
plt.xticks(rotation=90)
plt.title("recovered in line Plot",fontsize=15)
Out[29]:
Text(0.5, 1.0, 'recovered in line Plot')
In [30]:
plt.figure(figsize=(7,7))
labels = data.index
plt.pie(x='Recovered',data=data[:10],labels='State/UTs',startangle=90,autopct='%.1f%%')
plt.title("Recovery in Top 5 states in India", fontsize = 24) 
plt.tight_layout() 
plt.show()

Summary

1. Maharashtra has been affected the most in India. 2. Kerela stands the second followed by Karnataka 3. Rajashthan has the best discharge ratio 4. Discharge to Active Ratio is highest in Kerela followed by Karnataka. 5. Death ratio is highest in Maharashtra and Chattisgarh. 6. Most Active Ratio is in Kerela and least is in Delhi and Uttar Pradesh 7. Recovery is also highest in Maharashtra, Kerala and Karnataka stands at 2nd and 3rd respectively

In [ ]:

No comments:

Post a Comment