data_exploration
  |   Source

Exploring Data With Pandas

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
pd.set_option('max_columns', 50)
%matplotlib inline

Series

There are three ways to create a series.

In [3]:
# first method: Involves passing a list without index to pd.
a = pd.Series(['food', 8, 'july', 9.0, [2,3], 'data'])
print a
0      food
1         8
2      july
3         9
4    [2, 3]
5      data
dtype: object
In [5]:
# second method: Involves passing a list of data and a list on index
data = pd.Series(['food', 8, 'july', 9.0, [2,3], 'data'], 
                index=['a','b','c', 'd', 'e', 'f'])
print data
a      food
b         8
c      july
d         9
e    [2, 3]
f      data
dtype: object
In [8]:
# third method: Involves passing a dictionary
data = {'Lagos': 1000, 'Kaduna': 1300, 'Abuja': 900, 'Anambra': 1100,
     'Plateau': 450, 'Ebonyi': None}
states = pd.Series(data)
print states
Abuja       900.0
Anambra    1100.0
Ebonyi        NaN
Kaduna     1300.0
Lagos      1000.0
Plateau     450.0
dtype: float64

One can basically perform any operations on the series like a numpy array.

In [11]:
print states['Lagos']  # index with dictionary keys
1000.0
In [12]:
print states[states > 1000]  # boolean indexing
Anambra    1100.0
Kaduna     1300.0
dtype: float64
In [13]:
print states > 1000
Abuja      False
Anambra     True
Ebonyi     False
Kaduna      True
Lagos      False
Plateau    False
dtype: bool
In [14]:
print 'Abuja' in states  # check for membership
True
In [15]:
print states.notnull()  # check for availablility of values
Abuja       True
Anambra     True
Ebonyi     False
Kaduna      True
Lagos       True
Plateau     True
dtype: bool
In [18]:
print states[states.isnull()]   # expose null states
Ebonyi   NaN
dtype: float64

DataFrame

It's basically a table-like structure. It has rows and columns.

How are DataFrames birthed

Let's explore some of the ways we can read in data.

In [25]:
# we can create a new dataframe
data = {'year': [1986, 1990, 1994, 1998, 2002, 2006, 2011, 2014],
        'team': ['Argentina', 'Germany', 'Brazil', 'France', 'Brazil', 'Italy', 'Spain', 'Germany'],
        'goal_for': [3, 1, 3, 3, 2, 5, 1, 1],
        'goal_against': [2, 0, 2, 0, 0, 3, 0, 0]}
soccer = pd.DataFrame(data, columns=['year', 'team', 'goal_for', 'goal_against'])
soccer
Out[25]:
year team goal_for goal_against
0 1986 Argentina 3 2
1 1990 Germany 1 0
2 1994 Brazil 3 2
3 1998 France 3 0
4 2002 Brazil 2 0
5 2006 Italy 5 3
6 2011 Spain 1 0
7 2014 Germany 1 0
In [26]:
from_csv = pd.read_csv('Secondary.csv')
from_csv.head()
Out[26]:
Name of School LGA Ownership Type of School Email Phone Numbers Link
0 ALTERNATIVE HIGH SCHOOL FOR GIRLS, Agboju State owned Amuwo-Odofin Secondary altergirlssc@lagosschoolsonline.com 08037218349 =HYPERLINK("http://www.lagosschoolsonline.com/...
1 A & T COLLEGE, Tabon-Tabon Privately owned Agege Secondary 0803 385 4578, 0803 325 8098 =HYPERLINK("http://www.lagosschoolsonline.com/...
2 A P COLLEGE, Ejigbo Privately owned Oshodi-Isolo Secondary anyebe_peter@yahoo.com 0805 193 2373, 0806 683 7468 =HYPERLINK("http://www.lagosschoolsonline.com/...
3 A-U-D JUNIOR COMPREHENSIVE HIGH SCHOOL State owned Oshodi-Isolo Secondary okotaaudjnr@yahoo.com 0802 033 0476 =HYPERLINK("http://www.lagosschoolsonline.com/...
4 A-U-D JUNIOR GRAMMAR SCHOOL State owned Oshodi-Isolo Secondary 08023509792 =HYPERLINK("http://www.lagosschoolsonline.com/...

So let's write to excel and also read from excel. This might require installing xlrd, so let's do pip install xlrd

In [27]:
soccer.head()
Out[27]:
year team goal_for goal_against
0 1986 Argentina 3 2
1 1990 Germany 1 0
2 1994 Brazil 3 2
3 1998 France 3 0
4 2002 Brazil 2 0
In [28]:
soccer.to_excel('soccer.xlsx')
In [29]:
del soccer
In [30]:
soccer = pd.read_excel('soccer.xlsx')
In [31]:
soccer.head()
Out[31]:
year team goal_for goal_against
0 1986 Argentina 3 2
1 1990 Germany 1 0
2 1994 Brazil 3 2
3 1998 France 3 0
4 2002 Brazil 2 0

Reading From SQL database

In [43]:
import os
basedir = os.path.realpath('.')
from pandas.io import sql
import sqlite3
#conn = sqlite3.connect('sqlite:////' + os.path.join(basedir, 'data.sqlite'))
conn = sqlite3.connect('data.sqlite')
In [45]:
soccer.to_sql(name='test',con=conn)
In [49]:
query = "SELECT * FROM test"
result = pd.read_sql(query, con=conn)
result.head()
Out[49]:
index year team goal_for goal_against
0 0 1986 Argentina 3 2
1 1 1990 Germany 1 0
2 2 1994 Brazil 3 2
3 3 1998 France 3 0
4 4 2002 Brazil 2 0

Read data from url

In [50]:
url = 'https://raw.githubusercontent.com/kantologist/abuja_schools/master/data/international-schools.csv'
from_url = pd.read_csv(url)
from_url.head()
Out[50]:
Name of School School Description Type of School Address Phone Numbers School Website Link
0 Aduvie International School Aduvie International School offers educational... international-schools Plot 273, BO8, Cadastral Zone, Jahi District, ... 0808 939 4474 =HYPERLINK("http://www.aduvieinternationalscho... =HYPERLINK("http://www.finelib.com/listing/Adu...
1 British Canadian International Education BCIE is a reputable independent organisation t... international-schools 24B, Djibouti Crescent, 5th House after Zartec... 0817-149-5402, 0909-347-8930, 0806-678-8636 =HYPERLINK("http://www.bcie.co.uk/offices/nige... =HYPERLINK("http://www.finelib.com/listing/Bri...
2 Center for Teaching & Learning (CTL) CTL Academy is an international school offerin... international-schools 2, Oladunni Ayandipo Close, Katampe Extension,... 0818 194 6446, 0703 514 7277 =HYPERLINK("http://www.ctlacademy.org/","Visit... =HYPERLINK("http://www.finelib.com/listing/Cen...
3 Efab International School Efab International School provides educational... international-schools Along 2nd Avenue, Efab City Estate, Mbora, Abu... 0817 490 9071 =HYPERLINK("http://www.efabintlschool.com/","V... =HYPERLINK("http://www.finelib.com/listing/Efa...
4 FAMAKS International School FAMAKS International School provides education... international-schools 8, Adekunle Fajyi Close, Off Ajayi Crowther St... 0808 305 1079 =HYPERLINK("http://www.famakschools.com/","Vis... =HYPERLINK("http://www.finelib.com/listing/FAM...

So let's explore a sample data. Lagos Schools

In [52]:
schools = from_csv
schools.head()
Out[52]:
Name of School LGA Ownership Type of School Email Phone Numbers Link
0 ALTERNATIVE HIGH SCHOOL FOR GIRLS, Agboju State owned Amuwo-Odofin Secondary altergirlssc@lagosschoolsonline.com 08037218349 =HYPERLINK("http://www.lagosschoolsonline.com/...
1 A & T COLLEGE, Tabon-Tabon Privately owned Agege Secondary 0803 385 4578, 0803 325 8098 =HYPERLINK("http://www.lagosschoolsonline.com/...
2 A P COLLEGE, Ejigbo Privately owned Oshodi-Isolo Secondary anyebe_peter@yahoo.com 0805 193 2373, 0806 683 7468 =HYPERLINK("http://www.lagosschoolsonline.com/...
3 A-U-D JUNIOR COMPREHENSIVE HIGH SCHOOL State owned Oshodi-Isolo Secondary okotaaudjnr@yahoo.com 0802 033 0476 =HYPERLINK("http://www.lagosschoolsonline.com/...
4 A-U-D JUNIOR GRAMMAR SCHOOL State owned Oshodi-Isolo Secondary 08023509792 =HYPERLINK("http://www.lagosschoolsonline.com/...
In [53]:
schools.info()   # data inspection
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2098 entries, 0 to 2097
Data columns (total 7 columns):
Name of School    2098 non-null object
LGA               2098 non-null object
Ownership         2083 non-null object
Type of School    2098 non-null object
Email             2097 non-null object
Phone Numbers     2098 non-null object
Link              2098 non-null object
dtypes: object(7)
memory usage: 114.8+ KB
In [54]:
schools.dtypes
Out[54]:
Name of School    object
LGA               object
Ownership         object
Type of School    object
Email             object
Phone Numbers     object
Link              object
dtype: object
In [56]:
schools.describe()
Out[56]:
Name of School LGA Ownership Type of School Email Phone Numbers Link
count 2098 2098 2083 2098 2097 2098 2098
unique 1699 7 26 2 1351 1686 1708
top COMMUNITY JUNIOR SECONDARY SCHOOL Privately owned Alimosho Secondary =HYPERLINK("http://www.lagosschoolsonline.com/...
freq 4 1166 386 2097 418 7 2
In [57]:
schools.head()
Out[57]:
Name of School LGA Ownership Type of School Email Phone Numbers Link
0 ALTERNATIVE HIGH SCHOOL FOR GIRLS, Agboju State owned Amuwo-Odofin Secondary altergirlssc@lagosschoolsonline.com 08037218349 =HYPERLINK("http://www.lagosschoolsonline.com/...
1 A & T COLLEGE, Tabon-Tabon Privately owned Agege Secondary 0803 385 4578, 0803 325 8098 =HYPERLINK("http://www.lagosschoolsonline.com/...
2 A P COLLEGE, Ejigbo Privately owned Oshodi-Isolo Secondary anyebe_peter@yahoo.com 0805 193 2373, 0806 683 7468 =HYPERLINK("http://www.lagosschoolsonline.com/...
3 A-U-D JUNIOR COMPREHENSIVE HIGH SCHOOL State owned Oshodi-Isolo Secondary okotaaudjnr@yahoo.com 0802 033 0476 =HYPERLINK("http://www.lagosschoolsonline.com/...
4 A-U-D JUNIOR GRAMMAR SCHOOL State owned Oshodi-Isolo Secondary 08023509792 =HYPERLINK("http://www.lagosschoolsonline.com/...
In [58]:
schools.tail(3)
Out[58]:
Name of School LGA Ownership Type of School Email Phone Numbers Link
2095 DAMTOSE COLLEGE, Alakuko Privately owned Ifako/Ijaiye Secondary damtose-schools@yahoo.com 0802 949 2237, 0812 917 5576 =HYPERLINK("http://www.lagosschoolsonline.com/...
2096 DANMARTINS COLLEGE, Alagbado Privately owned Alimosho Secondary danmartins 43@yahoo.com, akestlabinternationa... 0802 307 8222, 0805 302 8289, 0803 070 0944 =HYPERLINK("http://www.lagosschoolsonline.com/...
2097 DANSOL HIGH SCHOOL, Ikeja Privately owned Ikeja Secondary dansolhighschool@yahoo.co.uk 0805 869 7186, 0802 932 7843, 0802 343 870... =HYPERLINK("http://www.lagosschoolsonline.com/...
In [59]:
schools[200:205]
Out[59]:
Name of School LGA Ownership Type of School Email Phone Numbers Link
200 AYANGBUREN SENIOR HIGH SCHOOL State owned Ikorodu Secondary gburensnrsch@lagosschoolsonline.com 0802-327-0643 =HYPERLINK("http://www.lagosschoolsonline.com/...
201 AYEDERE AJIBOLA JUNIOR HIGH SCHOOL, Ketu State owned Kosofe Secondary ayeajijunior@lagosschoolsonline.com 0803-074 3806 =HYPERLINK("http://www.lagosschoolsonline.com/...
202 AYEDERE AJIBOLA SENIOR HIGH SCHOOL, Ketu State owned Kosofe Secondary ayederesnr@lagosschoolsonline.com 0803 373 2921 =HYPERLINK("http://www.lagosschoolsonline.com/...
203 AYO-OLUSEUN COLLEGE, Alapere Privately owned Kosofe Secondary 0702 808 1787, 0802 339 9150 =HYPERLINK("http://www.lagosschoolsonline.com/...
204 AYOBAMI COLLEGE, Egan-Igando Privately owned Alimosho Secondary ayobami/pschool@yahoo.com 0802 303 5154 =HYPERLINK("http://www.lagosschoolsonline.com/...

oops! LGA and Ownership are mixed up. Let's fix that.

In [75]:
schools.columns
Out[75]:
Index([u'Name of School', u'LGA', u'Ownership', u'Type of School', u'Email',
       u'Phone Numbers', u'Link'],
      dtype='object')
In [83]:
schools.rename(columns= {'LGA':'Ownership', 'Ownership':'LGA'}, inplace=True )
schools.columns
Out[83]:
Index([u'Name of School', u'Ownership', u'LGA', u'Type of School', u'Email',
       u'Phone Numbers', u'Link'],
      dtype='object')
In [85]:
schools['Ownership'].head()
Out[85]:
0        State owned
1    Privately owned
2    Privately owned
3        State owned
4        State owned
Name: Ownership, dtype: object
In [86]:
schools[['LGA', 'Ownership']].head()
Out[86]:
LGA Ownership
0 Amuwo-Odofin State owned
1 Agege Privately owned
2 Oshodi-Isolo Privately owned
3 Oshodi-Isolo State owned
4 Oshodi-Isolo State owned
In [88]:
schools[(schools.Ownership == 'State owned')].head(3)
Out[88]:
Name of School Ownership LGA Type of School Email Phone Numbers Link
0 ALTERNATIVE HIGH SCHOOL FOR GIRLS, Agboju State owned Amuwo-Odofin Secondary altergirlssc@lagosschoolsonline.com 08037218349 =HYPERLINK("http://www.lagosschoolsonline.com/...
3 A-U-D JUNIOR COMPREHENSIVE HIGH SCHOOL State owned Oshodi-Isolo Secondary okotaaudjnr@yahoo.com 0802 033 0476 =HYPERLINK("http://www.lagosschoolsonline.com/...
4 A-U-D JUNIOR GRAMMAR SCHOOL State owned Oshodi-Isolo Secondary 08023509792 =HYPERLINK("http://www.lagosschoolsonline.com/...
In [90]:
schools[(schools.Ownership == 'State owned') & (schools.LGA == 'Oshodi-Isolo')].head(3)
Out[90]:
Name of School Ownership LGA Type of School Email Phone Numbers Link
3 A-U-D JUNIOR COMPREHENSIVE HIGH SCHOOL State owned Oshodi-Isolo Secondary okotaaudjnr@yahoo.com 0802 033 0476 =HYPERLINK("http://www.lagosschoolsonline.com/...
4 A-U-D JUNIOR GRAMMAR SCHOOL State owned Oshodi-Isolo Secondary 08023509792 =HYPERLINK("http://www.lagosschoolsonline.com/...
9 A.U.D. COMPREHENSIVE JUNIOR HIGH SCHOOL State owned Oshodi-Isolo Secondary audcomp@lagosschoolsonline.com 0802 033 0476 =HYPERLINK("http://www.lagosschoolsonline.com/...
In [91]:
schools.iloc[[0, 78, 300]]
Out[91]:
Name of School Ownership LGA Type of School Email Phone Numbers Link
0 ALTERNATIVE HIGH SCHOOL FOR GIRLS, Agboju State owned Amuwo-Odofin Secondary altergirlssc@lagosschoolsonline.com 08037218349 =HYPERLINK("http://www.lagosschoolsonline.com/...
78 AJUNMONI SENIOR SECONDARY SCHOOL, Mushin State owned Mushin Secondary ajumoni2014@lagosschoolsonline.com 0802-3166-414 =HYPERLINK("http://www.lagosschoolsonline.com/...
300 CARRINTON COLLEGE, Ajao Estate Privately owned Oshodi-Isolo Secondary archwood.school@yahoo.com 0803 305 5394, 0811 716 5698 =HYPERLINK("http://www.lagosschoolsonline.com/...
In [93]:
schools.groupby('LGA').count().head()
Out[93]:
Name of School Ownership Type of School Email Phone Numbers Link
LGA
Badagry 71 71 71 71 71 71
Ibeju-Lekki 34 34 34 34 34 34
Ikeja 71 71 71 71 71 71
Mushin 86 86 86 86 86 86
Shomolu 53 53 53 53 53 53
In [94]:
schools.groupby('LGA').size().tail()
Out[94]:
LGA
Mushin            6
Ojo             156
Oshodi-Isolo    157
Ownership         1
Surulere        145
dtype: int64

End For now

Reference

  1. Greg Reda's blog
Comments powered by Disqus