CBSE CS and IP

CBSE Class 11 & 12 Computer Science and Informatics Practices Python Materials, Video Lecture

Showing posts with label Python Pandas. Show all posts
Showing posts with label Python Pandas. Show all posts

head() and tail() functions of Series Object

head() and tail() functions of Series Object

Pandas Series provides two very useful methods for extracting the data from the top and bottom of the Series Object. These methods are head() and tail().

head() Method

head() method is used to get the elements from the top of the series. By default, it gives 5 elements.

Syntax:
<Series Object> . head(n = 5)

Example:
Consider the following Series, we will perform the operations on the below given Series S.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import pandas as pd
s = pd.Series({'A':1,'B':2,'C':3,'D':4,'E':5,'F':6,'G':7,'H':8})
print(s)

Output:
A    1
B    2
C    3
D    4
E    5
F    6
G    7
H    8
dtype: int64


head() Function without argument

If we do not give any argument inside head() function, it will give by default 5 values from the top.
1
2
3
4
5
6
7
8
9
s.head()
Output:
A    1
B    2
C    3
D    4
E    5
dtype: int64


head() Function with Positive Argument

When a positive number is provided, the head() function will extract the top n rows from Series Object. In the below given example, I have given 7, so 7 rows from the top has been extracted.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
s.head(7)
Output:
A    1
B    2
C    3
D    4
E    5
F    6
G    7
dtype: int64


head() Function with Negative Argument

We can also provide a negative value inside the head() function. For a negative value, it will check the index from the bottom and provide the data from the top.
1
2
3
4
5
s.head(-7)

Output:
A    1
dtype: int64

tail() Method

tail() method gives the elements of series from the bottom.

Syntax:
<Series Object> . tail(n = 5)

Example:
Consider the following Series, we will perform the operations on the below given Series S.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import pandas as pd
s = pd.Series({'A':1,'B':2,'C':3,'D':4,'E':5,'F':6,'G':7,'H':8})
print(s)

Output:
A    1
B    2
C    3
D    4
E    5
F    6
G    7
H    8
dtype: int64


tail() function without argument

If we do not provide any argument tail() function gives be default 5 values from the bottom of the Series Object.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
s.tail()
'''
Output:
D    4
E    5
F    6
G    7
H    8
dtype: int64
'''

tail() function Positive argument

When a positive number is provided tail() function given bottom n elements of the Series Object.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
s.tail(7)
'''
Output:
B    2
C    3
D    4
E    5
F    6
G    7
H    8
dtype: int64
'''

tail() function Negative argument

When a negative number is provided it will give the data as follows:

1
2
3
4
5
6
s.tail(-7)
'''
Output:
H    8
dtype: int64
'''



Pandas Series dtype Parameter

Python Pandas Series dtype

USE OF DTYPE IN SERIES

dtype parameter is used to provide the data type to the Series elements. Since a single series can have a single data type, we can assign our own data type using dtype parameter. 

Series Method Prototype

In the previous posts we have seen data Parameter and index Parameter, in this Post we will discuss dtype Parameter. By default, the dtype will be inferred by Series elements.  

<Series Object> =  pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

dtype can be:
  • If not specified, this will be inferred from data.
  • Any NumPy data type

a)  dtype - inferred from data

When we do not provide the dtype at the time of series creation it will automatically be taken by python from the data of Series. Check the following examples:

(i) Creating Series with NumPy array

In the following example, we are creating the series using the NumPy array. The data type of Series is coming as int32. This is taken by the python from the data because all the data are of integer type, hence "int32" was given as data type of Series.   

import numpy as np
import pandas as pd
arr = np.array([1,2,3,4])
print(arr)          
s = pd.Series(arr)
print(s)
print(type(s))

'''
Output:
-------
[1 2 3 4]

0    1
1    2
2    3
3    4
dtype: int32

<class 'pandas.core.series.Series'> 
'''

(ii) Using List

When we use list for the creation of Series, it takes the index of Series same as the index of the list. As shown in the example the list myList is shaving Three values, these values can be of any type. Here the list contains the string and integer both, hence the python has given the dtype as "object"
import pandas as pd
myList = ['A','B',2]
s = pd.Series(myList)
print(s)
print(type(s))

'''
Output:
-------
0    A
1    B
2    2
dtype: object

<class 'pandas.core.series.Series'>
'''

(iii) Using Dictionary

When a dictionary is used for Series creation, the values of that dictionary is used as data of Series and Keys are used as data labels (index) of Series. In this case, all the Series elements are integer, hence the data type is coming "int64".
import pandas as pd
d = {'A':1, 'B':2, 'C':3}
print(d)
s = pd.Series(d)
print(s)
print(type(s))

'''
Output:
------
{'A': 1, 'B': 2, 'C': 3}

A    1
B    2
C    3
dtype: int64

<class 'pandas.core.series.Series'>
'''

b) Any NumPy Data Type

At the time of Series Creation, we can provide our own data type using dtype Parameter. Following is the list of NumPy Data Type.

Data Type Description
bool_ Boolean (True or False) stored as a byte
int8 Byte (-128 to 127)
int16 Integer (-32768 to 32767)
int32 Integer (-2.15E-9 to 2.15E+9)
int64 Integer (-9.22E-18 to 9.22E+18)
uint8 Unsigned integer (0 to 255)
uint16 Unsigned integer (0 to 65535)
uint32 Unsigned integer (0 to 4.29E+9)
uint64 Unsigned integer (O to 1.84E+19)
float16 Half precision signed float
float32 Single precision signed float
float64 Double precision signed float
complex64 Complex number: two 32-bit floats (real and imaginary components)
complex128 Complex number: two 64-bit floats (real and imaginary components)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import pandas as pd
import numpy as np
s = pd.Series(data = [1,2,3,4], dtype = np.int16)
print(s)

'''
#Output:
0    1
1    2
2    3
3    4
dtype: int16
'''

s = pd.Series(data = [1,2,3,4], dtype = np.float32)                
print(s)
'''
#Output:
0    1.0
1    2.0
2    3.0
3    4.0
dtype: float32
'''

In the above example, you can see that we have provided "np.int16" and "np.float32" respectively for the same data [1,2,3,4]. 

Pandas Series Index Parameter

PANDAS Series Method Index Parameter

Series Method Prototype:

<Series Object> = pandas.Series(data=Noneindex=None, dtype=None, name=None, copy=False, fastpath=False)

  • In this post, we are going to discuss index parameter.
  • To know how to use data parameters click here

The Series method gives the default indexes as 0,1,2 ....etc. In this post, I am going to discuss how to provide user-defined indexes using index Parameter of the Series Method. It is used to provide the data labels for data values. The number of indexes should be the same as the number of data values. The indexes can be provided in the following three ways:
  • The Index taken by Dictionary
  • Specifying Index Using List
  • Indexes with Scalar Data Value

a) The index taken by Dictionary

If we are providing the data in the form of a dictionary, then the keys of the dictionary will become the indexes and values of the dictionary will become data of the Series.
import pandas as pd
d = {'A':1, 'B':2, 'C':3}
print(d)
s = pd.Series(d)
print(s)
print(type(s))

'''
Output:
------
{'A': 1, 'B': 2, 'C': 3}

A    1
B    2
C    3
dtype: int64

<class 'pandas.core.series.Series'>
'''

b) Specifying Index Using List

We can provide indexes using a List, the number of elements in the list should be the same as the number of data values. As shown in the example below the index are X, Y and Z and the data elements are 1, 2 and 3.
import pandas as pd
l = [1,2,3]
s = pd.Series(data= l, index = ['X','Y','Z'])
print(s)
'''
Output
------
X    1
Y    2
Z    3
dtype: int64
'''

l = [1,2,3]
s = pd.Series(l, ['X','Y','Z'])
print(s)
'''
Output
------
X    1
Y    2
Z    3
dtype: int64
'''

Note: Length of Data and Index Should be same. Otherwise, Python will show an error.

c) Indexes with Scalar Value

If we are creating the Series with Scalar Value, then we can provide indexes in the form of a list. The total values in the Series will depend upon the number of elements in the index. For example, you can see the code below. In this code, the scalar value is "Hello" and there are three elements in Series s, because we have provided three indexes.    
import pandas as pd
s = pd.Series("Hello",[1,2,3])
print(s)

'''
#Output:
1    Hello
2    Hello
3    Hello
dtype: object
'''

For more Details Wath the following Video:

Pandas Series - A Pandas Data Structure (How to create Pandas Series?)

Pandas Series in Python
Series is a one-dimensional Data Structure of Pandas, it is used for data analysis. It can contain a heterogeneous types of values. Series type Object has two main components:
  • An array of actual data
  • An associated array of indexes or data labels
Pandas Series
Fig-1 Series

Like list, there are indexes in Series, but in series, we can assign our own indexes (Data labels). The labels need not be unique but must be a hashable type. 

How to create pandas series?

To create the Series Object a list of data to be passed in Series() Function. This function is available inside the Pandas Module. Hence before creating Series Object we have to import pandas library using:

import pandas

Following is the syntax for Series Creation:

<Series Object> = pandas.Series(data)
<Series Object> = pandas.Series(data=None, index=None, dtype=None, name=None, copy=False)

data: data parameter provides data for Series creation, it can be a Sequence or Scalar Value 
index: This is used to change the default index of Series
dtype: It is used to change the default datatype of Series 
name: To give a name to Series
copy: To copy input Data, A boolean Value by default is false

By default, the index of series will be integers starting from 0,1,2...etc. as shown above in Fig-1 Series.

Use of different types of Data

In Series Data Structure, we can use our own data for creating the series. This data can be of different types. 

<Series Object> =  pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

The data can be:
  • Without Data (Empty Series)
  • A scalar value
  • A python sequence (ex:- list, tuple, dictionary etc.)
  • A ndarray
Let us discuss each one by one with Pandas Series Examples.

a) Without Data (Empty Series)

If we do not provide any data then python pandas will create an empty series. The default data type that pandas provide to series is "float64". In the following example, you can see that I have not provided any parameter in Series() function. This will create an empty series.
import pandas as pd
s = pd.Series()
print(s)         ## Series([], dtype: float64)
print(type(s))   ## <class 'pandas.core.series.Series'>


b) A Scalar Value

A Pandas Series can be created with only single Parameters. This will create a Series Object with only a single value. In the following example, you can see that I have provided a value 20 only. A Series Object 's' has been created by Pandas with value 20.  
import pandas as pd
s = pd.Series(20)
print(s)
print(type(s))

'''
## Output
-----------
0    20
dtype: int64

<class 'pandas.core.series.Series'>
'''


c) A python sequence (ex:- list, tuple, dictionary etc.)

A Python sequence can be used for the creation of Series Object. Any Python Sequence like list, tuple or dictionary can be used for this purpose. Next, we will create a series using List and Dictionary.

Using List: When we use list fr creation of Series, it takes the index of Series same as the index of the list. As shown in the example the list myList is shaving Three values, these values can be of any type.
import pandas as pd
myList = ['A','B',2]
s = pd.Series(myList)
print(s)
print(type(s))

'''
Output:
-------
0    A
1    B
2    2
dtype: object

<class 'pandas.core.series.Series'>
'''

Using Dictionary: When a dictionary is used for Series creation, the values of that dictionary is used as data of Series and Keys are used as data labels (index) of Series.
import pandas as pd
d = {'A':1, 'B':2, 'C':3}
print(d)
s = pd.Series(d)
print(s)
print(type(s))

'''
Output:
------
{'A': 1, 'B': 2, 'C': 3}

A    1
B    2
C    3
dtype: int64

<class 'pandas.core.series.Series'>
'''


d) A ndarray

Numpy is a core library for numerical and scientific computation. We can use a one-dimensional Numpy array for Series creation. The Index or data labels of Series will come as 0,1,2....etc. by default.
import numpy as np
import pandas as pd
arr = np.array([1,2,3,4])
print(arr)          
s = pd.Series(arr)
print(s)
print(type(s))

'''
Output:
-------
[1 2 3 4]

0    1
1    2
2    3
3    4
dtype: int32

<class 'pandas.core.series.Series'> 
'''


Python Pandas - Introduction

Following are the key points about Pandas:
  1. Python Pandas is an Open Source Python Library. 
  2. Pandas have derived from “Panel Data System”. 
  3. Pandas is a popular choice for Data Analysis and Data Science because it is very simple and easy to use.
  4. The main author of Pandas is Wes McKinney.
  5. Pandas offer high-performance, easy-to-use data structure and data analysis tools. 

Data Analysis – Process of evaluating big datasets using analytical and statistical tools to discover useful information and conclusion to support business decision-making.

Pandas Installation

Pandas library can be installed in your python by opening a command prompt and running the following command:
pip install pandas

Why Pandas is used in Python

  • Pandas is the most popular library in the Scientific Python ecosystem for doing data analysis. Pandas is capable of many tasks including-
  • It can read or write in many different data formats (Integer, float, double etc). 
  • It can calculate in all ways data is organized, i.e. row and column-wise analysis. 
  • It can easily select subsets of data from bulky data sets and even combine multiple datasets together. 
  • It supports visualization by integrating matplotlib and seaborn etc libraries.

Pandas Data Structure

Data Structure - A data structure is a particular way of organizing data in a computer so that it can be used effectively. For example, we can store a list of items using the list data structure.
  • There are four built-in data structures in Python - list, tuple, dictionary and set.
  • Pandas offer many data structures to handle a variety of data.  
  • Out of many Data Structures pandas two basic Data Structures of Pandas

Series - A Series is a Pandas data structure that represents a one-dimensional array-like object containing an array of data and an associated array of data labels, called its index. 
Pandas Series

DataFrame - A DataFrame is a two dimensional labelled array-like, Pandas data structure that stores an ordered collection of columns that can store data of different types.
Pandas DataFrame


Note: Both Series and DataFrame are objects in python.







Remove or Replace any character from Python Pandas DataFrame Column

removing a character from dataframe column


If you are searching for the solution on How to remove a Character from Pandas DataFrame Columns, you have come to the right place.

You can remove or replace any character from any column from your Pandas DataFrame using the following code:


dataframe_name    = df   ## your dataframe name 
dataframe_col_idx = 0    ## dataframe column index, on which you want to perform operation
char_to_replace   = 'a'  ## char which you want to replace
replaced_char     = 'XX' ## char/string into which you want to replace, '' in case to remove 

n = 0
for (i,j) in dataframe_name.iteritems():
    if i == dataframe_col_name:
        for name in j:            
            dataframe_name.iloc[n,dataframe_col_idx] = name.replace(char_to_replace,replaced_char)
            n = n + 1



You can use the above code to remove or replace any character from DataFrame column. Below are the instructions on how to use the above code:

  1. Change the dataframe_name variable and give your dataframe name.
  2. Give the index (in the form of an integer) of your column in dataframe_col_idx variable.
  3. Now give the character which you want to replace in char_to_replace.
  4. and replaced_char will have a character or string into which you want to change your character. 
If you just want to remove any character simply give replaced_char as ' ' (an empty string).

Consider the following example:

import pandas as pd

d = {'Name':['Sachin','Dhoni','Virat','Rohit','Shikhar','Sachin'],
     'Age':[26,25,25,24,31,33],
     'Score':[87,67,89,55,47,90]}

df = pd.DataFrame(d,index = ['A','B','C','D','E','F'])

df

Output:
        Name   Age  Score
A   Sachin   26     87
B    Dhoni   25     67
C    Virat   25     89
D    Rohit   24     55
E  Shikhar   31     47
F   Sachin   33     90


If I want to remove all the character 'a' from column 'Name', we can use the following code:
dataframe_name    = df   ## your dataframe name 
dataframe_col_idx = 0    ## dataframe column index, on which you want to perform operation
char_to_replace   = 'a'  ## char which you want to replace
replaced_char     = ''   ## char/string into which you want to replace, '' in case to remove 

n = 0
for (i,j) in dataframe_name.iteritems():
    if i == dataframe_col_name:
        for name in j:            
            dataframe_name.iloc[n,dataframe_col_idx] = name.replace(char_to_replace,replaced_char)
            n = n + 1



DataFrame df after running the above code:
     Name   Age    Score
A    Schin   26     87
B    Dhoni   25     67
C     Virt   25     89
D    Rohit   24     55
E   Shikhr   31     47
F    Schin   33     90


You can see how the character 'a' has been removed from my datafram. Using this code you can also remove special characters from your dataframe.