CBSE CS and IP: Python Pandas

Showing posts with label Python Pandas. Show all posts

head() and tail() functions of Series Object

Pandas Series provides two very useful methods for extracting the data from the top and bottom of the Series Object. These methods are head() and tail().

head() Method

head() method is used to get the elements from the top of the series. By default, it gives 5 elements.

Syntax:

<Series Object> . head(n = 5)

Example:

Consider the following Series, we will perform the operations on the below given Series S.

import pandas as pd
s = pd.Series({'A':1,'B':2,'C':3,'D':4,'E':5,'F':6,'G':7,'H':8})
print(s)

Output:
A    1
B    2
C    3
D    4
E    5
F    6
G    7
H    8
dtype: int64

head() Function without argument

If we do not give any argument inside head() function, it will give by default 5 values from the top.

s.head()
Output:
A    1
B    2
C    3
D    4
E    5
dtype: int64

head() Function with Positive Argument

When a positive number is provided, the head() function will extract the top n rows from Series Object. In the below given example, I have given 7, so 7 rows from the top has been extracted.

s.head(7)
Output:
A    1
B    2
C    3
D    4
E    5
F    6
G    7
dtype: int64

head() Function with Negative Argument

We can also provide a negative value inside the head() function. For a negative value, it will check the index from the bottom and provide the data from the top.

s.head(-7)

Output:
A    1
dtype: int64

tail() Method

tail() method gives the elements of series from the bottom.

Syntax:

<Series Object> . tail(n = 5)

Example:

Consider the following Series, we will perform the operations on the below given Series S.

import pandas as pd
s = pd.Series({'A':1,'B':2,'C':3,'D':4,'E':5,'F':6,'G':7,'H':8})
print(s)

Output:
A    1
B    2
C    3
D    4
E    5
F    6
G    7
H    8
dtype: int64

tail() function without argument

If we do not provide any argument tail() function gives be default 5 values from the bottom of the Series Object.

s.tail()
'''
Output:
D    4
E    5
F    6
G    7
H    8
dtype: int64
'''

tail() function Positive argument

When a positive number is provided tail() function given bottom n elements of the Series Object.

s.tail(7)
'''
Output:
B    2
C    3
D    4
E    5
F    6
G    7
H    8
dtype: int64
'''

tail() function Negative argument

When a negative number is provided it will give the data as follows:

s.tail(-7)
'''
Output:
H    8
dtype: int64
'''

Pandas Series dtype Parameter

cbsecsip Class 12 IP , Python Pandas

USE OF DTYPE IN SERIES

dtype parameter is used to provide the data type to the Series elements. Since a single series can have a single data type, we can assign our own data type using dtype parameter.

Series Method Prototype

In the previous posts we have seen data Parameter and index Parameter, in this Post we will discuss dtype Parameter. By default, the dtype will be inferred by Series elements.

<Series Object> = pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

dtype can be:

If not specified, this will be inferred from data.
Any NumPy data type

a) dtype - inferred from data

When we do not provide the dtype at the time of series creation it will automatically be taken by python from the data of Series. Check the following examples:

(i) Creating Series with NumPy array

In the following example, we are creating the series using the NumPy array. The data type of Series is coming as int32. This is taken by the python from the data because all the data are of integer type, hence "int32" was given as data type of Series.

import numpy as np
import pandas as pd
arr = np.array([1,2,3,4])
print(arr)          
s = pd.Series(arr)
print(s)
print(type(s))

'''
Output:
-------
[1 2 3 4]

0    1
1    2
2    3
3    4
dtype: int32

<class 'pandas.core.series.Series'> 
'''

(ii) Using List

When we use list for the creation of Series, it takes the index of Series same as the index of the list. As shown in the example the list myList is shaving Three values, these values can be of any type. Here the list contains the string and integer both, hence the python has given the dtype as "object".

import pandas as pd
myList = ['A','B',2]
s = pd.Series(myList)
print(s)
print(type(s))

'''
Output:
-------
0    A
1    B
2    2
dtype: object

<class 'pandas.core.series.Series'>
'''

(iii) Using Dictionary

When a dictionary is used for Series creation, the values of that dictionary is used as data of Series and Keys are used as data labels (index) of Series. In this case, all the Series elements are integer, hence the data type is coming "int64".

import pandas as pd
d = {'A':1, 'B':2, 'C':3}
print(d)
s = pd.Series(d)
print(s)
print(type(s))

'''
Output:
------
{'A': 1, 'B': 2, 'C': 3}

A    1
B    2
C    3
dtype: int64

<class 'pandas.core.series.Series'>
'''

b) Any NumPy Data Type

At the time of Series Creation, we can provide our own data type using dtype Parameter. Following is the list of NumPy Data Type.

Data Type	Description
bool_	Boolean (True or False) stored as a byte
int8	Byte (-128 to 127)
int16	Integer (-32768 to 32767)
int32	Integer (-2.15E-9 to 2.15E+9)
int64	Integer (-9.22E-18 to 9.22E+18)
uint8	Unsigned integer (0 to 255)
uint16	Unsigned integer (0 to 65535)
uint32	Unsigned integer (0 to 4.29E+9)
uint64	Unsigned integer (O to 1.84E+19)
float16	Half precision signed float
float32	Single precision signed float
float64	Double precision signed float
complex64	Complex number: two 32-bit floats (real and imaginary components)
complex128	Complex number: two 64-bit floats (real and imaginary components)

import pandas as pd
import numpy as np
s = pd.Series(data = [1,2,3,4], dtype = np.int16)
print(s)

'''
#Output:
0    1
1    2
2    3
3    4
dtype: int16
'''

s = pd.Series(data = [1,2,3,4], dtype = np.float32)                
print(s)
'''
#Output:
0    1.0
1    2.0
2    3.0
3    4.0
dtype: float32
'''

In the above example, you can see that we have provided "np.int16" and "np.float32" respectively for the same data [1,2,3,4].

Pandas Series Index Parameter

cbsecsip Class 12 IP , Python Pandas

Series Method Prototype:

<Series Object> = pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

In this post, we are going to discuss index parameter.
To know how to use data parameters click here.

The Series method gives the default indexes as 0,1,2 ....etc. In this post, I am going to discuss how to provide user-defined indexes using index Parameter of the Series Method. It is used to provide the data labels for data values. The number of indexes should be the same as the number of data values. The indexes can be provided in the following three ways:

The Index taken by Dictionary
Specifying Index Using List
Indexes with Scalar Data Value

a) The index taken by Dictionary

If we are providing the data in the form of a dictionary, then the keys of the dictionary will become the indexes and values of the dictionary will become data of the Series.

import pandas as pd
d = {'A':1, 'B':2, 'C':3}
print(d)
s = pd.Series(d)
print(s)
print(type(s))

'''
Output:
------
{'A': 1, 'B': 2, 'C': 3}

A    1
B    2
C    3
dtype: int64

<class 'pandas.core.series.Series'>
'''

b) Specifying Index Using List

We can provide indexes using a List, the number of elements in the list should be the same as the number of data values. As shown in the example below the index are X, Y and Z and the data elements are 1, 2 and 3.

import pandas as pd
l = [1,2,3]
s = pd.Series(data= l, index = ['X','Y','Z'])
print(s)
'''
Output
------
X    1
Y    2
Z    3
dtype: int64
'''

l = [1,2,3]
s = pd.Series(l, ['X','Y','Z'])
print(s)
'''
Output
------
X    1
Y    2
Z    3
dtype: int64
'''

Note: Length of Data and Index Should be same. Otherwise, Python will show an error.

c) Indexes with Scalar Value

If we are creating the Series with Scalar Value, then we can provide indexes in the form of a list. The total values in the Series will depend upon the number of elements in the index. For example, you can see the code below. In this code, the scalar value is "Hello" and there are three elements in Series s, because we have provided three indexes.

import pandas as pd
s = pd.Series("Hello",[1,2,3])
print(s)

'''
#Output:
1    Hello
2    Hello
3    Hello
dtype: object
'''

For more Details Wath the following Video:

Pandas Series - A Pandas Data Structure (How to create Pandas Series?)

cbsecsip Class 12 IP , Python Pandas

Series is a one-dimensional Data Structure of Pandas, it is used for data analysis. It can contain a heterogeneous types of values. Series type Object has two main components:

An array of actual data
An associated array of indexes or data labels.

Fig-1 Series

Like list, there are indexes in Series, but in series, we can assign our own indexes (Data labels). The labels need not be unique but must be a hashable type.

How to create pandas series?

To create the Series Object a list of data to be passed in Series() Function. This function is available inside the Pandas Module. Hence before creating Series Object we have to import pandas library using:

import pandas

Following is the syntax for Series Creation:

<Series Object> = pandas.Series(data)

<Series Object> = pandas.Series(data=None, index=None, dtype=None, name=None, copy=False)

data: data parameter provides data for Series creation, it can be a Sequence or Scalar Value

index: This is used to change the default index of Series

dtype: It is used to change the default datatype of Series

name: To give a name to Series

copy: To copy input Data, A boolean Value by default is false

By default, the index of series will be integers starting from 0,1,2...etc. as shown above in Fig-1 Series.

Use of different types of Data

In Series Data Structure, we can use our own data for creating the series. This data can be of different types.

<Series Object> = pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

The data can be:

Without Data (Empty Series)
A scalar value
A python sequence (ex:- list, tuple, dictionary etc.)
A ndarray

Let us discuss each one by one with Pandas Series Examples.

a) Without Data (Empty Series)

If we do not provide any data then python pandas will create an empty series. The default data type that pandas provide to series is "float64". In the following example, you can see that I have not provided any parameter in Series() function. This will create an empty series.

import pandas as pd
s = pd.Series()
print(s)         ## Series([], dtype: float64)
print(type(s))   ## <class 'pandas.core.series.Series'>

b) A Scalar Value

A Pandas Series can be created with only single Parameters. This will create a Series Object with only a single value. In the following example, you can see that I have provided a value 20 only. A Series Object 's' has been created by Pandas with value 20.

import pandas as pd
s = pd.Series(20)
print(s)
print(type(s))

'''
## Output
-----------
0    20
dtype: int64

<class 'pandas.core.series.Series'>
'''

c) A python sequence (ex:- list, tuple, dictionary etc.)

A Python sequence can be used for the creation of Series Object. Any Python Sequence like list, tuple or dictionary can be used for this purpose. Next, we will create a series using List and Dictionary.

Using List: When we use list fr creation of Series, it takes the index of Series same as the index of the list. As shown in the example the list myList is shaving Three values, these values can be of any type.

import pandas as pd
myList = ['A','B',2]
s = pd.Series(myList)
print(s)
print(type(s))

'''
Output:
-------
0    A
1    B
2    2
dtype: object

<class 'pandas.core.series.Series'>
'''

Using Dictionary: When a dictionary is used for Series creation, the values of that dictionary is used as data of Series and Keys are used as data labels (index) of Series.

import pandas as pd
d = {'A':1, 'B':2, 'C':3}
print(d)
s = pd.Series(d)
print(s)
print(type(s))

'''
Output:
------
{'A': 1, 'B': 2, 'C': 3}

A    1
B    2
C    3
dtype: int64

<class 'pandas.core.series.Series'>
'''

d) A ndarray

Numpy is a core library for numerical and scientific computation. We can use a one-dimensional Numpy array for Series creation. The Index or data labels of Series will come as 0,1,2....etc. by default.

import numpy as np
import pandas as pd
arr = np.array([1,2,3,4])
print(arr)          
s = pd.Series(arr)
print(s)
print(type(s))

'''
Output:
-------
[1 2 3 4]

0    1
1    2
2    3
3    4
dtype: int32

<class 'pandas.core.series.Series'> 
'''

Python Pandas - Introduction

cbsecsip Class 12 IP , Python Pandas

Following are the key points about Pandas:

Python Pandas is an Open Source Python Library.
Pandas have derived from “Panel Data System”.
Pandas is a popular choice for Data Analysis and Data Science because it is very simple and easy to use.
The main author of Pandas is Wes McKinney.
Pandas offer high-performance, easy-to-use data structure and data analysis tools.

Data Analysis – Process of evaluating big datasets using analytical and statistical tools to discover useful information and conclusion to support business decision-making.

Pandas Installation

Pandas library can be installed in your python by opening a command prompt and running the following command:

pip install pandas

Why Pandas is used in Python

Pandas is the most popular library in the Scientific Python ecosystem for doing data analysis. Pandas is capable of many tasks including-
It can read or write in many different data formats (Integer, float, double etc).
It can calculate in all ways data is organized, i.e. row and column-wise analysis.
It can easily select subsets of data from bulky data sets and even combine multiple datasets together.
It supports visualization by integrating matplotlib and seaborn etc libraries.

Pandas Data Structure

Data Structure - A data structure is a particular way of organizing data in a computer so that it can be used effectively. For example, we can store a list of items using the list data structure.

There are four built-in data structures in Python - list, tuple, dictionary and set.
Pandas offer many data structures to handle a variety of data.
Out of many Data Structures pandas two basic Data Structures of Pandas

Series - A Series is a Pandas data structure that represents a one-dimensional array-like object containing an array of data and an associated array of data labels, called its index.

DataFrame - A DataFrame is a two dimensional labelled array-like, Pandas data structure that stores an ordered collection of columns that can store data of different types.

Note: Both Series and DataFrame are objects in python.

Remove or Replace any character from Python Pandas DataFrame Column

cbsecsip Python Pandas ,

removing a character from dataframe column

If you are searching for the solution on How to remove a Character from Pandas DataFrame Columns, you have come to the right place.

You can remove or replace any character from any column from your Pandas DataFrame using the following code:

dataframe_name    = df   ## your dataframe name 
dataframe_col_idx = 0    ## dataframe column index, on which you want to perform operation
char_to_replace   = 'a'  ## char which you want to replace
replaced_char     = 'XX' ## char/string into which you want to replace, '' in case to remove 

n = 0
for (i,j) in dataframe_name.iteritems():
    if i == dataframe_col_name:
        for name in j:            
            dataframe_name.iloc[n,dataframe_col_idx] = name.replace(char_to_replace,replaced_char)
            n = n + 1

You can use the above code to remove or replace any character from DataFrame column. Below are the instructions on how to use the above code:

Change the dataframe_name variable and give your dataframe name.
Give the index (in the form of an integer) of your column in dataframe_col_idx variable.
Now give the character which you want to replace in char_to_replace.
and replaced_char will have a character or string into which you want to change your character.

If you just want to remove any character simply give replaced_char as ' ' (an empty string).

Consider the following example:

import pandas as pd

d = {'Name':['Sachin','Dhoni','Virat','Rohit','Shikhar','Sachin'],
     'Age':[26,25,25,24,31,33],
     'Score':[87,67,89,55,47,90]}

df = pd.DataFrame(d,index = ['A','B','C','D','E','F'])

df

Output:

        Name   Age  Score
A   Sachin   26     87
B    Dhoni   25     67
C    Virat   25     89
D    Rohit   24     55
E  Shikhar   31     47
F   Sachin   33     90

If I want to remove all the character 'a' from column 'Name', we can use the following code:

dataframe_name    = df   ## your dataframe name 
dataframe_col_idx = 0    ## dataframe column index, on which you want to perform operation
char_to_replace   = 'a'  ## char which you want to replace
replaced_char     = ''   ## char/string into which you want to replace, '' in case to remove 

n = 0
for (i,j) in dataframe_name.iteritems():
    if i == dataframe_col_name:
        for name in j:            
            dataframe_name.iloc[n,dataframe_col_idx] = name.replace(char_to_replace,replaced_char)
            n = n + 1

DataFrame df after running the above code:

     Name   Age    Score
A    Schin   26     87
B    Dhoni   25     67
C     Virt   25     89
D    Rohit   24     55
E   Shikhr   31     47
F    Schin   33     90

You can see how the character 'a' has been removed from my datafram. Using this code you can also remove special characters from your dataframe.

head() and tail() functions of Series Object

head() Method

head() Function without argument

head() Function with Positive Argument

head() Function with Negative Argument

tail() Method

tail() function without argument

tail() function Positive argument

tail() function Negative argument

Pandas Series dtype Parameter

USE OF DTYPE IN SERIES

Series Method Prototype

a) dtype - inferred from data

(i) Creating Series with NumPy array

(ii) Using List

(iii) Using Dictionary

b) Any NumPy Data Type

Pandas Series Index Parameter

a) The index taken by Dictionary

b) Specifying Index Using List

c) Indexes with Scalar Value

Pandas Series - A Pandas Data Structure (How to create Pandas Series?)

How to create pandas series?

Use of different types of Data

<Series Object> = pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

a) Without Data (Empty Series)

b) A Scalar Value

c) A python sequence (ex:- list, tuple, dictionary etc.)

d) A ndarray

Python Pandas - Introduction

Pandas Installation

Why Pandas is used in Python

Pandas Data Structure

Remove or Replace any character from Python Pandas DataFrame Column

Total Visitors

Subscribe Us

Categories

Recent

Featured

Blog Archive

Recent Post

Tags

Recent Comments