CBSE CS and IP: Python Pandas

Showing posts with label Python Pandas. Show all posts

What is Pandas DataFrame ? How to Create it ?

What is DataFrame?

It is a 2 dimensional data structure with columns of different types. It is just similar to a spreadsheet or SQL table, or a dict of Series objects.

Characteristics of DataFrame Object:

It has two indexes or axis row index (axis = 0) and column index (axis = 1)
Row index is known as index and column index is known as column name
Index(Row-Index) or Column (Column-Index) can be numbers or letters or stings
A column can have values of different types.
DataFrame is Value Mutable and Size Mutable

Creation of DataFrame

Now we will discuss How to create Pandas DataFrame. Before creating DataFrame Object we have to import pandas library.

1	import pandas

Syntax for DataFrame Creation:

1 2	<df_object> = pandas.DataFrame(data = None, index = None, columns = None, dtype = None, copy = False)

1. Dictionary of List / Series

Dictionary of List

import pandas as pd

d = {'first_name': ['Sheldon', 'Raj', 'Leonard', 'Howard', 'Amy'],
        'last_name': ['Copper', 'Koothrappali', 'Hofstadter', 'Wolowitz', 'Fowler'],
        'age': [42, 38, 36, 41, 35],
        'Comedy_Score': [9, 7, 8, 8, 5],
        'Rating_Score': [25, 25, 49, 62, 70]}

df = pd.DataFrame(d)
print(df)


Output
------
  first_name     last_name  age  Comedy_Score  Rating_Score
0    Sheldon        Copper   42             9            25
1        Raj  Koothrappali   38             7            25
2    Leonard    Hofstadter   36             8            49
3     Howard      Wolowitz   41             8            62
4        Amy        Fowler   35             5            70

Dictionary of Series

import pandas as pd
d = {'first_name': pd.Series(['Sheldon', 'Raj', 'Leonard', 'Howard', 'Amy']),
        'last_name': pd.Series(['Copper', 'Koothrappali', 'Hofstadter', 'Wolowitz', 'Fowler']),
        'age': pd.Series([42, 38, 36, 41, 35]),
        'Comedy_Score': pd.Series([9, 7, 8, 8, 5]),
        'Rating_Score': pd.Series([25, 25, 49, 62, 70])}

df = pd.DataFrame(d)
print(df)

Output
------
first_name     last_name  age  Comedy_Score  Rating_Score
0    Sheldon        Copper   42             9            25
1        Raj  Koothrappali   38             7            25
2    Leonard    Hofstadter   36             8            49
3     Howard      Wolowitz   41             8            62
4        Amy        Fowler   35             5            70

2. From List of List / Dictionaries

List of List

import pandas as pd
l= [ ['Sheldon', 'Copper', 42, 9, 25],
     ['Raj', 'Koothrappali', 38, 7, 25],
     ['Leonard', 'Hofstadter', 36, 8, 49],
     ['Howard', 'Wolowitz', 41, 8, 62],
     ['Amy', 'Fowler', 35, 5, 70] ]

df = pd.DataFrame(l)
print(df)


         0             1   2  3   4
0  Sheldon        Copper  42  9  25
1      Raj  Koothrappali  38  7  25
2  Leonard    Hofstadter  36  8  49
3   Howard      Wolowitz  41  8  62
4      Amy        Fowler  35  5  70

List of Dictionary

import pandas as pd

l = [ {'first_name': 'Sheldon', 'last_name': 'Copper', 'age': 42, 'Comedy_Score': 9, 'Rating_Score': 25},
{'first_name': 'Raj', 'last_name': 'Koothrappali', 'age': 38, 'Comedy_Score': 7, 'Rating_Score': 25},
{'first_name': 'Leonard', 'last_name': 'Hofstadter', 'age': 36, 'Comedy_Score': 8, 'Rating_Score': 49},
{'first_name': 'Howard', 'last_name': 'Wolowitz', 'age': 41, 'Comedy_Score': 8, 'Rating_Score': 62},
{'first_name': 'Amy', 'last_name': 'Fowler', 'age': 35, 'Comedy_Score': 5, 'Rating_Score': 70} ]

df = pd.DataFrame(l)
print(df)

Output
------
  first_name     last_name  age  Comedy_Score  Rating_Score
0    Sheldon        Copper   42             9            25
1        Raj  Koothrappali   38             7            25
2    Leonard    Hofstadter   36             8            49
3     Howard      Wolowitz   41             8            62
4        Amy        Fowler   35             5            70

3. Text / CSV Files

file: data.csv
--------------
first_name,last_name,age,Comedy_Score,Rating_Score
Sheldon, Copper, 42, 9, 25
Raj, Koothrappali, 38, 7, 25
Leonard, Hofstadter, 36, 8, 49
Howard, Wolowitz, 41, 8, 62
Amy, Fowler, 35, 5, 70

import pandas as pd
df = pd.read_csv("data.csv")
# you can give a .txt file also, but the data should be comma separated
print(df)

Output
------
  first_name      last_name  age  Comedy_Score  Rating_Score
0    Sheldon         Copper   42             9            25
1        Raj   Koothrappali   38             7            25
2    Leonard     Hofstadter   36             8            49
3     Howard       Wolowitz   41             8            62
4        Amy         Fowler   35             5            70

Modifying Pandas Series Elements

cbsecsip Class 12 IP , Python Pandas

If you know how to extract the series single element and Series slice, it is very simple for you to change the series elements. You can change a single element or a full slice of the series object.

Whatever you want to change in a series you have to access that element and assign it with the new value.

Consider the following Series Object:

import pandas as pd
student = pd.Series(
data = ["BOB", "JHON", "RAM", "MOHAN"],
index = ['S1','S2','S3','S4'])
print(student)

'''
S1      BOB
S2     JHON
S3      RAM
S4    MOHAN
dtype: object
'''

<series object> [<index>] = <new data value>
To assign a new value you have to simply access the value using the Series label or index position as described below. Then you have to just provide the new value by giving the assignment operator.
1 2 3 4 5 6 7 8 9 10 11
student['S1'] = "JACK" student[2] = "PETER" print(student) ''' S1 JACK S2 JHON S3 PETER S4 MOHAN dtype: object '''
<series object> [start : stop] = <new data value>
If you want to change a slice of values, you can provide the values using the colon (:), after that you can provide a scalar value or the value in the form of a list.
1 2 3 4 5 6 7 8 9 10 11
student['S3':'S4'] = "JACK" student[0:1] = "PETER" print(student) ''' S1 PETER S2 JHON S3 JACK S4 JACK dtype: object '''

loc and iloc attribute
You can use loc or iloc to modify the existing values in the series. In both the cases you have to provide the new value using assignment operator.

student.loc['S3'] = "JACK"
student.iloc[3] = "PETER"
print(student)
'''
S1      BOB
S2     JHON
S3     JACK
S4    PETER

dtype: object
'''

For more clarification you can watch the follwing video lecture on this topic:

Accessing Pandas Series Slices

cbsecsip Class 12 IP , Python Pandas

Slicing means extracting the part of the Series. Slicing can be done in the following ways:

Using indexing operator( [ start : stop : step ] )

Position wise (slicing includes stop - 1 data)
Data label wise (slicing includes both ends )

With unique data labels
With duplicate data labels

Using .loc attribute
Using .iloc attribute

Let us now discuss each type one by one:

1. Using indexing operator( [ start : stop : step ] )

Indexing operator is used for slicing, it is very similar to list and string slicing. There are three things start, stop and step. The Start is the starting point of the slice and it will go up to Stop - 1 with taking the mentioned Step.

Start, Stop can be Series Data Labels/Index or Index Position. The Step can be a positive or negative number. The default value of Step is 1.

Let us now discuss what is Series Data Labels / Index and Series Index Position. To know the difference between these two terms, check the below given Series student:

import pandas as pd
student = pd.Series(
data = ["BOB", "JHON", "RAM", "MOHAN"],
index = ['S1','S2','S3','S4'])
print(student)

S1      BOB
S2     JHON
S3      RAM
S4    MOHAN
dtype: object

We have created a Series student with data elements as "BOB", "JHON", "RAM" and "MOHAN" and its data labels/index as 'S1', 'S2', 'S3' and 'S4'. Here 'S1', 'S2', 'S3' and 'S4' are called data labels/index of given Series student. Pandas internally maintain a Position for these data labels starting from 0 up to (length - 1) from top and -1 to length from the bottom. You can understand both the terms as below:

Position   Index   Data_Values
  0/-4        S1       BOB
  1/-3        S2       JHON
  2/-2        S3       RAM
  3/-1        S4       MOHAN

Since our Series student has 4 elements, we have positions starting from 0 up to 3. I hope you have now understood the difference between the Series index and index positions.

a) Position wise (slicing includes stop - 1 data)

As we have discussed the position is a number, which pandas assigns to series internally, so we will use that position to find the series slice.

In this type of slicing the data will come up to Stop - 1.

syntax:

<Series Object> [start : stop : step]

>>> print(student)
S1      BOB
S2     JHON
S3      RAM
S4    MOHAN
dtype: object

>>> student[0:3:1]
S1     BOB
S2    JHON
S3     RAM
dtype: object

>>> student[-3:-1:1]
S2    JHON
S3     RAM
dtype: object

>>> student[-1:-4:-2]
S4    MOHAN
S2     JHON
dtype: object

b) Data label wise (slicing includes both ends )

We can use Series Data Labels for slicing, in this case, the Start and Stop will be a data label and the Step will be a number.

syntax:

<Series Object> [start : stop : step]

Since the Data Labels of any series can be duplicate, hence we will see the slicing for unique and duplicate data labels separately.

Note: In this type of slicing both the start and stop end will be included in the result.

i) With unique data labels

Check the following example, in this example all the data labels of student Series are unique.

>>> print(student)
S1      BOB
S2     JHON
S3      RAM
S4    MOHAN
dtype: object

>>> student['S1':'S4':2]
S1    BOB
S3    RAM
dtype: object

>>> student['S4':'S1':1]
Series([], dtype: object)

>>> student['S4':'S1':-1]
S4    MOHAN
S3      RAM
S2     JHON
S1      BOB
dtype: object

ii) With duplicate data labels

Check the following example, student Series is having two similar Data Labels S1. If we are doing the slicing on a non-unique data Label, we will face the error as we are facing in the below given example.

>>> print(student)
S1      BOB
S2     JHON
S3      RAM
S1    MOHAN
dtype: object

>>> student['S1':'S2']
KeyError: "Cannot get left slice bound for non-unique label: 'S1'"

>>> student['S2':'S3']
S2    JHON
S3     RAM
dtype: object

2. Using ".loc" attribute

Access a group of rows and columns by label(s) or a Boolean array.

Series.loc[ start : stop : step ]
Series.loc[[<list of labels>]]

Consider the following Series Object:

import pandas as pd
student = pd.Series(
data = ["BOB", "JHON", "RAM", "MOHAN"],
index = ['S1','S2','S3','S4'])
print(student)

Series.loc[ start : stop : step ] : Using this you can extract series slices using series index names with providing the range. Here start is the start index, stop is till where you want to extract the slice and step is the step size when you read the data. Data will be printed up to stop.
Example:
1 2 3 4 5 6 7 8
student.loc['S1':'S4':2] ''' S1 BOB S3 RAM dtype: object '''

Series.loc[[<list of labels>]] : If you want to access particular elements of a Series object you can use this type of loc attribute. Here you have to provide the index in the form of a list.
Example:
1 2 3 4 5 6 7
student.loc[['S1','S4']] ''' S1 BOB S4 MOHAN dtype: object '''

3. Using ".iloc" attribute

Using iloc attribute : Purely integer-location-based indexing for selection by position.

Series.iloc[ start : stop : step ]
Series.iloc[[<list of positions>]]

Consider the following Series Object:

import pandas as pd
student = pd.Series(
data = ["BOB", "JHON", "RAM", "MOHAN"],
index = ['S1','S2','S3','S4'])
print(student)

Series.iloc[ start : stop : step ] : Using this you can extract series slices using series index positions with providing the range. Here start is the start index, stop is till where you want to extract the slice and step is the step size when you read the data. Data will be printed up to stop-1.
Example:
1 2 3 4 5 6 7 8
student.iloc[0:3:1] ''' S1 BOB S2 JHON S3 RAM dtype: object '''

Series.iloc[[<list of positions>]] : If you want to access particular elements of a Series object you can use this type of iloc attribute. Here you have to provide the index positions in the form of a list.
Example:
1 2 3 4 5 6 7 8
student.iloc[[1,2,0]] ''' S2 JHON S3 RAM S1 BOB dtype: object '''

Watch the following video lecture to know more:

Accessing Pandas Series Elements

cbsecsip Class 12 IP , Python Pandas

Pandas Series is a 1-D (One Dimensional) Pandas Data Structure. In the previous post, we have seen how to create a Series Object. Here we will discuss how to access elements of Series in Pandas. There are two ways using which you can access the Individual Series Elements:

By using Data Labels / Index
By using Index Position
By using "at" and "iat" attributes

Syntax:

<Series Object> [ <Valid Index> ]

<Series Object> . at [ <Valid Index> ]

<Series Object> .iat [ <Valid Index position> ]

Let us now discuss what is Series Data Labels / Index and Series Index Position. To know the difference between these two terms, check the below given Series student:

import pandas as pd
student = pd.Series(
data = ["BOB", "JHON", "RAM", "MOHAN"],
index = ['S1','S2','S3','S4'])
print(student)

S1      BOB
S2     JHON
S3      RAM
S4    MOHAN
dtype: object

Position   Index   Data_Values
  0/-4        S1       BOB
  1/-3        S2       JHON
  2/-2        S3       RAM
  3/-1        S4       MOHAN

Since our Series student has 4 elements, we have positions starting from 0 up to 3. I hope you have now understood the difference between the Series index and index positions.

It is time to discuss the two main types using which we can find the Series elements:

1. By using Data Labels / Index

We will take our previous Series student and syntax mentioned above to find the elements by using Data Labels / Index i.e. 'S1', 'S2', 'S3' and 'S4'. Check the below-given examples:

>>> print(student)
S1      BOB
S2     JHON
S3      RAM
S4    MOHAN
dtype: object

>>> student["S1"]
'BOB'

>>> student["S3"]
'RAM'

>>> student["S5"]
## Error

2. By using Index Positions

Here again, we will use the Series student to find the elements by using Index Positions. The syntax will remain the same as we have used in our previous example.

>>> print(student)
S1      BOB
S2     JHON
S3      RAM
S4    MOHAN
dtype: object
>>> student[0]
'BOB'

>>> student[-4]
'BOB'

>>> student[2]
'RAM'

>>> student[-2]
'RAM'

>>> student[5]
## Error

3. By using "at" and "iat" attributes

we will use the same series student. "at" and "iat" both are Series attributes, we can use these attributes to find the elements of series.

"at": It takes Data Labels or Index to find the elements

"iat": It takes Index positions to extract the elements from Series

Let check the example of both:

>>> print(student)
S1      BOB
S2     JHON
S3      RAM
S4    MOHAN
dtype: object
 
>>> student.at['S1']
'BOB'
>>> student.iat[0]
'BOB'

>>> student.at['S4']
'MOHAN'
>>> student.iat[-1]
'MOHAN'

I hope, till now you have learnt how to get / access Series element by index. Now read the below-given questions and try to answer by yourself:

Questions:

How do you access the elements of a Pandas series?
To display the third element of a series object what you will write?
How do you get the first element of the pandas series?
How to get the last element of Series Object?
How to get the second last element of Series Object?

Answers:

You can access the series elements either using index or index positions.
student[2]
student[0]
student[-1]
student[-2]

Python Pandas - Series Attribute

cbsecsip Class 12 IP , Python Pandas

Attributes are the properties of any object. Here we will discuss all the Series attributes with programming examples. All the important Series attributes according to the CBSE Class 12 Informatics practices syllabus are given below in the table:-

Attributes	Description
Series.index	Range of the index (axis labels) of the Series.
Series.values	Return Series as ndarray or ndarray like depending upon dtype
Series.dtype	Return the dtype object of the underlying data.
Series.shape	Return a tuple of the shape of the underlying data.
Series.nbytes	Return the number of bytes in the underlying data.
Series.ndim	The number of dimensions of the underlying data, by definition 1.
Series.size	Return the number of elements in the underlying data.
Series.hasnans	Return if I have any nans; enables various perf speedups.
Series.empty	Return true if Series is empty
at, iat	To access a single value from Series
loc, iloc	To access slices from Series

Let us now check all the attribute with programming example. We will consider the following Series Student and check all the attributes on this Series Student.

import pandas as pd
student = pd.Series(["Sonal", "Rahul", "Mohan", "Siya",])
print(student)

'''
Output:
0    Sonal
1    Rahul
2    Mohan
3     Siya
dtype: object
'''

1. Series.index

This attribute is used to get the range of the index (axis labels) of the Series. Let us try this function on the student Series.

1 2	>>> student.index RangeIndex(start=0, stop=4, step=1)

2. Series.values

values attribute returns Series as ndarray or ndarray like depending upon dtype.

1 2	>>> student.values array(['Sonal', 'Rahul', 'Mohan', 'Siya'], dtype=object)

3. Series.dtype

dtype attribute is used to check the data type of the Series Object. Since the student series is of object type, below output is showing 'o'.

1 2	>>> student.dtype dtype('O')

4. Series.shape

shape attribute gives the shape of the underlying data structure in the form of a tuple. Since the student Series is having 4 elements the output is (4,).

1 2	>>> student.shape (4,)

5. Series.nbytes

nbyte attribute gives the total number of bytes taken by the Series object to store the data. The below-given output tells that the student object takes 32 bytes of memory.

1 2	>>> student.nbytes 32

6. Series.ndim

ndim gives the dimension of the underlying data structure. Since series is a 1-D data structure, for all series object it gives 1.

1 2	>>> student.ndim 1

7. Series.size

size gives the total number of elements in the series. Since the student series has 4 elements size will give 4.

1 2	>>> student.size 4

8. Series.hasnans

hasnans returns Boolean value. If any of the series elements is NaN it will return True. Otherwise false.

1 2	>>> student.hasnans False

9. Series.empty

empty attribute returns Boolean True if Series is empty, otherwise the output will be False.

1 2	>>> student.empty False

10. at, iat

We will discuss at and iat in detail in our upcoming Post. You can click here to go to the post.

11. loc, iloc

We will discuss loc and iloc in detail in our upcoming Post. You can click here to go to the post.

What is Pandas DataFrame ? How to Create it ?

What is DataFrame?

Creation of DataFrame

Syntax for DataFrame Creation:

1. Dictionary of List / Series

Dictionary of List

Dictionary of Series

2. From List of List / Dictionaries

List of List

List of Dictionary

3. Text / CSV Files

Modifying Pandas Series Elements

Accessing Pandas Series Slices

1. Using indexing operator( [ start : stop : step ] )

a) Position wise (slicing includes stop - 1 data)

b) Data label wise (slicing includes both ends )

i) With unique data labels

ii) With duplicate data labels

2. Using ".loc" attribute

3. Using ".iloc" attribute

Accessing Pandas Series Elements

1. By using Data Labels / Index

2. By using Index Positions

3. By using "at" and "iat" attributes

Python Pandas - Series Attribute

Total Visitors

Subscribe Us

Categories

Recent

Featured

Blog Archive

Recent Post

Tags

Recent Comments