Slicing means extracting the part of the Series. Slicing can be done in the following ways:
- Using indexing operator( [ start : stop : step ] )
- Position wise (slicing includes stop - 1 data)
- Data label wise (slicing includes both ends )
- With unique data labels
- With duplicate data labels
- Using .loc attribute
- Using .iloc attribute
Let us now discuss each type one by one:
1. Using indexing operator( [ start : stop : step ] )
Indexing operator is used for slicing, it is very similar to list and string slicing. There are three things start, stop and step. The Start is the starting point of the slice and it will go up to Stop - 1 with taking the mentioned Step.
Start, Stop can be Series Data Labels/Index or Index Position. The Step can be a positive or negative number. The default value of Step is 1.
Let us now discuss what is Series Data Labels / Index and Series Index Position. To know the difference between these two terms, check the below given Series student:
1 2 3 4 5 6 7 8 9 10 11 | import pandas as pd student = pd.Series( data = ["BOB", "JHON", "RAM", "MOHAN"], index = ['S1','S2','S3','S4']) print(student) S1 BOB S2 JHON S3 RAM S4 MOHAN dtype: object |
We have created a Series student with data elements as "BOB", "JHON", "RAM" and "MOHAN" and its data labels/index as 'S1', 'S2', 'S3' and 'S4'. Here 'S1', 'S2', 'S3' and 'S4' are called data labels/index of given Series student. Pandas internally maintain a Position for these data labels starting from 0 up to (length - 1) from top and -1 to length from the bottom. You can understand both the terms as below:
1 2 3 4 5 | Position Index Data_Values 0/-4 S1 BOB 1/-3 S2 JHON 2/-2 S3 RAM 3/-1 S4 MOHAN |
Since our Series student has 4 elements, we have positions starting from 0 up to 3. I hope you have now understood the difference between the Series index and index positions.
a) Position wise (slicing includes stop - 1 data)
As we have discussed the position is a number, which pandas assigns to series internally, so we will use that position to find the series slice.
In this type of slicing the data will come up to Stop - 1.
syntax:
<Series Object> [start : stop : step]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | >>> print(student) S1 BOB S2 JHON S3 RAM S4 MOHAN dtype: object >>> student[0:3:1] S1 BOB S2 JHON S3 RAM dtype: object >>> student[-3:-1:1] S2 JHON S3 RAM dtype: object >>> student[-1:-4:-2] S4 MOHAN S2 JHON dtype: object |
b) Data label wise (slicing includes both ends )
We can use Series Data Labels for slicing, in this case, the Start and Stop will be a data label and the Step will be a number.
syntax:
<Series Object> [start : stop : step]
Since the Data Labels of any series can be duplicate, hence we will see the slicing for unique and duplicate data labels separately.
Note: In this type of slicing both the start and stop end will be included in the result.
i) With unique data labels
Check the following example, in this example all the data labels of student Series are unique.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | >>> print(student) S1 BOB S2 JHON S3 RAM S4 MOHAN dtype: object >>> student['S1':'S4':2] S1 BOB S3 RAM dtype: object >>> student['S4':'S1':1] Series([], dtype: object) >>> student['S4':'S1':-1] S4 MOHAN S3 RAM S2 JHON S1 BOB dtype: object |
ii) With duplicate data labels
Check the following example, student Series is having two similar Data Labels S1. If we are doing the slicing on a non-unique data Label, we will face the error as we are facing in the below given example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | >>> print(student) S1 BOB S2 JHON S3 RAM S1 MOHAN dtype: object >>> student['S1':'S2'] KeyError: "Cannot get left slice bound for non-unique label: 'S1'" >>> student['S2':'S3'] S2 JHON S3 RAM dtype: object |
2. Using ".loc" attribute
Access a group of rows and columns by label(s) or a Boolean array.
- Series.loc[ start : stop : step ]
- Series.loc[[<list of labels>]]
Consider the following Series Object:
1 2 3 4 5 | import pandas as pd student = pd.Series( data = ["BOB", "JHON", "RAM", "MOHAN"], index = ['S1','S2','S3','S4']) print(student) |
- Series.loc[ start : stop : step ] : Using this you can extract series slices using series index names with providing the range. Here start is the start index, stop is till where you want to extract the slice and step is the step size when you read the data. Data will be printed up to stop.
Example:
1 2 3 4 5 6 7 8
student.loc['S1':'S4':2] ''' S1 BOB S3 RAM dtype: object '''
- Series.loc[[<list of labels>]] : If you want to access particular elements of a Series object you can use this type of loc attribute. Here you have to provide the index in the form of a list.
Example:
1 2 3 4 5 6 7
student.loc[['S1','S4']] ''' S1 BOB S4 MOHAN dtype: object '''
3. Using ".iloc" attribute
Using iloc attribute : Purely integer-location-based indexing for selection by position.
- Series.iloc[ start : stop : step ]
- Series.iloc[[<list of positions>]]
Consider the following Series Object:
1 2 3 4 5 | import pandas as pd student = pd.Series( data = ["BOB", "JHON", "RAM", "MOHAN"], index = ['S1','S2','S3','S4']) print(student) |
- Series.iloc[ start : stop : step ] : Using this you can extract series slices using series index positions with providing the range. Here start is the start index, stop is till where you want to extract the slice and step is the step size when you read the data. Data will be printed up to stop-1.
Example:
1 2 3 4 5 6 7 8
student.iloc[0:3:1] ''' S1 BOB S2 JHON S3 RAM dtype: object '''
- Series.iloc[[<list of positions>]] : If you want to access particular elements of a Series object you can use this type of iloc attribute. Here you have to provide the index positions in the form of a list.
Example:
1 2 3 4 5 6 7 8
student.iloc[[1,2,0]] ''' S2 JHON S3 RAM S1 BOB dtype: object '''
Watch the following video lecture to know more:
No comments:
Post a Comment