2.3. About Types
2.3.1. SetUp
>>> import pandas as pd
2.3.2. pd.Series
1-dimensional data structure similar to
ndarrayHas index
Can have name
>>> s = pd.Series(['Alice', 'Bob', 'Carol', 'Dave', 'Eve', 'Mallory'])
>>>
>>> s
0 Alice
1 Bob
2 Carol
3 Dave
4 Eve
5 Mallory
dtype: str
2.3.3. pd.DataFrame
2-dimensional object
All columns share the same index
List of
SeriesEach column must have name
Operations can be executed on columns or rows
>>> df = pd.DataFrame({
... 'firstnames': ['Alice', 'Bob', 'Carol', 'Dave', 'Eve', 'Mallory'],
... 'lastnames': ['Apricot', 'Blackthorn', 'Corn', 'Durian', 'Elderberry', 'Melon'],
... 'age': [30, 31, 32, 33, 34, 15],
... })
>>>
>>> df
firstnames lastnames age
0 Alice Apricot 30
1 Bob Blackthorn 31
2 Carol Corn 32
3 Dave Durian 33
4 Eve Elderberry 34
5 Mallory Melon 15
2.3.4. pd.Timestamp
pd.Timestamp('2000-01-01')pd.Timestamp('2000-01-01', unit='s', tz='Europe/Warsaw')
>>> pd.Timestamp('2000-01-01')
Timestamp('2000-01-01 00:00:00')
>>> pd.Timestamp('2000-01-01', unit='s', tz='Europe/Warsaw')
Timestamp('2000-01-01 00:00:00+0100', tz='Europe/Warsaw')
2.3.5. pd.Timedelta
pd.Timedelta('4 days 20 hours 15 minutes')
>>> pd.Timestamp('2000-01-01 00:00:00') + pd.Timedelta('4 days 20 hours 15 minutes')
Timestamp('2000-01-05 20:15:00')
2.3.6. pd.DateOffset
>>> mar = pd.Timestamp('2000-03-01 00:00:00')
>>> mar - pd.DateOffset(days=1)
Timestamp('2000-02-29 00:00:00')
2.3.7. pd.NA
2.3.8. pd.Interval
Definition:
>>> digit = pd.Interval(left=0, right=9, closed='both')
>>>
>>> digit
Interval(0, 9, closed='both')
Contains:
>>> 5 in digit
True
>>>
>>> 10 in digit
False
Interval between Timestamps:
>>> year = pd.Interval(left=pd.Timestamp('2000-01-01 00:00:00'),
... right=pd.Timestamp('2001-01-01 00:00:00'),
... closed='left')
>>>
>>> event1 = pd.Timestamp('1999-01-05')
>>> event2 = pd.Timestamp('2000-01-05')
>>>
>>> event1 in year
False
>>>
>>> event2 in year
True
>>>
>>> year.length
Timedelta('366 days 00:00:00')
2.3.9. pd.Categorical
Limited, fixed set of values
groups = pd.Categorical(['users', 'staff', 'admins'])
>>> groups = pd.Categorical(['users', 'staff', 'admins'])
>>>
>>> groups
['users', 'staff', 'admins']
Categories (3, str): ['admins', 'staff', 'users']
>>>
>>> 'managers' in groups
False
2.3.10. Use Case - 1
>>> status = pd.Categorical(['todo', 'done', 'todo', 'done'])
>>>
>>> status
['todo', 'done', 'todo', 'done']
Categories (2, str): ['done', 'todo']
>>>
>>> 'in progress' in status
False
>>>
>>> 'todo' in status
True
>>>
>>> status.categories
Index(['done', 'todo'], dtype='str')
2.3.11. Use Case - 2
>>> moon_landings = pd.Categorical(['apollo11', 'apollo12', 'apollo14',
... 'apollo15', 'apollo16', 'apollo17'])
>>>
>>> moon_landings
['apollo11', 'apollo12', 'apollo14', 'apollo15', 'apollo16', 'apollo17']
Categories (6, str): ['apollo11', 'apollo12', 'apollo14', 'apollo15', 'apollo16', 'apollo17']
>>>
>>> 'apollo11' in moon_landings
True
>>>
>>> 'apollo13' in moon_landings
False
>>>
>>> moon_landings.categories
Index(['apollo11', 'apollo12', 'apollo14', 'apollo15', 'apollo16', 'apollo17'], dtype='str')
2.3.12. Use Case - 3
>>> fiscalyear2020 = pd.Interval(
... left=pd.Timestamp('2020-01-01'),
... right=pd.Timestamp('2021-01-01'),
... closed='left')
>>>
>>> fiscalyear2021 = pd.Interval(
... left=pd.Timestamp('2021-01-01'),
... right=pd.Timestamp('2022-01-01'),
... closed='left')
>>>
>>>
>>> event1 = pd.Timestamp('2020-04-12')
>>> event2 = pd.Timestamp('2021-07-21')
>>>
>>> event1 in fiscalyear2020
True
>>> event1 in fiscalyear2021
False
>>> event2 in fiscalyear2020
False
>>> event2 in fiscalyear2021
True
2.3.13. Use Case - 4
>>> year_1970 = pd.Interval(left=pd.Timestamp('1970-01-01 00:00:00'),
... right=pd.Timestamp('1971-01-01 00:00:00'),
... closed='left')
>>>
>>> apollo11 = pd.Timestamp('1969-07-16')
>>> apollo13 = pd.Timestamp('1970-04-11')
>>>
>>> apollo11 in year_1970
False
>>>
>>> apollo13 in year_1970
True
>>>
>>> year_1970.length
Timedelta('365 days 00:00:00')
2.3.14. Use Case - 5
>>> colors = pd.Categorical(['red', 'green', 'blue'])
2.3.15. Use Case - 6
>>> ages = pd.Categorical(['child', 'teen', 'adult', 'senior'], ordered=True)
>>>
>>> ages
['child', 'teen', 'adult', 'senior']
Categories (4, str): ['adult' < 'child' < 'senior' < 'teen']