2.3. About Types

2.3.1. SetUp

>>> import pandas as pd

2.3.2. pd.Series

  • 1-dimensional data structure similar to ndarray

  • Has index

  • Can have name

>>> s = pd.Series(['Alice', 'Bob', 'Carol', 'Dave', 'Eve', 'Mallory'])
>>>
>>> s
0      Alice
1        Bob
2      Carol
3       Dave
4        Eve
5    Mallory
dtype: str

2.3.3. pd.DataFrame

  • 2-dimensional object

  • All columns share the same index

  • List of Series

  • Each column must have name

  • Operations can be executed on columns or rows

>>> df = pd.DataFrame({
...     'firstnames': ['Alice', 'Bob', 'Carol', 'Dave', 'Eve', 'Mallory'],
...     'lastnames': ['Apricot', 'Blackthorn', 'Corn', 'Durian', 'Elderberry', 'Melon'],
...     'age': [30, 31, 32, 33, 34, 15],
... })
>>>
>>> df
  firstnames   lastnames  age
0      Alice     Apricot   30
1        Bob  Blackthorn   31
2      Carol        Corn   32
3       Dave      Durian   33
4        Eve  Elderberry   34
5    Mallory       Melon   15

2.3.4. pd.Timestamp

  • pd.Timestamp('2000-01-01')

  • pd.Timestamp('2000-01-01', unit='s', tz='Europe/Warsaw')

>>> pd.Timestamp('2000-01-01')
Timestamp('2000-01-01 00:00:00')
>>> pd.Timestamp('2000-01-01', unit='s', tz='Europe/Warsaw')
Timestamp('2000-01-01 00:00:00+0100', tz='Europe/Warsaw')

2.3.5. pd.Timedelta

  • pd.Timedelta('4 days 20 hours 15 minutes')

>>> pd.Timestamp('2000-01-01 00:00:00') + pd.Timedelta('4 days 20 hours 15 minutes')
Timestamp('2000-01-05 20:15:00')

2.3.6. pd.DateOffset

>>> mar = pd.Timestamp('2000-03-01 00:00:00')
>>> mar - pd.DateOffset(days=1)
Timestamp('2000-02-29 00:00:00')

2.3.7. pd.NA

2.3.8. pd.Interval

Definition:

>>> digit = pd.Interval(left=0, right=9, closed='both')
>>>
>>> digit
Interval(0, 9, closed='both')

Contains:

>>> 5 in digit
True
>>>
>>> 10 in digit
False

Interval between Timestamps:

>>> year = pd.Interval(left=pd.Timestamp('2000-01-01 00:00:00'),
...                    right=pd.Timestamp('2001-01-01 00:00:00'),
...                    closed='left')
>>>
>>> event1 = pd.Timestamp('1999-01-05')
>>> event2 = pd.Timestamp('2000-01-05')
>>>
>>> event1 in year
False
>>>
>>> event2 in year
True
>>>
>>> year.length
Timedelta('366 days 00:00:00')

2.3.9. pd.Categorical

  • Limited, fixed set of values

  • groups = pd.Categorical(['users', 'staff', 'admins'])

>>> groups = pd.Categorical(['users', 'staff', 'admins'])
>>>
>>> groups
['users', 'staff', 'admins']
Categories (3, str): ['admins', 'staff', 'users']
>>>
>>> 'managers' in groups
False

2.3.10. Use Case - 1

>>> status = pd.Categorical(['todo', 'done', 'todo', 'done'])
>>>
>>> status
['todo', 'done', 'todo', 'done']
Categories (2, str): ['done', 'todo']
>>>
>>> 'in progress' in status
False
>>>
>>> 'todo' in status
True
>>>
>>> status.categories
Index(['done', 'todo'], dtype='str')

2.3.11. Use Case - 2

>>> moon_landings = pd.Categorical(['apollo11', 'apollo12', 'apollo14',
...                                 'apollo15', 'apollo16', 'apollo17'])
>>>
>>> moon_landings
['apollo11', 'apollo12', 'apollo14', 'apollo15', 'apollo16', 'apollo17']
Categories (6, str): ['apollo11', 'apollo12', 'apollo14', 'apollo15', 'apollo16', 'apollo17']
>>>
>>> 'apollo11' in moon_landings
True
>>>
>>> 'apollo13' in moon_landings
False
>>>
>>> moon_landings.categories
Index(['apollo11', 'apollo12', 'apollo14', 'apollo15', 'apollo16', 'apollo17'], dtype='str')

2.3.12. Use Case - 3

>>> fiscalyear2020 = pd.Interval(
...     left=pd.Timestamp('2020-01-01'),
...     right=pd.Timestamp('2021-01-01'),
...     closed='left')
>>>
>>> fiscalyear2021 = pd.Interval(
...     left=pd.Timestamp('2021-01-01'),
...     right=pd.Timestamp('2022-01-01'),
...     closed='left')
>>>
>>>
>>> event1 = pd.Timestamp('2020-04-12')
>>> event2 = pd.Timestamp('2021-07-21')
>>>
>>> event1 in fiscalyear2020
True
>>> event1 in fiscalyear2021
False
>>> event2 in fiscalyear2020
False
>>> event2 in fiscalyear2021
True

2.3.13. Use Case - 4

>>> year_1970 = pd.Interval(left=pd.Timestamp('1970-01-01 00:00:00'),
...                         right=pd.Timestamp('1971-01-01 00:00:00'),
...                         closed='left')
>>>
>>> apollo11 = pd.Timestamp('1969-07-16')
>>> apollo13 = pd.Timestamp('1970-04-11')
>>>
>>> apollo11 in year_1970
False
>>>
>>> apollo13 in year_1970
True
>>>
>>> year_1970.length
Timedelta('365 days 00:00:00')

2.3.14. Use Case - 5

>>> colors = pd.Categorical(['red', 'green', 'blue'])

2.3.15. Use Case - 6

>>> ages = pd.Categorical(['child', 'teen', 'adult', 'senior'], ordered=True)
>>>
>>> ages
['child', 'teen', 'adult', 'senior']
Categories (4, str): ['adult' < 'child' < 'senior' < 'teen']