5.12. Series String

5.12.1. SetUp

>>> import pandas as pd
>>>
>>>
>>> df = pd.DataFrame([
...     {'firstname': 'Alice', 'lastname': 'Apricot', 'email': 'alice@example.com'},
...     {'firstname': 'Bob', 'lastname': 'Blackthorn', 'email': 'bob@example.com'},
...     {'firstname': 'Carol', 'lastname': 'Corn', 'email': 'carol@example.com'},
...     {'firstname': 'Dave', 'lastname': 'Durian', 'email': 'dave@example.org'},
...     {'firstname': 'Eve', 'lastname': 'Elderberry', 'email': 'eve@example.org'},
...     {'firstname': 'Mallory', 'lastname': 'Melon', 'email': pd.NA},
... ]).convert_dtypes()
>>>
>>> df
  firstname    lastname              email
0     Alice     Apricot  alice@example.com
1       Bob  Blackthorn    bob@example.com
2     Carol        Corn  carol@example.com
3      Dave      Durian   dave@example.org
4       Eve  Elderberry    eve@example.org
5   Mallory       Melon               <NA>
>>>
>>> df.info(memory_usage='deep')
<class 'pandas.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   firstname  6 non-null      string
 1   lastname   6 non-null      string
 2   email      5 non-null      string
dtypes: string(3)
memory usage: 426.0 bytes

5.12.2. Lower

>>> df['firstname'].str.lower()
    alice
      bob
    carol
     dave
      eve
  mallory
Name: firstname, dtype: string

5.12.3. Upper

>>> df['firstname'].str.upper()
    ALICE
      BOB
    CAROL
     DAVE
      EVE
  MALLORY
Name: firstname, dtype: string

5.12.4. Title

>>> df['firstname'].str.title()
    Alice
      Bob
    Carol
     Dave
      Eve
  Mallory
Name: firstname, dtype: string

5.12.5. Replace

>>> df['firstname'].str.replace('a', 'X')
    Alice
      Bob
    CXrol
     DXve
      Eve
  MXllory
Name: firstname, dtype: string

5.12.6. Split

>>> df['email'].str.split('@')
  [alice, example.com]
    [bob, example.com]
  [carol, example.com]
   [dave, example.org]
    [eve, example.org]
                  <NA>
Name: email, dtype: object

>>> df['email'].str.split('@', expand=True)
       0            1
alice  example.com
  bob  example.com
carol  example.com
 dave  example.org
  eve  example.org
 <NA>         <NA>

5.12.7. Extract

>>> df['email'].str.extract(r'([a-z]+)@example.com')
       0
alice
  bob
carol
 <NA>
 <NA>
 <NA>