5.12. Series String

5.12.1. SetUp

>>> import pandas as pd
>>>
>>>
>>> df = pd.DataFrame([
...     {'firstname': 'Alice', 'lastname': 'Apricot', 'email': 'alice@example.com'},
...     {'firstname': 'Bob', 'lastname': 'Blackthorn', 'email': 'bob@example.com'},
...     {'firstname': 'Carol', 'lastname': 'Corn', 'email': 'carol@example.com'},
...     {'firstname': 'Dave', 'lastname': 'Durian', 'email': 'dave@example.org'},
...     {'firstname': 'Eve', 'lastname': 'Elderberry', 'email': 'eve@example.org'},
...     {'firstname': 'Mallory', 'lastname': 'Melon', 'email': pd.NA},
... ]).convert_dtypes()
>>>
>>> df
  firstname    lastname              email
0     Alice     Apricot  alice@example.com
1       Bob  Blackthorn    bob@example.com
2     Carol        Corn  carol@example.com
3      Dave      Durian   dave@example.org
4       Eve  Elderberry    eve@example.org
5   Mallory       Melon               <NA>
>>>
>>> df.info(memory_usage='deep')
<class 'pandas.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   firstname  6 non-null      string
 1   lastname   6 non-null      string
 2   email      5 non-null      string
dtypes: string(3)
memory usage: 426.0 bytes

5.12.2. Lower

>>> df['firstname'].str.lower()
0      alice
1        bob
2      carol
3       dave
4        eve
5    mallory
Name: firstname, dtype: string

5.12.3. Upper

>>> df['firstname'].str.upper()
0      ALICE
1        BOB
2      CAROL
3       DAVE
4        EVE
5    MALLORY
Name: firstname, dtype: string

5.12.4. Title

>>> df['firstname'].str.title()
0      Alice
1        Bob
2      Carol
3       Dave
4        Eve
5    Mallory
Name: firstname, dtype: string

5.12.5. Replace

>>> df['firstname'].str.replace('a', 'X')
0      Alice
1        Bob
2      CXrol
3       DXve
4        Eve
5    MXllory
Name: firstname, dtype: string

5.12.6. Split

>>> df['email'].str.split('@')
0    [alice, example.com]
1      [bob, example.com]
2    [carol, example.com]
3     [dave, example.org]
4      [eve, example.org]
5                    <NA>
Name: email, dtype: object
>>> df['email'].str.split('@', expand=True)
       0            1
0  alice  example.com
1    bob  example.com
2  carol  example.com
3   dave  example.org
4    eve  example.org
5   <NA>         <NA>

5.12.7. Extract

>>> df['email'].str.extract(r'([a-z]+)@example.com')
       0
0  alice
1    bob
2  carol
3   <NA>
4   <NA>
5   <NA>