5.12. Series String
5.12.1. SetUp
>>> import pandas as pd
>>>
>>>
>>> df = pd.DataFrame([
... {'firstname': 'Alice', 'lastname': 'Apricot', 'email': 'alice@example.com'},
... {'firstname': 'Bob', 'lastname': 'Blackthorn', 'email': 'bob@example.com'},
... {'firstname': 'Carol', 'lastname': 'Corn', 'email': 'carol@example.com'},
... {'firstname': 'Dave', 'lastname': 'Durian', 'email': 'dave@example.org'},
... {'firstname': 'Eve', 'lastname': 'Elderberry', 'email': 'eve@example.org'},
... {'firstname': 'Mallory', 'lastname': 'Melon', 'email': pd.NA},
... ]).convert_dtypes()
>>>
>>> df
firstname lastname email
0 Alice Apricot alice@example.com
1 Bob Blackthorn bob@example.com
2 Carol Corn carol@example.com
3 Dave Durian dave@example.org
4 Eve Elderberry eve@example.org
5 Mallory Melon <NA>
>>>
>>> df.info(memory_usage='deep')
<class 'pandas.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 firstname 6 non-null string
1 lastname 6 non-null string
2 email 5 non-null string
dtypes: string(3)
memory usage: 426.0 bytes
5.12.2. Lower
>>> df['firstname'].str.lower()
0 alice
1 bob
2 carol
3 dave
4 eve
5 mallory
Name: firstname, dtype: string
5.12.3. Upper
>>> df['firstname'].str.upper()
0 ALICE
1 BOB
2 CAROL
3 DAVE
4 EVE
5 MALLORY
Name: firstname, dtype: string
5.12.4. Title
>>> df['firstname'].str.title()
0 Alice
1 Bob
2 Carol
3 Dave
4 Eve
5 Mallory
Name: firstname, dtype: string
5.12.5. Replace
>>> df['firstname'].str.replace('a', 'X')
0 Alice
1 Bob
2 CXrol
3 DXve
4 Eve
5 MXllory
Name: firstname, dtype: string
5.12.6. Split
>>> df['email'].str.split('@')
0 [alice, example.com]
1 [bob, example.com]
2 [carol, example.com]
3 [dave, example.org]
4 [eve, example.org]
5 <NA>
Name: email, dtype: object
>>> df['email'].str.split('@', expand=True)
0 1
0 alice example.com
1 bob example.com
2 carol example.com
3 dave example.org
4 eve example.org
5 <NA> <NA>
5.12.7. Extract
>>> df['email'].str.extract(r'([a-z]+)@example.com')
0
0 alice
1 bob
2 carol
3 <NA>
4 <NA>
5 <NA>