3.9. Read SQL

  • File paths works also with URLs

  • SQL functions uses SQLAlchemy, which supports many RDBMS

  • Read SQL query or database table into a DataFrame

3.9.1. SetUp

>>> import pandas as pd
>>>
>>> pd.set_option('display.max_columns', 50)
>>> pd.set_option('display.max_rows', 200)
>>> pd.set_option('display.width', 500)
>>> pd.set_option('display.memory_usage', 'deep')
>>> pd.set_option('display.precision', 4)

3.9.2. Read SQL

  • pd.read_sql()

  • Read SQL query or database table into a DataFrame.

  • This function is a convenience wrapper around read_sql_table and read_sql_query (for backward compatibility)

  • A SQL query will be routed to read_sql_query

  • While a database table name will be routed to read_sql_table.

>>> pd.read_sql()

3.9.3. Read SQL Query

  • pd.read_sql_query()

  • Read SQL query into a DataFrame

>>> pd.read_sql_query()

3.9.4. Read SQL Table

  • pd.read_sql_table()

  • Read SQL database table into a DataFrame

>>> pd.read_sql_table()

3.9.5. Example

>>> import sqlite3
>>> import requests
>>>
>>>
>>> DATA = 'https://python3.info/_static/apollo11.db'
>>> DATABASE = '/tmp/myfile.db'
>>>
>>> SQL = """
...     SELECT *
...     FROM apollo11
... """
>>>
>>>
>>> # Download database
>>> with open(DATABASE, mode='wb') as db:
...     resp = requests.get(DATA)
...     db.write(resp.content)
49152
>>>
>>>
>>> # Read data from database
>>> with sqlite3.connect(DATABASE) as db:
...     df = pd.read_sql(SQL, db, parse_dates=['datetime', 'date'], index_col='datetime')
>>> df.info(memory_usage='deep')
<class 'pandas.DataFrame'>
DatetimeIndex: 250 entries, 1969-07-14 21:00:00 to 1969-08-27 00:00:00
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype
---  ------    --------------  -----
 0   date      250 non-null    datetime64[us]
 1   time      250 non-null    str
 2   met       250 non-null    int64
 3   category  250 non-null    str
 4   event     250 non-null    str
dtypes: datetime64[us](1), int64(1), str(3)
memory usage: 25.7 KB
>>> df['event'].head(n=5)
datetime
1969-07-14 21:00:00                          Terminal countdown started.
1969-07-15 16:00:00                 Scheduled 11-hour hold at T-9 hours.
1969-07-16 03:00:00                      Countdown resumed at T-9 hours.
1969-07-16 08:30:00    Scheduled 1-hour 32-minute hold at T-3 hours 3...
1969-07-16 10:02:00           Countdown resumed at T-3 hours 30 minutes.
Name: event, dtype: str

3.9.6. Use Case - 1

SetUp:

>>> import sqlite3
>>> import pandas as pd
>>>
>>> DATABASE = '/tmp/myfile.db'

Prepare data:

>>> DATA = [
...     {'firstname': 'Alice', 'lastname': 'Apricot', 'age': 30},
...     {'firstname': 'Bob', 'lastname': 'Blackthorn', 'age': 31},
...     {'firstname': 'Carol', 'lastname': 'Corn', 'age': 32},
...     {'firstname': 'Dave', 'lastname': 'Durian', 'age': 33},
...     {'firstname': 'Eve', 'lastname': 'Elderberry', 'age': 34},
...     {'firstname': 'Mallory', 'lastname': 'Melon', 'age': 15},
... ]

Prepare SQL statements:

>>> SQL_CREATE = """
...     CREATE TABLE users (
...         id INTEGER PRIMARY KEY AUTOINCREMENT,
...         firstname TEXT NOT NULL,
...         lastname TEXT NOT NULL,
...         age INTEGER NOT NULL
...     );
... """
>>> SQL_INSERT = """
...     INSERT INTO users (firstname, lastname, age)
...     VALUES (:firstname, :lastname, :age);
... """
>>> SQL_SELECT = """
...     SELECT *
...     FROM users;
... """

Write data to database:

>>> with sqlite3.connect(DATABASE) as db:
...     _ = db.execute(SQL_CREATE)
...     _ = db.executemany(SQL_INSERT, DATA)

Read data from database to Pandas DataFrame:

>>> with sqlite3.connect(DATABASE) as db:
...     df = pd.read_sql(SQL_SELECT, db, index_col='id')

Result:

>>> df
   firstname    lastname  age
id
1      Alice     Apricot   30
2        Bob  Blackthorn   31
3      Carol        Corn   32
4       Dave      Durian   33
5        Eve  Elderberry   34
6    Mallory       Melon   15

3.9.7. Assignments