3.9. Read SQL
File paths works also with URLs
SQL functions uses SQLAlchemy, which supports many RDBMS
Read SQL query or database table into a DataFrame
3.9.1. SetUp
>>> import pandas as pd
>>>
>>> pd.set_option('display.max_columns', 50)
>>> pd.set_option('display.max_rows', 200)
>>> pd.set_option('display.width', 500)
>>> pd.set_option('display.memory_usage', 'deep')
>>> pd.set_option('display.precision', 4)
3.9.2. Read SQL
pd.read_sql()Read SQL query or database table into a DataFrame.
This function is a convenience wrapper around
read_sql_tableandread_sql_query(for backward compatibility)A SQL query will be routed to
read_sql_queryWhile a database table name will be routed to
read_sql_table.
>>> pd.read_sql()
3.9.3. Read SQL Query
pd.read_sql_query()Read SQL query into a DataFrame
>>> pd.read_sql_query()
3.9.4. Read SQL Table
pd.read_sql_table()Read SQL database table into a DataFrame
>>> pd.read_sql_table()
3.9.5. Example
>>> import sqlite3
>>> import requests
>>>
>>>
>>> DATA = 'https://python3.info/_static/apollo11.db'
>>> DATABASE = '/tmp/myfile.db'
>>>
>>> SQL = """
... SELECT *
... FROM apollo11
... """
>>>
>>>
>>> # Download database
>>> with open(DATABASE, mode='wb') as db:
... resp = requests.get(DATA)
... db.write(resp.content)
49152
>>>
>>>
>>> # Read data from database
>>> with sqlite3.connect(DATABASE) as db:
... df = pd.read_sql(SQL, db, parse_dates=['datetime', 'date'], index_col='datetime')
>>> df.info(memory_usage='deep')
<class 'pandas.DataFrame'>
DatetimeIndex: 250 entries, 1969-07-14 21:00:00 to 1969-08-27 00:00:00
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 250 non-null datetime64[us]
1 time 250 non-null str
2 met 250 non-null int64
3 category 250 non-null str
4 event 250 non-null str
dtypes: datetime64[us](1), int64(1), str(3)
memory usage: 25.7 KB
>>> df['event'].head(n=5)
datetime
1969-07-14 21:00:00 Terminal countdown started.
1969-07-15 16:00:00 Scheduled 11-hour hold at T-9 hours.
1969-07-16 03:00:00 Countdown resumed at T-9 hours.
1969-07-16 08:30:00 Scheduled 1-hour 32-minute hold at T-3 hours 3...
1969-07-16 10:02:00 Countdown resumed at T-3 hours 30 minutes.
Name: event, dtype: str
3.9.6. Use Case - 1
SetUp:
>>> import sqlite3
>>> import pandas as pd
>>>
>>> DATABASE = '/tmp/myfile.db'
Prepare data:
>>> DATA = [
... {'firstname': 'Alice', 'lastname': 'Apricot', 'age': 30},
... {'firstname': 'Bob', 'lastname': 'Blackthorn', 'age': 31},
... {'firstname': 'Carol', 'lastname': 'Corn', 'age': 32},
... {'firstname': 'Dave', 'lastname': 'Durian', 'age': 33},
... {'firstname': 'Eve', 'lastname': 'Elderberry', 'age': 34},
... {'firstname': 'Mallory', 'lastname': 'Melon', 'age': 15},
... ]
Prepare SQL statements:
>>> SQL_CREATE = """
... CREATE TABLE users (
... id INTEGER PRIMARY KEY AUTOINCREMENT,
... firstname TEXT NOT NULL,
... lastname TEXT NOT NULL,
... age INTEGER NOT NULL
... );
... """
>>> SQL_INSERT = """
... INSERT INTO users (firstname, lastname, age)
... VALUES (:firstname, :lastname, :age);
... """
>>> SQL_SELECT = """
... SELECT *
... FROM users;
... """
Write data to database:
>>> with sqlite3.connect(DATABASE) as db:
... _ = db.execute(SQL_CREATE)
... _ = db.executemany(SQL_INSERT, DATA)
Read data from database to Pandas DataFrame:
>>> with sqlite3.connect(DATABASE) as db:
... df = pd.read_sql(SQL_SELECT, db, index_col='id')
Result:
>>> df
firstname lastname age
id
1 Alice Apricot 30
2 Bob Blackthorn 31
3 Carol Corn 32
4 Dave Durian 33
5 Eve Elderberry 34
6 Mallory Melon 15