PyFireSQL is a SQL-like programming interface to query Cloud Firestore collections using Python.
There is no formal query language to Cloud Firestore - NoSQL collection/document structure. For many instances, we need to use the useful but clunky Firestore UI to navigate, scroll and filter through the endless records. With the UI, we have no way to extract the found documents. Even though we attempted to extract and update by writing a unique program for the specific task, we felt many scripts are almost the same that something must be done to limit the endless program writing. What if we can use SQL-like statements to perform the data extraction, which is both formal and reusable? - This idea will be the motivation for the FireSQL language!
Even though we see no relational data model of (table, row, column), we can easily see the equivalent between table -> collection, row -> document and column -> field in the Firestore data model. The SQL-like statement can be transformed accordingly.
To install PyFireSQL from PyPi,
pip install pyfiresql
To install from PyFireSQL source, checkout the project
cd PyFireSQL
# install require packages
pip install -r requirements.txt
# install (optional) development require packages
pip install -r requirements_dev.txt
python setup.py install
The set of implemented SQL-like DML (Data Manipulation Language) statements are,
FireSQL Statement | Description |
---|---|
SELECT | select documents from a collection |
INSERT | insert new document in a collection |
UPDATE | modify the existing documents in a collection |
DELETE | delete existing documents in a collection |
Please read the details in the corresponding PyFireSQL Documentation @readthedocs.
See
firesql/sql/grammar/firesql.lark
for the FireSQL grammar specification.
We provided a simple firebase SQL interface class that can be easily applied a SQL statement to fetch from Firebase collections.
In PyFireSQL, we offer a simple programming interface to parse and execute firebase SQL.
from firesql.firebase import FirebaseClient
from firesql.sql import FireSQL
# make connection to Cloud Firestore
client = FirebaseClient()
client.connect(credentials_json='credentials.json')
# query via the FireSQL interface - the results are in list of docs (Dict)
query = "SELECT * FROM Users WHERE state = 'ACTIVE'"
fireSQL = FireSQL()
docs = fireSQL.execute(client, query)
After fireSQL.execute()
query completed, the results are a list of docs (as Dict) that satisfied the query.
Then we can pass the list of docs to render into any output format, in our case, the DocPrinter
object can output csv
or json
with the select fields.
from firesql.sql.doc_printer import DocPrinter
docPrinter = DocPrinter()
if format == 'csv':
docPrinter.printCSV(docs, fireSQL.select_fields())
else:
docPrinter.printJSON(docs, fireSQL.select_fields())
For further post-processing, we can use Pandas's Dataframe to perform any data analysis, grouping, sorting and calculations. The list of docs (as Dict) can be directly imported into Dataframe! very convenience.
import pandas as pd
df = pd.DataFrame(docs)
In addition, we provide an interface script firesql-query.py
to accept an FireSQL statement.
usage: firesql-query.py [-h] [-c CREDENTIALS] [-f FORMAT] [-i INPUT]
[-q QUERY]
optional arguments:
-h, --help show this help message and exit
-c CREDENTIALS, --credentials CREDENTIALS
credentials JSON path
-f FORMAT, --format FORMAT
output format (csv|json)
-i INPUT, --input INPUT
FireSQL query input file (required)
-q QUERY, --query QUERY
FireSQL query (required)
For example, finding all ACTIVE
users from Users collection
python firesql-query.py -c credential.json \
-q "SELECT docid, email, state FROM Users WHERE state IN ('ACTIVE')"
docid
is a special column name that is used to project the Firestore document ID.
The default query result is rendered in "csv" output format.
"docid","email","state"
"0r6YWowe9rW65yB1qTKsCe83cCm2","btscheung+real@gmail.com","ACTIVE"
"1utcUa9fdheOlrMe9GOCjrJ3wjh1","btscheung+bennycorp@gmail.com","ACTIVE"
"7CUJOqe6rlOTQuatc27EQGivZfn2","btscheung+twotwo@gmail.com","ACTIVE"
...
Alternatively, by specifying the -f json
output format, the result will be,
[
{"docid": "0r6YWowe9rW65yB1qTKsCe83cCm2", "email": "btscheung+real@gmail.com", "state": "ACTIVE"},
{"docid": "1utcUa9fdheOlrMe9GOCjrJ3wjh1", "email": "btscheung+bennycorp@gmail.com", "state": "ACTIVE"},
{"docid": "7CUJOqe6rlOTQuatc27EQGivZfn2", "email": "btscheung+twotwo@gmail.com", "state": "ACTIVE"},
...
]
For more complicated SQL, we can use -i input.sql
to specify the SQL input file.
input.sql
file:
SELECT u.email, u.state, b.date, b.state
FROM
Users as u JOIN Bookings as b
ON u.email = b.email
WHERE
u.state IN ('ACTIVE') and
b.state IN ('CHECKED_IN', 'CHECKED_OUT') and
b.date >= '2022-03-18T04:00:00'
By execute the input file
python firesql-query.py -c credentials.json -i input.sql
The result will be,
NOTE: the column
state
fromUsers
will be automatically disambiguated by appending the alias prefixu_state
.
"email","u_state","date","state"
"btscheung+bennycorp@gmail.com","ACTIVE","2022-03-18T04:00:00","CHECKED_IN"
"btscheung+bennycorp@gmail.com","ACTIVE","2022-03-18T04:00:00","CHECKED_IN"
"btscheung+hill6@gmail.com","ACTIVE","2022-03-31T04:00:00","CHECKED_IN"
...
Create a source distribution and upload to PyPi
rm -rf dist
rm -rf build
python setup.py sdist
twine upload dist/*