Skip to content

Latest commit

 

History

History
 
 

README.md

Export Collections to BigQuery

Author: Firebase (https://firebase.google.com)

Description: Sends realtime, incremental updates from a specified Cloud Firestore collection to BigQuery.

Details: Use this extension to export the documents in a Cloud Firestore collection to BigQuery. Exports are realtime and incremental, so the data in BigQuery is a mirror of your content in Cloud Firestore.

The extension creates and updates a dataset containing the following two BigQuery resources:

  • A table of raw data that stores a full change history of the documents within your collection. This table includes a number of metadata fields so that BigQuery can display the current state of your data. The principle metadata fields are timestamp, document_name, and the operation for the document change.
  • A view which represents the current state of the data within your collection. It also shows a log of the latest operation for each document (CREATE, UPDATE, or IMPORT).

If you create, update, delete, or import a document in the specified collection, this extension sends that update to BigQuery. You can then run queries on this mirrored dataset.

Note that this extension only listens for document changes in the collection, but not changes in any subcollection. You can, though, install additional instances of this extension to specifically listen to a subcollection or other collections in your database. Or if you have the same subcollection across documents in a given collection, you can use {wildcard} notation to listen to all those subcollections (for example: chats/{chatid}/posts).

Additional setup

Before installing this extension, you'll need to:

Backfill your BigQuery dataset

This extension only sends the content of documents that have been changed -- it does not export your full dataset of existing documents into BigQuery. So, to backfill your BigQuery dataset with all the documents in your collection, you can run the import script provided by this extension.

Important: Run the import script over the entire collection after installing this extension, otherwise all writes to your database during the import might be lost.

Generate schema views

After your data is in BigQuery, you can run the schema-views script (provided by this extension) to create views that make it easier to query relevant data. You only need to provide a JSON schema file that describes your data structure, and the schema-views script will create the views.

Billing

To install an extension, your project must be on the Blaze (pay as you go) plan

  • You will be charged a small amount (typically around $0.01/month) for the Firebase resources required by this extension (even if it is not used).
  • This extension uses other Firebase and Google Cloud Platform services, which have associated charges if you exceed the service’s free tier:
    • BigQuery (this extension writes to BigQuery with streaming inserts)
    • Cloud Firestore
    • Cloud Functions (Node.js 10+ runtime. See FAQs)

Configuration Parameters:

  • Cloud Functions location: Where do you want to deploy the functions created for this extension? You usually want a location close to your database. For help selecting a location, refer to the location selection guide.

  • BigQuery Dataset location: Where do you want to deploy the BigQuery dataset created for this extension? For help selecting a location, refer to the location selection guide.

  • Collection path: What is the path of the collection that you would like to export? You may use {wildcard} notation to match a subcollection of all documents in a collection (for example: chatrooms/{chatid}/posts).

  • Dataset ID: What ID would you like to use for your BigQuery dataset? This extension will create the dataset, if it doesn't already exist.

  • Table ID: What identifying prefix would you like to use for your table and view inside your BigQuery dataset? This extension will create the table and view, if they don't already exist.

  • BigQuery SQL table partitioning option: This parameter will allow you to partition the BigQuery table and BigQuery view created by the extension based on data ingestion time. You may select the granularity of partitioning based upon one of: HOUR, DAY, MONTH, YEAR. This will generate one partition per day, hour, month or year, respectively.

Cloud Functions:

  • fsexportbigquery: Listens for document changes in your specified Cloud Firestore collection, then exports the changes into BigQuery.

APIs Used:

  • bigquery-json.googleapis.com (Reason: Mirrors data from your Cloud Firestore collection in BigQuery.)

Access Required:

This extension will operate with the following project IAM roles:

  • bigquery.dataEditor (Reason: Allows the extension to configure and export data into BigQuery.)