Streaming json data to Big Query using Rails 6

angga kusumandaru
3 min readMar 14, 2020

How to stream data using job and test it using RSpec.

Bigquery is SAAS using REST api to managed data warehouse which provided by google. It can be combine with mapreduce and have machine learning capability.

This topic is try to send data to bigQuery, there is any way to send data using ruby, such as:

# upload using csv
table.load "gs://my-bucket/file-name.csv"
# load using json
table.insert data_rows

For these tutorial we try to send via last one (streaming json), for complete description see this link

Step 1: Create service account and install gemfile

It assume you already install Rails 6. First of all make sure some of gem below are already insert in Gemfile.lock. rspec and webmock used for test bigquery service

gem 'google-cloud-bigquery'group :test do
gem 'rspec-rails'
gem 'webmock'end
end

After that, install with command bundle install .

These gem using key from Service Account, before we create key, we need to define Role for these service account, since we only need to steam data and avoid alter or delete table, we register these roles:

Roles for stream data

Example give this roles name BigQuery Stream

Back into service account create new one, than assign role into this service,

service account generation

Choose json file for key generation, save into local computer. (Securing these file, and never put these file on repository)

Step 2: Create service for stream account

Set environment variable on for path and project_id, also make sure credential json file not on same project, so it not accidentally push on repository.

project ID
BIGQUERY_CREDENTIAL_PATH=/path-to-file/bigquery.json
BIGQUERY_PROJECT_ID=bigquery-test-270003

After that create base service file to connect to bigquery, like base.rb

base.rb

We load library bigquery also use active model for catch error on header filer

We load configuration using environment variable and initialize instance on initialise method.

Since we make base class as superclass and inheritance dataset and table id we must define on base.rb and than we load these dataset and table, after that we call method to send json data into bigquery.

We need check if table is exist on bigquery before we stream data, and than we send data into bigquery and check if response is success or not, if response file than we catch errors to show to user and return false value, otherwise we return true response.

Than we can define subclass for sent data, for example, we try send user model

user_service.rb

We then define DATASET_ID and TABLE_ID for these service and convert model into json using as_json method

we define shared example to mock bigquery, catch response form response and define on json for each request

big_query_example.rb

and we create spec by test this class:

user_service_spec.rb

For final step we create job so, streaming data running on background task

user_stream_job.rb

We check model is exist before we send and we call service to run job

for spec test we check using these command

user_stream_job_spec.rb

And you can call by using command:

BigQuery::UserStreamJob.perform_later(some_user.id)

Check on bigquery console when data is succesfully inserted

user table

--

--