Skip to content

Latest commit

 

History

History
147 lines (117 loc) · 7.35 KB

other_dialects.md

File metadata and controls

147 lines (117 loc) · 7.35 KB

Customizing the Block for Other Dialects

We built this block using Google BigQuery because the Google Public Datasets Program is making the relevant data available for free. But we know that many Looker users may want to use this Block with their own databases. So in order to make this block accessible for other databases we’ve outlined the steps you’ll need to take to adapt the Block below:

  • Get the relevant data into your database
  • Install the Looker Block through the Marketplace
  • Update the Block to match your database tables and synax

Part 1: Data loading The Data Block contains dynamic data (like daily case reports) that will need to be updated on a regular basis (mostly daily), as well as static datasets (population by country) that can just be loaded once.

Below you can find all the tables needed for the Block, with their names in the Block, as well as public addresses of where to find the comparable data as flat files in an S3 bucket being maintained by Starschema. For customers who use Snowflake, you can also make use of the Snowflake Data Exchange to access the Starschema tables directly without having to worry about loading them daily. (For instructions on how to set this up, see the docs.)

Dynamic Data

  1. JHU Data
  • Table name in LookML: bigquery-public-data.covid19_jhu_csse.summary
  • CSV file: s3://starschema.covid/JHU_COVID-19.csv
  • Snowflake Data Exchange: “[DATABASE_NAME_OF_SHARE]”.”PUBLIC”.”JHU_COVID_19”
  • Update Frequency: Daily
  1. NYT Data
  • Table name in LookML: bigquery-public-data.covid19_nyt.us_counties
  • CSV file: s3://starschema.covid/NYT_US_COVID19.csv
  • Snowflake Data Exchange: [DATABASE_NAME_OF_SHARE]”.”PUBLIC”.”NYT_US_COVID_19
  • Update Frequency: Daily
  1. Italy Province Data
  • Table name in LookML: bigquery-public-data.covid19_italy.data_by_province
  • CSV file: s3://starschema.covid/PCM_DPS_COVID19.csv
  • Update Frequency: Daily
  1. Italy Region Data
  • Table name in LookML:bigquery-public-data.covid19_italy.data_by_region
  • CSV file: s3://starschema.covid/PCM_DPS_COVID19-DETAILS.csv
  • Update Frequency: Daily
  1. Kaiser Family Foundation, US State Policies
  • Table name in LookML: lookerdata.covid19_block.state_policies
  • CSV file: s3://starschema.covid/KFF_US_POLICY_ACTIONS.csv
  • Update Frequency: Intermittent
  1. Kaiser Family Foundation, US State Mitigations
  • Table name in LookML: `lookerdata.covid19_block.state_mitigations
  • CSV file: s3://starschema.covid/KFF_US_STATE_MITIGATIONS.csv
  • Update Frequency: Intermittent
  1. COVID-19 Tracking Project
  • Table name in LookML: lookerdata.covid19_block.covid19_tracking_project
  • CSV file: s3://starschema.covid/CT_US_COVID_TESTS.csv
  • Update Frequency: Daily

Static Data The following static CSV files can be loaded once from S3 and do not need to be reloaded.

  1. Italy Province Stats Table name in LookML: lookerdata.covid19_block.italy_province_stats CSV file: https://looker-datablocks.s3.amazonaws.com/covid-19/italy_province.csv

  2. Italy Region Stats Table name in LookML: lookerdata.covid19_block.italy_region_stats CSV file: https://looker-datablocks.s3.amazonaws.com/covid-19/italy_regions.csv

  3. Population by State by Country This is the derived table generated by the view population_by_county_state_country CSV File: https://looker-datablocks.s3.amazonaws.com/covid-19/population_by_county_state_country_core.csv

  4. Country Region Table name in LookML:lookerdata.covid19_block.country_region CSV File: https://looker-datablocks.s3.amazonaws.com/covid-19/country_region.csv

  5. State Region Table name in LookML:lookerdata.covid19_block.state_region CSV File: https://looker-datablocks.s3.amazonaws.com/covid-19/state_region.csv

  6. Hospital Bed Summary Table name in LookML:hospital_bed_summar.covid19_block.state_region CSV File: https://looker-datablocks.s3.amazonaws.com/covid-19/hospital_bed_summary+.csv

Creating a Snowflake COVID19 DB Share

  • Specific instructions from Snowflake to create the DB Share
  • The DB Share is a Read Only once created
  • Also, you can name the database - thus, each client can have a different database name for this share

Part 2: Installing the Block through the Marketplace

Part 3: Customizing the Block Once you have necessary datasets available in your database, and you have installed the Block through the Marketplace, you will need to update the table references, column references and SQL syntax to match your database. You can find examples of migrating syntax below, but there will likely be additional tweaks you need to make depending on the specific syntax of your database.

NYT Data

  • Snowflake Table Name: [DATABASE_NAME].”PUBLIC”.”NYT_COVID19”
  • Column Name Changes(BigQuery->Snowflake):
  • state_name -> state
  • confirmed_cases -> cases

BigQuery Syntax: This will be placed in a refinement file on the installed marketplace project.

include: "//block-covid19/views/*.view.lkml"

view: +nyt_data {
  derived_table: {
    sql: select * except(county, county_fips_code),
           case when county = 'Unknown' then concat(county,' - ',state_name)
              else county end as county,
            case when county = 'Unknown'then NULL
                --mofifying the FIPS code to match other data
                when county = 'New York City' then 36125
                when county = 'Kansas City' then 29095
                else cast(county_fips_code as int64) end as fips
        from `bigquery-public-data.covid19_nyt.us_counties`;;
    sql_trigger_value: SELECT COUNT(*) FROM `bigquery-public-data.covid19_nyt.us_counties` ;;
  }

Snowflake Syntax: This will be placed in a refinement file on the installed marketplace project.

include: "//block-covid19/views/*.view.lkml"

view: +nyt_data {
  derived_table: {
    sql: SELECT date,
          case when county = 'Unknown' then concat(county,' - ',state_name)
            else county end as county,
          state as state_name,
        --mofifying the FIPS code to match other data
          case when county = 'Unknown' then NULL
            when county = 'New York City' then 36125
            when county = 'Kansas City' then 29095
            else fips
          end as fips,
          cases as confirmed_cases,
          deaths as deaths
        FROM “[DATABASE_NAME]”.”PUBLIC”.”NYT_US_COVID19” ;;
  }
  sql_trigger_value: SELECT COUNT(*) FROM “[DATABASE_NAME]”.”PUBLIC”.”NYT_US_COVID19” ;;

John Hopkins Data (JHU)

  • Snowflake Table Name: “[DATABASE_NAME]”."PUBLIC"."JHU_COVID_19"
  • IMPORTANT - Granularity Changes
  • The BigQuery table is mostly unique record per day, per longitude, latitude (info on Cruise ships and null long/lats make it messy). The Snowflake table is a unique record per long, lat, date AND case_type. Case Type can be Confirmed, Deaths, Active, Recovered. A subquery needs to be used to make the SF table the same granularity as the BQ table. This has to be done, because we are using lag functions and if they are not in the same grain, the value will be incorrect. Column Name Changes (BQ -> SF)