You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Taking a random sample of a table is often useful when the table is very large, and you are still working in an exploratory or investigative mode. A query against a small fraction of a table runs much more quickly, and consumes fewer resources.
Here are some example queries to select a random or pseudo-random subset of rows.
Using rand()
The RAND function returns a value in [0, 1) (i.e. including zero but not 1). You can use it to sample tables, e.g.:
A hash is a deterministic mapping from one value to another, and so is not random, but can appear random 'enough' to produce a convincing sample. Use a supported hash function in BigQuery to produce a pseudo-random ordering of your table:
Farm fingerprint
-- farm_fingerprintselect*fromspotify.spotify_artistsorder by farm_fingerprint(spotify_artists.artist_id) limit10
The TABLESAMPLE clause is currently in preview status, but it has a major benefit - it doesn't require a full table scan, and therefore can be much cheaper and quicker than the methods above. The downside is that the percentage value must be a literal, so this query is less flexible than the ones above.
Tablesample
-- tablesampleselect*fromspotify.spotify_daily_tracks tablesample system (10 percent)