Skip to content

Data Formats

DeclanBuckley edited this page Dec 5, 2023 · 33 revisions

Home / Data Formats

List of implemented data and meta-data formats:

  • JSON-stat.
  • CSV.
  • PX.
  • XLSX.
  • GeoJSON.
  • RDF work in progress.

JSON-stat.

The JSON-stat format is a simple lightweight JSON format best suited for data visualization, mobile apps or open data initiatives, that has been designed for all kinds of disseminators.

JSON-stat - Query Schema.

This schema defines JSON-stat as a schema for how data may be queried. Please view the JSON-stat schema.

{
	"$schema": "http://json-schema.org/draft-04/schema#",
	"title": "JSON-stat 2.0 Dataset Query Schema",
	"id": "https://json-stat.org/format/schema/2.0/dataset.query.json",
	"description": "This is version 1.00 of the JSON-stat 2.0 Dataset Query Schema (2020-02-27 10:55)",

	"definitions": {
		"strarray": {
			"type": "array",
			"items": {
				"type": "string"
			},
			"uniqueItems": true
		},
		"version": {
			"type": "string",
			"enum": ["2.0"]
		},
		"extension": {
			"type": "object"
		},
		"category": {
			"type": "object",
			"properties": {
				"index": {
					"oneOf":
					[{
							"$ref": "#/definitions/strarray"
						}, {
							"type": "object",
							"additionalProperties": {
								"type": "number"
							}
						}
					]
				}
			},
			"additionalProperties": false
		}
	},
	"type": "object",
	"properties": {
		"class": {
			"type": "string",
			"enum": ["query"]
		},
		"version": {
			"$ref": "#/definitions/version"
		},
		"extension": {
			"$ref": "#/definitions/extension"
		},
		"id": {
			"$ref": "#/definitions/strarray"
		},
		"dimension": {
			"type": "object",
			"additionalProperties": {
				"type": "object",
				"properties": {
					"category": {
						"$ref": "#/definitions/category"
					}
				},
				"additionalProperties": false,
				"required": ["category"]
			}
		}
	},
	"additionalProperties": false,
	"required": ["version", "class", "id", "dimension"]
}

CSV.

CSV (Comma Separated Values) is a simple file format used to store tabular data, such as a spreadsheet or database.

PX.

PX is a plain file format for statistical files used by a large number of organisation and members of the PcAxis community. The latest version is PX 2013

Please note that while PxStat follows the same mandatory requirements set in the PX 2013 format, there are some additional requirements and restrictions as outlined below.

PX Requirements..
Name Required Default Value Description Language Specific Example
CHARSET N The character set used in the file No "UTF-8"
AXIS-VERSION Y The version of PX No "2013"
LANGUAGE N "en" (configurable) The main language of the file No "en"
LANGUAGES N All languages used in the file. Required if more than one language is used No "en","ga"
CREATION-DATE Y The date/time the PX file was created. Use YYYYMMDD HH:mm No "20200406 10:00"
MATRIX Y The code to identify the dataset No "HPM08"
DECIMALS Y The default number of decimals points for the data. This can be overridden for any dimension by setting the individual PRECISION value for the dimension No 2
OFFICIAL-STATISTICS N "NO" Indicates if the dataset contains official statistics No "YES" or "NO"
SUBJECT-AREA Y The Product Name of the dataset Yes "Retail Sales - Monthly Series"
SUBJECT-CODE Y The Product Code of the dataset No "RSMS"
CONTENTS Y The Title of the dataset Yes "Retail Sales Index (NACE Rev 2)"
TITLE Y The Title of the dataset including the list of Classifications and Time series Yes "Retail Sales Index (NACE Rev 2) - NACE Group, Month"
UNITS Y The default unit of measurement for the data. Also set for each statistic dimension at a dimension level. A configurable value is used as a default (currently "-") Yes "Number"
STUB Y The list of dimensions that make up the Y axis in the DATA array Yes "Month","Region"
HEADING Y The list of dimensions that make up the X axis in the DATA array Yes "Age Group"
CONTVARIABLE N The code to used to identify the set of statistics. The name of the individual statistic is taken from Contents if not set. Yes "Statistic"
MAP N Mapping data associated with a dimension. This is a url with an absolute reference. Yes MAP("Region")="https://cdn.cso.ie/client/2.0.0/map/en/ie_nuts2.json";
CODES N Multiple entries. The code of each dimension and of each item in that dimension Yes CODES("Sex")="-","1","2";
VALUES Y The value of each dimension and of each item in that dimension Yes VALUES("Sex")="Both sexes","Male","Female";
TIMEVAL N The code to identify the time dimension and each item Yes TIMEVAL("TLIST(A1)")=TLIST(A1),"2017","2018"; See below for allowed values.
DOMAIN N The codes for the classifications Yes DOMAIN("Region")="C02196V04140";
UNITS N The units for a specific dimension Yes UNITS("Exports")="Millions";
SOURCE Y The copyright source of the data. Required by PxStat Yes "Central Statistics Office, Ireland"
NOTEX N Notes to describe the dataset. Can be multiple lines. Each line no more than 256 chars Yes
PRECISION N The number of decimal places applied specifically to each statistic Yes PRECISION("STATISTIC","Growth rate")=2;
DATA Y The data point values in row-major order of the dimensions. Defaulted to ".." when no data is suppplied. The value ".." is configurable in PxStat No

Allowed TIMEVAL values:

Name Usage
"TLIST(A1)" Use for annual
"TLIST(H1)" Use for half-yearly
"TLIST(Q1)" Use for quarterly
"TLIST(M1)" Use for monthly
"TLIST(W1)" Use for weekly
"TLIST(D1)" Use for daily

XLSX.

XLSX is the Office Open XML standard used by any modern spreadsheet software such as Microsoft Excel, Google Sheets, Open Office, Libre Office, etc...

GeoJSON.

GeoJSON is an open standard format designed for representing simple geographical features, along with their non-spatial attributes. It is based on JSON, the JavaScript Object Notation. The features include points (addresses and locations), line strings (streets, highways and boundaries), polygons (countries, provinces, tracts of land), and multi-part collections of these types.

RDF work in progress.

The Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model.

It has come to be used as a general method for conceptual description or modelling of information that is implemented in web resources, using a variety of syntax notations and data serialisation formats. It is also used in knowledge management applications.

RDF was adopted as a W3C recommendation in 1999. The RDF 1.0 specification was published in 2004, the RDF 1.1 specification in 2014.

Clone this wiki locally