Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

# Should {readepi} be {readapi} ?? #70

Open
bahadzie opened this issue Jan 19, 2024 · 1 comment
Open

# Should {readepi} be {readapi} ?? #70

bahadzie opened this issue Jan 19, 2024 · 1 comment

Comments

@bahadzie
Copy link
Member

bahadzie commented Jan 19, 2024

readepi provides functions for importing epidemiological data into R from common health information systems.
README.Rmd - Thibaut Jombart commit 17 months ago.

tldr

  • Reduce the scope and focus of readepi to return data from web API data sources
  • Use ideas from DBI to create an interface specification and test suite for accessing web APIs
  • Implement the interface speficiation for Redcap, DHIS, Fingertips, GoData and ColOpenData

More...

Ideas from DBI

  • Define an interface specification for uniform and consistent data/metadata retrieval from web APIs as a front-end R package
  • Implement a back-end package that meets the interface specification for each web API (e.g. redcap) that is a data source
  • Define a test suite to guarantee back-end packages have implemented the interface as expected

In addition the following are nice to have in R

  • Return data/metadata as dataframes
  • Subset data at source if possible

The Interface

A simple interface for getting the data. The general consensus is around the following pseudo-code

datasource <- readepi(
  connect, # required to establish connection
  filter, # rows
  select # columns
)
# Up to this point, no API request has been made. The previous steps of the pipeline serve to create the request we will make now
data <- fetch_data(datasource)

# OR 

data <- connect_api(credentials) |>
  select() |>
  filter() |>
  # Up to this point, no API request has been made. The previous steps of the pipeline serve to create the request we will make now
  fetch_data()

Discussion

A case was made for removing functionality related to importing data from files because
readepi was only providing a thin wrapper around {rio}.

The same case can be made for removing functionality related to importing data from
relational databases because readepi is a wrapper around DBI.

After a quick review of DBI, my initial proposal was to implement readepi as a backend
to DBI. Upon investigation only 7 of the 45 functions defined in the DBI
spec are relevant to the concept of readepi. DBI is a spec tailored mainly to accessing
data in SQL based relational databases. Its interface abstraction leaks leaks some of this detail.

The semantics of web APIs are not the same as SQL semantics. Ideally any interface should be
implemented in the context of the R ecosystem and made to feel and behave as R objects
are expected to behave. This is where I believe readepi/readpapi should be positioned.

Appendix

A. readepi interface over time

NAMESPACE
read_from_file, read_from_ms_sql_server, read_from_redcap, read_stuff, readepi

NAMESPACE
readCredentials, read_from_file, read_from_ms_sql_server, read_from_redcap,
read_stuff, readepi, show_example_file

NAMESPACE
readCredentials, read_from_file, read_from_ms_sql_server, read_from_redcap,
read_stuff, readepi, showTables, show_example_file, subsetFields, subsetRecords

NAMESPACE
clade_assignment, genome_assembly, readCredentials, read_from_dhis2,
read_from_file, read_from_ms_sql_server, read_from_redcap, read_stuff,
readepi, showTables, show_example_file, subsetFields, subsetRecords

NAMESPACE
clade_assignment, genome_assembly, readCredentials, read_from_file,
read_from_ms_sql_server, read_from_redcap, read_stuff, readepi, showTables,
show_example_file, subsetFields, subsetRecords

NAMESPACE
clade_assignment, genome_assembly, readCredentials, read_from_dhis2,
read_from_file, read_from_ms_sql_server, read_from_redcap, read_stuff,
readepi, showTables, show_example_file, subsetFields, subsetRecords

NAMESPACE
getExtension, read_credentials, read_from_dhis2, read_from_file,
read_from_ms_sql_server, read_from_redcap, readepi, show_example_file,
show_tables, subset_fields, subset_records

NAMESPACE
getExtension, install_odbc_driver, read_credentials, read_from_dhis2,
read_from_file, read_from_ms_sql_server, read_from_redcap, readepi,
show_example_file, show_tables, subset_fields, subset_records

NAMESPACE
check_dhis2_attributes, getExtension, install_odbc_driver, read_credentials,
read_from_dhis2, read_from_file, read_from_ms_sql_server, read_from_redcap,
readepi, show_example_file, show_tables, subset_fields, subset_records

NAMESPACE
check_dhis2_attributes, getExtension, get_data_element_groups,
get_data_elements, get_data_sets, get_organisation_units, install_odbc_driver,
read_credentials, read_from_dhis2, read_from_file, read_from_ms_sql_server,
read_from_redcap, readepi, show_example_file, show_tables, subset_fields,
subset_records

NAMESPACE
check_dhis2_attributes, getExtension, get_data_elements, get_data_sets,
get_fingertips_metadata, get_indicatorID_from_domainID,
get_indicatorID_from_domainName, get_indicatorID_from_indicatorName,
get_indicatorID_from_profile, get_organisation_units, install_odbc_driver,
read_credentials, read_from_dhis2, read_from_file, read_from_fingertips,
read_from_ms_sql_server, read_from_redcap, readepi, show_example_file,
show_tables, subset_fields, subset_records

NAMESPACE
check_dhis2_attributes, getExtension, get_data_elements, get_data_sets, get_fingertips_metadata, get_indicatorID_from_domainID, get_indicatorID_from_domainName, get_indicatorID_from_indicatorName, get_indicatorID_from_profile, get_organisation_units, read_credentials, read_from_dhis2, read_from_file, read_from_fingertips, read_from_ms_sql_server, read_from_redcap, readepi, show_example_file, show_tables, subset_fields, subset_records

NAMESPACE
check_dhis2_attributes, getExtension, get_data_elements, get_data_sets, get_fingertips_metadata, get_indicatorID_from_domainID, get_indicatorID_from_domainName, get_indicatorID_from_indicatorName, get_indicatorID_from_profile, get_organisation_units, read_credentials, read_from_dhis2, read_from_file, read_from_fingertips, read_from_ms_sql_server, read_from_redcap, read_multiple_files, readepi, show_example_file, show_tables, subset_fields, subset_records

NAMESPACE
check_dhis2_attributes, get_data_elements, get_data_sets, get_extension, get_fingertips_metadata, get_ind_id_from_domain_id, get_ind_id_from_domain_name, get_ind_id_from_ind_name, get_ind_id_from_profile, get_organisation_units, read_credentials, read_from_dhis2, read_from_file, read_from_fingertips, read_from_ms_sql_server, read_from_redcap, read_multiple_files, readepi, show_example_file, show_tables, subset_fields, subset_records

NAMESPACE
check_dhis2_attributes, connect_to_server, detect_separator, fetch_data_from_query, get_data_elements, get_data_sets, get_extension, get_fingertips_metadata, get_ind_id_from_domain_id, get_ind_id_from_domain_name, get_ind_id_from_ind_name, get_ind_id_from_profile, get_organisation_units, identify_table_name, read_credentials, read_from_dhis2, read_from_file, read_from_fingertips, read_from_redcap, read_multiple_files, readepi, show_example_file, show_tables, sql_select_data, sql_select_entire_dataset, sql_select_fields_only, sql_select_records_and_fields, sql_select_records_only, sql_server_read_data, visualise_table

NAMESPACE
check_dhis2_attributes, connect_to_server, detect_separator, dhis2_subset_fields, dhis2_subset_records, fetch_data_from_query, get_base_name, get_data_elements, get_data_sets, get_extension, get_fingertips_metadata, get_ind_id_from_domain_id, get_ind_id_from_domain_name, get_ind_id_from_ind_name, get_ind_id_from_profile, get_organisation_units, get_read_file_args, get_read_fingertips_args, get_relevant_data_elt_group, get_relevant_dataset, get_relevant_organisation_unit, identify_table_name, import_redcap_data, read_credentials, read_from_dhis2, read_from_file, read_from_fingertips, read_from_redcap, read_multiple_files, readepi, redcap_get_results, redcap_read, redcap_read_fields, redcap_read_records, redcap_read_rows_columns, show_example_file, show_tables, sql_select_data, sql_select_entire_dataset, sql_select_fields_only, sql_select_records_and_fields, sql_select_records_only, sql_server_read_data, visualise_table

NAMESPACE
check_dhis2_attributes, connect_to_server, detect_separator, dhis2_subset_fields, dhis2_subset_records, fetch_data_from_query, fingertips_subset_columns, fingertips_subset_rows, get_base_name, get_data_elements, get_data_sets, get_extension, get_fingertips_metadata, get_ind_id_from_domain_id, get_ind_id_from_domain_name, get_ind_id_from_ind_name, get_ind_id_from_profile, get_organisation_units, get_profile_name, get_read_file_args, get_read_fingertips_args, get_relevant_data_elt_group, get_relevant_dataset, get_relevant_organisation_unit, identify_table_name, import_redcap_data, read_credentials, read_from_dhis2, read_from_file, read_from_fingertips, read_from_redcap, read_multiple_files, readepi, redcap_get_results, redcap_read, redcap_read_fields, redcap_read_records, redcap_read_rows_columns, show_example_file, show_tables, sql_select_data, sql_select_entire_dataset, sql_select_fields_only, sql_select_records_and_fields, sql_select_records_only, sql_server_read_data, visualise_table

NAMESPACE
get_dhis2_attributes, readepi, show_example_file, show_tables,

B. DBI Reference Functions

*functions that are relevant to {readepi}

Connecting and disconnecting

*dbConnect(drv, ...) - Create a connection to a DBMS

*dbGetInfo(dbObj, ...) - Get DBMS metadata

dbDisconnect(conn, ...) - Disconnect (close) a connection

dbCanConnect(drv, ...) - Check if a connection to a DBMS can be established

dbIsValid(dbObj, ...) - Is this DBMS object still valid?

dbIsReadOnly(dbObj, ...) - Is this DBMS object read only?

dbGetConnectArgs(drv, eval = TRUE, ...) - Get connection arguments

Tables

*dbReadTable(conn, name, ...) - Read database tables as data frames

*dbListTables(conn, ...) - List remote tables

*dbListFields(conn, name, ...) - List field names of a remote table

*dbExistsTable(conn, name, ...) - Does a table exist?

*dbListObjects(conn, prefix = NULL, ...) - List remote objects

dbWriteTable(conn, name, value, ...) - Copy data frames to database tables

dbCreateTable(conn, name, fields, ..., row.names = NULL, temporary = FALSE) - Create a table in the database

dbAppendTable(conn, name, value, ..., row.names = NULL) - Insert rows into a table

dbRemoveTable(conn, name, ...) Remove a table from the database

sqlRownamesToColumn(df, row.names = NA) / sqlColumnToRownames(df, row.names = NA) - Convert row names back and forth between columns

Queries and statements

dbGetQuery(conn, statement, ...) - Retrieve results from a query

dbExecute(conn, statement, ...) - Change database state

Results

dbSendQuery(conn, statement, ...) - Execute a query on a given database connection

dbSendStatement(conn, statement, ...) - Execute a data manipulation statement on a given database connection

dbBind(res, params, ...) / dbBindArrow(res, params, ...) - Bind values to a parameterized/prepared statement

dbFetch(res, n = -1, ...) / fetch(res, n = -1, ...) - Fetch records from a previously executed query

dbGetRowCount(res, ...) - The number of rows fetched so far

dbGetRowsAffected(res, ...) - The number of rows affected

dbGetStatement(res, ...) - Get the statement associated with a result set

dbHasCompleted(res, ...) - Completion status

dbColumnInfo(res, ...) - Information about result types

dbClearResult(res, ...) -Clear a result set

Transactions

dbBegin(conn, ...) / dbCommit(conn, ...) / dbRollback(conn, ...)- Begin/commit/rollback SQL transactions

dbWithTransaction(conn, code, ...) / dbBreak() - Self-contained SQL transactions

SQL

SQL(x, ..., names = NULL) - SQL quoting

dbDataType(dbObj, obj, ...) - Determine the SQL data type of an object

dbQuoteIdentifier(conn, x, ...) - Quote identifiers

dbQuoteLiteral(conn, x, ...) - Quote literal values

dbQuoteString(conn, x, ...) - Quote literal strings

sqlCreateTable(con, table, fields, row.names = NA, temporary = FALSE, ...) - Compose query to create a simple table

sqlAppendTable(con, table, values, row.names = NA, ...) - Compose query to insert rows into a table

sqlData(con, value, row.names = NA, ...) - Convert a data frame into form suitable for upload to an SQL database

dbUnquoteIdentifier(conn, x, ...) - Unquote identifiers

sqlInterpolate(conn, sql, ..., .dots = list()) - Safely interpolate values into an SQL string

Classes

DBIObject-class

DBIDriver-class

DBIConnection-class

DBIResult-class

DBIConnector-class

@chartgerink
Copy link
Member

Hey @bahadzie - thanks for this extensive thought process! I am still getting familiar with everything, so please do tell me if I get things wrong. As a newbie here, I thought readepi's goal was to harmonize reading in data from various sources. That way, I would only need to know how to use readepi regardless of whether I'm reading in data from an API or a relational database.

Did you get feedback from people that they were confused by readepi's goal or preferred a different definition? Or is this from your (+your team's) assessment of doing the work creating the package?

On the topic of scope reduction: My experience with APIs is that it's very hard to know what you'll be getting back, if there is no known standard being used. There might be only a handful of sources or standards in epidemiology (I'm unaware of the landscape), which you are better in assessing. So my question for a redefinition to focus on APIs would be: Are the APIs we would want to read known or unknown in advance?

A generic readapi would of course still be interesting, but does that then still fit the intended purpose with which readepi was started?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants