Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

converting tibble to xts #418

Open
linc0380 opened this issue Mar 12, 2024 · 3 comments
Open

converting tibble to xts #418

linc0380 opened this issue Mar 12, 2024 · 3 comments
Labels
question User questions and open questions about development

Comments

@linc0380
Copy link

Description

I am trying to convert a tibble to an xts object. The conversion works fine with as.xts(). The behaviour using xts() is unexpected, however.

A dataset in xlsx format is downloaded from the St. Louis Fed data archive. Go to https://fred.stlouisfed.org/graph/?g=1icUr, click on Download, and select Excel. I modified the resulting xlsx sheet by deleting all auxiliary information in rows 1 to 10, renaming the date column date. That file is saved as SP500.xlsx. I am not sure how I can make this file available to save you from replicating these steps.

The xlsx file is imported by means of

data <- read_excel("SP500.xlsx", na = "#N/A")

The resulting structure is

!> str(data)
tibble [2,608 × 2] (S3: tbl_df/tbl/data.frame)
  $ date : POSIXct[1:2608], format: "2014-03-12" "2014-03-13" ...
  $ SP500: num [1:2608] 1868 1846 1841 1859 1872 ...

Expected behavior

The following conversion fine:

data <- read_excel('"SP500.xlsx", na = "#N/A")
str(data)
data
data <- as.xts(data)
str(data)
data

resulting in

> str(data)                                                                                                                                                                                                                                                                  
 An xts object on 2014-03-12 / 2024-03-08 containing:
   Data:    double [2608, 1]
   Columns: SP500
   Index:   POSIXct,POSIXt [2608] (TZ: "UTC")
 > data2                                                                                                                                                                                                                                                                      
              SP500
 2014-03-12 1868.20
 2014-03-13 1846.34
 2014-03-14 1841.13
 2014-03-17 1858.83
 2014-03-18 1872.25
 2014-03-19 1860.77
 2014-03-20 1872.01
 2014-03-21 1866.52
 2014-03-24 1857.44
 2014-03-25 1865.62
        ...
 2024-02-26 5069.53
 2024-02-27 5078.18
 2024-02-28 5069.76
 2024-02-29 5096.27
 2024-03-01 5137.08
 2024-03-04 5130.95
 2024-03-05 5078.65
 2024-03-06 5104.76
 2024-03-07 5157.36
 2024-03-08 5123.69

Minimal, reproducible example

Yet using

data <- read_excel('"SP500.xlsx", na = "#N/A")
str(data)
data
data <- xts(x = data$SP500, order.by = data$date)
str(data)
data

results in an xts object with the same structure yet with output

 > str(data)                                                                                                                                                                                                                                                                  
 An xts object on 2014-03-12 / 2024-03-08 containing:
   Data:    double [2608, 1]
   Index:   POSIXct,POSIXt [2608] (TZ: "UTC")
 > data                                                                                                                                                                                                                                                                      
            m.c.seq.row..seq.n...seq.col..drop...FALSE.
 2014-03-12                                     1868.20
 2014-03-13                                     1846.34
 2014-03-14                                     1841.13
 2014-03-17                                     1858.83
 2014-03-18                                     1872.25
 2014-03-19                                     1860.77
 2014-03-20                                     1872.01
 2014-03-21                                     1866.52
 2014-03-24                                     1857.44
 2014-03-25                                     1865.62
        ...
 2024-02-26                                     5069.53
 2024-02-27                                     5078.18
 2024-02-28                                     5069.76
 2024-02-29                                     5096.27
 2024-03-01                                     5137.08
 2024-03-04                                     5130.95
 2024-03-05                                     5078.65
 2024-03-06                                     5104.76
 2024-03-07                                     5157.36
 2024-03-08                                     5123.69

I am not sure what the first output line means.

Session Info

!> sessionInfo()
 R version 4.3.2 (2023-10-31)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Debian GNU/Linux 11 (bullseye)

 Matrix products: default
 BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.13.so;  LAPACK version 3.9.0

 locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

 time zone: Europe/Berlin
 tzcode source: system (glibc)

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
 [1] readxl_1.4.2 xts_0.13.1   zoo_1.8-12

 loaded via a namespace (and not attached):
  [1] vctrs_0.6.5      httr_1.4.5       cli_3.6.1        rlang_1.1.0
  [5] stringi_1.7.6    swirl_2.4.5      glue_1.6.2       RCurl_1.98-1.6
  [9] brio_1.1.3       fansi_1.0.4      grid_4.3.2       cellranger_1.1.0
 [13] tibble_3.2.1     bitops_1.0-7     yaml_2.3.5       lifecycle_1.0.3
 [17] stringr_1.5.0    compiler_4.3.2   pkgconfig_2.0.3  testthat_3.1.4
 [21] lattice_0.21-8   digest_0.6.29    R6_2.5.1         utf8_1.2.3
 [25] pillar_1.9.0     magrittr_2.0.3   tools_4.3.2
@joshuaulrich
Copy link
Owner

For what it's worth, you can get data from FRED in an xts object using quantmod.

data <- quantmod::getSymbols("SP500", src = "FRED", auto.assign = FALSE)

The first line in the output from xts(x = data$SP500, order.by = data$date) is the column name, which is a monstrosity because data$SP500 is a vector and doesn't have a column name. You can use xts(data["SP500"], data$date) if you want the column name to be "SP500".

I can see how that weird column name is confusing, so I'll change the code so the result of xts() doesn't add a column name for a vector input.

@joshuaulrich joshuaulrich added the question User questions and open questions about development label Mar 12, 2024
@linc0380
Copy link
Author

Thanks for this! Your answer is very illuminating indeed. I didn't realise that data$SP500 returns a simple numeric vector while data["SP500"] preserves the type (and its name!). That a terrific lesson to have learnt today. I wasn't aware of quantmod's capabilities either. Thanks a million for the pointer!

@dimfalk
Copy link

dimfalk commented Jul 20, 2024

I can see how that weird column name is confusing, so I'll change the code so the result of xts() doesn't add a column name for a vector input.

Looking forward for the column name to be adjusted 😏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question User questions and open questions about development
Projects
None yet
Development

No branches or pull requests

3 participants