Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amazon Linux 2023 target #39

Open
grantmcdermott opened this issue Aug 10, 2023 · 9 comments
Open

Amazon Linux 2023 target #39

grantmcdermott opened this issue Aug 10, 2023 · 9 comments

Comments

@grantmcdermott
Copy link

Hi @Enchufa2,

(Apologies in advance if this is out of scope.)

Back in March, Amazon Linux 2023 was launched. tl;dr this is the successor distro to AL2 and will become the default for much of AWS infrastructure (incl. a lot of internal tools).

AL2023 is closer to Fedora than its predecessor and I noticed that copr recently added build targets for it. fedora-copr/copr#2666

Would it be possible to add AL2023 support to cran2copr?

Thanks for considering.

@Enchufa2
Copy link
Member

Thanks, I would be happy to support AL2023. There are two limitations though:

  1. Disk space in Fedora Copr infrastructure.
  2. Important dependencies that are unavailable in AL2023.

Disk space has been the main limitation in the past for activating new chroots in this project, so I should ask the Fedora Copr team first. If we have their ok, then we could look into the second limitation. For instance, I just tried:

$ podman run --rm -it amazonlinux:2023
$ dnf -q install gdal proj libarrow
Error: Unable to find a match: gdal proj libarrow

So my question is: how do you compile CRAN packages without these dependencies (probably others too)? Is there any EPEL-like repository for AL2023 providing them? If not, would you be willing to help maintain them in a separate repo (maybe another Fedora Copr project)? My guess is that most of them should work just by running the Fedora spec for the AL2023 target, but some adaptations may be needed.

@Enchufa2
Copy link
Member

Hi, @praiskup @FrostyX, we are talking about the possibility of enabling the AL2023 chroot in iucar/cran. Before assessing the feasibility in terms of dependencies, and knowing that disk space has been a problem in the past, I would like to ask for the feasibility from the point of view of the infrastructure.

@praiskup
Copy link

From that perspective, all the data in your iucar namespace is <= 250G, which includes three Fedora chroots. Per backend stats. We have just two AmazonLinux chroots now, so I suppose <= 200G new data. That shouldn't cause any technical problems, do it.

@Enchufa2
Copy link
Member

Thanks, Pavel, we will continue the discussion from there then. Feel free to unsubscribe.

@grantmcdermott
Copy link
Author

grantmcdermott commented Aug 11, 2023

So my question is: how do you compile CRAN packages without these dependencies (probably others too)? Is there any EPEL-like repository for AL2023 providing them? If not, would you be willing to help maintain them in a separate repo (maybe another Fedora Copr project)? My guess is that most of them should work just by running the Fedora spec for the AL2023 target, but some adaptations may be needed.

In truth, I am less familiar with both Fedora and AL2023 than I am other distros. But I believe the preferred approach is to first raise a FR on the main AL2023 GitHub repo. From a quick search, I can see that GDAL has already been requested. I didn't see PROJ or arrow, but I am happy to request those. (A current list of all AL2023 packages is available here.)

As regards EPEL, that is not supported. I am certainly happy to help maintain a separate Copr project for the libraries that we can't pass through the main FRs, but may need some handholding to set it up.

@Enchufa2

This comment was marked as outdated.

@Enchufa2
Copy link
Member

Update after moving the repo to the cran4linux org and cleaning up a bit the sysreqs:

deps <- read.csv("sysreqs/sysreqs.csv", na.strings="") |>
  subset(build, fedora_rhel, drop=TRUE)

Let's ask AL2023 to install all of them and thus to report what's missing:

out <- suppressWarnings(system2("podman", c(
  "run --rm -it -v $PWD/sysreqs:/mnt:z -w /mnt",
  "public.ecr.aws/amazonlinux/amazonlinux:2023",
  "dnf install -q", paste(shQuote(deps), collapse=" ")
), stdout=TRUE, stderr=TRUE)) |> print()
#> [1] "Error: Unable to find a match: libarrow-dataset-devel bwidget coin-or-Clp-devel coin-or-SYMPHONY-devel devscripts-checkbashisms /usr/bin/exiftool ffmpeg-free-devel gdal-devel geos-devel glpk-devel hdf5-devel hiredis-devel jags-devel leptonica-devel libsodium-devel mecab-devel mariadb-devel netcdf-devel openbugs glibc-devel(x86-32) opencv-devel pocl-devel poppler-cpp-devel poppler-data poppler-glib-devel proj-devel QuantLib-devel redland-devel scala tesseract-devel tiledb-devel udunits2-devel zeromq-devel\r"
#> attr(,"status")
#> [1] 1
out <- sub("\\r", "", strsplit(out, ": ")[[1]][3])
unavailable <- strsplit(out, " ")[[1]] |> print()
#>  [1] "libarrow-dataset-devel"   "bwidget"                 
#>  [3] "coin-or-Clp-devel"        "coin-or-SYMPHONY-devel"  
#>  [5] "devscripts-checkbashisms" "/usr/bin/exiftool"       
#>  [7] "ffmpeg-free-devel"        "gdal-devel"              
#>  [9] "geos-devel"               "glpk-devel"              
#> [11] "hdf5-devel"               "hiredis-devel"           
#> [13] "jags-devel"               "leptonica-devel"         
#> [15] "libsodium-devel"          "mecab-devel"             
#> [17] "mariadb-devel"            "netcdf-devel"            
#> [19] "openbugs"                 "glibc-devel(x86-32)"     
#> [21] "opencv-devel"             "pocl-devel"              
#> [23] "poppler-cpp-devel"        "poppler-data"            
#> [25] "poppler-glib-devel"       "proj-devel"              
#> [27] "QuantLib-devel"           "redland-devel"           
#> [29] "scala"                    "tesseract-devel"         
#> [31] "tiledb-devel"             "udunits2-devel"          
#> [33] "zeromq-devel"

And finally, let's ask Fedora the source package names for these:

pkgs <- system2("dnf", c(
  "rq -q --qf '%{source_name}'",
  paste("--whatprovides", shQuote(unavailable))
), stdout=TRUE) |> print()
#>  [1] "QuantLib"            "bwidget"             "coin-or-Clp"        
#>  [4] "coin-or-SYMPHONY"    "devscripts"          "ffmpeg"             
#>  [7] "gdal"                "geos"                "glibc"              
#> [10] "glpk"                "hdf5"                "hiredis"            
#> [13] "jags"                "leptonica"           "libarrow"           
#> [16] "libsodium"           "mariadb"             "mecab"              
#> [19] "netcdf"              "openbugs"            "opencv"             
#> [22] "perl-Image-ExifTool" "pocl"                "poppler"            
#> [25] "poppler-data"        "proj"                "redland"            
#> [28] "scala"               "tesseract"           "tiledb"             
#> [31] "udunits2"            "zeromq" 

Therefore:

  • There are a few packages that are strongly required: gdal, geos, proj, udunits2. If these are not available, we cannot build the geospatial stack as well as its dependencies. These are a lot of packages.
  • Some nice to have, for obvious reasons: ffmpeg, hdf5, hiredis, libarrow, mariadb. We don't lose many packages without them, but some of those packages are pretty useful.
  • The others are far less important in principle, and only a handful of packages are lost. It depends of course on whether you rely on them.

In other words, if you manage to get the geospatial packages accepted, that would be enough to activate the AL2023 chroot. :)

@grantmcdermott
Copy link
Author

grantmcdermott commented Aug 25, 2023

Super, thanks @Enchufa2. I'm on vacation now but will ping the AL2023 repo with requests when I get a chance!

@grantmcdermott
Copy link
Author

grantmcdermott commented Feb 1, 2024

Just a minor update on this:

The arrow homepage includes install instructions for binary artifacts on AL2023 (scroll down towards the bottom).

Unfortunately, by default, this pulls in the latest release of libarrow & co. So, there's a good chance that there will be a version mismatch with the R release (which is normally a couple of months behind for some reason.) Nonetheless, I managed to adapt their instructions in a way that pulls in the appropriate arrow system library version(s) based on the available CRAN release:

# Note: No sudo because I assume you are root

# preliminaries: install R and some system deps
dnf install -y R libcurl-devel openssl-devel


# Set env vars for matching up the R and system arrow versions

ARCH=$(uname -m)
R_ARROW_VER=`Rscript -e 'cat(available.packages(filters = list(function(db) db[db[, "Package"] == "arrow", ]), repos = "https://cran.r-project.org")[["Version"]])'`
R_ARROW_VER="${R_ARROW_VER%.*}-1"
ARROW_URL="https://apache.jfrog.io/artifactory/arrow/amazon-linux/2023/${ARCH}/Packages"

# Install

ARROW_ENDPOINT=${R_ARROW_VER}.amzn2023.noarch.rpm
dnf install -y ${ARROW_URL}/apache-arrow-release-${ARROW_ENDPOINT}

ARROW_ENDPOINT=${R_ARROW_VER}.amzn2023.${ARCH}.rpm
dnf install -y ${ARROW_URL}/arrow-devel-${ARROW_ENDPOINT} # For C++
dnf install -y ${ARROW_URL}/arrow-glib-devel-${ARROW_ENDPOINT} # For GLib (C)
dnf install -y ${ARROW_URL}/arrow-acero-devel-${ARROW_ENDPOINT} # For Apache Arrow Acero
dnf install -y ${ARROW_URL}/arrow-dataset-devel-${ARROW_ENDPOINT} # For Apache Arrow Dataset C++
dnf install -y ${ARROW_URL}/arrow-dataset-glib-devel-${ARROW_ENDPOINT} # For Apache Arrow Dataset GLib (C)

# Note: I couldn't get the flight libs to build (see comments below)
# dnf install -y ${ARROW_URL}/arrow-flight-devel-${ARROW_ENDPOINT} # For Apache Arrow Flight C++
# dnf install -y ${ARROW_URL}/arrow-flight-glib-devel-${ARROW_ENDPOINT} # For Apache Arrow Flight GLib (C)
# dnf install -y ${ARROW_URL}/arrow-flight-sql-devel-${ARROW_ENDPOINT} # For Apache Arrow Flight SQL C++
# dnf install -y ${ARROW_URL}/arrow-flight-sql-glib-devel-${ARROW_ENDPOINT} # For Apache Arrow Flight SQL GLib (C)

dnf install -y ${ARROW_URL}/gandiva-devel-${ARROW_ENDPOINT} # For Apache Gandiva C++
dnf install -y ${ARROW_URL}/gandiva-glib-devel-${ARROW_ENDPOINT} # For Apache Gandiva GLib (C)
dnf install -y ${ARROW_URL}/parquet-devel-${ARROW_ENDPOINT} # For Apache Parquet C++
dnf install -y ${ARROW_URL}/parquet-glib-devel-${ARROW_ENDPOINT} # For Apache Parquet GLib (C)

Once that's done, installing the R arrow package & compilation works:

install.packages("arrow")

(Tested on the latest amazonlinux:2023 docker image.)

Comments:

  1. One bummer (given the fact that it's Amazon Linux) is that this configuration doesn't come with S3 support enabled. I'm probably missing some step, but hopefully there's a reasonable solution.
  2. I couldn't get the arrow flight libs to install correctly due to an unresolved dependency version issue. (Related, I believe, to abseil-cpp.) On the release page there's a separate group of targets for, e.g., `arrow14-flight* libs that you probably need to work with. but I didn't pursue this much further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants