This is a parking lot for code used in a hobby-only setup of ClearLinux OS on bare metal. Nothing useful for production here.
The use case is to setup a home lab, which runs a YARN managed Hadoop/HDFS/Spark cluster accessed through Zeppelin/Rstudio. All software used will be open-source. Hardware is whatever I had laying around, or get cheap.
The end goal is to have an air-gapped environment that enables dev-ops CI/CD pipeline that enables easy use of open-source tools for AI/ML on health care data, all the while conforming with state-of-the-art standards (e.g. FHIR, etc.).
- 1 Laptop; upgraded to 32 GB RAM, 1 TB NVME + 2TB SSD (ClearLinux OS on an old Lenovo 81MU007NUS Ideapad S145 14.0" HD Pentium 5405U 2.3GHz)
- 10 SBCs with 8 GB RAM, 128 SSD https://ark.intel.com/content/www/us/en/ark/products/87740/intel-nuc-kit-nuc5ppyh.html
- 1 GPU node; NVIDIA RTX3070, 64 GB RAM, 4x 2TB SSD, 1x Sabrient Rocket 2TB, Ryzen 9 5950X
- Anchormen: https://anchormen.nl/blog/big-data-services/spark-and-hdfs-with-kubernetes/
- The Deployment Bunny: https://deploymentbunny.com/2014/09/28/building-next-gen-datacenter-the-pelicase-portable-datacenter/
- Reddit: https://www.reddit.com/r/NUCLabs/comments/drblg9/sell_me_on_a_nuc_labcluster/
- Louis Aslett's Amazon Machine Images: https://www.louisaslett.com/RStudio_AMI/
- Google v. Oracle America: https://www.scotusblog.com/case-files/cases/google-llc-v-oracle-america-inc/
- Frank Pasquale's Data-Informed Duties in AI Development: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3503121
- Ubuntu Unleashed 2019 Edition; Matthew Helmke
- Deep Learning with R for Beginners; Hodnett, Wiley et. al.
- Machine Learning with R; Brett Lantz
- Mastering Spark with R; Luraschi, Kuo et. al.
- Web Application Development with R Using Shiny; Chris Beeley
- R Markdown; Xie, Allaire, et. al.
- Docker Deep Dive; Nigel Poulton
- The Kubernetes Book; Nigel Poulton
- Spark: The Definitive Guide; Bill Chambers and Matel Zaharia
- Hadoop: The Definitive Guide; Tom White