owid · veronikasamborska1994 · Sep 10, 2024 · Sep 9, 2024 · Sep 10, 2024 · Sep 10, 2024
diff --git a/dag/archive/artificial_intelligence.yml b/dag/archive/artificial_intelligence.yml
@@ -110,6 +110,45 @@ steps:
   data://grapher/artificial_intelligence/2024-06-19/epoch_compute_intensive:
     - data://garden/artificial_intelligence/2024-06-19/epoch_compute_intensive
 
+  # Main EPOCH dataset
+  data://meadow/artificial_intelligence/2024-08-05/epoch:
+    - snapshot://artificial_intelligence/2024-08-05/epoch.csv
+  data://garden/artificial_intelligence/2024-08-05/epoch:
+    - data://meadow/artificial_intelligence/2024-08-05/epoch
+  data://grapher/artificial_intelligence/2024-08-05/epoch:
+    - data://garden/artificial_intelligence/2024-08-05/epoch
+
+  # EPOCH aggregates by domain
+  data://garden/artificial_intelligence/2024-08-05/epoch_aggregates_domain:
+    - data://meadow/artificial_intelligence/2024-08-05/epoch
+  data://grapher/artificial_intelligence/2024-08-05/epoch_aggregates_domain:
+    - data://garden/artificial_intelligence/2024-08-05/epoch_aggregates_domain
+
+  # EPOCH aggregates by researcher affiliaiton
+  data://garden/artificial_intelligence/2024-08-05/epoch_aggregates_affiliation:
+    - data://garden/artificial_intelligence/2024-08-05/epoch
+  data://grapher/artificial_intelligence/2024-08-05/epoch_aggregates_affiliation:
+    - data://garden/artificial_intelligence/2024-08-05/epoch_aggregates_affiliation
+
+ # EPOCH dataset on Compute Intensive AI
+  data://meadow/artificial_intelligence/2024-08-05/epoch_compute_intensive:
+    - snapshot://artificial_intelligence/2024-08-05/epoch_compute_intensive.csv
+  data://garden/artificial_intelligence/2024-08-05/epoch_compute_intensive:
+    - data://meadow/artificial_intelligence/2024-08-05/epoch_compute_intensive
+  data://grapher/artificial_intelligence/2024-08-05/epoch_compute_intensive:
+    - data://garden/artificial_intelligence/2024-08-05/epoch_compute_intensive
+
+  # EPOCH dataset on Compute Intensive AI, aggregates by country
+  data://garden/artificial_intelligence/2024-08-05/epoch_compute_intensive_countries:
+    - data://meadow/artificial_intelligence/2024-08-05/epoch_compute_intensive
+  data://grapher/artificial_intelligence/2024-08-05/epoch_compute_intensive_countries:
+    - data://garden/artificial_intelligence/2024-08-05/epoch_compute_intensive_countries
+
+  # EPOCH dataset on Compute Intensive AI, aggregates by domain
+  data://garden/artificial_intelligence/2024-08-05/epoch_compute_intensive_domain:
+    - data://meadow/artificial_intelligence/2024-08-05/epoch_compute_intensive
+  data://grapher/artificial_intelligence/2024-08-05/epoch_compute_intensive_domain:
+    - data://garden/artificial_intelligence/2024-08-05/epoch_compute_intensive_domain
 
 ##############################################################################################################
 

diff --git a/dag/artificial_intelligence.yml b/dag/artificial_intelligence.yml
@@ -1,44 +1,44 @@
 steps:
 
   # Main EPOCH dataset
-  data://meadow/artificial_intelligence/2024-08-05/epoch:
-    - snapshot://artificial_intelligence/2024-08-05/epoch.csv
-  data://garden/artificial_intelligence/2024-08-05/epoch:
-    - data://meadow/artificial_intelligence/2024-08-05/epoch
-  data://grapher/artificial_intelligence/2024-08-05/epoch:
-    - data://garden/artificial_intelligence/2024-08-05/epoch
+  data://meadow/artificial_intelligence/2024-09-09/epoch:
+    - snapshot://artificial_intelligence/2024-09-09/epoch.csv
+  data://garden/artificial_intelligence/2024-09-09/epoch:
+    - data://meadow/artificial_intelligence/2024-09-09/epoch
+  data://grapher/artificial_intelligence/2024-09-09/epoch:
+    - data://garden/artificial_intelligence/2024-09-09/epoch
 
   # EPOCH aggregates by domain
-  data://garden/artificial_intelligence/2024-08-05/epoch_aggregates_domain:
-    - data://meadow/artificial_intelligence/2024-08-05/epoch
-  data://grapher/artificial_intelligence/2024-08-05/epoch_aggregates_domain:
-    - data://garden/artificial_intelligence/2024-08-05/epoch_aggregates_domain
+  data://garden/artificial_intelligence/2024-09-09/epoch_aggregates_domain:
+    - data://meadow/artificial_intelligence/2024-09-09/epoch
+  data://grapher/artificial_intelligence/2024-09-09/epoch_aggregates_domain:
+    - data://garden/artificial_intelligence/2024-09-09/epoch_aggregates_domain
 
   # EPOCH aggregates by researcher affiliaiton
-  data://garden/artificial_intelligence/2024-08-05/epoch_aggregates_affiliation:
-    - data://garden/artificial_intelligence/2024-08-05/epoch
-  data://grapher/artificial_intelligence/2024-08-05/epoch_aggregates_affiliation:
-    - data://garden/artificial_intelligence/2024-08-05/epoch_aggregates_affiliation
+  data://garden/artificial_intelligence/2024-09-09/epoch_aggregates_affiliation:
+    - data://garden/artificial_intelligence/2024-09-09/epoch
+  data://grapher/artificial_intelligence/2024-09-09/epoch_aggregates_affiliation:
+    - data://garden/artificial_intelligence/2024-09-09/epoch_aggregates_affiliation
 
  # EPOCH dataset on Compute Intensive AI
-  data://meadow/artificial_intelligence/2024-08-05/epoch_compute_intensive:
-    - snapshot://artificial_intelligence/2024-08-05/epoch_compute_intensive.csv
-  data://garden/artificial_intelligence/2024-08-05/epoch_compute_intensive:
-    - data://meadow/artificial_intelligence/2024-08-05/epoch_compute_intensive
-  data://grapher/artificial_intelligence/2024-08-05/epoch_compute_intensive:
-    - data://garden/artificial_intelligence/2024-08-05/epoch_compute_intensive
+  data://meadow/artificial_intelligence/2024-09-09/epoch_compute_intensive:
+    - snapshot://artificial_intelligence/2024-09-09/epoch_compute_intensive.csv
+  data://garden/artificial_intelligence/2024-09-09/epoch_compute_intensive:
+    - data://meadow/artificial_intelligence/2024-09-09/epoch_compute_intensive
+  data://grapher/artificial_intelligence/2024-09-09/epoch_compute_intensive:
+    - data://garden/artificial_intelligence/2024-09-09/epoch_compute_intensive
 
   # EPOCH dataset on Compute Intensive AI, aggregates by country
-  data://garden/artificial_intelligence/2024-08-05/epoch_compute_intensive_countries:
-    - data://meadow/artificial_intelligence/2024-08-05/epoch_compute_intensive
-  data://grapher/artificial_intelligence/2024-08-05/epoch_compute_intensive_countries:
-    - data://garden/artificial_intelligence/2024-08-05/epoch_compute_intensive_countries
+  data://garden/artificial_intelligence/2024-09-09/epoch_compute_intensive_countries:
+    - data://meadow/artificial_intelligence/2024-09-09/epoch_compute_intensive
+  data://grapher/artificial_intelligence/2024-09-09/epoch_compute_intensive_countries:
+    - data://garden/artificial_intelligence/2024-09-09/epoch_compute_intensive_countries
 
   # EPOCH dataset on Compute Intensive AI, aggregates by domain
-  data://garden/artificial_intelligence/2024-08-05/epoch_compute_intensive_domain:
-    - data://meadow/artificial_intelligence/2024-08-05/epoch_compute_intensive
-  data://grapher/artificial_intelligence/2024-08-05/epoch_compute_intensive_domain:
-    - data://garden/artificial_intelligence/2024-08-05/epoch_compute_intensive_domain
+  data://garden/artificial_intelligence/2024-09-09/epoch_compute_intensive_domain:
+    - data://meadow/artificial_intelligence/2024-09-09/epoch_compute_intensive
+  data://grapher/artificial_intelligence/2024-09-09/epoch_compute_intensive_domain:
+    - data://garden/artificial_intelligence/2024-09-09/epoch_compute_intensive_domain
 
   # Large Language Models and Compute (EPOCH)
   data://garden/artificial_intelligence/2024-02-15/epoch_llms:

diff --git a/etl/steps/data/garden/artificial_intelligence/2024-09-09/epoch.meta.yml b/etl/steps/data/garden/artificial_intelligence/2024-09-09/epoch.meta.yml
@@ -0,0 +1,98 @@
+# NOTE: To learn more about the fields, hover over their names.
+definitions:
+  common:
+    processing_level: major
+    presentation:
+      topic_tags:
+        - Artificial Intelligence
+# Learn more about the available fields:
+# http://docs.owid.io/projects/etl/architecture/metadata/reference/dataset/
+dataset:
+  update_period_days: 31
+
+# Learn more about the available fields:
+# http://docs.owid.io/projects/etl/architecture/metadata/reference/tables/
+tables:
+  epoch:
+    variables:
+      domain:
+        title: Domain
+        unit: ''
+        short_unit: ''
+        description_short: Refers to the specific area, application, or field in which an AI system is designed to operate.
+        description_processing: |-
+          In cases where multiple domains were associated with a system, we consolidated these entries under the label "Multiple domains". We also identified domains associated with fewer than 20 notable systems and grouped these under the category 'Other'.
+        display:
+          zeroDay: '1949-01-01'
+          yearIsDay: true
+
+      organization_categorization:
+        title: Researcher affiliation
+        unit: ''
+        short_unit: ''
+        description_short: Describes the sector where the authors of an AI system have their primary affiliations.
+        description_from_producer: |-
+            Systems are categorized as “Industry” if their authors are affiliated with private sector organizations, “Academia” if the authors are affiliated with universities or academic institutions, or “Industry - Academia Collaboration” when at least 30% of the authors are from each.
+
+      parameters:
+        title: Number of parameters
+        unit: ''
+        description_short: Total number of learnable variables or weights that the model contains. Parameters are adjusted during the training process to optimize the model's performance.
+        description_key:
+            - Parameters are internal variables that machine learning models adjust during their training process to improve their ability to make accurate predictions. They act as the model's "knobs" that are fine-tuned based on the provided data. In deep learning, a subset of artificial intelligence (AI), parameters primarily consist of the weights assigned to the connections between the small processing units called neurons. Picture a vast network of interconnected neurons where the strength of each connection represents a parameter.
+
+            - The total number of parameters in a model is influenced by various factors. The model's structure and the number of “layers” of neurons play a significant role. Generally, more complex models with additional layers tend to have a higher number of parameters. Special components of specific deep learning architectures can further contribute to the overall parameter count.
+
+            - Understanding the number of parameters in a model is crucial to design effective models. More parameters can help the model understand complex data patterns, potentially leading to higher accuracy. However, there's a fine balance to strike. If a model has too many parameters, it risks memorizing the specific examples in its training data rather than learning their underlying patterns. Consequently, it may perform poorly when presented with new, unseen data. Achieving the right balance of parameters is a critical consideration in model development.
+
+            - In recent times, the AI community has witnessed the emergence of what are often referred to as "giant models." These models boast an astounding number of parameters, reaching into the billions or even trillions. While these huge models have achieved remarkable performance, they have a significant computational cost. Effectively managing and training such large-scale models has become a prominent and active area of research and discussion within the AI field.
+
+        display:
+          numDecimalPlaces: 0
+          zeroDay: '1949-01-01'
+          yearIsDay: true
+
+      training_dataset_size__datapoints:
+        title: Training dataset size
+        unit: 'datapoints'
+        description_short: The number of examples provided to train an AI model. Typically, more data results in a more comprehensive understanding by the model.
+        description_key:
+          - Training data size refers to the volume of data employed to train an artificial intelligence (AI) model effectively. It's a representation of the number of examples that the model learns from during its training process. It is a fundamental measure of the scope of the data used in the model's learning phase.
+
+          - To grasp the concept of training data size, imagine teaching a friend the art of distinguishing different types of birds. In this analogy, each bird picture presented to your friend corresponds to an individual piece of training data. If you showed them 100 unique bird photos, then the training data size in this scenario would be quantified as 100.
+
+          - Training data size is an essential indicator in AI and machine learning. First and foremost, it directly impacts the depth of learning achieved by the model. The more extensive the dataset, the more profound and comprehensive the model's understanding of the subject matter becomes. Additionally, a large training data size contributes significantly to improved recognition capabilities. By exposing the model to a diverse array of examples, it becomes adept at identifying subtle nuances, much like how it becomes skilled at distinguishing various bird species through exposure to a large variety of bird images.
+
+        display:
+          numDecimalPlaces: 0
+          zeroDay: '1949-01-01'
+          yearIsDay: true
+
+      training_computation_petaflop:
+        title: Training computation (petaFLOP)
+        unit: 'petaFLOP'
+        description_short: Computation is measured in total petaFLOP, which is 10¹⁵ [floating-point operations](#dod:flop) estimated from AI literature, albeit with some uncertainty.
+        description_key:
+          - In the context of artificial intelligence (AI), training computation is predominantly measured using floating-point operations or “FLOP”. One FLOP represents a single arithmetic operation involving floating-point numbers, such as addition, subtraction, multiplication, or division. To adapt to the vast computational demands of AI systems, the measurement unit of petaFLOP is commonly used. One petaFLOP stands as a staggering one quadrillion FLOPs, underscoring the magnitude of computational operations within AI.
+
+          - Modern AI systems are rooted in machine learning and deep learning techniques. These methodologies are notorious for their computational intensity, involving complex mathematical processes and algorithms. During the training phase, AI models process large volumes of data, while continuously adapting and refining their parameters to optimize performance, rendering the training process computationally intensive.
+
+          - Many factors influence the magnitude of training computation within AI systems. Notably, the size of the dataset employed for training significantly impacts the computational load. Larger datasets necessitate more processing power. The complexity of the model's architecture also plays a pivotal role; more intricate models lead to more computations. Parallel processing, involving the simultaneous use of multiple processors, also has a substantial effect. Beyond these factors, specific design choices and other variables further contribute to the complexity and scale of training computation within AI.
+
+        description_processing: Training computation was converted from its original measurement in FLOPs (floating-point operations) to a more manageable unit known as petaFLOPs. This conversion is performed by dividing the original training compute value by 1e15, which represents one quadrillion (10^15). The purpose of this conversion is to provide a more human-readable and practical representation of the immense computational efforts involved in training AI systems. By expressing the training computation in petaFLOPs, it becomes easier to grasp the scale and magnitude of the computational resources required for training these systems, especially when dealing with large datasets and complex architectures.
+        display:
+          numDecimalPlaces: 0
+          zeroDay: '1949-01-01'
+          yearIsDay: true
+        presentation:
+          grapher_config:
+            title: Training computation
+
+      publication_date:
+          title: Publication date
+          unit: ''
+          description_short: The date when the AI system was first published.
+          description_from_producer:  The publication, announcement, or release date of the model, in YYYY-MM-DD format. If the year and month are known but the day is unknown, the day is filled in as YYYY-MM-15. If the year is known but the month and day are unknown, the month and day are filled in as YYYY-07-01.
+
+
+