Cyphers/27sync (#4059)

* [ONNX] Add GatherND op to ONNX importer (#3963) * [ONNX] Added gatherND op to ONNX importer. * Added tests. * Removed new line. * Update onnx_import.in.cpp * Changed tests. * Fix serialization of op's type (#4019) * Fix serializer op name * cleanup * Make sure examples compile (#3981) * Make sure examples compile * Resolve doc build error due to opset versioning and align dynamic tensor doc to cpp example * Add latest rc * Remove deprecated API * Update brief link summary * Dist example * update doc for cpp code examples folder * Fix typo and toc index * Build config for example, deprecation in dist test * style * Update jenkins-trigger.groovy Moving to gitlab. * Fix layernorm flatten issue (#4032) * fix layernorm flatten issue * update ut * checkout output val * fix style * apply tolerance * [MKLDNN] Emit dgemm for 2D DP FP Dot op (#3990) * [MLIR] Update MLIR/LLVM repos * Move MLIR/LLVM repos forward This includes fix to affine fusion algorithm. * Fix issues after merge * Fix lit test * [MKLDNN] Emit dgemm for 2D DP FP Dot op Add support for emitting MKLDNN's double precision FP gemm from a 2D double precision floating point Dot operation. * Removed unnecessarily duplicated pattern * Add f64 matmul support to CPU Emitter + unit test * Add check for DP unsupported bias in cpu_fusion.cpp * Remove GOE from Adjoints class (#3973) * Change generate_adjoints to take an OutputVector instead of a NodeVector for deltas. * Cleanup * Adjoints class convert to use Output<Node> * More cleanup * More cleanup * Post-merge build issues * Don't push initial bprops multiple times * Eliminate GOE correctly * back-compatibility, unit test * Helper in Constant to allow casting values to a different type (#4000) * Helper in Constant to allow casting values to a different type Simplify logic needed to extract values from a Constant node, when the expected data type is specified only as integral or floating point. * Review comment * Review comment Co-Authored-By: Tomasz Socha <[email protected]> * Style apply * TensorIterator: reshape support (#4038) * Add second decompose pass to INTERPRETER backend (#4036) * Update MKLDNN to v1.0.4. (#3951) * Update MKLDNN to v1.0.4. Build MKLDNN-v1 by default. * Add bf16 support check. * Modify visibility. * Tested doc build for 0.27 with sitemap for ngraph.ai endpoint (#4014) * Make sure examples compile * Resolve doc build error due to opset versioning and align dynamic tensor doc to cpp example * Add latest rc * Remove deprecated API * Update brief link summary * Dist example * update doc for cpp code examples folder * Fix typo and toc index * Build config for example, deprecation in dist test * style * Sitemap for ngraph.ai doc content * Add title to sitemap * resolve docbuild warnings resultant from sitemap link labeling * Doc tag for 0.27.1 * Matmul dyshape_fused_forward_fluid_fix (#4023) * use constructor_validate_and_infer_types() in CumSum ctor (#4044) * - use construct_validate_infer_types() in CumSum ctor * - remove unused variable - relax rank check * Warning * Fix tolerance for all_close_f (#4042) * Fix tolerance for all_close_f * Lower tolerance * use all_close * Use v1::Gather in ONNX Importer (#4037) * Add upgrade and downgrade pass for GroupConvolutionBackpropData ops (#4035) * Add upgrade and downgrade pass for GroupConvolutionBackpropData ops - Add up/downgrade passes for GroupConvolutionBackpropData operators - Improve decompose operatorion of v0::GroupConvolutionBackpropData to support N-dimensional data - Add UT for up/downgrade passes. * Remove unused variable * Fixed constant operation for u1 format (#4045) * Fixed bin constant ops * Added export * Fixed buffer size * Fixed code style * Fix broken serialize and deserialize for Sum and Product (#4050) * v1::Reshape downgrade pass + onnx_importer adjustments (#4046) * Update ONNX importer to use nGraph ops from new opset header (#3994) * Fix NNP-T naming in README (#4048)
NervanaSystems · Dec 14, 2019 · 8adf498 · 8adf498
1 parent 6fb1c5e
commit 8adf498
Show file tree

Hide file tree

Showing 375 changed files with 2,313 additions and 1,321 deletions.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -164,7 +164,7 @@ option(NGRAPH_UNIT_TEST_ENABLE "Control the building of unit tests" TRUE)
 option(NGRAPH_DOC_BUILD_ENABLE "Control the building of documentation" FALSE)
 option(NGRAPH_TOOLS_ENABLE "Control the building of tool" TRUE)
 option(NGRAPH_CPU_ENABLE "Control the building of the CPU backend" TRUE)
-option(NGRAPH_USE_LEGACY_MKLDNN "Use legacy MKLDNN" TRUE)
+option(NGRAPH_USE_LEGACY_MKLDNN "Use legacy MKLDNN" FALSE)
 option(NGRAPH_MLIR_ENABLE "Control the building of MLIR backend" FALSE)
 option(NGRAPH_INTERPRETER_ENABLE "Control the building of the INTERPRETER backend" TRUE)
 option(NGRAPH_NOP_ENABLE "Control the building of the NOP backend" TRUE)
@@ -621,6 +621,7 @@ endif()
 add_subdirectory(src)
 
 add_subdirectory(test)
+add_subdirectory(doc/examples)
 
 if (NGRAPH_DOC_BUILD_ENABLE)
     add_subdirectory(doc)

diff --git a/README.md b/README.md
@@ -45,8 +45,8 @@ framework and deploying to a variety of hardware targets. We strongly believe in
 providing freedom, performance, and ease-of-use to AI developers. 
 
 The diagram below shows deep learning frameworks and hardware targets
-supported by nGraph. NNP-L and NNP-I in the diagram refer to Intel's next generation 
-deep learning accelerators: Intel® Nervana™ Neural Network Processor for Learning and 
+supported by nGraph. NNP-T and NNP-I in the diagram refer to Intel's next generation 
+deep learning accelerators: Intel® Nervana™ Neural Network Processor for Training and 
 Inference respectively.  Future plans for supporting addtional deep learning frameworks 
 and backends are outlined in the [ecosystem] section. 
 

diff --git a/cmake/external_mkldnn_v1.cmake b/cmake/external_mkldnn_v1.cmake
@@ -18,10 +18,12 @@ include(ExternalProject)
 
 # Includes blas 3.8.0 in mkldnn
 set(NGRAPH_MKLDNN_SHORT_VERSION 1)
-set(NGRAPH_MKLDNN_FULL_VERSION 1.0.0.0)
-set(NGRAPH_MKLDNN_VERSION "v1.0")
-set(NGRAPH_MKLDNN_SUB_VERSION "2019.0.5.20190502")
-set(NGRAPH_MKLDNN_GIT_TAG "553c23f")
+set(NGRAPH_MKLDNN_FULL_VERSION 1.0.4.0)
+set(NGRAPH_MKLDNN_MKLML_ASSET_VERSION "v0.21")
+set(NGRAPH_MKLDNN_VERSION "v1.0.4")
+set(NGRAPH_MKLDNN_MKLML_VERSION "2019.0.5.20190502")
+set(NGRAPH_MKLDNN_MKLML_WIN32_VERSION "2020.0.20190813")
+set(NGRAPH_MKLDNN_GIT_TAG "v1.0.4")
 
 #------------------------------------------------------------------------------
 # Fetch and install MKL-DNN
@@ -88,17 +90,18 @@ endif()
 
 # This section sets up MKL as an external project to be used later by MKLDNN
 
-set(MKLURLROOT "https://github.com/intel/mkl-dnn/releases/download/v0.19-rc/")
-set(MKLVERSION ${NGRAPH_MKLDNN_SUB_VERSION})
+set(MKLURLROOT "https://github.com/intel/mkl-dnn/releases/download/${NGRAPH_MKLDNN_MKLML_ASSET_VERSION}/")
+set(MKLVERSION ${NGRAPH_MKLDNN_MKLML_VERSION})
+set(MKLWIN32VERSION ${NGRAPH_MKLDNN_MKLML_WIN32_VERSION})
 if (LINUX)
     set(MKLPACKAGE "mklml_lnx_${MKLVERSION}.tgz")
     set(MKL_SHA1_HASH 6ab490f0b358124338d04ee9383c3cbc536969d8)
 elseif (APPLE)
     set(MKLPACKAGE "mklml_mac_${MKLVERSION}.tgz")
     set(MKL_SHA1_HASH a1c42af04f990b0e515a1c31946424b2e68fccc9)
 elseif (WIN32)
-    set(MKLPACKAGE "mklml_win_${MKLVERSION}.zip")
-    set(MKL_SHA1_HASH 9d6ff4d5a486689338158093e96c43ee442b65f0)
+    set(MKLPACKAGE "mklml_win_${MKLWIN32VERSION}.zip")
+    set(MKL_SHA1_HASH cc117093e658d50a8e4e3d1cf192c300b6bac0fc)
 endif()
 set(MKL_LIBS ${MKLML_LIB} ${OMP_LIB})
 set(MKLURL ${MKLURLROOT}${MKLPACKAGE})

diff --git a/cmake/mkldnn_v1.patch b/cmake/mkldnn_v1.patch
@@ -63,18 +63,18 @@ index 99970659..ef88a0a7 100644
      # Compilation happens with OpenMP to enable `#pragma omp simd`
      # but during linkage OpenMP dependency should be avoided
 diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
-index 60bb0c94..cc3fc9d6 100644
+index f99ec31ce..b3c1d9bb8 100644
 --- a/src/CMakeLists.txt
 +++ b/src/CMakeLists.txt
 @@ -73,8 +73,10 @@ endif()
  add_library(${LIB_NAME}
      ${MKLDNN_LIBRARY_TYPE} ${HEADERS} ${${LIB_NAME}_SUB_OBJS})
 
--set_property(TARGET ${LIB_NAME} PROPERTY VERSION "${PROJECT_VERSION}.0")
--set_property(TARGET ${LIB_NAME} PROPERTY SOVERSION "0")
+-set_property(TARGET ${LIB_NAME} PROPERTY VERSION "${MKLDNN_VERSION_MAJOR}.${MKLDNN_VERSION_MINOR}")
+-set_property(TARGET ${LIB_NAME} PROPERTY SOVERSION "${MKLDNN_VERSION_MAJOR}")
 +if(MKLDNN_LIB_VERSIONING_ENABLE)
-+       set_property(TARGET ${LIB_NAME} PROPERTY VERSION "${PROJECT_VERSION}.0")
-+       set_property(TARGET ${LIB_NAME} PROPERTY SOVERSION "0")
++    set_property(TARGET ${LIB_NAME} PROPERTY VERSION "${MKLDNN_VERSION_MAJOR}.${MKLDNN_VERSION_MINOR}")
++    set_property(TARGET ${LIB_NAME} PROPERTY SOVERSION "${MKLDNN_VERSION_MAJOR}")
 +endif()
  set_property(TARGET ${LIB_NAME} PROPERTY PUBLIC_HEADER ${HEADERS})
 

diff --git a/doc/examples/CMakeLists.txt b/doc/examples/CMakeLists.txt
@@ -17,6 +17,7 @@
 if (NGRAPH_CPU_ENABLE)
     add_subdirectory(abc)
     add_subdirectory(abc_operator)
+    add_subdirectory(dynamic_tensor)
     add_subdirectory(mnist_mlp)
     add_subdirectory(update)
 endif()
diff --git a/doc/examples/abc/abc.cpp b/doc/examples/abc/abc.cpp
@@ -50,17 +50,17 @@ int main()
     float v_b[2][3] = {{7, 8, 9}, {10, 11, 12}};
     float v_c[2][3] = {{1, 0, -1}, {-1, 1, 2}};
 
-    t_a->write(&v_a, 0, sizeof(v_a));
-    t_b->write(&v_b, 0, sizeof(v_b));
-    t_c->write(&v_c, 0, sizeof(v_c));
+    t_a->write(&v_a, sizeof(v_a));
+    t_b->write(&v_b, sizeof(v_b));
+    t_c->write(&v_c, sizeof(v_c));
 
     // Invoke the function
     auto exec = backend->compile(f);
     exec->call({t_result}, {t_a, t_b, t_c});
 
     // Get the result
     float r[2][3];
-    t_result->read(&r, 0, sizeof(r));
+    t_result->read(&r, sizeof(r));
 
     std::cout << "[" << std::endl;
     for (size_t i = 0; i < s[0]; ++i)

diff --git a/doc/examples/abc_operator/abc_operator.cpp b/doc/examples/abc_operator/abc_operator.cpp
@@ -49,17 +49,17 @@ int main()
     float v_b[2][3] = {{7, 8, 9}, {10, 11, 12}};
     float v_c[2][3] = {{1, 0, -1}, {-1, 1, 2}};
 
-    t_a->write(&v_a, 0, sizeof(v_a));
-    t_b->write(&v_b, 0, sizeof(v_b));
-    t_c->write(&v_c, 0, sizeof(v_c));
+    t_a->write(&v_a, sizeof(v_a));
+    t_b->write(&v_b, sizeof(v_b));
+    t_c->write(&v_c, sizeof(v_c));
 
     // Invoke the function
     auto exec = backend->compile(f);
     exec->call({t_result}, {t_a, t_b, t_c});
 
     // Get the result
     float r[2][3];
-    t_result->read(&r, 0, sizeof(r));
+    t_result->read(&r, sizeof(r));
 
     std::cout << "[" << std::endl;
     for (size_t i = 0; i < s[0]; ++i)

diff --git a/doc/examples/dynamic_tensor/CMakeLists.txt b/doc/examples/dynamic_tensor/CMakeLists.txt
@@ -0,0 +1,19 @@
+# ******************************************************************************
+# Copyright 2017-2019 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ******************************************************************************
+
+add_executable(partial_shape partial_shape.cpp)
+add_dependencies(partial_shape ngraph cpu_backend)
+target_link_libraries(partial_shape ngraph cpu_backend)
diff --git a/doc/examples/dynamic_tensor/partial_shape.cpp b/doc/examples/dynamic_tensor/partial_shape.cpp
@@ -15,54 +15,66 @@
 //*****************************************************************************
 
 #include <iostream>
+#include <numeric>
+#include <vector>
 
 #include <ngraph/ngraph.hpp>
 
+using namespace std;
 using namespace ngraph;
 
+void execute(shared_ptr<runtime::Backend> be,
+             shared_ptr<runtime::Executable> ex,
+             shared_ptr<runtime::Tensor> t_out,
+             uint32_t n);
+
 int main()
 {
     // Create and compile a graph where the provided info of shape of x is
     // (2,?)
     auto x_shape_info = PartialShape{2, Dimension::dynamic()};
     auto x = make_shared<op::Parameter>(element::i32, x_shape_info);
     auto a = x + x;
-    auto f = make_shared<Function>({a}, {x});
-    auto be = runtime::backend::create();
+    auto f = make_shared<Function>(OutputVector{a}, ParameterVector{x});
+    auto be = runtime::Backend::create("CPU", true);
     auto ex = be->compile(f);
 
     // Create a dynamic tensor of shape (2,?)
     auto t_out = be->create_dynamic_tensor(element::i32, x_shape_info);
+    execute(be, ex, t_out, 3);
+    execute(be, ex, t_out, 11);
+    execute(be, ex, t_out, 20);
 
-    // Call the graph to write a value with shape (2,3) to t_out
-    auto t_in = be->create_tensor(element::i32, Shape{2, 3});
-    t_in->write();
-    ex->call({t_out}, {t_in})
-
-        // Call the graph again, to write a value with a different shape to
-        // t_out.
-        t_in = be->create_tensor(element::i32, Shape{2, 20});
-    t_in->write();
-    ex->call({t_out}, {t_in})
-
-        // Get the result. At this point t_out->get_shape() would return
-        // Shape{2,20},
-        // but t_out->get_partial_shape() would return "(2,?)"
+    return 0;
+}
 
-        float r[2][3];
-    t_result->read(&r, 0, sizeof(r));
+void execute(shared_ptr<runtime::Backend> be,
+             shared_ptr<runtime::Executable> ex,
+             shared_ptr<runtime::Tensor> t_out,
+             uint32_t n)
+{
+    // Initialize input of shape (2, n)
+    auto t_in = be->create_tensor(element::i32, Shape{2, n});
+    {
+        vector<int32_t> t_val(2 * n);
+        iota(t_val.begin(), t_val.end(), 0);
+        t_in->write(&t_val[0], t_val.size() * sizeof(t_val[0]));
+    }
+    // Get the result
+    ex->call({t_out}, {t_in});
 
-    std::cout << "[" << std::endl;
+    auto s = t_out->get_shape();
+    vector<int32_t> r(s[0] * s[1]);
+    t_out->read(&r[0], r.size() * sizeof(r[0]));
+    cout << "[" << endl;
     for (size_t i = 0; i < s[0]; ++i)
     {
-        std::cout << " [";
+        cout << " [";
         for (size_t j = 0; j < s[1]; ++j)
         {
-            std::cout << r[i][j] << ' ';
+            cout << r[i * s[1] + j] << ' ';
         }
-        std::cout << ']' << std::endl;
+        cout << ']' << endl;
     }
-    std::cout << ']' << std::endl;
-
-    return 0;
+    cout << ']' << endl;
 }
diff --git a/doc/examples/mnist_mlp/CMakeLists.txt b/doc/examples/mnist_mlp/CMakeLists.txt
@@ -17,9 +17,8 @@
 add_executable(mnist_mlp mnist_loader.cpp mnist_mlp.cpp)
 add_dependencies(mnist_mlp ngraph cpu_backend)
 target_link_libraries(mnist_mlp ngraph cpu_backend)
-if (NGRAPH_DISTRIBUTED_ENABLE)
-    add_executable(dist_mnist_mlp mnist_loader.cpp dist_mnist_mlp.cpp)
-    target_compile_definitions(dist_mnist_mlp PRIVATE NGRAPH_DISTRIBUTED_ENABLE)
-    target_include_directories(dist_mnist_mlp SYSTEM PRIVATE libmlsl)
-    target_link_libraries(dist_mnist_mlp ngraph cpu_backend libmlsl)
-endif()
+
+add_executable(dist_mnist_mlp mnist_loader.cpp dist_mnist_mlp.cpp)
+target_compile_definitions(dist_mnist_mlp PRIVATE NGRAPH_DISTRIBUTED_ENABLE)
+add_dependencies(dist_mnist_mlp ngraph cpu_backend)
+target_link_libraries(dist_mnist_mlp ngraph cpu_backend)
diff --git a/doc/examples/mnist_mlp/dist_mnist_mlp.cpp b/doc/examples/mnist_mlp/dist_mnist_mlp.cpp
@@ -90,10 +90,8 @@ float test_accuracy(MNistDataLoader& loader,
     {
         loader.load();
         t_X->write(loader.get_image_floats(),
-                   0,
                    loader.get_image_batch_size() * sizeof(float));
         t_Y->write(loader.get_label_floats(),
-                   0,
                    loader.get_label_batch_size() * sizeof(float));
         exec->call({t_softmax}, {t_X, t_W0, t_b0, t_W1, t_b1});
         size_t acc = accuracy_count(t_softmax, t_Y);
@@ -106,8 +104,6 @@ float test_accuracy(MNistDataLoader& loader,
 
 int main(int argc, char* argv[])
 {
-    ngraph::Distributed dist;
-
     size_t epochs = 5;
     size_t batch_size = 128;
     size_t output_size = 10;
@@ -177,10 +173,10 @@ int main(int argc, char* argv[])
     // Updates
     ngraph::autodiff::Adjoints adjoints(OutputVector{loss},
                                         OutputVector{delta});
-    auto grad_W0 = adjoints.backprop_node(W0);
-    auto grad_b0 = adjoints.backprop_node(b0);
-    auto grad_W1 = adjoints.backprop_node(W1);
-    auto grad_b1 = adjoints.backprop_node(b1);
+    auto grad_W0 = adjoints.backprop_output(W0);
+    auto grad_b0 = adjoints.backprop_output(b0);
+    auto grad_W1 = adjoints.backprop_output(W1);
+    auto grad_b1 = adjoints.backprop_output(b1);
 
     auto avg_grad_W0 = std::make_shared<op::AllReduce>(grad_W0);
     auto avg_grad_b0 = std::make_shared<op::AllReduce>(grad_b0);
@@ -254,10 +250,8 @@ int main(int argc, char* argv[])
     {
         train_loader.load();
         t_X->write(train_loader.get_image_floats(),
-                   0,
                    train_loader.get_image_batch_size() * sizeof(float));
         t_Y->write(train_loader.get_label_floats(),
-                   0,
                    train_loader.get_label_batch_size() * sizeof(float));
         train_exec->call(
             {t_loss,

diff --git a/doc/examples/mnist_mlp/mnist_mlp.cpp b/doc/examples/mnist_mlp/mnist_mlp.cpp
@@ -89,10 +89,8 @@ float test_accuracy(MNistDataLoader& loader,
     {
         loader.load();
         t_X->write(loader.get_image_floats(),
-                   0,
                    loader.get_image_batch_size() * sizeof(float));
         t_Y->write(loader.get_label_floats(),
-                   0,
                    loader.get_label_batch_size() * sizeof(float));
         exec->call({t_softmax}, {t_X, t_W0, t_b0, t_W1, t_b1});
         size_t acc = accuracy_count(t_softmax, t_Y);
@@ -174,10 +172,10 @@ int main(int argc, const char* argv[])
     // Updates
     ngraph::autodiff::Adjoints adjoints(OutputVector{loss},
                                         OutputVector{delta});
-    auto W0_next = W0 + adjoints.backprop_node(W0);
-    auto b0_next = b0 + adjoints.backprop_node(b0);
-    auto W1_next = W1 + adjoints.backprop_node(W1);
-    auto b1_next = b1 + adjoints.backprop_node(b1);
+    auto W0_next = W0 + adjoints.backprop_output(W0);
+    auto b0_next = b0 + adjoints.backprop_output(b0);
+    auto W1_next = W1 + adjoints.backprop_output(W1);
+    auto b1_next = b1 + adjoints.backprop_output(b1);
 
     // Get the backend
     auto backend = runtime::Backend::create("CPU");
@@ -232,7 +230,7 @@ int main(int argc, const char* argv[])
         clone_function(Function(OutputVector{softmax},
                                 ParameterVector{X, W0, b0, W1, b1}),
                        inference_node_map);
-    auto inference_exe = backend->compile(inference_function);
+    auto inference_exec = backend->compile(inference_function);
 
     set_scalar(t_learning_rate, .03f);
 
@@ -241,10 +239,8 @@ int main(int argc, const char* argv[])
     {
         train_loader.load();
         t_X->write(train_loader.get_image_floats(),
-                   0,
                    train_loader.get_image_batch_size() * sizeof(float));
         t_Y->write(train_loader.get_label_floats(),
-                   0,
                    train_loader.get_label_batch_size() * sizeof(float));
         train_exec->call(
             {t_loss,
@@ -264,7 +260,7 @@ int main(int argc, const char* argv[])
         {
             last_epoch = train_loader.get_epoch();
             std::cout << "Test accuracy: " << test_accuracy(test_loader,
-                                                            exec,
+                                                            inference_exec,
                                                             t_X,
                                                             t_Y,
                                                             t_softmax,

diff --git a/doc/examples/mnist_mlp/tensor_utils.hpp b/doc/examples/mnist_mlp/tensor_utils.hpp
@@ -49,7 +49,7 @@ void randomize(std::function<T()> rand,
     {
         temp.push_back(rand());
     }
-    t->write(&temp[0], 0, element_count * sizeof(T));
+    t->write(&temp[0], element_count * sizeof(T));
 }
 
 // Get a scalar value from a tensor, optionally at an element offset
@@ -58,7 +58,7 @@ T get_scalar(const std::shared_ptr<ngraph::runtime::Tensor>& t,
              size_t element_offset = 0)
 {
     T result;
-    t->read(&result, element_offset * sizeof(T), sizeof(T));
+    t->read(&result + (element_offset * sizeof(T)), sizeof(T));
     return result;
 }
 
@@ -68,7 +68,7 @@ void set_scalar(const std::shared_ptr<ngraph::runtime::Tensor>& t,
                 T value,
                 size_t element_offset = 0)
 {
-    t->write(&value, element_offset * sizeof(T), sizeof(T));
+    t->write(&value + (element_offset * sizeof(T)), sizeof(T));
 }
 
 // Show a shape