Build docs for all pushes and PRs (#598)

adamreeve · web-flow · commit 3008c402ab94 · 2025-12-12T09:43:52.000+13:00
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -1,30 +1,16 @@
-name: Publish Docs
+name: Build documentation
 
 on:
   push:
-    branches:
-      - master
-
-permissions:
-  actions: read
-  pages: write
-  id-token: write
+  pull_request:
 
 jobs:
-  build-and-deploy:
+  build:
     runs-on: ubuntu-latest
     steps:
     - name: Checkout Repository
       uses: actions/checkout@v6
 
-    - name: Setup Python
-      uses: actions/setup-python@v6
-      with:
-        python-version: 3.x
-
-    - name: Run Preprocessing Script
-      run: python docs/tools/preprocess_docs.py
-
     - name: Setup .NET
       uses: actions/setup-dotnet@v5
       with:
@@ -34,13 +20,25 @@ jobs:
       run: dotnet tool update -g docfx
 
     - name: Build Documentation
-      run: docfx docfx.json
+      run: docfx --warningsAsErrors docfx.json
       working-directory: ./docs
 
     - name: Upload Site Artifact
       uses: actions/upload-pages-artifact@v4
       with:
         path: './docs/_site'
 
+  deploy:
+    if: github.event_name == 'push' && github.ref == 'refs/heads/master' && !github.event.repository.fork
+    runs-on: ubuntu-latest
+    needs: build
+    permissions:
+      pages: write
+      id-token: write
+    environment:
+      name: github-pages
+      url: ${{ steps.deployment.outputs.page_url }}
+    steps:
     - name: Deploy to GitHub Pages
+      id: deployment
       uses: actions/deploy-pages@v4
diff --git a/docs/guides/Arrow.md b/docs/guides/Arrow.md
@@ -4,14 +4,14 @@ The Apache Parquet C++ library provides APIs for reading and writing data in the
 These are wrapped by ParquetSharp using the [Arrow C data interface](https://arrow.apache.org/docs/format/CDataInterface.html)
 to allow high performance reading and writing of Arrow data with zero copying of array data between C++ and .NET.
 
-The Arrow API is contained in the `ParquetSharp.Arrow` namespace,
+The Arrow API is contained in the @ParquetSharp.Arrow namespace,
 and included in the [ParquetSharp NuGet package](https://www.nuget.org/packages/ParquetSharp/).
 
 ## Reading Arrow data
 
-Reading Parquet data in Arrow format uses a `ParquetSharp.Arrow.FileReader`.
-This can be constructed using a file path, a .NET `System.IO.Stream`,
-or a subclass of `ParquetSharp.IO.RandomAccessFile`.
+Reading Parquet data in Arrow format uses a @ParquetSharp.Arrow.FileReader.
+This can be constructed using a file path, a .NET @System.IO.Stream,
+or a subclass of @ParquetSharp.IO.RandomAccessFile.
 In this example, we'll open a file using a path:
 
 ```csharp
@@ -68,9 +68,9 @@ the reader properties, discussed below.
 
 ### Reader properties
 
-The `ParquetSharp.Arrow.FileReader` constructor accepts an instance of
-`ParquetSharp.ReaderProperties` to control standard Parquet reading behaviour,
-and additionally accepts an instance of `ParquetSharp.Arrow.ArrowReaderProperties`
+The @ParquetSharp.Arrow.FileReader constructor accepts an instance of
+@ParquetSharp.ReaderProperties to control standard Parquet reading behaviour,
+and additionally accepts an instance of @ParquetSharp.Arrow.ArrowReaderProperties
 to customise Arrow specific behaviour:
 
 ```csharp
@@ -94,7 +94,7 @@ using var fileReader = new FileReader(
 
 ## Writing Arrow data
 
-The `ParquetSharp.Arrow.FileWriter` class allows writing Parquet files
+The @ParquetSharp.Arrow.FileWriter class allows writing Parquet files
 using Arrow format data.
 
 In this example we'll walk through writing a file with a timestamp,
@@ -134,15 +134,15 @@ RecordBatch GetBatch(int batchNumber) =>
     }, numIds);
 ```
 
-Now we create a `ParquetSharp.Arrow.FileWriter`, specifying the path to write to and the
+Now we create a @ParquetSharp.Arrow.FileWriter, specifying the path to write to and the
 file schema:
 
 ```csharp
 using var writer = new FileWriter("data.parquet", schema);
 ```
 
-Rather than specifying a file path, we could also write to a .NET `System.IO.Stream`
-or a subclass of `ParquetSharp.IO.OutputStream`.
+Rather than specifying a file path, we could also write to a .NET @System.IO.Stream
+or a subclass of @ParquetSharp.IO.OutputStream.
 
 ### Writing data in batches
 
@@ -207,9 +207,9 @@ writer.Close();
 
 ### Writer properties
 
-The `ParquetSharp.Arrow.FileWriter` constructor accepts an instance of
-`ParquetSharp.WriterProperties` to control standard Parquet writing behaviour,
-and additionally accepts an instance of `ParquetSharp.Arrow.ArrowWriterProperties`
+The @ParquetSharp.Arrow.FileWriter constructor accepts an instance of
+@ParquetSharp.WriterProperties to control standard Parquet writing behaviour,
+and additionally accepts an instance of @ParquetSharp.Arrow.ArrowWriterProperties
 to customise Arrow specific behaviour:
 
 ```csharp
diff --git a/docs/guides/Encryption.md b/docs/guides/Encryption.md
@@ -27,7 +27,7 @@ Double wrapping is enabled by default.
 For further details, see the
 [Key Management Tools design document](https://docs.google.com/document/d/1bEu903840yb95k9q2X-BlsYKuXoygE4VnMDl9xz_zhk).
 
-The Key Management Tools API is contained in the `ParquetSharp.Encryption` namespace.
+The Key Management Tools API is contained in the @ParquetSharp.Encryption namespace.
 In order to use this API,
 a client for a Key Management Service must be implemented:
 
@@ -55,7 +55,7 @@ internal sealed class MyKmsClient : IKmsClient
 ```
 
 The main entrypoint for the Key Management Tools API is the
-`ParquetSharp.Encryption.CryptoFactory` class.
+@ParquetSharp.Encryption.CryptoFactory class.
 This requires a factory method for creating KMS clients,
 which are cached internally and periodically recreated:
 
@@ -76,7 +76,7 @@ kmsConnectionConfig.KmsInstanceUrl = ...;
 kmsConnectionConfig.KeyAccessToken = ...;
 ```
 
-Then to configure how the file is encrypted, an `ParquetSharp.Encryption.EncryptionConfiguration` is created:
+Then to configure how the file is encrypted, an @ParquetSharp.Encryption.EncryptionConfiguration is created:
 
 ```c#
 string footerKeyId = ...;
@@ -113,7 +113,7 @@ encryptionConfig.PlaintextFooter = true;
 ```
 
 The `kmsConnectionConfig` and `encryptionConfiguration` are used to generate
-file encryption properties, which are used to build the `ParquetSharp.WriterProperties`:
+file encryption properties, which are used to build the @ParquetSharp.WriterProperties:
 
 ```c#
 using var fileEncryptionProperties = cryptoFactory.GetFileEncryptionProperties(
@@ -126,7 +126,7 @@ using var writerProperties = writerPropertiesBuilder
                 .Build();
 ```
 
-Finally, the Parquet file can be written using the `ParquetSharp.WriterProperties`:
+Finally, the Parquet file can be written using the @ParquetSharp.WriterProperties:
 
 ```c#
 Column[] columns = ...;
@@ -136,9 +136,9 @@ using var fileWriter = new ParquetFileWriter(parquetFilePath, columns, writerPro
 
 ### Reading Encrypted Files
 
-Reading encrypted files requires creating `ParquetSharp.FileDecryptionProperties`
-with a `ParquetSharp.Encryption.CryptoFactory`, and adding these to the
-`ParquetSharp.ReaderProperties`:
+Reading encrypted files requires creating @ParquetSharp.FileDecryptionProperties
+with a @ParquetSharp.Encryption.CryptoFactory, and adding these to the
+@ParquetSharp.ReaderProperties:
 
 ```c#
 using var decryptionConfig = new DecryptionConfiguration();
@@ -164,16 +164,16 @@ Key material is stored inside the Parquet file metadata by default,
 but key material can also be stored in separate JSON files alongside Parquet files,
 to allow rotation of master keys without needing to rewrite the Parquet files.
 
-This is configured in the `ParquetSharp.Encryption.EncryptionConfiguration`:
+This is configured in the @ParquetSharp.Encryption.EncryptionConfiguration:
 
 ```c#
 using var encryptionConfig = new EncryptionConfiguration(footerKeyId);
 encryptionConfig.InternalKeyMaterial = false;  // External key material
 ```
 
 When using external key material, the path to the Parquet file being written or read
-must be specified when creating `ParquetSharp.FileEncryptionProperties` and
-`ParquetSharp.FileDecryptionProperties`:
+must be specified when creating @ParquetSharp.FileEncryptionProperties and
+@ParquetSharp.FileDecryptionProperties:
 
 ```c#
 using var fileEncryptionProperties = cryptoFactory.GetFileEncryptionProperties(
@@ -247,7 +247,7 @@ using var fileDecryptionProperties = builder
 ```
 
 Rather than having to specify decryption keys directly, a
-`ParquetSharp.DecryptionKeyRetriever` can be used to retrieve keys
+@ParquetSharp.DecryptionKeyRetriever can be used to retrieve keys
 based on the key metadata, to allow more flexibility:
 
 ```c#
@@ -298,7 +298,7 @@ using var fileDecryptionProperties = builder
 
 If the AAD prefix doesn't match the expected prefix an exception will be thrown when reading the file.
 
-Alternatively, you can implement an `ParquetSharp.AadPrefixVerifier` if you have more complex verification logic:
+Alternatively, you can implement an @ParquetSharp.AadPrefixVerifier if you have more complex verification logic:
 
 ```c#
 internal sealed class MyAadVerifier : ParquetSharp.AadPrefixVerifier
@@ -324,8 +324,8 @@ using var fileDecryptionProperties = builder
 
 ## Arrow API Compatibility
 
-Note that the above examples use the `ParquetSharp.ParquetFileReader` and
-`ParquetSharp.ParquetFileWriter` classes, but encryption may also be used with the Arrow API.
-The `ParquetSharp.Arrow.FileReader` and `ParquetSharp.Arrow.FileWriter` constructors
-accept `ParquetSharp.ReaderProperties` and `ParquetSharp.WriterProperties` parameters
+Note that the above examples use the @ParquetSharp.ParquetFileReader and
+@ParquetSharp.ParquetFileWriter classes, but encryption may also be used with the Arrow API.
+The @ParquetSharp.Arrow.FileReader and @ParquetSharp.Arrow.FileWriter constructors
+accept @ParquetSharp.ReaderProperties and @ParquetSharp.WriterProperties parameters
 respectively, which can have encryption properties configured.
diff --git a/docs/guides/Nested.md b/docs/guides/Nested.md
@@ -7,7 +7,7 @@ but the Parquet format can be used to represent data with a complex nested struc
 
 In order to write a file with nested columns,
 we must define the Parquet file schema explicitly as a graph structure using schema nodes,
-rather than using ParquetSharp's `ParquetSharp.Column` type.
+rather than using ParquetSharp's @ParquetSharp.Column type.
 
 Imagine we have the following JSON object we would like to store as Parquet:
 
@@ -41,8 +41,8 @@ or we had a non-null object with a null `message` and null `ids`.
 Instead, we will represent this data in Parquet with a single
 `objects` column.
 
-In order to define the schema, we will be using `ParquetSharp.Schema.PrimitiveNode`
-and `ParquetSharp.Schema.GroupNode`.
+In order to define the schema, we will be using @ParquetSharp.Schema.PrimitiveNode
+and @ParquetSharp.Schema.GroupNode.
 
 In the Parquet schema, we have one one top-level group node named `objects`,
 which contains two nested fields, `ids` and `message`.
@@ -74,7 +74,7 @@ using var schema = new GroupNode(
 
 ### Writing data
 
-We can then create a `ParquetSharp.ParquetFileWriter` with this schema:
+We can then create a @ParquetSharp.ParquetFileWriter with this schema:
 
 ```csharp
 using var propertiesBuilder = new WriterPropertiesBuilder();
@@ -85,7 +85,7 @@ using var fileWriter = new ParquetFileWriter("objects.parquet", schema, writerPr
 
 When writing data to this file,
 the leaf-level values written must be nested within ParquetSharp's
-`ParquetSharp.Nested` type to indicate they are contained in a group,
+@ParquetSharp.Nested type to indicate they are contained in a group,
 and allow nullable nested structures to be represented unambiguously.
 
 For example, both the `objects` and `message` fields are optional,
diff --git a/docs/guides/PowerShell.md b/docs/guides/PowerShell.md
@@ -1,6 +1,6 @@
 # ParquetSharp in PowerShell
 
-The main requirement to using ParquetSharp from PowerShell is that `ParquetSharpNative.dll` is in the `PATH` or in the same directory as `ParquetSharp.dll`. The following guide shows one possible approach to achieve this:
+The main requirement to using ParquetSharp from PowerShell is that @ParquetSharpNative.dll is in the `PATH` or in the same directory as @ParquetSharp.dll. The following guide shows one possible approach to achieve this:
 
 ### Installation
 
@@ -23,7 +23,7 @@ Copy-Item -Path ".\lib\System.Runtime.CompilerServices.Unsafe.4.5.3\lib\net461\S
 Copy-Item -Path ".\lib\System.ValueTuple.4.5.0\lib\net461\System.ValueTuple.dll" -Destination ".\bin"
 ```
 
-Finally, copy `ParquetSharp.dll` and `ParquetSharpNative.dll` into `bin`. This will depend on the current version of ParquetSharp, as well as your architecture and OS:
+Finally, copy @ParquetSharp.dll and @ParquetSharpNative.dll into `bin`. This will depend on the current version of ParquetSharp, as well as your architecture and OS:
 
 ```powershell
 # Replace path with the appropriate version of ParquetSharp
@@ -36,7 +36,7 @@ Copy-Item -Path ".\lib\ParquetSharp.12.1.0\runtimes\win-x64\native\ParquetSharpN
 The available runtime architectures are `win-x64`, `linux-x64`, `linux-arm64`, `osx-x64`, and `osx-arm64`.
 
 ### Usage
-Use `Add-Type` to load `ParquetSharp.dll`. Note that we're using custom directories:
+Use `Add-Type` to load @ParquetSharp.dll. Note that we're using custom directories:
 
 ```powershell
 # Replace path with the appropriate versions of ParquetSharp
diff --git a/docs/guides/Reading.md b/docs/guides/Reading.md
@@ -1,8 +1,8 @@
 # Reading Parquet files
 
-The low-level ParquetSharp API provides the `ParquetSharp.ParquetFileReader` class for reading Parquet files.
+The low-level ParquetSharp API provides the @ParquetSharp.ParquetFileReader class for reading Parquet files.
 This is usually constructed from a file path, but may also be constructed from a
-`ParquetSharp.IO.ManagedRandomAccessFile`, which wraps a .NET `System.IO.Stream` that supports seeking.
+@ParquetSharp.IO.ManagedRandomAccessFile, which wraps a .NET @System.IO.Stream that supports seeking.
 
 ```csharp
 using var fileReader = new ParquetFileReader("data.parquet");
@@ -15,7 +15,7 @@ using var fileReader = new ParquetFileReader(input);
 
 ### Obtaining file metadata
 
-The `ParquetSharp.FileMetaData` property of a `ParquetFileReader` exposes information about the Parquet file and its schema:
+The @ParquetSharp.FileMetaData property of a `ParquetFileReader` exposes information about the Parquet file and its schema:
 
 ```csharp
 int numColumns = fileReader.FileMetaData.NumColumns;
@@ -34,7 +34,7 @@ for (int columnIndex = 0; columnIndex < schema.NumColumns; ++columnIndex) {
 
 Parquet files store data in separate row groups, which all share the same schema,
 so if you wish to read all data in a file, you generally want to loop over all of the row groups
-and create a `ParquetSharp.RowGroupReader` for each one:
+and create a @ParquetSharp.RowGroupReader for each one:
 
 ```csharp
 for (int rowGroup = 0; rowGroup < fileReader.FileMetaData.NumRowGroups; ++rowGroup) {
@@ -45,10 +45,10 @@ for (int rowGroup = 0; rowGroup < fileReader.FileMetaData.NumRowGroups; ++rowGro
 
 ### Reading columns directly
 
-The `Column` method of `RowGroupReader` takes an integer column index and returns a `ParquetSharp.ColumnReader` object,
+The `Column` method of `RowGroupReader` takes an integer column index and returns a @ParquetSharp.ColumnReader object,
 which can read primitive values from the column, as well as raw definition level and repetition level data.
 Usually you will not want to use a `ColumnReader` directly, but instead call its `LogicalReader` method to
-create a `ParquetSharp.LogicalColumnReader` that can read logical values.
+create a @ParquetSharp.LogicalColumnReader that can read logical values.
 There are two variations of this `LogicalReader` method; the plain `LogicalReader` method returns an abstract
 `LogicalColumnReader`, whereas the generic `LogicalReader<TElement>` method returns a typed `LogicalColumnReader<TElement>`,
 which reads values of the specified element type.
@@ -96,7 +96,7 @@ When reading Timestamp to a DateTime, ParquetSharp sets the DateTimeKind based o
 
 If `IsAdjustedToUtc` is `true` the DateTimeKind will be set to `DateTimeKind.Utc` otherwise it will be set to `DateTimeKind.Unspecified`.
 
-This behavior can be overwritten by setting the AppContext switch `ParquetSharp.ReadDateTimeKindAsUnspecified` to `true`, so the DateTimeKind will be always set to `DateTimeKind.Unspecified` regardless of the value of `IsAdjustedToUtc`.
+This behavior can be overwritten by setting the AppContext switch @ParquetSharp.ReadDateTimeKindAsUnspecified to `true`, so the DateTimeKind will be always set to `DateTimeKind.Unspecified` regardless of the value of `IsAdjustedToUtc`.
 This also matches the old behavior of [ParquetSharp < 7.0.0](https://github.com/G-Research/ParquetSharp/pull/261)
 
 ```csharp
@@ -117,7 +117,7 @@ Some legacy implementations of Parquet write timestamps using the Int96 primitiv
 which has been [deprecated](https://issues.apache.org/jira/browse/PARQUET-323).
 ParquetSharp doesn't support reading Int96 values as .NET `DateTime`s
 as not all Int96 timestamp values are representable as a `DateTime`.
-However, there is limited support for reading raw Int96 values using the `ParquetSharp.Int96` type
+However, there is limited support for reading raw Int96 values using the @ParquetSharp.Int96 type
 and it is left to applications to decide how to interpret these values.
 
 ## Long path handling
diff --git a/docs/guides/RowOriented.md b/docs/guides/RowOriented.md
@@ -70,8 +70,8 @@ using (var rowReader = ParquetFile.CreateRowReader<MyRow>("example.parquet"))
 
 ## Reading and writing custom types
 
-The row-oriented API supports reading and writing custom types by providing a `ParquetSharp.LogicalTypeFactory`
-and a `ParquetSharp.LogicalReadConverterFactory` or `ParquetSharp.LogicalWriteConverterFactory`.
+The row-oriented API supports reading and writing custom types by providing a @ParquetSharp.LogicalTypeFactory
+and a @ParquetSharp.LogicalReadConverterFactory or @ParquetSharp.LogicalWriteConverterFactory.
 
 ### Writing custom types
 
diff --git a/docs/guides/TimeSpan.md b/docs/guides/TimeSpan.md
@@ -110,6 +110,6 @@ Note that when using this approach, if you read the file back with
 ParquetSharp the data will be read as `long` values as there's no
 way to tell it was originally `TimeSpan` data.
 To read the data back as `TimeSpan`s, you'll also need to implement
-a custom `ParquetSharp.LogicalReadConverterFactory` and use the `LogicalReadOverride` method
-or provide a custom `ParquetSharp.LogicalTypeFactory`.
+a custom @ParquetSharp.LogicalReadConverterFactory and use the `LogicalReadOverride` method
+or provide a custom @ParquetSharp.LogicalTypeFactory.
 See the [type factories documentation](TypeFactories.md) for more details.
diff --git a/docs/guides/TypeFactories.md b/docs/guides/TypeFactories.md
diff --git a/docs/guides/Writing.md b/docs/guides/Writing.md
diff --git a/docs/tools/preprocess_docs.py b/docs/tools/preprocess_docs.py