Skip to content
This repository has been archived by the owner on Jul 1, 2023. It is now read-only.

Make ShapedArray.description's maxScalarCountPerLine user-customizable #1168

Open
xanderdunn opened this issue Dec 24, 2020 · 9 comments
Open
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@xanderdunn
Copy link

xanderdunn commented Dec 24, 2020

Here is ShapedArray's fileprivate func description( indentLevel: Int, edgeElementCount: Int, maxScalarLength: Int, maxScalarCountPerLine: Int, summarizing: Bool ) -> String.

Is there any reason this is marked fileprivate? It's currently accessible only via the public func description( lineWidth: Int = 80, edgeElementCount: Int = 3, summarizing: Bool = false ) where the maxScalarCountPerLine is calculated for me:

let maxScalarCountPerLine = Swift.max(1, lineWidth / maxScalarLength)

Calculating the maxScalarCountPerLine independently for each Tensor leads to this problem:

=== Feature 0:
input:
[ -0.022060618,   0.024561103,  -0.025651768,   -0.04885944,   0.012175075,   0.006922609,    -0.0516627,
  -0.019092154,   0.024305645,  -0.028501112,  -0.047275346,   0.014285761,    0.00435431,  -0.052575804,
   -0.01609808,   0.023822624,  -0.031269953,   -0.04550122,   0.016227337,  0.0016748396,   -0.05326795,
  -0.013095862,    0.02311485,   -0.03394214,  -0.043547418,   0.017988473,  -0.001100175,  -0.053735107,
  -0.010103013,   0.022186458,  -0.036502086,          -0.0,           0.0, -0.0039545433,  -0.053974554]
output:
[       -0.0,         0.0,        -0.0,        -0.0,         0.0,        -0.0,        -0.0,        -0.0,        -0.0,
        -0.0,        -0.0,         0.0,         0.0,        -0.0,         0.0,        -0.0,        -0.0,         0.0,
         0.0,         0.0,        -0.0,        -0.0,         0.0,         0.0,         0.0,         0.0,        -0.0,
        -0.0,         0.0,         0.0,        -0.0,  -1.6888539, -0.99011576,         0.0,         0.0]
target:
[        -0.0,          0.0,         -0.0,         -0.0,          0.0,          0.0,         -0.0,         -0.0,
          0.0,         -0.0,         -0.0,          0.0,          0.0,         -0.0,         -0.0,          0.0,
         -0.0,         -0.0,          0.0,          0.0,         -0.0,         -0.0,          0.0,         -0.0,
         -0.0,          0.0,         -0.0,         -0.0,         -0.0,          0.0,         -0.0, -0.041425332,
  0.019558901,         -0.0,         -0.0]

Each Tensor has a different number of scalars per line, so the values are visually shifted in each of the three descriptions. This makes it difficult to visually inspect the values at the same position in each tensor.

I would like to be able to force the max number of scalars per line for each Tensor so that the values are more readily visually comparable.

@dan-zheng
Copy link
Member

Here's the PR that added Tensor pretty-printing: swiftlang/swift#23837. The Swift implementation of ShapedArray.description is largely based on the PyTorch pretty-printing implementation, which is a simpler version of NumPy's: TF-419.

The original goal with Tensor pretty-printing was to closely match the output of PyTorch. Does your example print better in existing n-d array libraries, like NumPy or PyTorch? I wonder if PyTorch printing exposes enough knobs to achieve what you'd like to do, without using unreasonably unsafe or private APIs.

@xanderdunn
Copy link
Author

xanderdunn commented Dec 24, 2020

Thanks @dan-zheng. I copied the above linked swift-apis code into my project:

extension String {
  /// Returns a string of the specified length, padded with whitespace to the left.
  func leftPadded(toLength length: Int) -> String {
    return repeatElement(" ", count: max(0, length - count)) + self
  }
}

public extension ShapedArray {

  func vectorDescription(
    indentLevel: Int,
    edgeElementCount: Int,
    maxScalarLength: Int,
    maxScalarCountPerLine: Int,
    summarizing: Bool
  ) -> String {
    // Get scalar descriptions.
    func scalarDescription(_ element: Element) -> String {
      let description = String(describing: element)
      return description.leftPadded(toLength: maxScalarLength)
    }

    var scalarDescriptions: [String] = []
    if summarizing && count > 2 * edgeElementCount {
      scalarDescriptions += prefix(edgeElementCount).map(scalarDescription)
      scalarDescriptions += ["..."]
      scalarDescriptions += suffix(edgeElementCount).map(scalarDescription)
    } else {
      scalarDescriptions += map(scalarDescription)
    }

    // Combine scalar descriptions into lines, based on the scalar count per line.
    let lines = stride(
      from: scalarDescriptions.startIndex,
      to: scalarDescriptions.endIndex,
      by: maxScalarCountPerLine
    ).map { i -> ArraySlice<String> in
      let upperBound = Swift.min(
        i.advanced(by: maxScalarCountPerLine),
        scalarDescriptions.count)
      return scalarDescriptions[i..<upperBound]
    }

    // Return lines joined with separators.
    let lineSeparator = ",\n" + String(repeating: " ", count: indentLevel + 1)
    return lines.enumerated().reduce(into: "[") { result, entry in
      let (i, line) = entry
      result += line.joined(separator: ", ")
      result += i != lines.count - 1 ? lineSeparator : ""
    } + "]"
  }

  func description(
    indentLevel: Int,
    edgeElementCount: Int,
    maxScalarLength: Int,
    maxScalarCountPerLine: Int,
    summarizing: Bool
  ) -> String {
    // Handle scalars.
    if let scalar = scalar {
      return String(describing: scalar)
    }

    // Handle vectors, which have special line-width-sensitive logic.
    if rank == 1 {
      return vectorDescription(
        indentLevel: indentLevel,
        edgeElementCount: edgeElementCount,
        maxScalarLength: maxScalarLength,
        maxScalarCountPerLine: maxScalarCountPerLine,
        summarizing: summarizing)
    }

    // Handle higher-rank tensors.
    func elementDescription(_ element: Element) -> String {
      return element.description//(
        /*indentLevel: indentLevel + 1,*/
        /*edgeElementCount: edgeElementCount,*/
        /*maxScalarLength: maxScalarLength,*/
        /*maxScalarCountPerLine: maxScalarCountPerLine,*/
        /*summarizing: summarizing)*/
    }

    var elementDescriptions: [String] = []
    if summarizing && count > 2 * edgeElementCount {
      elementDescriptions += prefix(edgeElementCount).map(elementDescription)
      elementDescriptions += ["..."]
      elementDescriptions += suffix(edgeElementCount).map(elementDescription)
    } else {
      elementDescriptions += map(elementDescription)
    }

    // Return lines joined with separators.
    let lineSeparator =
      "," + String(repeating: "\n", count: rank - 1)
      + String(repeating: " ", count: indentLevel + 1)
    return elementDescriptions.enumerated().reduce(into: "[") { result, entry in
      let (i, elementDescription) = entry
      result += elementDescription
      result += i != elementDescriptions.count - 1 ? lineSeparator : ""
    } + "]"
  }
}

And this achieved what I wanted:

=== Variable 0:
input:
[-0.046002183, 0.015716469, 0.0024140144, -0.05310176, -0.013912535, 0.023329472, -0.033225544, -0.044096183,
  0.017527763, -0.00033658138,       -0.0,       -0.0,        0.0,       -0.0,       -0.0,        0.0,
  -0.0031709874, -0.05393206, -0.007940362, 0.021374973, -0.038286343, -0.03978186, 0.020576812, -0.006072663,
  -0.0540048, -0.005004768, 0.020078853, -0.04061756, -0.037398703, 0.021796772, -0.009024683, -0.053848244,
  -0.002125753, 0.01857983, -0.04279759]
output:
[      -0.0,       -0.0,        0.0,        0.0,       -0.0,        0.0,       -0.0,        0.0,
         0.0,       -0.0, -0.2371707,  1.7225978, -2.3474987, -1.0847256,  1.6712375,  0.3812195,
        -0.0,       -0.0,       -0.0,        0.0,       -0.0,       -0.0,       -0.0,        0.0,
        -0.0,        0.0,       -0.0,        0.0,        0.0,       -0.0,       -0.0,       -0.0,
         0.0,       -0.0,       -0.0]
target:
[      -0.0,        0.0,        0.0,       -0.0,       -0.0,        0.0,       -0.0,       -0.0,
         0.0,       -0.0, -0.053630464, -0.010915402, 0.022460626, -0.035817537, -0.042018704, 0.019151034,
        -0.0,       -0.0,       -0.0,        0.0,       -0.0,       -0.0,        0.0,       -0.0,
        -0.0,       -0.0,        0.0,       -0.0,       -0.0,        0.0,       -0.0,       -0.0,
        -0.0,        0.0,       -0.0]

The corresponding values are all lined up visually now. Given that it works as-is, I was wondering why it needed to be fileprivate.

I'm not seeing any options in Pytorch's set_printoptions that would achieve this. Nothing stands out in the numpy implementation that would achieve it either. No worries if this is outside the scope of the intended API.

@dan-zheng
Copy link
Member

dan-zheng commented Dec 24, 2020

Nice, thanks for sharing your code snippet! Could you please complete the example by showing the invocation of ShapedArray.description(...) used to print the array contents?


I would describe your change as "adding maxScalarCountPerLine as a customizable argument to ShapedArray.description(...)" - would you agree with this more pointed description? If so, I might recommend changing the PR title to be more specific along those lines. Currently, the title sounds like "changing private API to be public", which sounds scarier.

Supporting this change tentatively sounds good to me (I haven't thought about it super hard). Do you have some intuition why maxScalarCountPerLine should be user-customizable instead of always computing it from other arguments (maxScalarLength, edgeElementCount)? Feel free to start a PR with tests for review!

@xanderdunn xanderdunn changed the title Make ShapedArray description(indentLevel:edgeElementCount:maxScalarLength:maxScalarCountPerLine:summarizing:) Public Add maxScalarCountPerLine to ShapedArray.description Dec 24, 2020
@xanderdunn xanderdunn changed the title Add maxScalarCountPerLine to ShapedArray.description Make ShapedArray.description's maxScalarCountPerLine user-customizable Dec 24, 2020
@xanderdunn
Copy link
Author

Each ShapedArray was printed with:

print(myTensor[TensorRange.ellipsis, variableIndex].array.description(indentLevel: 0,
                                                                         edgeElementCount: 50,
                                                                         maxScalarLength: 10,
                                                                         maxScalarCountPerLine: 8,
                                                                         summarizing: true))

Oh yes, the idea of opening currently private API is incidental. The core idea is supporting maxScalarCountPerLine in ShapedArray's description. Thanks, I changed the title.

maxScalarCountPerLine should be user-customizable so that tensors can be printed with the same number of rows and columns, thus making different tensors visually comparable, like in my above two examples. I think right now maxScalarCountPerLine is a function of lineWidth and maxScalarLength:

public func description(
    lineWidth: Int = 80,
    edgeElementCount: Int = 3,
    summarizing: Bool = false
  ) -> String {
let maxScalarLength = scalars.lazy.map { String(describing: $0).count }.max() ?? 3
let maxScalarCountPerLine = Swift.max(1, lineWidth / maxScalarLength)

where only the lineWidth is user-customizable. This makes it difficult or impossible to control the number of columns that are printed for a given tensor, which makes it difficult to visually compare tensors.

This was born out of a desire to understand my loss values. Printing a snippet of my model's output alongside input and target allows me to understand how close the outputs really are to the targets in absolute value for a given valLoss.

@dan-zheng
Copy link
Member

Thanks for the context! Feel free to open a PR, if you'd like to upstream your func description(...) changes.

@dan-zheng dan-zheng added enhancement New feature or request help wanted Extra attention is needed labels Dec 24, 2020
@xanderdunn
Copy link
Author

Thanks @dan-zheng, I'll plan to open a pull request with some unit tests after Christmas.

@dan-zheng
Copy link
Member

Thank you! Take your time - Santa and we are patiently waiting for you after Xmas.

@brettkoonce
Copy link
Contributor

🎅

@dan-zheng
Copy link
Member

🎅

tbh GitHub needs to enable more than eight reaction emoji 🙎🏻‍♂️

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants