diff --git a/docs/_downloads/a607fe7aa48af0b6284b3555b090b9a2/fx_conv_bn_fuser.ipynb b/docs/_downloads/a607fe7aa48af0b6284b3555b090b9a2/fx_conv_bn_fuser.ipynb index c56d02626..ceb07eeb9 100644 --- a/docs/_downloads/a607fe7aa48af0b6284b3555b090b9a2/fx_conv_bn_fuser.ipynb +++ b/docs/_downloads/a607fe7aa48af0b6284b3555b090b9a2/fx_conv_bn_fuser.ipynb @@ -15,14 +15,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\n(beta) Building a Convolution/Batch Norm fuser in FX\n*******************************************************\n**Author**: `Horace He `_\n\nIn this tutorial, we are going to use FX, a toolkit for composable function\ntransformations of PyTorch, to do the following:\n\n1) Find patterns of conv/batch norm in the data dependencies.\n2) For the patterns found in 1), fold the batch norm statistics into the convolution weights.\n\nNote that this optimization only works for models in inference mode (i.e. `mode.eval()`)\n\nWe will be building the fuser that exists here:\nhttps://github.com/pytorch/pytorch/blob/orig/release/1.8/torch/fx/experimental/fuser.py\n" + "\n(\ubca0\ud0c0) FX\uc5d0\uc11c \ud569\uc131\uacf1/\ubc30\uce58 \uc815\uaddc\ud654(Convolution/Batch Norm) \uacb0\ud569\uae30(Fuser) \ub9cc\ub4e4\uae30\n****************************************************************************\n**\uc800\uc790**: `Horace He `_\n\n**\ubc88\uc5ed:** `\uc624\ucc2c\ud76c `_\n\n\uc774 \ud29c\ud1a0\ub9ac\uc5bc\uc5d0\uc11c\ub294 PyTorch\uc758 \uad6c\uc131 \uac00\ub2a5\ud55c \ud568\uc218\uc758 \ubcc0\ud658\uc744 \uc704\ud55c \ud234\ud0b7\uc778 FX\ub97c \uc0ac\uc6a9\ud558\uc5ec \ub2e4\uc74c\uc744 \uc218\ud589\ud558\uace0\uc790 \ud569\ub2c8\ub2e4.\n\n1) \ub370\uc774\ud130 \uc758\uc874\uc131\uc5d0\uc11c \ud569\uc131\uacf1/\ubc30\uce58 \uc815\uaddc\ud654 \ud328\ud134\uc744 \ucc3e\uc2b5\ub2c8\ub2e4.\n2) 1\ubc88\uc5d0\uc11c \ubc1c\uacac\ub41c \ud328\ud134\uc758 \uacbd\uc6b0 \ubc30\uce58 \uc815\uaddc\ud654 \ud1b5\uacc4\ub97c \ud569\uc131\uacf1 \uac00\uc911\uce58\ub85c \uacb0\ud569\ud569\ub2c8\ub2e4(folding).\n\n\uc774 \ucd5c\uc801\ud654\ub294 \ucd94\ub860 \ubaa8\ub4dc(\uc989, `mode.eval()`)\uc758 \ubaa8\ub378\uc5d0\ub9cc \uc801\uc6a9\ub41c\ub2e4\ub294 \uc810\uc5d0 \uc720\uc758\ud558\uc138\uc694.\n\n\ub2e4\uc74c \ub9c1\ud06c\uc5d0 \uc788\ub294 \uacb0\ud569\uae30\ub97c \ub9cc\ub4e4 \uac83\uc785\ub2c8\ub2e4.\nhttps://github.com/pytorch/pytorch/blob/orig/release/1.8/torch/fx/experimental/fuser.py\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "First, let's get some imports out of the way (we will be using all\nof these later in the code).\n\n" + "\uba87 \uac00\uc9c0\uc758 import \uacfc\uc815\uc744 \uba3c\uc800 \ucc98\ub9ac\ud574\uc90d\uc2dc\ub2e4(\ub098\uc911\uc5d0 \ucf54\ub4dc\uc5d0\uc11c \ubaa8\ub450 \uc0ac\uc6a9\ud560 \uac83\uc785\ub2c8\ub2e4).\n\n" ] }, { @@ -40,7 +40,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "For this tutorial, we are going to create a model consisting of convolutions\nand batch norms. Note that this model has some tricky components - some of\nthe conv/batch norm patterns are hidden within Sequentials and one of the\nBatchNorms is wrapped in another Module.\n\n" + "\uc774 \ud29c\ud1a0\ub9ac\uc5bc\uc5d0\uc11c\ub294 \ud569\uc131\uacf1\uacfc \ubc30\uce58 \uc815\uaddc\ud654\ub85c \uad6c\uc131\ub41c \ubaa8\ub378\uc744 \ub9cc\ub4e4 \uac83\uc785\ub2c8\ub2e4.\n\uc774 \ubaa8\ub378\uc5d0\ub294 \uc544\ub798\uc640 \uac19\uc740 \uae4c\ub2e4\ub85c\uc6b4 \uc694\uc18c\uac00 \uc788\uc2b5\ub2c8\ub2e4.\n\ud569\uc131\uacf1/\ubc30\uce58 \uc815\uaddc\ud654 \ud328\ud134 \uc911\uc758 \uc77c\ubd80\ub294 \uc2dc\ud000\uc2a4\uc5d0 \uc228\uaca8\uc838 \uc788\uace0\n\ubc30\uce58 \uc815\uaddc\ud654 \uc911 \ud558\ub098\ub294 \ub2e4\ub978 \ubaa8\ub4c8\ub85c \uac10\uc2f8\uc838 \uc788\uc2b5\ub2c8\ub2e4.\n\n" ] }, { @@ -58,7 +58,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Fusing Convolution with Batch Norm\n-----------------------------------------\nOne of the primary challenges with trying to automatically fuse convolution\nand batch norm in PyTorch is that PyTorch does not provide an easy way of\naccessing the computational graph. FX resolves this problem by symbolically\ntracing the actual operations called, so that we can track the computations\nthrough the `forward` call, nested within Sequential modules, or wrapped in\nan user-defined module.\n\n" + "\ud569\uc131\uacf1\uacfc \ubc30\uce58 \uc815\uaddc\ud654 \uacb0\ud569\ud558\uae30\n-----------------------------\nPyTorch\uc5d0\uc11c \ud569\uc131\uacf1\uacfc \ubc30\uce58 \uc815\uaddc\ud654\ub97c \uc790\ub3d9\uc73c\ub85c \uacb0\ud569\ud558\ub824\uace0 \ud560 \ub54c \uac00\uc7a5 \ud070 \uc5b4\ub824\uc6c0 \uc911 \ud558\ub098\ub294\nPyTorch\uac00 \uacc4\uc0b0 \uadf8\ub798\ud504\uc5d0 \uc27d\uac8c \uc811\uadfc\ud560 \uc218 \uc788\ub294 \ubc29\ubc95\uc744 \uc81c\uacf5\ud558\uc9c0 \uc54a\ub294\ub2e4\ub294 \uac83\uc785\ub2c8\ub2e4.\nFX\ub294 \ud638\ucd9c\ub41c \uc2e4\uc81c \uc5f0\uc0b0\uc744 \uae30\ud638\uc801(symbolically)\uc73c\ub85c \ucd94\uc801\ud558\uc5ec \uc774 \ubb38\uc81c\ub97c \ud574\uacb0\ud558\ubbc0\ub85c \uc21c\ucc28\uc801 \ubaa8\ub4c8 \ub0b4\uc5d0 \uc911\ucca9\ub418\uac70\ub098\n\uc0ac\uc6a9\uc790 \uc815\uc758 \ubaa8\ub4c8\ub85c \uac10\uc2f8\uc9c4 `forward` \ud638\ucd9c\uc744 \ud1b5\ud574 \uacc4\uc0b0\uc744 \ucd94\uc801\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n\n" ] }, { @@ -76,14 +76,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This gives us a graph representation of our model. Note that both the modules\nhidden within the sequential as well as the wrapped Module have been inlined\ninto the graph. This is the default level of abstraction, but it can be\nconfigured by the pass writer. More information can be found at the FX\noverview https://pytorch.org/docs/master/fx.html#module-torch.fx\n\n" + "\uc774\ub807\uac8c \ud558\uba74 \ubaa8\ub378\uc744 \uadf8\ub798\ud504\ub85c \ub098\ud0c0\ub0bc \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n\uc21c\ucc28\uc801 \ubaa8\ub4c8 \ubc0f \uac10\uc2f8\uc9c4 \ubaa8\ub4c8 \ub0b4\uc5d0 \uc228\uaca8\uc9c4 \ub450 \ubaa8\ub4c8\uc774 \ubaa8\ub450 \uadf8\ub798\ud504\uc5d0 \uc0bd\uc785\ub418\uc5b4 \uc788\uc2b5\ub2c8\ub2e4.\n\uc774\ub294 \uae30\ubcf8 \ucd94\uc0c1\ud654 \uc218\uc900\uc774\uc9c0\ub9cc \uc804\ub2ec \uae30\ub85d\uae30(pass writer)\uc5d0\uc11c \uad6c\uc131\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n\uc790\uc138\ud55c \ub0b4\uc6a9\uc740 \ub2e4\uc74c \ub9c1\ud06c\uc758 FX \uac1c\uc694\uc5d0\uc11c \ud655\uc778\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\nhttps://pytorch.org/docs/master/fx.html#module-torch.fx\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Fusing Convolution with Batch Norm\n----------------------------------\nUnlike some other fusions, fusion of convolution with batch norm does not\nrequire any new operators. Instead, as batch norm during inference\nconsists of a pointwise add and multiply, these operations can be \"baked\"\ninto the preceding convolution's weights. This allows us to remove the batch\nnorm entirely from our model! Read\nhttps://nenadmarkus.com/p/fusing-batchnorm-and-conv/ for further details. The\ncode here is copied from\nhttps://github.com/pytorch/pytorch/blob/orig/release/1.8/torch/nn/utils/fusion.py\nclarity purposes.\n\n" + "\ud569\uc131\uacf1\uacfc \ubc30\uce58 \uc815\uaddc\ud654 \uacb0\ud569\ud558\uae30\n---------------------------\n\uc77c\ubd80 \ub2e4\ub978 \uacb0\ud569\uacfc \ub2ec\ub9ac, \ud569\uc131\uacf1\uacfc \ubc30\uce58 \uc815\uaddc\ud654\uc758 \uacb0\ud569\uc740 \uc0c8\ub85c\uc6b4 \uc5f0\uc0b0\uc790\ub97c \ud544\uc694\ub85c \ud558\uc9c0 \uc54a\uc2b5\ub2c8\ub2e4.\n\ub300\uc2e0, \ucd94\ub860 \uc911 \ubc30\uce58 \uc815\uaddc\ud654\ub294 \uc810\ubcc4 \ub367\uc148\uacfc \uacf1\uc148\uc73c\ub85c \uad6c\uc131\ub418\ubbc0\ub85c,\n\uc774\ub7ec\ud55c \uc5f0\uc0b0\uc740 \uc774\uc804 \ud569\uc131\uacf1\uc758 \uac00\uc911\uce58\ub85c \"\ubbf8\ub9ac \uacc4\uc0b0\ub418\uc5b4 \uc800\uc7a5(baked)\" \ub420 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n\uc774\ub97c \ud1b5\ud574 \ubc30\uce58 \uc815\uaddc\ud654\ub97c \ubaa8\ub378\uc5d0\uc11c \uc644\uc804\ud788 \uc81c\uac70\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4!\n\uc790\uc138\ud55c \ub0b4\uc6a9\uc740 \ub2e4\uc74c \ub9c1\ud06c\uc5d0\uc11c \ud655\uc778 \ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\nhttps://nenadmarkus.com/p/fusing-batchnorm-and-conv/\n\uc774 \ucf54\ub4dc\ub294 \uba85\ud655\uc131\uc744 \uc704\ud574 \ub2e4\uc74c \ub9c1\ud06c\uc5d0\uc11c \ubcf5\uc0ac\ud55c \uac83\uc785\ub2c8\ub2e4.\nhttps://github.com/pytorch/pytorch/blob/orig/release/1.8/torch/nn/utils/fusion.py\n\n" ] }, { @@ -94,14 +94,14 @@ }, "outputs": [], "source": [ - "def fuse_conv_bn_eval(conv, bn):\n \"\"\"\n Given a conv Module `A` and an batch_norm module `B`, returns a conv\n module `C` such that C(x) == B(A(x)) in inference mode.\n \"\"\"\n assert(not (conv.training or bn.training)), \"Fusion only for eval!\"\n fused_conv = copy.deepcopy(conv)\n\n fused_conv.weight, fused_conv.bias = \\\n fuse_conv_bn_weights(fused_conv.weight, fused_conv.bias,\n bn.running_mean, bn.running_var, bn.eps, bn.weight, bn.bias)\n\n return fused_conv\n\ndef fuse_conv_bn_weights(conv_w, conv_b, bn_rm, bn_rv, bn_eps, bn_w, bn_b):\n if conv_b is None:\n conv_b = torch.zeros_like(bn_rm)\n if bn_w is None:\n bn_w = torch.ones_like(bn_rm)\n if bn_b is None:\n bn_b = torch.zeros_like(bn_rm)\n bn_var_rsqrt = torch.rsqrt(bn_rv + bn_eps)\n\n conv_w = conv_w * (bn_w * bn_var_rsqrt).reshape([-1] + [1] * (len(conv_w.shape) - 1))\n conv_b = (conv_b - bn_rm) * bn_var_rsqrt * bn_w + bn_b\n\n return torch.nn.Parameter(conv_w), torch.nn.Parameter(conv_b)" + "def fuse_conv_bn_eval(conv, bn):\n \"\"\"\n \ud569\uc131\uacf1 \ubaa8\ub4c8 'A'\uc640 \ubc30\uce58 \uc815\uaddc\ud654 \ubaa8\ub4c8 'B'\uac00 \uc8fc\uc5b4\uc9c0\uba74\n C(x) == B(A(x))\ub97c \ub9cc\uc871\ud558\ub294 \ud569\uc131\uacf1 \ubaa8\ub4c8 'C'\ub97c \ucd94\ub860 \ubaa8\ub4dc\ub85c \ubc18\ud658\ud569\ub2c8\ub2e4.\n \"\"\"\n assert(not (conv.training or bn.training)), \"Fusion only for eval!\"\n fused_conv = copy.deepcopy(conv)\n\n fused_conv.weight, fused_conv.bias = \\\n fuse_conv_bn_weights(fused_conv.weight, fused_conv.bias,\n bn.running_mean, bn.running_var, bn.eps, bn.weight, bn.bias)\n\n return fused_conv\n\ndef fuse_conv_bn_weights(conv_w, conv_b, bn_rm, bn_rv, bn_eps, bn_w, bn_b):\n if conv_b is None:\n conv_b = torch.zeros_like(bn_rm)\n if bn_w is None:\n bn_w = torch.ones_like(bn_rm)\n if bn_b is None:\n bn_b = torch.zeros_like(bn_rm)\n bn_var_rsqrt = torch.rsqrt(bn_rv + bn_eps)\n\n conv_w = conv_w * (bn_w * bn_var_rsqrt).reshape([-1] + [1] * (len(conv_w.shape) - 1))\n conv_b = (conv_b - bn_rm) * bn_var_rsqrt * bn_w + bn_b\n\n return torch.nn.Parameter(conv_w), torch.nn.Parameter(conv_b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "FX Fusion Pass\n----------------------------------\nNow that we have our computational graph as well as a method for fusing\nconvolution and batch norm, all that remains is to iterate over the FX graph\nand apply the desired fusions.\n\n" + "FX \uacb0\ud569 \uc804\ub2ec(pass)\n--------------\n\uc774\uc81c \ud569\uc131\uacf1\uacfc \ubc30\uce58 \uc815\uaddc\ud654\ub97c \uacb0\ud569\ud558\ub294 \ubc29\ubc95\ubfd0\ub9cc \uc544\ub2c8\ub77c \uacc4\uc0b0 \uadf8\ub798\ud504\ub3c4 \uc5bb\uc5c8\uc73c\ubbc0\ub85c\n\ub0a8\uc740 \uac83\uc740 FX \uadf8\ub798\ud504\uc5d0 \uc808\ucc28\ub97c \ubc18\ubcf5\ud558\uace0 \uc6d0\ud558\ub294 \uacb0\ud569\uc744 \uc801\uc6a9\ud558\ub294 \uac83\uc785\ub2c8\ub2e4.\n\n" ] }, { @@ -112,21 +112,21 @@ }, "outputs": [], "source": [ - "def _parent_name(target : str) -> Tuple[str, str]:\n \"\"\"\n Splits a qualname into parent path and last atom.\n For example, `foo.bar.baz` -> (`foo.bar`, `baz`)\n \"\"\"\n *parent, name = target.rsplit('.', 1)\n return parent[0] if parent else '', name\n\ndef replace_node_module(node: fx.Node, modules: Dict[str, Any], new_module: torch.nn.Module):\n assert(isinstance(node.target, str))\n parent_name, name = _parent_name(node.target)\n setattr(modules[parent_name], name, new_module)\n\n\ndef fuse(model: torch.nn.Module) -> torch.nn.Module:\n model = copy.deepcopy(model)\n # The first step of most FX passes is to symbolically trace our model to\n # obtain a `GraphModule`. This is a representation of our original model\n # that is functionally identical to our original model, except that we now\n # also have a graph representation of our forward pass.\n fx_model: fx.GraphModule = fx.symbolic_trace(model)\n modules = dict(fx_model.named_modules())\n\n # The primary representation for working with FX are the `Graph` and the\n # `Node`. Each `GraphModule` has a `Graph` associated with it - this\n # `Graph` is also what generates `GraphModule.code`.\n # The `Graph` itself is represented as a list of `Node` objects. Thus, to\n # iterate through all of the operations in our graph, we iterate over each\n # `Node` in our `Graph`.\n for node in fx_model.graph.nodes:\n # The FX IR contains several types of nodes, which generally represent\n # call sites to modules, functions, or methods. The type of node is\n # determined by `Node.op`.\n if node.op != 'call_module': # If our current node isn't calling a Module then we can ignore it.\n continue\n # For call sites, `Node.target` represents the module/function/method\n # that's being called. Here, we check `Node.target` to see if it's a\n # batch norm module, and then check `Node.args[0].target` to see if the\n # input `Node` is a convolution.\n if type(modules[node.target]) is nn.BatchNorm2d and type(modules[node.args[0].target]) is nn.Conv2d:\n if len(node.args[0].users) > 1: # Output of conv is used by other nodes\n continue\n conv = modules[node.args[0].target]\n bn = modules[node.target]\n fused_conv = fuse_conv_bn_eval(conv, bn)\n replace_node_module(node.args[0], modules, fused_conv)\n # As we've folded the batch nor into the conv, we need to replace all uses\n # of the batch norm with the conv.\n node.replace_all_uses_with(node.args[0])\n # Now that all uses of the batch norm have been replaced, we can\n # safely remove the batch norm.\n fx_model.graph.erase_node(node)\n fx_model.graph.lint()\n # After we've modified our graph, we need to recompile our graph in order\n # to keep the generated code in sync.\n fx_model.recompile()\n return fx_model" + "def _parent_name(target : str) -> Tuple[str, str]:\n \"\"\"\n \uc815\uaddc\ud654 \ub41c \uc774\ub984(qualname)\uc744 \ubd80\ubaa8\uacbd\ub85c(parent path)\uc640 \ub9c8\uc9c0\ub9c9 \uc694\uc18c(last atom)\ub85c \ub098\ub220\uc90d\ub2c8\ub2e4.\n \uc608\ub97c \ub4e4\uc5b4, `foo.bar.baz` -> (`foo.bar`, `baz`)\n \"\"\"\n *parent, name = target.rsplit('.', 1)\n return parent[0] if parent else '', name\n\ndef replace_node_module(node: fx.Node, modules: Dict[str, Any], new_module: torch.nn.Module):\n assert(isinstance(node.target, str))\n parent_name, name = _parent_name(node.target)\n setattr(modules[parent_name], name, new_module)\n\n\ndef fuse(model: torch.nn.Module) -> torch.nn.Module:\n model = copy.deepcopy(model)\n # \ub300\ubd80\ubd84\uc758 FX \uc804\ub2ec\uc758 \uccab \ubc88\uc9f8 \ub2e8\uacc4\ub294 `GraphModule` \uc744 \uc5bb\uae30 \uc704\ud574\n # \ubaa8\ub378\uc744 \uae30\ud638\uc801\uc73c\ub85c \ucd94\uc801\ud558\ub294 \uac83\uc785\ub2c8\ub2e4.\n # \uc774\uac83\uc740 \uc6d0\ub798 \ubaa8\ub378\uacfc \uae30\ub2a5\uc801\uc73c\ub85c \ub3d9\uc77c\ud55c \uc6d0\ub798 \ubaa8\ub378\uc758 \ud45c\ud604\uc785\ub2c8\ub2e4.\n # \ub2e8, \uc774\uc81c\ub294 \uc21c\uc804\ud30c \ub2e8\uacc4(forward pass)\uc5d0 \ub300\ud55c \uadf8\ub798\ud504 \ud45c\ud604\ub3c4 \uac00\uc9c0\uace0 \uc788\uc2b5\ub2c8\ub2e4.\n fx_model: fx.GraphModule = fx.symbolic_trace(model)\n modules = dict(fx_model.named_modules())\n\n # FX \uc791\uc5c5\uc744 \uc704\ud55c \uae30\ubcf8 \ud45c\ud604\uc740 `\uadf8\ub798\ud504(Graph)` \uc640 `\ub178\ub4dc(Node)` \uc785\ub2c8\ub2e4.\n # \uac01 `GraphModule` \uc5d0\ub294 \uc5f0\uad00\ub41c `\uadf8\ub798\ud504` \uac00 \uc788\uc2b5\ub2c8\ub2e4.\n # \uc774 `\uadf8\ub798\ud504` \ub294 `GraphModule.code` \ub97c \uc0dd\uc131\ud558\ub294 \uac83\uc774\uae30\ub3c4 \ud569\ub2c8\ub2e4.\n # `\uadf8\ub798\ud504` \uc790\uccb4\ub294 `\ub178\ub4dc` \uac1d\uccb4\uc758 \ubaa9\ub85d\uc73c\ub85c \ud45c\uc2dc\ub429\ub2c8\ub2e4.\n # \ub530\ub77c\uc11c \uadf8\ub798\ud504\uc758 \ubaa8\ub4e0 \uc791\uc5c5\uc744 \ubc18\ubcf5\ud558\uae30 \uc704\ud574 `\uadf8\ub798\ud504` \uc5d0\uc11c \uac01 `\ub178\ub4dc` \uc5d0 \ub300\ud574 \ubc18\ubcf5\ud569\ub2c8\ub2e4.\n for node in fx_model.graph.nodes:\n # FX IR \uc5d0\ub294 \uc77c\ubc18\uc801\uc73c\ub85c \ubaa8\ub4c8, \ud568\uc218 \ub610\ub294 \uba54\uc18c\ub4dc\uc5d0 \ub300\ud55c\n # \ud638\ucd9c \uc0ac\uc774\ud2b8\ub97c \ub098\ud0c0\ub0b4\ub294 \uc5ec\ub7ec \uc720\ud615\uc758 \ub178\ub4dc\uac00 \uc788\uc2b5\ub2c8\ub2e4.\n # \ub178\ub4dc\uc758 \uc720\ud615\uc740 `Node.op` \uc5d0 \uc758\ud574 \uacb0\uc815\ub429\ub2c8\ub2e4.\n if node.op != 'call_module': # \ud604\uc7ac \ub178\ub4dc\uac00 \ubaa8\ub4c8\uc744 \ud638\ucd9c\ud558\uc9c0 \uc54a\uc73c\uba74 \ubb34\uc2dc\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n continue\n # \ud638\ucd9c \uc0ac\uc774\ud2b8\uc758 \uacbd\uc6b0, `Node.target` \uc740 \ud638\ucd9c\ub418\ub294 \ubaa8\ub4c8/\ud568\uc218/\ubc29\ubc95\uc744 \ub098\ud0c0\ub0c5\ub2c8\ub2e4.\n # \uc5ec\uae30\uc11c\ub294 'Node.target' \uc744 \ud655\uc778\ud558\uc5ec \ubc30\uce58 \uc815\uaddc\ud654 \ubaa8\ub4c8\uc778\uc9c0 \ud655\uc778\ud55c \ub2e4\uc74c\n # `Node.args[0].target` \uc744 \ud655\uc778\ud558\uc5ec \uc785\ub825 `\ub178\ub4dc` \uac00 \ud569\uc131\uacf1\uc778\uc9c0 \ud655\uc778\ud569\ub2c8\ub2e4.\n if type(modules[node.target]) is nn.BatchNorm2d and type(modules[node.args[0].target]) is nn.Conv2d:\n if len(node.args[0].users) > 1: # \ud569\uc131\uacf1 \ucd9c\ub825\uc740 \ub2e4\ub978 \ub178\ub4dc\uc5d0\uc11c \uc0ac\uc6a9\ub429\ub2c8\ub2e4.\n continue\n conv = modules[node.args[0].target]\n bn = modules[node.target]\n fused_conv = fuse_conv_bn_eval(conv, bn)\n replace_node_module(node.args[0], modules, fused_conv)\n # \ubc30\uce58 \uc815\uaddc\ud654\ub97c \ud569\uc131\uacf1\uc73c\ub85c \uacb0\ud569\ud588\uae30 \ub54c\ubb38\uc5d0\n # \ubc30\uce58 \uc815\uaddc\ud654\uc758 \uc0ac\uc6a9\uc744 \ubaa8\ub450 \ud569\uc131\uacf1\uc73c\ub85c \uad50\uccb4\ud574\uc57c \ud569\ub2c8\ub2e4.\n node.replace_all_uses_with(node.args[0])\n # \ubc30\uce58 \uc815\uaddc\ud654 \uc0ac\uc6a9\uc744 \ubaa8\ub450 \uad50\uccb4\ud588\uc73c\ubbc0\ub85c\n # \uc548\uc804\ud558\uac8c \ubc30\uce58 \uc815\uaddc\ud654\ub97c \uc81c\uac70\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n fx_model.graph.erase_node(node)\n fx_model.graph.lint()\n # \uadf8\ub798\ud504\ub97c \uc218\uc815\ud55c \ud6c4\uc5d0\ub294 \uc0dd\uc131\ub41c \ucf54\ub4dc\ub97c \ub3d9\uae30\ud654\ud558\uae30 \uc704\ud574 \uadf8\ub798\ud504\ub97c \ub2e4\uc2dc \ucef4\ud30c\uc77c\ud574\uc57c \ud569\ub2c8\ub2e4.\n fx_model.recompile()\n return fx_model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "

Note

We make some simplifications here for demonstration purposes, such as only\n matching 2D convolutions. View\n https://github.com/pytorch/pytorch/blob/master/torch/fx/experimental/fuser.py\n for a more usable pass.

\n\n" + "

Note

\uc5ec\uae30\uc11c\ub294 2D \ud569\uc131\uacf1\ub9cc \uc77c\uce58\uc2dc\ud0a4\ub294 \ub4f1 \uc2dc\uc5f0 \ubaa9\uc801\uc73c\ub85c \uc57d\uac04\uc758 \ub2e8\uc21c\ud654\ub97c \ud558\uc600\uc2b5\ub2c8\ub2e4.\n \ub354 \uc720\uc6a9\ud55c \uc804\ub2ec\uc740 \ub2e4\uc74c \ub9c1\ud06c\ub97c \ucc38\uc870\ud558\uc2ed\uc2dc\uc624.\n https://github.com/pytorch/pytorch/blob/master/torch/fx/experimental/fuser.py

\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Testing out our Fusion Pass\n-----------------------------------------\nWe can now run this fusion pass on our initial toy model and verify that our\nresults are identical. In addition, we can print out the code for our fused\nmodel and verify that there are no more batch norms.\n\n" + "\uacb0\ud569 \uc804\ub2ec(Fusion pass) \uc2e4\ud5d8\ud558\uae30\n--------------------------------\n\uc774\uc81c \uc544\uc8fc \uc791\uc740 \ucd08\uae30 \ubaa8\ub378\uc5d0 \ub300\ud574 \uc774 \uacb0\ud569 \uc804\ub2ec\uc744 \uc2e4\ud589\ud574 \uacb0\uacfc\uac00 \ub3d9\uc77c\ud55c\uc9c0 \ud655\uc778\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n\ub610\ud55c \uacb0\ud569 \ubaa8\ub378\uc758 \ucf54\ub4dc\ub97c \ucd9c\ub825\ud558\uc5ec \ub354 \uc774\uc0c1 \ubc30\uce58 \uc815\uaddc\ud654\uac00 \uc5c6\ub294\uc9c0 \ud655\uc778\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n\n" ] }, { @@ -144,7 +144,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Benchmarking our Fusion on ResNet18\n----------\nWe can test our fusion pass on a larger model like ResNet18 and see how much\nthis pass improves inference performance.\n\n" + "ResNet18\uc5d0\uc11c \uacb0\ud569 \ubca4\uce58\ub9c8\ud0b9\ud558\uae30\n------------------------------\n\uc774\uc81c ResNet18\uacfc \uac19\uc740 \ub300\ud615 \ubaa8\ub378\uc5d0\uc11c \uacb0\ud569 \uc804\ub2ec\uc744 \uc2e4\ud5d8\ud558\uace0\n\uc774 \uc804\ub2ec\uc774 \ucd94\ub860 \uc131\ub2a5\uc744 \uc5bc\ub9c8\ub098 \ud5a5\uc0c1\uc2dc\ud0a4\ub294\uc9c0 \ud655\uc778\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n\n" ] }, { @@ -162,7 +162,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "As we previously saw, the output of our FX transformation is\n(Torchscriptable) PyTorch code, we can easily `jit.script` the output to try\nand increase our performance even more. In this way, our FX model\ntransformation composes with Torchscript with no issues.\n\n" + "\uc55e\uc11c \uc0b4\ud3b4\ubcf8 \ubc14\uc640 \uac19\uc774, FX \ubcc0\ud658\uc758 \ucd9c\ub825\uc740 (Torchscriptable) PyTorch \ucf54\ub4dc\uc785\ub2c8\ub2e4.\n\ub530\ub77c\uc11c `jit.script` \ub97c \ud1b5\ud574 \uc27d\uac8c \ucd9c\ub825\ud558\uc5ec \uc131\ub2a5\uc744 \ub354 \ub192\uc77c \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n\uc774\ub7ec\ud55c \ubc29\uc2dd\uc73c\ub85c FX \ubaa8\ub378 \ubcc0\ud658\uc740 Torchscript\uc640 \uc544\ubb34\ub7f0 \ubb38\uc81c \uc5c6\uc774 \uad6c\uc131\ub429\ub2c8\ub2e4.\n\n" ] }, { @@ -173,7 +173,7 @@ }, "outputs": [], "source": [ - "jit_rn18 = torch.jit.script(fused_rn18)\nprint(\"jit time: \", benchmark(jit_rn18))\n\n\n############\n# Conclusion\n# ----------\n# As we can see, using FX we can easily write static graph transformations on\n# PyTorch code.\n#\n# Since FX is still in beta, we would be happy to hear any\n# feedback you have about using it. Please feel free to use the\n# PyTorch Forums (https://discuss.pytorch.org/) and the issue tracker\n# (https://github.com/pytorch/pytorch/issues) to provide any feedback\n# you might have." + "jit_rn18 = torch.jit.script(fused_rn18)\nprint(\"jit time: \", benchmark(jit_rn18))\n\n\n######\n# \uacb0\ub860\n# ---\n# FX\ub97c \uc0ac\uc6a9\ud558\uba74 PyTorch \ucf54\ub4dc\uc5d0 \uc815\uc801 \uadf8\ub798\ud504 \ubcc0\ud658\uc744 \uc27d\uac8c \uc791\uc131\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n#\n# FX\ub294 \uc544\uc9c1 \ubca0\ud0c0 \ubc84\uc804\uc774\uae30 \ub54c\ubb38\uc5d0 FX \uc0ac\uc6a9\uc5d0 \ub300\ud55c \ud53c\ub4dc\ubc31\uc744 \ubcf4\ub0b4\uc8fc\uc2dc\uba74 \uac10\uc0ac\ud558\uaca0\uc2b5\ub2c8\ub2e4.\n# PyTorch \ud3ec\ub7fc (https://discuss.pytorch.org/)\n# \uc774\uc288 \ucd94\uc801\uae30 (https://github.com/pytorch/pytorch/issues)\n# \uc704 \ub450 \ub9c1\ud06c\ub97c \uc0ac\uc6a9\ud558\uc5ec \ud53c\ub4dc\ubc31\uc744 \uc81c\uacf5\ud574\uc8fc\uc2dc\uba74 \ub429\ub2c8\ub2e4." ] } ], diff --git a/docs/_downloads/b4e47a277095203f677594adf65ff4e3/fx_conv_bn_fuser.py b/docs/_downloads/b4e47a277095203f677594adf65ff4e3/fx_conv_bn_fuser.py index c06f5f768..e6d50f886 100644 --- a/docs/_downloads/b4e47a277095203f677594adf65ff4e3/fx_conv_bn_fuser.py +++ b/docs/_downloads/b4e47a277095203f677594adf65ff4e3/fx_conv_bn_fuser.py @@ -1,26 +1,25 @@ # -*- coding: utf-8 -*- """ -(beta) Building a Convolution/Batch Norm fuser in FX -******************************************************* -**Author**: `Horace He `_ +(베타) FX에서 합성곱/배치 정규화(Convolution/Batch Norm) 결합기(Fuser) 만들기 +**************************************************************************** +**저자**: `Horace He `_ -In this tutorial, we are going to use FX, a toolkit for composable function -transformations of PyTorch, to do the following: +**번역:** `오찬희 `_ -1) Find patterns of conv/batch norm in the data dependencies. -2) For the patterns found in 1), fold the batch norm statistics into the convolution weights. +이 튜토리얼에서는 PyTorch의 구성 가능한 함수의 변환을 위한 툴킷인 FX를 사용하여 다음을 수행하고자 합니다. -Note that this optimization only works for models in inference mode (i.e. `mode.eval()`) +1) 데이터 의존성에서 합성곱/배치 정규화 패턴을 찾습니다. +2) 1번에서 발견된 패턴의 경우 배치 정규화 통계를 합성곱 가중치로 결합합니다(folding). -We will be building the fuser that exists here: +이 최적화는 추론 모드(즉, `mode.eval()`)의 모델에만 적용된다는 점에 유의하세요. + +다음 링크에 있는 결합기를 만들 것입니다. https://github.com/pytorch/pytorch/blob/orig/release/1.8/torch/fx/experimental/fuser.py """ - ###################################################################### -# First, let's get some imports out of the way (we will be using all -# of these later in the code). +# 몇 가지의 import 과정을 먼저 처리해줍시다(나중에 코드에서 모두 사용할 것입니다). from typing import Type, Dict, Any, Tuple, Iterable import copy @@ -29,10 +28,10 @@ import torch.nn as nn ###################################################################### -# For this tutorial, we are going to create a model consisting of convolutions -# and batch norms. Note that this model has some tricky components - some of -# the conv/batch norm patterns are hidden within Sequentials and one of the -# BatchNorms is wrapped in another Module. +# 이 튜토리얼에서는 합성곱과 배치 정규화로 구성된 모델을 만들 것입니다. +# 이 모델에는 아래와 같은 까다로운 요소가 있습니다. +# 합성곱/배치 정규화 패턴 중의 일부는 시퀀스에 숨겨져 있고 +# 배치 정규화 중 하나는 다른 모듈로 감싸져 있습니다. class WrappedBatchNorm(nn.Module): def __init__(self): @@ -66,42 +65,40 @@ def forward(self, x): model.eval() ###################################################################### -# Fusing Convolution with Batch Norm -# ----------------------------------------- -# One of the primary challenges with trying to automatically fuse convolution -# and batch norm in PyTorch is that PyTorch does not provide an easy way of -# accessing the computational graph. FX resolves this problem by symbolically -# tracing the actual operations called, so that we can track the computations -# through the `forward` call, nested within Sequential modules, or wrapped in -# an user-defined module. +# 합성곱과 배치 정규화 결합하기 +# ----------------------------- +# PyTorch에서 합성곱과 배치 정규화를 자동으로 결합하려고 할 때 가장 큰 어려움 중 하나는 +# PyTorch가 계산 그래프에 쉽게 접근할 수 있는 방법을 제공하지 않는다는 것입니다. +# FX는 호출된 실제 연산을 기호적(symbolically)으로 추적하여 이 문제를 해결하므로 순차적 모듈 내에 중첩되거나 +# 사용자 정의 모듈로 감싸진 `forward` 호출을 통해 계산을 추적할 수 있습니다. traced_model = torch.fx.symbolic_trace(model) print(traced_model.graph) ###################################################################### -# This gives us a graph representation of our model. Note that both the modules -# hidden within the sequential as well as the wrapped Module have been inlined -# into the graph. This is the default level of abstraction, but it can be -# configured by the pass writer. More information can be found at the FX -# overview https://pytorch.org/docs/master/fx.html#module-torch.fx +# 이렇게 하면 모델을 그래프로 나타낼 수 있습니다. +# 순차적 모듈 및 감싸진 모듈 내에 숨겨진 두 모듈이 모두 그래프에 삽입되어 있습니다. +# 이는 기본 추상화 수준이지만 전달 기록기(pass writer)에서 구성할 수 있습니다. +# 자세한 내용은 다음 링크의 FX 개요에서 확인할 수 있습니다. +# https://pytorch.org/docs/master/fx.html#module-torch.fx #################################### -# Fusing Convolution with Batch Norm -# ---------------------------------- -# Unlike some other fusions, fusion of convolution with batch norm does not -# require any new operators. Instead, as batch norm during inference -# consists of a pointwise add and multiply, these operations can be "baked" -# into the preceding convolution's weights. This allows us to remove the batch -# norm entirely from our model! Read -# https://nenadmarkus.com/p/fusing-batchnorm-and-conv/ for further details. The -# code here is copied from +# 합성곱과 배치 정규화 결합하기 +# --------------------------- +# 일부 다른 결합과 달리, 합성곱과 배치 정규화의 결합은 새로운 연산자를 필요로 하지 않습니다. +# 대신, 추론 중 배치 정규화는 점별 덧셈과 곱셈으로 구성되므로, +# 이러한 연산은 이전 합성곱의 가중치로 "미리 계산되어 저장(baked)" 될 수 있습니다. +# 이를 통해 배치 정규화를 모델에서 완전히 제거할 수 있습니다! +# 자세한 내용은 다음 링크에서 확인 할 수 있습니다. +# https://nenadmarkus.com/p/fusing-batchnorm-and-conv/ +# 이 코드는 명확성을 위해 다음 링크에서 복사한 것입니다. # https://github.com/pytorch/pytorch/blob/orig/release/1.8/torch/nn/utils/fusion.py -# clarity purposes. + def fuse_conv_bn_eval(conv, bn): """ - Given a conv Module `A` and an batch_norm module `B`, returns a conv - module `C` such that C(x) == B(A(x)) in inference mode. + 합성곱 모듈 'A'와 배치 정규화 모듈 'B'가 주어지면 + C(x) == B(A(x))를 만족하는 합성곱 모듈 'C'를 추론 모드로 반환합니다. """ assert(not (conv.training or bn.training)), "Fusion only for eval!" fused_conv = copy.deepcopy(conv) @@ -128,17 +125,15 @@ def fuse_conv_bn_weights(conv_w, conv_b, bn_rm, bn_rv, bn_eps, bn_w, bn_b): #################################### -# FX Fusion Pass -# ---------------------------------- -# Now that we have our computational graph as well as a method for fusing -# convolution and batch norm, all that remains is to iterate over the FX graph -# and apply the desired fusions. - +# FX 결합 전달(pass) +# -------------- +# 이제 합성곱과 배치 정규화를 결합하는 방법뿐만 아니라 계산 그래프도 얻었으므로 +# 남은 것은 FX 그래프에 절차를 반복하고 원하는 결합을 적용하는 것입니다. def _parent_name(target : str) -> Tuple[str, str]: """ - Splits a qualname into parent path and last atom. - For example, `foo.bar.baz` -> (`foo.bar`, `baz`) + 정규화 된 이름(qualname)을 부모경로(parent path)와 마지막 요소(last atom)로 나눠줍니다. + 예를 들어, `foo.bar.baz` -> (`foo.bar`, `baz`) """ *parent, name = target.rsplit('.', 1) return parent[0] if parent else '', name @@ -151,62 +146,57 @@ def replace_node_module(node: fx.Node, modules: Dict[str, Any], new_module: torc def fuse(model: torch.nn.Module) -> torch.nn.Module: model = copy.deepcopy(model) - # The first step of most FX passes is to symbolically trace our model to - # obtain a `GraphModule`. This is a representation of our original model - # that is functionally identical to our original model, except that we now - # also have a graph representation of our forward pass. + # 대부분의 FX 전달의 첫 번째 단계는 `GraphModule` 을 얻기 위해 + # 모델을 기호적으로 추적하는 것입니다. + # 이것은 원래 모델과 기능적으로 동일한 원래 모델의 표현입니다. + # 단, 이제는 순전파 단계(forward pass)에 대한 그래프 표현도 가지고 있습니다. fx_model: fx.GraphModule = fx.symbolic_trace(model) modules = dict(fx_model.named_modules()) - # The primary representation for working with FX are the `Graph` and the - # `Node`. Each `GraphModule` has a `Graph` associated with it - this - # `Graph` is also what generates `GraphModule.code`. - # The `Graph` itself is represented as a list of `Node` objects. Thus, to - # iterate through all of the operations in our graph, we iterate over each - # `Node` in our `Graph`. + # FX 작업을 위한 기본 표현은 `그래프(Graph)` 와 `노드(Node)` 입니다. + # 각 `GraphModule` 에는 연관된 `그래프` 가 있습니다. + # 이 `그래프` 는 `GraphModule.code` 를 생성하는 것이기도 합니다. + # `그래프` 자체는 `노드` 객체의 목록으로 표시됩니다. + # 따라서 그래프의 모든 작업을 반복하기 위해 `그래프` 에서 각 `노드` 에 대해 반복합니다. for node in fx_model.graph.nodes: - # The FX IR contains several types of nodes, which generally represent - # call sites to modules, functions, or methods. The type of node is - # determined by `Node.op`. - if node.op != 'call_module': # If our current node isn't calling a Module then we can ignore it. + # FX IR 에는 일반적으로 모듈, 함수 또는 메소드에 대한 + # 호출 사이트를 나타내는 여러 유형의 노드가 있습니다. + # 노드의 유형은 `Node.op` 에 의해 결정됩니다. + if node.op != 'call_module': # 현재 노드가 모듈을 호출하지 않으면 무시할 수 있습니다. continue - # For call sites, `Node.target` represents the module/function/method - # that's being called. Here, we check `Node.target` to see if it's a - # batch norm module, and then check `Node.args[0].target` to see if the - # input `Node` is a convolution. + # 호출 사이트의 경우, `Node.target` 은 호출되는 모듈/함수/방법을 나타냅니다. + # 여기서는 'Node.target' 을 확인하여 배치 정규화 모듈인지 확인한 다음 + # `Node.args[0].target` 을 확인하여 입력 `노드` 가 합성곱인지 확인합니다. if type(modules[node.target]) is nn.BatchNorm2d and type(modules[node.args[0].target]) is nn.Conv2d: - if len(node.args[0].users) > 1: # Output of conv is used by other nodes + if len(node.args[0].users) > 1: # 합성곱 출력은 다른 노드에서 사용됩니다. continue conv = modules[node.args[0].target] bn = modules[node.target] fused_conv = fuse_conv_bn_eval(conv, bn) replace_node_module(node.args[0], modules, fused_conv) - # As we've folded the batch nor into the conv, we need to replace all uses - # of the batch norm with the conv. + # 배치 정규화를 합성곱으로 결합했기 때문에 + # 배치 정규화의 사용을 모두 합성곱으로 교체해야 합니다. node.replace_all_uses_with(node.args[0]) - # Now that all uses of the batch norm have been replaced, we can - # safely remove the batch norm. + # 배치 정규화 사용을 모두 교체했으므로 + # 안전하게 배치 정규화를 제거할 수 있습니다. fx_model.graph.erase_node(node) fx_model.graph.lint() - # After we've modified our graph, we need to recompile our graph in order - # to keep the generated code in sync. + # 그래프를 수정한 후에는 생성된 코드를 동기화하기 위해 그래프를 다시 컴파일해야 합니다. fx_model.recompile() return fx_model ###################################################################### # .. note:: -# We make some simplifications here for demonstration purposes, such as only -# matching 2D convolutions. View +# 여기서는 2D 합성곱만 일치시키는 등 시연 목적으로 약간의 단순화를 하였습니다. +# 더 유용한 전달은 다음 링크를 참조하십시오. # https://github.com/pytorch/pytorch/blob/master/torch/fx/experimental/fuser.py -# for a more usable pass. ###################################################################### -# Testing out our Fusion Pass -# ----------------------------------------- -# We can now run this fusion pass on our initial toy model and verify that our -# results are identical. In addition, we can print out the code for our fused -# model and verify that there are no more batch norms. +# 결합 전달(Fusion pass) 실험하기 +# -------------------------------- +# 이제 아주 작은 초기 모델에 대해 이 결합 전달을 실행해 결과가 동일한지 확인할 수 있습니다. +# 또한 결합 모델의 코드를 출력하여 더 이상 배치 정규화가 없는지 확인할 수 있습니다. fused_model = fuse(model) @@ -216,10 +206,10 @@ def fuse(model: torch.nn.Module) -> torch.nn.Module: ###################################################################### -# Benchmarking our Fusion on ResNet18 -# ---------- -# We can test our fusion pass on a larger model like ResNet18 and see how much -# this pass improves inference performance. +# ResNet18에서 결합 벤치마킹하기 +# ------------------------------ +# 이제 ResNet18과 같은 대형 모델에서 결합 전달을 실험하고 +# 이 전달이 추론 성능을 얼마나 향상시키는지 확인할 수 있습니다. import torchvision.models as models import time @@ -241,22 +231,20 @@ def benchmark(model, iters=20): print("Unfused time: ", benchmark(rn18)) print("Fused time: ", benchmark(fused_rn18)) ###################################################################### -# As we previously saw, the output of our FX transformation is -# (Torchscriptable) PyTorch code, we can easily `jit.script` the output to try -# and increase our performance even more. In this way, our FX model -# transformation composes with Torchscript with no issues. +# 앞서 살펴본 바와 같이, FX 변환의 출력은 (Torchscriptable) PyTorch 코드입니다. +# 따라서 `jit.script` 를 통해 쉽게 출력하여 성능을 더 높일 수 있습니다. +# 이러한 방식으로 FX 모델 변환은 Torchscript와 아무런 문제 없이 구성됩니다. + jit_rn18 = torch.jit.script(fused_rn18) print("jit time: ", benchmark(jit_rn18)) -############ -# Conclusion -# ---------- -# As we can see, using FX we can easily write static graph transformations on -# PyTorch code. +###### +# 결론 +# --- +# FX를 사용하면 PyTorch 코드에 정적 그래프 변환을 쉽게 작성할 수 있습니다. # -# Since FX is still in beta, we would be happy to hear any -# feedback you have about using it. Please feel free to use the -# PyTorch Forums (https://discuss.pytorch.org/) and the issue tracker -# (https://github.com/pytorch/pytorch/issues) to provide any feedback -# you might have. +# FX는 아직 베타 버전이기 때문에 FX 사용에 대한 피드백을 보내주시면 감사하겠습니다. +# PyTorch 포럼 (https://discuss.pytorch.org/) +# 이슈 추적기 (https://github.com/pytorch/pytorch/issues) +# 위 두 링크를 사용하여 피드백을 제공해주시면 됩니다. diff --git a/docs/advanced/ONNXLive.html b/docs/advanced/ONNXLive.html index 54305e13c..a6430de58 100644 --- a/docs/advanced/ONNXLive.html +++ b/docs/advanced/ONNXLive.html @@ -205,7 +205,7 @@

Code Transforms with FX

프론트엔드 API

@@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

Mobile

diff --git a/docs/advanced/cpp_autograd.html b/docs/advanced/cpp_autograd.html index b29fe334b..d707977b9 100644 --- a/docs/advanced/cpp_autograd.html +++ b/docs/advanced/cpp_autograd.html @@ -207,7 +207,7 @@

Code Transforms with FX

프론트엔드 API

@@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

Mobile

diff --git a/docs/advanced/cpp_export.html b/docs/advanced/cpp_export.html index 6368b304f..03a78d8ee 100644 --- a/docs/advanced/cpp_export.html +++ b/docs/advanced/cpp_export.html @@ -207,7 +207,7 @@

Code Transforms with FX

프론트엔드 API

@@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

Mobile

diff --git a/docs/advanced/cpp_extension.html b/docs/advanced/cpp_extension.html index a66a89c02..b3aa1c6a2 100644 --- a/docs/advanced/cpp_extension.html +++ b/docs/advanced/cpp_extension.html @@ -207,7 +207,7 @@

Code Transforms with FX

프론트엔드 API

@@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

Mobile

diff --git a/docs/advanced/cpp_frontend.html b/docs/advanced/cpp_frontend.html index 6ccc0de68..1bd304f7c 100644 --- a/docs/advanced/cpp_frontend.html +++ b/docs/advanced/cpp_frontend.html @@ -207,7 +207,7 @@

Code Transforms with FX

프론트엔드 API

@@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

Mobile

diff --git a/docs/advanced/ddp_pipeline.html b/docs/advanced/ddp_pipeline.html index 5f428b604..4e8258091 100644 --- a/docs/advanced/ddp_pipeline.html +++ b/docs/advanced/ddp_pipeline.html @@ -34,7 +34,7 @@ - + @@ -207,7 +207,7 @@

Code Transforms with FX

프론트엔드 API

@@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

Mobile

@@ -837,7 +837,7 @@

Output - + diff --git a/docs/advanced/dispatcher.html b/docs/advanced/dispatcher.html index 2c8080920..8142a8958 100644 --- a/docs/advanced/dispatcher.html +++ b/docs/advanced/dispatcher.html @@ -35,7 +35,7 @@ - + @@ -207,7 +207,7 @@

Code Transforms with FX

프론트엔드 API

@@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

Mobile

@@ -711,7 +711,7 @@

Tracer - + diff --git a/docs/advanced/dynamic_quantization_tutorial.html b/docs/advanced/dynamic_quantization_tutorial.html index 4a0072e9f..ac2320338 100644 --- a/docs/advanced/dynamic_quantization_tutorial.html +++ b/docs/advanced/dynamic_quantization_tutorial.html @@ -207,7 +207,7 @@

Code Transforms with FX

프론트엔드 API

@@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

Mobile

diff --git a/docs/advanced/extend_dispatcher.html b/docs/advanced/extend_dispatcher.html index 86a04eba4..1d2cc1d0a 100644 --- a/docs/advanced/extend_dispatcher.html +++ b/docs/advanced/extend_dispatcher.html @@ -207,7 +207,7 @@

Code Transforms with FX

프론트엔드 API

@@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

Mobile

diff --git a/docs/advanced/neural_style_tutorial.html b/docs/advanced/neural_style_tutorial.html index 13b907c6d..d3cf6c983 100644 --- a/docs/advanced/neural_style_tutorial.html +++ b/docs/advanced/neural_style_tutorial.html @@ -205,7 +205,7 @@

Code Transforms with FX

프론트엔드 API

@@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

Mobile

diff --git a/docs/advanced/numpy_extensions_tutorial.html b/docs/advanced/numpy_extensions_tutorial.html index a828875d9..aed88a096 100644 --- a/docs/advanced/numpy_extensions_tutorial.html +++ b/docs/advanced/numpy_extensions_tutorial.html @@ -205,7 +205,7 @@

Code Transforms with FX

프론트엔드 API

@@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

Mobile

diff --git a/docs/advanced/rpc_ddp_tutorial.html b/docs/advanced/rpc_ddp_tutorial.html index 1b83d881b..8c76ad235 100644 --- a/docs/advanced/rpc_ddp_tutorial.html +++ b/docs/advanced/rpc_ddp_tutorial.html @@ -207,7 +207,7 @@

Code Transforms with FX

프론트엔드 API

@@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

Mobile

diff --git a/docs/advanced/sg_execution_times.html b/docs/advanced/sg_execution_times.html index 1b762daaf..fa1cdf270 100644 --- a/docs/advanced/sg_execution_times.html +++ b/docs/advanced/sg_execution_times.html @@ -205,7 +205,7 @@

Code Transforms with FX

프론트엔드 API

@@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

Mobile

diff --git a/docs/advanced/static_quantization_tutorial.html b/docs/advanced/static_quantization_tutorial.html index 095ab5724..d6350549d 100644 --- a/docs/advanced/static_quantization_tutorial.html +++ b/docs/advanced/static_quantization_tutorial.html @@ -207,7 +207,7 @@

Code Transforms with FX

프론트엔드 API

@@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

Mobile

diff --git a/docs/advanced/super_resolution_with_onnxruntime.html b/docs/advanced/super_resolution_with_onnxruntime.html index 08fd65803..2a05affde 100644 --- a/docs/advanced/super_resolution_with_onnxruntime.html +++ b/docs/advanced/super_resolution_with_onnxruntime.html @@ -34,7 +34,7 @@ - + @@ -207,7 +207,7 @@

Code Transforms with FX

프론트엔드 API

@@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

Mobile

@@ -603,7 +603,7 @@

ONNX 런타임에서 이미지를 입력값으로 모델을 실행하기 - + diff --git a/docs/advanced/torch-script-parallelism.html b/docs/advanced/torch-script-parallelism.html index 2cb0760b9..1cdde78d5 100644 --- a/docs/advanced/torch-script-parallelism.html +++ b/docs/advanced/torch-script-parallelism.html @@ -207,7 +207,7 @@

Code Transforms with FX

프론트엔드 API

@@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

Mobile

diff --git a/docs/advanced/torch_script_custom_classes.html b/docs/advanced/torch_script_custom_classes.html index b24473012..9945a23e9 100644 --- a/docs/advanced/torch_script_custom_classes.html +++ b/docs/advanced/torch_script_custom_classes.html @@ -9,7 +9,7 @@ - Extending TorchScript with Custom C++ Classes — PyTorch Tutorials 1.9.0+cu102 documentation + 커스텀 C++ 클래스로 TorchScript 확장하기 — PyTorch Tutorials 1.9.0+cu102 documentation @@ -207,7 +207,7 @@

Code Transforms with FX

프론트엔드 API

@@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

Mobile

@@ -295,7 +295,7 @@ -
  • Extending TorchScript with Custom C++ Classes
  • +
  • 커스텀 C++ 클래스로 TorchScript 확장하기
  • @@ -349,18 +349,15 @@
    -
    -

    Extending TorchScript with Custom C++ Classes

    -

    This tutorial is a follow-on to the -custom operator -tutorial, and introduces the API we’ve built for binding C++ classes into TorchScript -and Python simultaneously. The API is very similar to -pybind11, and most of the concepts will transfer -over if you’re familiar with that system.

    -
    -

    Implementing and Binding the Class in C++

    -

    For this tutorial, we are going to define a simple C++ class that maintains persistent -state in a member variable.

    +
    +

    커스텀 C++ 클래스로 TorchScript 확장하기

    +

    이 튜토리얼은 커스텀 오퍼레이터 튜토리얼의 후속이며 +C++ 클래스를 TorchScript와 Python에 동시에 바인딩하기 위해 구축한 API를 소개합니다. +API는 pybind11 과 +매우 유사하며 해당 시스템에 익숙하다면 대부분의 개념이 이전됩니다.

    +
    +

    C++에서 클래스 구현 및 바인딩

    +

    이 튜토리얼에서는 멤버 변수에서 지속 상태를 유지하는 간단한 C++ 클래스를 정의할 것입니다.

    // This header is all you need to do the C++ portions of this
     // tutorial
     #include <torch/script.h>
    @@ -399,23 +396,19 @@ 

    Implementing and Binding the Class in C++};

    -

    There are several things to note:

    +

    몇 가지 주의할 사항이 있습니다:

      -
    • torch/custom_class.h is the header you need to include to extend TorchScript -with your custom class.
    • -
    • Notice that whenever we are working with instances of the custom -class, we do it via instances of c10::intrusive_ptr<>. Think of intrusive_ptr -as a smart pointer like std::shared_ptr, but the reference count is stored -directly in the object, as opposed to a separate metadata block (as is done in -std::shared_ptr. torch::Tensor internally uses the same pointer type; -and custom classes have to also use this pointer type so that we can -consistently manage different object types.
    • -
    • The second thing to notice is that the user-defined class must inherit from -torch::CustomClassHolder. This ensures that the custom class has space to -store the reference count.
    • +
    • torch/custom_class.h 는 커스텀 클래스로 TorchScript를 확장하기 위해 포함해야하는 헤더입니다.
    • +
    • 커스텀 클래스의 인스턴스로 작업할 때마다 c10::intrusive_ptr<> 의 인스턴스를 통해 작업을 수행합니다. +intrusive_ptrstd::shared_ptr 과 같은 스마트 포인터로 생각하세요. 그러나 참조 계수는 +std::shared_ptr 같이 별도의 메타데이터 블록과 달리 객체에 직접 저장됩니다. +torch::Tensor 는 내부적으로 동일한 포인터 유형을 사용합니다. +커스텀 클래스도 torch::Tensor 포인터 유형을 사용해야 다양한 객체 유형을 일관되게 관리할 수 있습니다.
    • +
    • 두 번째로 주목해야 할 점은 커스텀 클래스가 torch::CustomClassHolder 에서 상속되어야 한다는 것입니다. +이렇게 하면 커스텀 클래스에 참조 계수를 저장할 공간이 있습니다.
    -

    Now let’s take a look at how we will make this class visible to TorchScript, a process called -binding the class:

    +

    이제 이 클래스를 어떻게 TorchScript에서 사용가능하게 하는지 살펴보겠습니다. +이런 과정은 클래스를 바인딩 한다고 합니다:

    // Notice a few things:
     // - We pass the class to be registered as a template parameter to
     //   `torch::class_`. In this instance, we've passed the
    @@ -459,12 +452,12 @@ 

    Implementing and Binding the Class in C++ -

    Building the Example as a C++ Project With CMake

    -

    Now, we’re going to build the above C++ code with the CMake build system. First, take all the C++ code -we’ve covered so far and place it in a file called class.cpp. -Then, write a simple CMakeLists.txt file and place it in the -same directory. Here is what CMakeLists.txt should look like:

    +
    +

    CMake를 사용하여 C++ 프로젝트로 예제 빌드

    +

    이제 CMake 빌드 시스템을 사용하여 위의 C++ 코드를 빌드합니다. +먼저, 지금까지 다룬 모든 C++ code를 class.cpp 라는 파일에 넣습니다. +그런 다음 간단한 CMakeLists.txt 파일을 작성하여 동일한 디렉토리에 배치합니다. +CMakeLists.txt 는 다음과 같아야 합니다:

    cmake_minimum_required(VERSION 3.1 FATAL_ERROR)
     project(custom_class)
     
    @@ -477,16 +470,15 @@ 

    Building the Example as a C++ Project With CMaketarget_link_libraries(custom_class "${TORCH_LIBRARIES}")

    -

    Also, create a build directory. Your file tree should look like this:

    +

    또한 build 디렉토리를 만듭니다. 파일 트리는 다음과 같아야 합니다:

    custom_class_project/
       class.cpp
       CMakeLists.txt
       build/
     
    -

    We assume you’ve setup your environment in the same way as described in -the previous tutorial. -Go ahead and invoke cmake and then make to build the project:

    +

    이전 튜토리얼 에서 설명한 것과 동일한 방식으로 환경을 설정했다고 가정합니다. +계속해서 cmake를 호출한 다음 make를 호출하여 프로젝트를 빌드합니다:

    $ cd build
     $ cmake -DCMAKE_PREFIX_PATH="$(python -c 'import torch.utils; print(torch.utils.cmake_prefix_path)')" ..
       -- The C compiler identification is GNU 7.3.1
    @@ -523,9 +515,9 @@ 

    Building the Example as a C++ Project With CMake[100%] Built target custom_class

    -

    What you’ll find is there is now (among other things) a dynamic library -file present in the build directory. On Linux, this is probably named -libcustom_class.so. So the file tree should look like:

    +

    이제 무엇보다도 빌드 디렉토리에 동적 라이브러리 파일이 있다는 것을 알게 될 것입니다. +리눅스에서는 아마도 libcustom_class.so 로 이름이 지정될 것입니다. +따라서 파일 트리는 다음과 같아야 합니다:

    custom_class_project/
       class.cpp
       CMakeLists.txt
    @@ -534,11 +526,10 @@ 

    Building the Example as a C++ Project With CMake -

    Using the C++ Class from Python and TorchScript

    -

    Now that we have our class and its registration compiled into an .so file, -we can load that .so into Python and try it out. Here’s a script that -demonstrates that:

    +
    +

    Python 및 TorchScript의 C++ 클래스 사용

    +

    이제 클래스와 등록이 .so 파일로 컴파일되었으므로 해당 .so 를 Python에 읽어들이고 사용해 볼 수 있습니다. +다음은 이를 보여주는 스크립트입니다:

    import torch
     
     # `torch.classes.load_library()` allows you to pass the path to your .so file
    @@ -590,11 +581,10 @@ 

    Using the C++ Class from Python and TorchScript -

    Saving, Loading, and Running TorchScript Code Using Custom Classes

    -

    We can also use custom-registered C++ classes in a C++ process using -libtorch. As an example, let’s define a simple nn.Module that -instantiates and calls a method on our MyStackClass class:

    +
    +

    커스텀 클래스를 사용하여 TorchScript 코드 저장, 읽기 및 실행

    +

    libtorch를 사용하여 C++ 프로세스에서 커스텀 등록 C++ 클래스를 사용할 수도 있습니다. +예를 들어 MyStackClass 클래스에서 메소드를 인스턴스화하고 호출하는 간단한 nn.Module 을 정의해 보겠습니다:

    import torch
     
     torch.classes.load_library('build/libcustom_class.so')
    @@ -615,12 +605,11 @@ 

    Saving, Loading, and Running TorchScript Code Using Custom Classesscripted_foo.save('foo.pt')

    -

    foo.pt in our filesystem now contains the serialized TorchScript -program we’ve just defined.

    -

    Now, we’re going to define a new CMake project to show how you can load -this model and its required .so file. For a full treatment of how to do this, -please have a look at the Loading a TorchScript Model in C++ Tutorial.

    -

    Similarly to before, let’s create a file structure containing the following:

    +

    파일 시스템의 foo.pt는 방금 정의한 직렬화된 TorchScript 프로그램을 포함합니다.

    +

    이제 이 모델과 필요한 .so 파일을 읽어들이는 방법을 보여주기 위해 새 CMake 프로젝트를 정의하겠습니다. +이 작업을 수행하는 방법에 대한 자세한 내용은 C++에서 TorchScript 모델 로딩하기 를 +참조하세요.

    +

    이전과 유사하게 다음을 포함하는 파일 구조를 생성해 보겠습니다:

    cpp_inference_example/
       infer.cpp
       CMakeLists.txt
    @@ -632,11 +621,9 @@ 

    Saving, Loading, and Running TorchScript Code Using Custom Classesbuild/

    -

    Notice we’ve copied over the serialized foo.pt file, as well as the source -tree from the custom_class_project above. We will be adding the -custom_class_project as a dependency to this C++ project so that we can -build the custom class into the binary.

    -

    Let’s populate infer.cpp with the following:

    +

    직렬화된 foo.pt 파일과 위의 custom_class_project 소스 트리를 복사했음을 주목하세요. +커스텀 클래스를 바이너리로 빌드할 수 있도록 custom_class_project 를 이 C++ 프로젝트에 의존성으로 추가할 것입니다.

    +

    infer.cpp 를 다음으로 채우겠습니다:

    -

    And similarly let’s define our CMakeLists.txt file:

    +

    마찬가지로 CMakeLists.txt 파일을 정의해 보겠습니다:

    -

    You know the drill: cd build, cmake, and make:

    +

    cd build, cmake, 및 make 에 대한 사용 방법을 알고 있습니다:

    -

    And now we can run our exciting C++ binary:

    +

    이제 흥미로운 C++ 바이너리를 실행할 수 있습니다:

    $ ./infer
       momfoobarbaz
     
    -

    Incredible!

    +

    대단합니다!

    -
    -

    Moving Custom Classes To/From IValues

    -

    It’s also possible that you may need to move custom classes into or out of -IValue``s, such as when you take or return ``IValue``s from TorchScript methods -or you want to instantiate a custom class attribute in C++. For creating an -``IValue from a custom C++ class instance:

    +
    +

    커스텀 클래스를 IValues로/에서 이동

    +

    TorchScript 메소드에서 IValue 를 가져오거나 반환하기, 또는 C++에서 커스텀 클래스 속성을 인스턴스화하려는 +경우와 같이 커스텀 클래스를 IValue 안팎으로 이동해야 할 수도 있습니다. +커스텀 C++ 클래스 인스턴스에서 IValue 를 생성하려면:

      -
    • torch::make_custom_class<T>() provides an API similar to c10::intrusive_ptr<T> -in that it will take whatever set of arguments you provide to it, call the constructor -of T that matches that set of arguments, and wrap that instance up and return it. -However, instead of returning just a pointer to a custom class object, it returns -an IValue wrapping the object. You can then pass this IValue directly to -TorchScript.
    • -
    • In the event that you already have an intrusive_ptr pointing to your class, you -can directly construct an IValue from it using the constructor IValue(intrusive_ptr<T>).
    • +
    • torch::make_custom_class<T>() 는 제공하는 인수 집합을 사용하고 해당 인수 집합과 일치하는 +T의 생성자를 호출하며 해당 인스턴스를 래핑하고 반환하는 c10::intrusive_ptr<T>와 유사한 API를 제공합니다. +그러나 커스텀 클래스 객체에 대한 포인터만 반환하는 대신 객체를 래핑하는 IValue 를 반환합니다. +그런 다음 이 IValue 를 TorchScript에 직접 전달할 수 있습니다.
    • +
    • 이미 클래스를 가리키는 intrusive_ptr 이 있는 경우 생성자 IValue(intrusive_ptr<T>) 를 사용하여 +해당 클래스에서 IValue를 직접 생성할 수 있습니다.
    -

    For converting IValue back to custom classes:

    +

    IValue 를 커스텀 클래스로 다시 변환하려면:

      -
    • IValue::toCustomClass<T>() will return an intrusive_ptr<T> pointing to the -custom class that the IValue contains. Internally, this function is checking -that T is registered as a custom class and that the IValue does in fact contain -a custom class. You can check whether the IValue contains a custom class manually by -calling isCustomClass().
    • +
    • IValue::toCustomClass<T>()IValue 에 포함된 커스텀 클래스를 가리키는 intrusive_ptr<T> 를 +반환합니다. 내부적으로 이 함수는 T 가 커스텀 클래스로 등록되어 있고 IValue 에 실제로 커스텀 클래스가 +포함되어 있는지 확인합니다. isCustomClass() 를 호출하여 IValue 에 커스텀 클래스가 포함되어 있는지 +수동으로 확인할 수 있습니다.
    -
    -

    Defining Serialization/Deserialization Methods for Custom C++ Classes

    -

    If you try to save a ScriptModule with a custom-bound C++ class as -an attribute, you’ll get the following error:

    +
    +

    커스텀 C++ 클래스에 대한 직렬화/역직렬화 방법 정의

    +

    커스텀 바인딩 된 C++ 클래스를 속성으로 사용하여 ScriptModule 을 저장하려고 하면 +다음 오류가 발생합니다:

    # export_attr.py
     import torch
     
    @@ -781,19 +764,16 @@ 

    Defining Serialization/Deserialization Methods for Custom C++ Classesfor this class. (pushIValueImpl at ../torch/csrc/jit/pickler.cpp:128)

    -

    This is because TorchScript cannot automatically figure out what information -save from your C++ class. You must specify that manually. The way to do that -is to define __getstate__ and __setstate__ methods on the class using -the special def_pickle method on class_.

    +

    TorchScript가 C++ 클래스에서 저장한 정보를 자동으로 파악할 수 없기 때문입니다. +수동으로 지정해야 합니다. 그렇게 하는 방법은 class_ 에서 특별한 def_pickle 메소드를 +사용하여 클래스에서 __getstate____setstate__ 메소드를 정의하는 것입니다.

    Note

    -

    The semantics of __getstate__ and __setstate__ in TorchScript are -equivalent to that of the Python pickle module. You can -read more -about how we use these methods.

    +

    TorchScript에서 __getstate____setstate__ 의 의미는 Python pickle 모듈의 의미와 동일합니다. +이러한 방법을 어떻게 사용하는지에 대하여 자세한 내용 을 +참조하세요.

    -

    Here is an example of the def_pickle call we can add to the registration of -MyStackClass to include serialization methods:

    +

    다음은 직렬화 메소드를 포함하기 위해 MyStackClass 등록에 추가할 수 있는 def_pickle 호출의 예시입니다:

    -
    -

    Defining Custom Operators that Take or Return Bound C++ Classes

    -

    Once you’ve defined a custom C++ class, you can also use that class -as an argument or return from a custom operator (i.e. free functions). Suppose -you have the following free function:

    +
    +

    바인딩된 C++ 클래스를 사용하거나 반환하는 커스텀 연산자 정의

    +

    커스텀 C++ 클래스를 정의한 후에는 해당 클래스를 인수로 사용하거나 커스텀 연산자(예를 들어 free 함수)에서 +반환할 수도 있습니다. 다음과 같은 free 함수가 있다고 가정합니다:

    c10::intrusive_ptr<MyStackClass<std::string>> manipulate_instance(const c10::intrusive_ptr<MyStackClass<std::string>>& instance) {
       instance->pop();
       return instance;
     }
     
    -

    You can register it running the following code inside your TORCH_LIBRARY -block:

    +

    TORCH_LIBRARY 블록 내에서 다음 코드를 실행하여 등록할 수 있습니다:

        m.def(
           "foo::manipulate_instance(__torch__.torch.classes.my_classes.MyStackClass x) -> __torch__.torch.classes.my_classes.MyStackClass Y",
           manipulate_instance
         );
     
    -

    Refer to the custom op tutorial -for more details on the registration API.

    -

    Once this is done, you can use the op like the following example:

    +

    등록 API에 대한 자세한 내용은 커스텀 C++ 연산자로 TorchScript 확장 을 +참조하세요.

    +

    이 작업이 완료되면 다음 예제와 같이 연산자를 사용할 수 있습니다:

    -
    -

    Conclusion

    -

    This tutorial walked you through how to expose a C++ class to TorchScript -(and by extension Python), how to register its methods, how to use that -class from Python and TorchScript, and how to save and load code using -the class and run that code in a standalone C++ process. You are now ready -to extend your TorchScript models with C++ classes that interface with -third party C++ libraries or implement any other use case that requires the -lines between Python, TorchScript and C++ to blend smoothly.

    -

    As always, if you run into any problems or have questions, you can use our -forum or GitHub issues to get in touch. Also, our -frequently asked questions (FAQ) page may have helpful information.

    +
    +

    결론

    +

    이 튜토리얼에서는 독립된 C++ 프로세스에서 C++ 클래스를 TorchScript 및 +확장 Python에 나타내는 방법, 해당 메소드를 등록하는 방법, Python 및 TorchScript에서 +해당 클래스를 사용하는 방법, 클래스를 사용하여 코드를 저장 및 읽어들이고 해당 코드를 실행하는 방법을 안내했습니다. +이제 타사 C++ 라이브러리와 인터페이스가 있는 C++ 클래스로 TorchScript 모델을 확장하거나, +Python, TorchScript 및 C++ 간의 라인이 원활하게 혼합되어야 하는 다른 사용 사례를 구현할 준비가 되었습니다.

    +

    언제나 처럼 문제를 마주치거나 질문이 있으면 저희 forum 또는 +GitHub issues 에 올려주시면 되겠습니다. +또한 자주 묻는 질문(FAQ) 페이지 에 +유용한 정보가 있을 수 있습니다.

    @@ -954,15 +928,15 @@

    Conclusion

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    @@ -1327,7 +1327,7 @@

    Building with Setuptools - + diff --git a/docs/beginner/Intro_to_TorchScript_tutorial.html b/docs/beginner/Intro_to_TorchScript_tutorial.html index f9e6285e6..2970ef9d8 100644 --- a/docs/beginner/Intro_to_TorchScript_tutorial.html +++ b/docs/beginner/Intro_to_TorchScript_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/audio_preprocessing_tutorial.html b/docs/beginner/audio_preprocessing_tutorial.html index 49149c3ec..5d8c7a10a 100644 --- a/docs/beginner/audio_preprocessing_tutorial.html +++ b/docs/beginner/audio_preprocessing_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/basics/autogradqs_tutorial.html b/docs/beginner/basics/autogradqs_tutorial.html index 5eb792055..e1751c92f 100644 --- a/docs/beginner/basics/autogradqs_tutorial.html +++ b/docs/beginner/basics/autogradqs_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/basics/buildmodel_tutorial.html b/docs/beginner/basics/buildmodel_tutorial.html index f9ed4d848..1d32729a6 100644 --- a/docs/beginner/basics/buildmodel_tutorial.html +++ b/docs/beginner/basics/buildmodel_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/basics/data_tutorial.html b/docs/beginner/basics/data_tutorial.html index 5eba09362..84172c3ab 100644 --- a/docs/beginner/basics/data_tutorial.html +++ b/docs/beginner/basics/data_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/basics/intro.html b/docs/beginner/basics/intro.html index eda0b531b..8f99cb853 100644 --- a/docs/beginner/basics/intro.html +++ b/docs/beginner/basics/intro.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/basics/optimization_tutorial.html b/docs/beginner/basics/optimization_tutorial.html index 1fc5a1dfd..67c7384be 100644 --- a/docs/beginner/basics/optimization_tutorial.html +++ b/docs/beginner/basics/optimization_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/basics/quickstart_tutorial.html b/docs/beginner/basics/quickstart_tutorial.html index 3cf6c483e..8a52ca309 100644 --- a/docs/beginner/basics/quickstart_tutorial.html +++ b/docs/beginner/basics/quickstart_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/basics/saveloadrun_tutorial.html b/docs/beginner/basics/saveloadrun_tutorial.html index 488df484d..8df46e21a 100644 --- a/docs/beginner/basics/saveloadrun_tutorial.html +++ b/docs/beginner/basics/saveloadrun_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/basics/sg_execution_times.html b/docs/beginner/basics/sg_execution_times.html index 01c4212ab..c53b60d6d 100644 --- a/docs/beginner/basics/sg_execution_times.html +++ b/docs/beginner/basics/sg_execution_times.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/basics/tensorqs_tutorial.html b/docs/beginner/basics/tensorqs_tutorial.html index c9e9cf803..cdbf82536 100644 --- a/docs/beginner/basics/tensorqs_tutorial.html +++ b/docs/beginner/basics/tensorqs_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/basics/transforms_tutorial.html b/docs/beginner/basics/transforms_tutorial.html index 5e4fd01d8..facdb80ba 100644 --- a/docs/beginner/basics/transforms_tutorial.html +++ b/docs/beginner/basics/transforms_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/blitz/autograd_tutorial.html b/docs/beginner/blitz/autograd_tutorial.html index e6dbdf681..4545f8e87 100644 --- a/docs/beginner/blitz/autograd_tutorial.html +++ b/docs/beginner/blitz/autograd_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/blitz/cifar10_tutorial.html b/docs/beginner/blitz/cifar10_tutorial.html index 08397704e..67c73b103 100644 --- a/docs/beginner/blitz/cifar10_tutorial.html +++ b/docs/beginner/blitz/cifar10_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/blitz/data_parallel_tutorial.html b/docs/beginner/blitz/data_parallel_tutorial.html index c001490f6..0d7f2a884 100644 --- a/docs/beginner/blitz/data_parallel_tutorial.html +++ b/docs/beginner/blitz/data_parallel_tutorial.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/blitz/neural_networks_tutorial.html b/docs/beginner/blitz/neural_networks_tutorial.html index c378c88ed..56f9eed95 100644 --- a/docs/beginner/blitz/neural_networks_tutorial.html +++ b/docs/beginner/blitz/neural_networks_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/blitz/sg_execution_times.html b/docs/beginner/blitz/sg_execution_times.html index f608740ea..37cd1096f 100644 --- a/docs/beginner/blitz/sg_execution_times.html +++ b/docs/beginner/blitz/sg_execution_times.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/blitz/tensor_tutorial.html b/docs/beginner/blitz/tensor_tutorial.html index 7f2a326bc..a1e7dfdaf 100644 --- a/docs/beginner/blitz/tensor_tutorial.html +++ b/docs/beginner/blitz/tensor_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/chatbot_tutorial.html b/docs/beginner/chatbot_tutorial.html index ef73493a6..8938342d2 100644 --- a/docs/beginner/chatbot_tutorial.html +++ b/docs/beginner/chatbot_tutorial.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/colab.html b/docs/beginner/colab.html index ac49d2882..7942647db 100644 --- a/docs/beginner/colab.html +++ b/docs/beginner/colab.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/data_loading_tutorial.html b/docs/beginner/data_loading_tutorial.html index 90acd0489..0fbd927cb 100644 --- a/docs/beginner/data_loading_tutorial.html +++ b/docs/beginner/data_loading_tutorial.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/dcgan_faces_tutorial.html b/docs/beginner/dcgan_faces_tutorial.html index 903725f02..24df60988 100644 --- a/docs/beginner/dcgan_faces_tutorial.html +++ b/docs/beginner/dcgan_faces_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/deep_learning_60min_blitz.html b/docs/beginner/deep_learning_60min_blitz.html index 6c02a4f93..f6a97678d 100644 --- a/docs/beginner/deep_learning_60min_blitz.html +++ b/docs/beginner/deep_learning_60min_blitz.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/deep_learning_nlp_tutorial.html b/docs/beginner/deep_learning_nlp_tutorial.html index 3e1deb254..9e7574cdf 100644 --- a/docs/beginner/deep_learning_nlp_tutorial.html +++ b/docs/beginner/deep_learning_nlp_tutorial.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/deeplabv3_on_android.html b/docs/beginner/deeplabv3_on_android.html index 00f6968d7..aaf5d3d6e 100644 --- a/docs/beginner/deeplabv3_on_android.html +++ b/docs/beginner/deeplabv3_on_android.html @@ -34,7 +34,7 @@ - + @@ -206,7 +206,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -220,7 +220,7 @@ @@ -240,7 +240,7 @@

    Mobile

    @@ -558,7 +558,7 @@

    더 알아보기 - +

    diff --git a/docs/beginner/deeplabv3_on_ios.html b/docs/beginner/deeplabv3_on_ios.html index 8756ba58c..49408358b 100644 --- a/docs/beginner/deeplabv3_on_ios.html +++ b/docs/beginner/deeplabv3_on_ios.html @@ -9,7 +9,7 @@ - Image Segmentation DeepLabV3 on iOS — PyTorch Tutorials 1.9.0+cu102 documentation + iOS에서의 이미지 분할 DeepLapV3 — PyTorch Tutorials 1.9.0+cu102 documentation @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    @@ -295,7 +295,7 @@

  • -
  • Image Segmentation DeepLabV3 on iOS
  • +
  • iOS에서의 이미지 분할 DeepLapV3
  • @@ -349,51 +349,55 @@
    -
    -

    Image Segmentation DeepLabV3 on iOS

    -

    Author: Jeff Tang

    -

    Reviewed by: Jeremiah Chung

    -
    -

    Introduction

    -

    Semantic image segmentation is a computer vision task that uses semantic labels to mark specific regions of an input image. The PyTorch semantic image segmentation DeepLabV3 model can be used to label image regions with 20 semantic classes including, for example, bicycle, bus, car, dog, and person. Image segmentation models can be very useful in applications such as autonomous driving and scene understanding.

    -

    In this tutorial, we will provide a step-by-step guide on how to prepare and run the PyTorch DeepLabV3 model on iOS, taking you from the beginning of having a model you may want to use on iOS to the end of having a complete iOS app using the model. We will also cover practical and general tips on how to check if your next favorite pre-trained PyTorch models can run on iOS, and how to avoid pitfalls.

    +
    +

    iOS에서의 이미지 분할 DeepLapV3

    +

    저자: Jeff Tang

    +

    감수: Jeremiah Chung +번역: 김현길

    +
    +

    소개

    +

    의미론적 이미지 분할(Semantic image segmentation)은 의미론적 라벨을 사용하여 입력 이미지의 특정 영역을 표시하는 컴퓨터 비전 작업입니다. +PyTorch의 의미론적 이미지 분할에 사용하는 DeepLabV3 모델20가지 의미론적 클래스 가 있습니다. 예를 들어 자전거, 버스, 차, 개, 사람과 같은 것들의 이미지 영역에 라벨을 달 수 있습니다. +의미론적 이미지 분할은 자율주행이나 장면 이해(scene understanding)같은 적용 분야에 매우 유용합니다.

    +

    이 튜토리얼에서는 iOS에서 PyTorch DeepLabV3 모델을 준비하고 실행하는 단계별 가이드를 제공합니다. 사용하고자 하는 모델을 준비하는 시작 단계에서부터 iOS 앱에서 모델을 사용하는 마지막 단계까지 모두 살펴봅니다. +또한 iOS에서 여러분이 선호하는 사전에 학습된(pre-trained) PyTorch 모델을 사용하는 방법과 여러 함정들을 피하는 실용적이며 보편적인 팁도 다룰 예정입니다.

    Note

    -

    Before going through this tutorial, you should check out PyTorch Mobile for iOS and give the PyTorch iOS HelloWorld example app a quick try. This tutorial will go beyond the image classification model, usually the first kind of model deployed on mobile. The complete code repo for this tutorial is available here.

    +

    이 튜토리얼을 진행하기 앞서 iOS를 위한 PyTorch 모바일 을 확인하고, PyTorch iOS 예제인 HelloWorld 앱을 실행해 보십시오. 이 튜토리얼은 대게 처음으로 모바일에 배포하는 모델인 이미지 분류 모델을 넘어선 다음 단계를 다루고 있습니다. 이 튜토리얼을 위한 전체 코드는 여기 에서 확인 가능합니다.

    -
    -

    Learning Objectives

    -

    In this tutorial, you will learn how to:

    +
    +

    학습 목표

    +

    이 튜토리얼에서 배울 것들:

      -
    1. Convert the DeepLabV3 model for iOS deployment.
    2. -
    3. Get the output of the model for the example input image in Python and compare it to the output from the iOS app.
    4. -
    5. Build a new iOS app or reuse an iOS example app to load the converted model.
    6. -
    7. Prepare the input into the format that the model expects and process the model output.
    8. -
    9. Complete the UI, refactor, build and run the app to see image segmentation in action.
    10. +
    11. DeepLabV3 모델을 iOS 배포용으로 변환하기
    12. +
    13. 파이썬에서 예제 이미지를 입력하여 모델의 결과값을 얻고 iOS 앱에서의 결과값과 비교하기
    14. +
    15. 새로운 iOS 앱을 만들거나 iOS 예제 앱에 변환된 모델을 가져와서 재사용하기
    16. +
    17. 모델이 원하는 형식에 맞는 입력값 준비하고 모델에서 결과값 처리하기
    18. +
    19. UI 완성, 리팩토링, 앱 빌드 및 실행해서 이미지 분할 동작 확인하기
    -
    -

    Pre-requisites

    +
    +

    요구사항

      -
    • PyTorch 1.6 or 1.7
    • -
    • torchvision 0.7 or 0.8
    • -
    • Xcode 11 or 12
    • +
    • PyTorch 1.6 이나 1.7
    • +
    • torchvision 0.7 이나 0.8
    • +
    • Xcode 11 이나 12
    -
    -

    Steps

    -
    -

    1. Convert the DeepLabV3 model for iOS deployment

    -

    The first step to deploying a model on iOS is to convert the model into the TorchScript format.

    +
    +

    단계

    +
    +

    1. DeepLabV3 모델을 iOS 배포용으로 변환하기

    +

    iOS에 모델을 배포하는 첫 단계는 모델을 TorchScript 형식으로 변환하는 것입니다.

    Note

    -

    Not all PyTorch models can be converted to TorchScript at this time because a model definition may use language features that are not in TorchScript, which is a subset of Python. See the Script and Optimize Recipe for more details.

    +

    현 시점에선 PyTorch 모델 중 TorchScript로 변환되지 않는 모델도 있습니다. 모델 정의에서 파이썬의 부분 집합인 TorchScript가 지원하지 않는 언어의 기능을 사용하고 있을 수 있기 때문입니다. 세부사항은 Script and Optimize Recipe 를 참고하세요.

    -

    Simply run the script below to generate the scripted model deeplabv3_scripted.pt:

    +

    스크립트된 모델 deeplabv3_scripted.pt 생성을 위해 아래 스크립트를 실행합니다:

    import torch
     
    -# use deeplabv3_resnet50 instead of deeplabv3_resnet101 to reduce the model size
    +# 모델 사이즈를 줄이기 위해 resnet101 대신 deeplabv3_resnet50 사용
     model = torch.hub.load('pytorch/vision:v0.8.0', 'deeplabv3_resnet50', pretrained=True)
     model.eval()
     
    @@ -401,11 +405,11 @@ 

    1. Convert the DeepLabV3 model for iOS deploymenttorch.jit.save(scriptedm, "deeplabv3_scripted.pt")

    -

    The size of the generated deeplabv3_scripted.pt model file should be around 168MB. Ideally, a model should also be quantized for significant size reduction and faster inference before being deployed on an iOS app. To have a general understanding of quantization, see the Quantization Recipe and the resource links there. We will cover in detail how to correctly apply a quantization workflow called Post Training Static Quantization to the DeepLabV3 model in a future tutorial or recipe.

    +

    생성한 deeplabv3_scripted.pt 모델 파일의 크기는 168MB 정도가 되어야 됩니다. 이상적으로는 모델을 iOS 앱에 배포하기 전에 크기 감소와 더 빠른 추론을 위해 양자화(Quantization)가 되어야 합니다. 양자화가 무엇인지 알고 싶다면 Quantization Recipe 와 이 안의 참고 링크들을 확인해 주십시오. DeepLabV3에서 어떻게 올바르게 양자화 작업 흐름, 속칭 학습 후(Post Training) Static Quantization 을 적용할 것인지 관련하여 세부사항은 앞으로의 튜토리얼이나 레시피에서 다룰 예정입니다.

    -
    -

    2. Get example input and output of the model in Python

    -

    Now that we have a scripted PyTorch model, let’s test with some example inputs to make sure the model works correctly on iOS. First, let’s write a Python script that uses the model to make inferences and examine inputs and outputs. For this example of the DeepLabV3 model, we can reuse the code in Step 1 and in the DeepLabV3 model hub site. Add the following code snippet to the code above:

    +
    +

    2. 파이썬에서 모델의 예제 입출력 얻기

    +

    이제 스크립트된 PyTorch 모델을 얻었으니, iOS에서 모델이 잘 동작하는지 예제를 입력해 테스트를 진행합시다. 첫번째로 모델을 이용해서 추론하고 입출력을 검토하는 파이썬 스크립트를 작성해 봅시다. DeepLabV3의 예시를 들기 위해 첫번째 단계의 코드와 DeepLabV3 model hub site 를 재사용합니다. 위의 코드에 아래의 코드 조각을 덧붙입니다:

    from PIL import Image
     from torchvision import transforms
     input_image = Image.open("deeplab.jpg")
    @@ -423,17 +427,18 @@ 

    2. Get example input and output of the model in Pythonprint(output.shape)

    -

    Download deeplab.jpg from here and run the script above to see the shapes of the input and output of the model:

    +

    여기 에서 deeplab.jpg 을 다운받고 위의 스크립트를 실행하면 모델 입출력의 shape를 확인할 수 있습니다:

    torch.Size([1, 3, 400, 400])
     torch.Size([21, 400, 400])
     
    -

    So if you provide the same image input deeplab.jpg of size 400x400 to the model on iOS, the output of the model should have the size [21, 400, 400]. You should also print out at least the beginning parts of the actual data of the input and output, to be used in Step 4 below to compare with the actual input and output of the model when running in the iOS app.

    +

    그래서 400x400 크기와 동일한 입력 이미지 deeplab.jpg 를 iOS의 모델에 입력하면, 모델 출력은 [21, 400, 400]의 크기를 가져야 합니다. 또한, 최소한 실제 입출력 데이터의 시작 부분만이라도 출력해서 확인을 해 봅시다. 아래의 단계 4에서는 iOS에서 앱을 실행하는데, 이 때 모델의 실제 입출력 값과 비교하기 위함입니다.

    -
    -

    3. Build a new iOS app or reuse an example app and load the model

    -

    First, follow Step 3 of the Model Preparation for iOS recipe to use our model in an Xcode project with PyTorch Mobile enabled. Because both the DeepLabV3 model used in this tutorial and the MobileNet v2 model used in the PyTorch HelloWorld iOS example are computer vision models, you may choose to start with the HelloWorld example repo as a template to reuse the code that loads the model and processes the input and output.

    -

    Now let’s add deeplabv3_scripted.pt and deeplab.jpg used in Step 2 to the Xcode project and modify ViewController.swift to resemble:

    +
    +

    3. 새로운 iOS 앱을 만들거나 예제 앱에 변환된 모델을 가져와서 재사용하기

    +

    첫번째로 모델을 Xcode 프로젝트에서 PyTorch Mobile과 함께 쓰기 위해 iOS 레시피를 위한 모델 준비 의 단계 3을 따라합니다. +이 튜토리얼의 DeepLabV3 모델과 PyTorch HelloWorld iOS 예제 내부의 MobileNet v2 모델 둘 다 컴퓨터 비전 모델이기에, HelloWorld 예제 저장소 를 모델을 읽어 들이고 입출력을 처리하는 본보기로 삼아 시작할 수도 있습니다.

    +

    이제 단계 2에서 사용한 deeplabv3_scripted.ptdeeplab.jpg 를 Xcode 프로젝트에 추가하고 ViewController.swift 를 이와 유사하게 수정합니다:

    class ViewController: UIViewController {
         var image = UIImage(named: "deeplab.jpg")!
     
    @@ -453,13 +458,13 @@ 

    3. Build a new iOS app or reuse an example app and load the model}

    -

    Then set a breakpoint at the line return module and build and run the app. The app should stop at the breakpoint, meaning that the scripted model in Step 1 has been successfully loaded on iOS.

    +

    그 후 return module 라인에 브레이크포인트를 설정하고 빌드 및 앱 실행을 합니다. 앱이 브레이크포인트에서 반드시 멈춘다면 iOS에서 단계 1의 스크립트된 모델을 성공적으로 읽어 들였다는 의미입니다.

    -
    -

    4. Process the model input and output for model inference

    -

    After the model loads in the previous step, let’s verify that it works with expected inputs and can generate expected outputs. As the model input for the DeepLabV3 model is an image, the same as that of the MobileNet v2 in the HelloWorld example, we will reuse some of the code in the TorchModule.mm file from HelloWorld for input processing. Replace the predictImage method implementation in TorchModule.mm with the following code:

    +
    +

    4. 모델 추론을 위한 입출력 처리하기

    +

    이전 단계에서 모델을 읽어들인 이후 입력값이 잘 동작하는지, 예상한대로 출력값을 생성하는지 확인해 봅시다. DeepLabV3 모델을 위한 입력은 HelloWorld 예제 내부의 MobileNet v2에서 쓰는 이미지와 동일합니다. 그래서 TorchModule.mm HelloWorld 프로젝트의 입력 처리를 위한 코드를 재사용 합니다. TorchModule.mm 안의 predictImage 메소드 구현을 아래와 같이 변경합니다:

    - (unsigned char*)predictImage:(void*)imageBuffer {
    -    // 1. the example deeplab.jpg size is size 400x400 and there are 21 semantic classes
    +    // 1. 예제 deeplab.jpg의 크기는 400x400 이며 21개의 의미론적 클래스가 있습니다
         const int WIDTH = 400;
         const int HEIGHT = 400;
         const int CLASSNUM = 21;
    @@ -468,7 +473,7 @@ 

    4. Process the model input and output for model inferencetorch::autograd::AutoGradMode guard(false); at::AutoNonVariableTypeMode non_var_type_mode(true); - // 2. convert the input tensor to an NSMutableArray for debugging + // 2. 디버깅을 위해 입력 텐서를 NSMutableArray로 변환합니다 float* floatInput = tensor.data_ptr<float>(); if (!floatInput) { return nil; @@ -478,11 +483,11 @@

    4. Process the model input and output for model inference[inputs addObject:@(floatInput[i])]; } - // 3. the output of the model is a dictionary of string and tensor, as - // specified at https://pytorch.org/hub/pytorch_vision_deeplabv3_resnet101 + // 3. 모델 출력은 문자열과 텐서의 딕셔너리이며, 자세한 설명은 + // https://pytorch.org/hub/pytorch_vision_deeplabv3_resnet101 에 있습니다 auto outputDict = _impl.forward({tensor}).toGenericDict(); - // 4. convert the output to another NSMutableArray for easy debugging + // 4. 쉬운 디버깅을 위해 출력을 다른 NSMutableArray로 변환합니다 auto outputTensor = outputDict.at("out").toTensor(); float* floatBuffer = outputTensor.data_ptr<float>(); if (!floatBuffer) { @@ -499,15 +504,15 @@

    4. Process the model input and output for model inference

    Note

    -

    The model output is a dictionary for the DeepLabV3 model so we use toGenericDict to correctly extract the result. For other models, the model output may also be a single tensor or a tuple of tensors, among other things.

    +

    모델의 출력은 DeepLabV3 모델을 위한 딕셔너리여서 toGenericDict 를 사용해서 적절하게 결과를 추출할 수 있습니다. 다른 모델은 모델 출력이 단일 텐서나 텐서 튜플 같은 것들이 될 수도 있습니다.

    -

    With the code changes shown above, you can set breakpoints after the two for loops that populate inputs and results and compare them with the model input and output data you saw in Step 2 to see if they match. For the same inputs to the models running on iOS and Python, you should get the same outputs.

    -

    All we have done so far is to confirm that the model of our interest can be scripted and run correctly in our iOS app as in Python. The steps we walked through so far for using a model in an iOS app consumes the bulk, if not most, of our app development time, similar to how data preprocessing is the heaviest lift for a typical machine learning project.

    +

    위의 코드 변경에서도 보았듯이, inputsresults 를 만드는 두 개의 for 반복문 뒤에 브레이크포인트를 설정하여 단계 2에서의 모델의 입출력과 맞아 떨어지는지 비교할 수도 있습니다. iOS와 파이썬에서 동작하는 모델에 동일한 입력값을 넣었으면 출력값도 동일해야 됩니다.

    +

    지금까지 했던 모든 것들은 파이썬에서처럼 iOS 앱에서도 우리의 흥미를 끄는 모델이 스크립팅되고 정상적으로 동작하는지 확인하는 것입니다. 일반적인 머신러닝 프로젝트에서 데이터 처리가 가장 힘든 부분인 것처럼, iOS 앱에서 모델을 사용하여 여기까지 밟아온 단계들이 앱 개발 기간 중 대부분은 아니지만 상당히 많은 시간을 차지합니다.

    -
    -

    5. Complete the UI, refactor, build and run the app

    -

    Now we are ready to complete the app and the UI to actually see the processed result as a new image. The output processing code should be like this, added to the end of the code snippet in Step 4 in TorchModule.mm - remember to first remove the line return nil; temporarily put there to make the code build and run:

    -
    // see the 20 semantic classes link in Introduction
    +
    +

    5. UI 완성, 리팩토링, 앱 빌드 및 실행

    +

    이제 새 이미지를 처리한 결과를 확인하기 위해 앱과 UI를 완성할 준비가 되었습니다. 결과 처리 코드는 아래와 같아야 되며, 단계 4에서의 TorchModule.mm 코드 끝부분에 추가되어야 합니다 - 먼저 return nil; 라인을 지우는걸 명심하세요. 코드를 빌드하고 실행하기 위해 임시로 넣은 것입니다:

    + -

    The implementation here is based on the understanding of the DeepLabV3 model which outputs a tensor of size [21, width, height] for an input image of width*height. Each element in the width*height output array is a value between 0 and 20 (for a total of 21 semantic labels described in Introduction) and the value is used to set a specific color. Color coding of the segmentation here is based on the class with the highest probability, and you can extend the color coding for all classes in your own dataset.

    -

    After the output processing, you will also need to call a helper function to convert the RGB buffer to an UIImage instance to be shown on UIImageView. You can refer to the example code convertRGBBufferToUIImage defined in UIImageHelper.mm in the code repo.

    -

    The UI for this app is also similar to that for HelloWorld, except that you do not need the UITextView to show the image classification result. You can also add two buttons Segment and Restart as shown in the code repo to run the model inference and to show back the original image after the segmentation result is shown.

    -

    The last step before we can run the app is to connect all the pieces together. Modify the ViewController.swift file to use the predictImage, which is refactored and changed to segmentImage in the repo, and helper functions you built as shown in the example code in the repo in ViewController.swift. Connect the buttons to the actions and you should be good to go.

    -

    Now when you run the app on an iOS simulator or an actual iOS device, you will see the following screens:

    +

    여기에서 구현한 것은 width*height인 입력 이미지로 [21, width, height] 크기의 텐서를 출력하는 DeepLabV3 모델에 대한 이해를 바탕으로 구현한 것입니다. width*height인 결과 행렬의 각 원소들은 0에서 20 사이의 값(소개에서 설명한 총 21개의 의미론적 라벨을 표현)을 가지며, 각각의 값은 특정한 색을 가집니다. 여기에서 설명하는 분할에서는 가장 높은 확률을 가지는 클래스의 색깔 코드(color coding)을 사용하고, 데이터셋의 모든 클래스들에 각각의 색깔 코드를 설정하도록 확장도 할 수 있습니다.

    +

    결과 처리 이후, UIImageView 에 표시하기 위해 RGB bufferUIImage 인스턴스로 변환하는 헬퍼 함수를 호출해야 할 수도 있습니다. 코드 저장소 내부의 UIImageHelper.mm 에 정의된 예제 코드인 convertRGBBufferToUIImage 를 참조할 수도 있습니다.

    +

    이 앱의 UI는 HelloWorld의 UI와 유사하지만 이미지 분류의 결과를 보여주기 위해 UITextView 를 필요로 하지 않습니다. 코드 저장소에서 볼 수 있는 것처럼 Segment and Restart 버튼 두 개를 추가할 수도 있습니다. 이 버튼들은 모델 추론을 실행하고 분할 결과를 보다가 원본 이미지로 되돌리기 위해 사용합니다.

    +

    앱을 실행하기 전 마지막 단계는 모든 조각들을 하나로 합치는 것입니다. predictImage 를 사용하기 위해 ViewController.swift 를 변경하십시오. predictImage 는 저장소에서 리팩토링되어 segmentImage 로 변경됩니다. 그리고 저장소에 있는 ViewController.swift 의 헬퍼 함수를 예제 코드에서 본 것과 같이 수정하세요. 버튼에 액션을 연결하면 바로 실행할 수 있습니다.

    +

    이제 앱을 iOS 에뮬레이터나 실제 iOS 기기에서 실행하면 이런 화면들을 볼 수 있습니다:

    ../_images/deeplabv3_ios.png ../_images/deeplabv3_ios2.png
    -
    -

    Recap

    -

    In this tutorial, we described what it takes to convert a pre-trained PyTorch DeepLabV3 model for iOS and how to make sure the model can run successfully on iOS. Our focus was to help you understand the process of confirming that a model can indeed run on iOS. The complete code repo is available here.

    -

    More advanced topics such as quantization and using models via transfer learning or of your own on iOS will be covered soon in future demo apps and tutorials.

    +
    +

    정리

    +

    이 튜토리얼에서는 사전에 학습된 PyTorch DeepLabV3 모델을 iOS에서 사용하기 위한 변환과, 그 모델이 어떻게 iOS에서 성공적으로 실행되는지 보았습니다. 여기에서는 모델이 iOS에서도 정말 실행이 되는지 각 과정을 확인해 보면서 전체 과정을 이해하는 것에 초점을 두었습니다. 전체 코드는 `여기 <https://github.com/pytorch/ios-demo-app/tree/master/ImageSegmentation`_ 에서 확인 가능합니다.

    +

    iOS에서 양자화나 전이 학습(transfer learning)같은 고급 주제는 앞으로의 데모 앱이나 튜토리얼에서 다룰 예정입니다.

    - @@ -618,20 +623,20 @@

    Learn More

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/examples_autograd/polynomial_autograd.html b/docs/beginner/examples_autograd/polynomial_autograd.html index fd4b38624..d90112168 100644 --- a/docs/beginner/examples_autograd/polynomial_autograd.html +++ b/docs/beginner/examples_autograd/polynomial_autograd.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/examples_autograd/polynomial_custom_function.html b/docs/beginner/examples_autograd/polynomial_custom_function.html index 4651393de..cc4f22656 100644 --- a/docs/beginner/examples_autograd/polynomial_custom_function.html +++ b/docs/beginner/examples_autograd/polynomial_custom_function.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/examples_nn/dynamic_net.html b/docs/beginner/examples_nn/dynamic_net.html index 1bfa4d765..119c5f24b 100644 --- a/docs/beginner/examples_nn/dynamic_net.html +++ b/docs/beginner/examples_nn/dynamic_net.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/examples_nn/polynomial_module.html b/docs/beginner/examples_nn/polynomial_module.html index ed8bfc540..4238c04ab 100644 --- a/docs/beginner/examples_nn/polynomial_module.html +++ b/docs/beginner/examples_nn/polynomial_module.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/examples_nn/polynomial_nn.html b/docs/beginner/examples_nn/polynomial_nn.html index f863714e0..9e32857fd 100644 --- a/docs/beginner/examples_nn/polynomial_nn.html +++ b/docs/beginner/examples_nn/polynomial_nn.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/examples_nn/polynomial_optim.html b/docs/beginner/examples_nn/polynomial_optim.html index d3416c0a8..960f0355f 100644 --- a/docs/beginner/examples_nn/polynomial_optim.html +++ b/docs/beginner/examples_nn/polynomial_optim.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/examples_tensor/polynomial_numpy.html b/docs/beginner/examples_tensor/polynomial_numpy.html index 3804e4ddc..6412cfc77 100644 --- a/docs/beginner/examples_tensor/polynomial_numpy.html +++ b/docs/beginner/examples_tensor/polynomial_numpy.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/examples_tensor/polynomial_tensor.html b/docs/beginner/examples_tensor/polynomial_tensor.html index 66a5846a9..e73314f7c 100644 --- a/docs/beginner/examples_tensor/polynomial_tensor.html +++ b/docs/beginner/examples_tensor/polynomial_tensor.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/fgsm_tutorial.html b/docs/beginner/fgsm_tutorial.html index 0208ee874..302336a61 100644 --- a/docs/beginner/fgsm_tutorial.html +++ b/docs/beginner/fgsm_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/finetuning_torchvision_models_tutorial.html b/docs/beginner/finetuning_torchvision_models_tutorial.html index 19ac0feeb..a6fda0ee5 100644 --- a/docs/beginner/finetuning_torchvision_models_tutorial.html +++ b/docs/beginner/finetuning_torchvision_models_tutorial.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/former_torchies/autograd_tutorial_old.html b/docs/beginner/former_torchies/autograd_tutorial_old.html index 54b178f80..ebbf942cc 100644 --- a/docs/beginner/former_torchies/autograd_tutorial_old.html +++ b/docs/beginner/former_torchies/autograd_tutorial_old.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/former_torchies/nnft_tutorial.html b/docs/beginner/former_torchies/nnft_tutorial.html index f92dc6807..133355e42 100644 --- a/docs/beginner/former_torchies/nnft_tutorial.html +++ b/docs/beginner/former_torchies/nnft_tutorial.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/former_torchies/parallelism_tutorial.html b/docs/beginner/former_torchies/parallelism_tutorial.html index 522b4e79c..dacbf756d 100644 --- a/docs/beginner/former_torchies/parallelism_tutorial.html +++ b/docs/beginner/former_torchies/parallelism_tutorial.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/former_torchies/sg_execution_times.html b/docs/beginner/former_torchies/sg_execution_times.html index ec8a10d95..82880a4c5 100644 --- a/docs/beginner/former_torchies/sg_execution_times.html +++ b/docs/beginner/former_torchies/sg_execution_times.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/former_torchies/tensor_tutorial_old.html b/docs/beginner/former_torchies/tensor_tutorial_old.html index 098ff5925..5dbadd17a 100644 --- a/docs/beginner/former_torchies/tensor_tutorial_old.html +++ b/docs/beginner/former_torchies/tensor_tutorial_old.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/former_torchies_tutorial.html b/docs/beginner/former_torchies_tutorial.html index a5950439e..9a8394475 100644 --- a/docs/beginner/former_torchies_tutorial.html +++ b/docs/beginner/former_torchies_tutorial.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/hybrid_frontend/learning_hybrid_frontend_through_example_tutorial.html b/docs/beginner/hybrid_frontend/learning_hybrid_frontend_through_example_tutorial.html index f3f72502d..d7aaaaefa 100644 --- a/docs/beginner/hybrid_frontend/learning_hybrid_frontend_through_example_tutorial.html +++ b/docs/beginner/hybrid_frontend/learning_hybrid_frontend_through_example_tutorial.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/hybrid_frontend/sg_execution_times.html b/docs/beginner/hybrid_frontend/sg_execution_times.html index 6f714ffed..d77cc0840 100644 --- a/docs/beginner/hybrid_frontend/sg_execution_times.html +++ b/docs/beginner/hybrid_frontend/sg_execution_times.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/hybrid_frontend_tutorial.html b/docs/beginner/hybrid_frontend_tutorial.html index 816eab668..7d4830e71 100644 --- a/docs/beginner/hybrid_frontend_tutorial.html +++ b/docs/beginner/hybrid_frontend_tutorial.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/hyperparameter_tuning_tutorial.html b/docs/beginner/hyperparameter_tuning_tutorial.html index 5589ac308..b40ea9289 100644 --- a/docs/beginner/hyperparameter_tuning_tutorial.html +++ b/docs/beginner/hyperparameter_tuning_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/nlp/advanced_tutorial.html b/docs/beginner/nlp/advanced_tutorial.html index 2069541eb..1b5a647b4 100644 --- a/docs/beginner/nlp/advanced_tutorial.html +++ b/docs/beginner/nlp/advanced_tutorial.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/nlp/deep_learning_tutorial.html b/docs/beginner/nlp/deep_learning_tutorial.html index ae5634441..16fe06337 100644 --- a/docs/beginner/nlp/deep_learning_tutorial.html +++ b/docs/beginner/nlp/deep_learning_tutorial.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/nlp/pytorch_tutorial.html b/docs/beginner/nlp/pytorch_tutorial.html index 3831c4820..eb78caac7 100644 --- a/docs/beginner/nlp/pytorch_tutorial.html +++ b/docs/beginner/nlp/pytorch_tutorial.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/nlp/sequence_models_tutorial.html b/docs/beginner/nlp/sequence_models_tutorial.html index 69b18986f..824d2f02e 100644 --- a/docs/beginner/nlp/sequence_models_tutorial.html +++ b/docs/beginner/nlp/sequence_models_tutorial.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/nlp/sg_execution_times.html b/docs/beginner/nlp/sg_execution_times.html index f9ab70a77..f12817cce 100644 --- a/docs/beginner/nlp/sg_execution_times.html +++ b/docs/beginner/nlp/sg_execution_times.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/nlp/word_embeddings_tutorial.html b/docs/beginner/nlp/word_embeddings_tutorial.html index 3cf15bd21..857359320 100644 --- a/docs/beginner/nlp/word_embeddings_tutorial.html +++ b/docs/beginner/nlp/word_embeddings_tutorial.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/nn_tutorial.html b/docs/beginner/nn_tutorial.html index 9f75f5a16..a128b043a 100644 --- a/docs/beginner/nn_tutorial.html +++ b/docs/beginner/nn_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/profiler.html b/docs/beginner/profiler.html index 74323d4f1..c376d023f 100644 --- a/docs/beginner/profiler.html +++ b/docs/beginner/profiler.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/ptcheat.html b/docs/beginner/ptcheat.html index 549a0d221..3d4b2afdb 100644 --- a/docs/beginner/ptcheat.html +++ b/docs/beginner/ptcheat.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/pytorch_with_examples.html b/docs/beginner/pytorch_with_examples.html index 1ab425a88..de283ce38 100644 --- a/docs/beginner/pytorch_with_examples.html +++ b/docs/beginner/pytorch_with_examples.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/saving_loading_models.html b/docs/beginner/saving_loading_models.html index 3d87e8891..a8e563715 100644 --- a/docs/beginner/saving_loading_models.html +++ b/docs/beginner/saving_loading_models.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/sg_execution_times.html b/docs/beginner/sg_execution_times.html index 75709db1e..d1096965a 100644 --- a/docs/beginner/sg_execution_times.html +++ b/docs/beginner/sg_execution_times.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/text_sentiment_ngrams_tutorial.html b/docs/beginner/text_sentiment_ngrams_tutorial.html index b0c9d2980..0f22c41bf 100644 --- a/docs/beginner/text_sentiment_ngrams_tutorial.html +++ b/docs/beginner/text_sentiment_ngrams_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/torchtext_translation.html b/docs/beginner/torchtext_translation.html index 998487fef..4f686dc44 100644 --- a/docs/beginner/torchtext_translation.html +++ b/docs/beginner/torchtext_translation.html @@ -205,7 +205,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -219,7 +219,7 @@ @@ -239,7 +239,7 @@

    Mobile

    diff --git a/docs/beginner/transfer_learning_tutorial.html b/docs/beginner/transfer_learning_tutorial.html index 077c63306..dda5d7a9e 100644 --- a/docs/beginner/transfer_learning_tutorial.html +++ b/docs/beginner/transfer_learning_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/transformer_tutorial.html b/docs/beginner/transformer_tutorial.html index 163e599de..0410c41ff 100644 --- a/docs/beginner/transformer_tutorial.html +++ b/docs/beginner/transformer_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/translation_transformer.html b/docs/beginner/translation_transformer.html index d4144c5ad..5a4aa1706 100644 --- a/docs/beginner/translation_transformer.html +++ b/docs/beginner/translation_transformer.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/beginner/vt_tutorial.html b/docs/beginner/vt_tutorial.html index 3b1292bd7..5e4d882ea 100644 --- a/docs/beginner/vt_tutorial.html +++ b/docs/beginner/vt_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/genindex.html b/docs/genindex.html index 960a758cf..8266f504e 100644 --- a/docs/genindex.html +++ b/docs/genindex.html @@ -206,7 +206,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -220,7 +220,7 @@ @@ -240,7 +240,7 @@

    Mobile

    diff --git a/docs/index.html b/docs/index.html index e9c9fb224..22e6b0654 100644 --- a/docs/index.html +++ b/docs/index.html @@ -206,7 +206,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -220,7 +220,7 @@ @@ -240,7 +240,7 @@

    Mobile

    diff --git a/docs/intermediate/char_rnn_classification_tutorial.html b/docs/intermediate/char_rnn_classification_tutorial.html index 1a95257a9..033ac5e77 100644 --- a/docs/intermediate/char_rnn_classification_tutorial.html +++ b/docs/intermediate/char_rnn_classification_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/intermediate/char_rnn_generation_tutorial.html b/docs/intermediate/char_rnn_generation_tutorial.html index 495d9f3fd..a8600ccec 100644 --- a/docs/intermediate/char_rnn_generation_tutorial.html +++ b/docs/intermediate/char_rnn_generation_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/intermediate/ddp_tutorial.html b/docs/intermediate/ddp_tutorial.html index f6f6cc163..22a076476 100644 --- a/docs/intermediate/ddp_tutorial.html +++ b/docs/intermediate/ddp_tutorial.html @@ -9,7 +9,7 @@ - Getting Started with Distributed Data Parallel — PyTorch Tutorials 1.9.0+cu102 documentation + 분산 데이터 병렬 처리 시작하기 — PyTorch Tutorials 1.9.0+cu102 documentation @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    @@ -295,7 +295,7 @@

  • -
  • Getting Started with Distributed Data Parallel
  • +
  • 분산 데이터 병렬 처리 시작하기
  • @@ -349,66 +349,56 @@
    -
    -

    Getting Started with Distributed Data Parallel

    -

    Author: Shen Li

    -

    Edited by: Joe Zhu

    -

    Prerequisites:

    +
    +

    분산 데이터 병렬 처리 시작하기

    +

    저자: Shen Li

    +

    감수: Joe Zhu

    +

    번역: 조병근

    +

    선수과목(Prerequisites):

    -

    DistributedDataParallel -(DDP) implements data parallelism at the module level which can run across -multiple machines. Applications using DDP should spawn multiple processes and -create a single DDP instance per process. DDP uses collective communications in the -torch.distributed -package to synchronize gradients and buffers. More specifically, DDP registers -an autograd hook for each parameter given by model.parameters() and the -hook will fire when the corresponding gradient is computed in the backward -pass. Then DDP uses that signal to trigger gradient synchronization across -processes. Please refer to -DDP design note for more details.

    -

    The recommended way to use DDP is to spawn one process for each model replica, -where a model replica can span multiple devices. DDP processes can be -placed on the same machine or across machines, but GPU devices cannot be -shared across processes. This tutorial starts from a basic DDP use case and -then demonstrates more advanced use cases including checkpointing models and -combining DDP with model parallel.

    +

    분산 데이터 병렬 처리(DDP)는 +여러 기기에서 실행할 수 있는 데이터 병렬 처리를 모듈 수준에서 구현합니다. +DDP를 사용하는 어플리케이션은 여러 작업(process)을 생성하고 작업 당 단일 DDP 인스턴스를 생성해야 합니다. +DDP는 torch.distributed +패키지의 집합 통신(collective communication)을 사용하여 변화도(gradient)와 버퍼를 동기화합니다. +좀 더 구체적으로, DDP는 model.parameters()에 의해 주어진 각 파라미터에 대해 Autograd hook을 등록하고, +hook은 역방향 전달에서 해당 변화도가 계산될 때 작동합니다. +다음으로 DDP는 이 신호를 사용하여 작업 간에 변화도 동기화를 발생시킵니다. 자세한 내용은 +DDP design note를 참조하십시오.

    +

    DDP의 권장 사용법은, 여러 장치에 있을 수 있는 각 모델 복제본당 하나의 작업을 생성하는 것입니다. +DDP 작업은 동일한 기기 또는 여러 기기에 배치할 수 있지만 GPU 장치는 작업 간에 공유할 수 없습니다. +이 튜토리얼에서는 기본 DDP 사용 사례에서 시작하여, +checkpointing 모델 및 DDP와 모델 병렬 처리의 결합을 포함한 추가적인 사용 사례를 보여줍니다.

    Note

    -

    The code in this tutorial runs on an 8-GPU server, but it can be easily -generalized to other environments.

    +

    이 튜토리얼의 코드는 8-GPU 서버에서 실행되지만 다른 환경에서도 쉽게 적용할 수 있습니다.

    -
    -

    Comparison between DataParallel and DistributedDataParallel

    -

    Before we dive in, let’s clarify why, despite the added complexity, you would -consider using DistributedDataParallel over DataParallel:

    +
    +

    DataParallelDistributedDataParallel 간의 비교

    +

    내용에 들어가기에 앞서 복잡성이 증가했음에도 불구하고 +DataParallelDistributedDataParallel 사용을 고려하는 이유를 생각해봅시다.

      -
    • First, DataParallel is single-process, multi-thread, and only works on a -single machine, while DistributedDataParallel is multi-process and works -for both single- and multi- machine training. DataParallel is usually -slower than DistributedDataParallel even on a single machine due to GIL -contention across threads, per-iteration replicated model, and additional -overhead introduced by scattering inputs and gathering outputs.
    • -
    • Recall from the -prior tutorial -that if your model is too large to fit on a single GPU, you must use model parallel -to split it across multiple GPUs. DistributedDataParallel works with -model parallel; DataParallel does not at this time. When DDP is combined -with model parallel, each DDP process would use model parallel, and all processes -collectively would use data parallel.
    • -
    • If your model needs to span multiple machines or if your use case does not fit -into data parallelism paradigm, please see the RPC API -for more generic distributed training support.
    • +
    • 첫째, DataParallel은 단일 작업, 멀티쓰레드이며 단일 기기에서만 작동하는 반면, +DistributedDataParallel은 다중 작업이며 단일 및 다중 기기 학습을 전부 지원합니다. +DataParallel은 쓰레드간 GIL 경합, 복제 모델의 반복 당 생성, 산란 입력 및 수집 출력으로 인한 +추가적인 오버헤드로 인해 일반적으로 단일 시스템에서조차 DistributedDataParallel보다 느립니다.
    • +
    • 모델이 너무 커서 단일 GPU에 맞지 않을 경우 model parallel을 사용하여 여러 GPU로 분할해야 한다는 +prior tutorial을 떠올려 보세요. +DistributedDataParallelmodel parallel에서 실행되지만 DataParallel은 이때 실행되지 않습니다. +DDP를 모델 병렬 처리와 결합하면 각 DDP 작업은 모델 병렬 처리를 사용하며 +모든 작업은 데이터 병렬 처리를 사용합니다.
    • +
    • 모델이 여러 대의 기기에 존재해야 하거나 사용 사례가 데이터 병렬화 패러다임에 맞지 않는 경우, +일반적인 분산 학습 지원을 보려면 the RPC API를 참조하십시오.
    -
    -

    Basic Use Case

    -

    To create DDP modules, first set up process groups properly. More details can -be found in -Writing Distributed Applications with PyTorch.

    +
    +

    기본적인 사용법

    +

    DDP 모듈을 생성하기 전에 우선 작업 그룹을 올바르게 설정해야 합니다. 자세한 내용은 +PYTORCH로 분산 어플리케이션 개발하기에서 확인할 수 있습니다.

    -

    Now, let’s create a toy module, wrap it with DDP, and feed it with some dummy -input data. Please note, as DDP broadcasts model states from rank 0 process to -all other processes in the DDP constructor, you don’t need to worry about -different DDP processes start from different model parameter initial values.

    +

    이제 DDP로 감싸여진 Toy 모듈을 생성하고 더미 입력 데이터를 입력해 보겠습니다. +우선 DDP는 0순위 작업에서부터 DDP 생성자의 다른 모든 작업들에게 모델의 상태를 전달하므로, +다른 모델의 매개 변수 초기값들에서 시작하는 다른 DDP 작업들에 대하여 걱정할 필요가 없습니다.

    -

    As you can see, DDP wraps lower-level distributed communication details and -provides a clean API as if it is a local model. Gradient synchronization -communications take place during the backward pass and overlap with the -backward computation. When the backward() returns, param.grad already -contains the synchronized gradient tensor. For basic use cases, DDP only -requires a few more LoCs to set up the process group. When applying DDP to more -advanced use cases, some caveats require caution.

    +

    보여지는 바와 같이 DDP는 하위 수준의 분산 커뮤니케이션 세부 사항을 포함하고 +로컬 모델처럼 깔끔한 API를 제공합니다. 변화도 동기화 통신(gradient synchronization communications)은 +역전파 전달(backward pass)간 수행되며 역전파 계산(backward computation)과 겹치게 됩니다. +backword()가 반환되면 param.grad에는 동기화된 변화도 텐서(synchronized gradient tensor)가 포함되어 있습니다. +기본적으로 DDP는 작업 그룹을 설정하는데 몇 개의 LoCs만이 필요하지만 보다 다양하게 사용하는 경우 주의가 필요합니다.

    -
    -

    Skewed Processing Speeds

    -

    In DDP, the constructor, the forward pass, and the backward pass are -distributed synchronization points. Different processes are expected to launch -the same number of synchronizations and reach these synchronization points in -the same order and enter each synchronization point at roughly the same time. -Otherwise, fast processes might arrive early and timeout on waiting for -stragglers. Hence, users are responsible for balancing workloads distributions -across processes. Sometimes, skewed processing speeds are inevitable due to, -e.g., network delays, resource contentions, unpredictable workload spikes. To -avoid timeouts in these situations, make sure that you pass a sufficiently -large timeout value when calling -init_process_group.

    +
    +

    비대칭 작업 속도

    +

    DDP에서는 생성자, 순전파(forward pass) 및 역전파 전달 호출 지점이 분산 동기화 지점(distribute synchronization point)입니다. +서로 다른 작업이 동일한 수의 동기화를 시작하고 동일한 순서로 이러한 동기화 지점에 도달하여 +각 동기화 지점을 거의 동시에 진입을 요구합니다. +그렇지 않으면 빠른 작업이 일찍 도착하고 다른 작업의 대기 시간이 초과될 수 있습니다. +따라서 사용자는 작업 간의 작업량을 균형 있게 분배할 필요가 있습니다. +때때로 비대칭 작업(skewed processing) 속도는 다음과 같은 이유로 인하여 불가피하게 발생합니다. +예를 들어, 네트워크 지연, 리소스 경쟁(resource contentions), 예측하지 못한 작업량 급증 등입니다. +이러한 상황에서 시간 초과를 방지하려면, init_process_group를 +호출할 때 충분한 timeout값을 전달해야 합니다.

    -
    -

    Save and Load Checkpoints

    -

    It’s common to use torch.save and torch.load to checkpoint modules -during training and recover from checkpoints. See -SAVING AND LOADING MODELS -for more details. When using DDP, one optimization is to save the model in -only one process and then load it to all processes, reducing write overhead. -This is correct because all processes start from the same parameters and -gradients are synchronized in backward passes, and hence optimizers should keep -setting parameters to the same values. If you use this optimization, make sure all -processes do not start loading before the saving is finished. Besides, when -loading the module, you need to provide an appropriate map_location -argument to prevent a process to step into others’ devices. If map_location -is missing, torch.load will first load the module to CPU and then copy each -parameter to where it was saved, which would result in all processes on the -same machine using the same set of devices. For more advanced failure recovery -and elasticity support, please refer to TorchElastic.

    +
    +

    체크포인트를 저장하고 읽어오기

    +

    학습 중에 torch.savetorch.load 로 모듈의 체크포인트를 만들고 그 체크포인트로부터 복구하는 것이 일반적입니다. +더 자세한 내용은 SAVING AND LOADING MODELS를 참고하세요. +DDP를 사용할 때, 최적의 방법은 모델을 한 작업에만 저장하고 +그 모델을 모든 작업에 쓰기 과부하(write overhead)를 줄이며 읽어오는 것입니다. +이는 모든 작업이 같은 매개변수로부터 시작되고 변화도는 +역전파 전달로 동기화되므로 옵티마이저(optimizer)는 +매개변수를 동일한 값으로 계속 설정해야 하기 때문에 정확합니다. 이러한 최적화를 사용하는 경우, +저장이 완료되기 전에 읽어오는 작업을 시작하지 않도록 해야 합니다. 게다가, 모듈을 읽어올 때, +작업이 다른 기기에 접근하지 않도록 적절한 map_location 인자를 제공해야합니다. +map_location값이 없을 경우, torch.load는 먼저 모듈을 CPU에 읽어온 다음 각 매개변수가 +저장된 위치로 복사하여 동일한 장치를 사용하는 동일한 기기에서 모든 작업을 발생시킵니다. +더 추가적인 실패 복구와 엘라스틱(elasticity support)은 TorchElastic을 참고하세요.

    def demo_checkpoint(rank, world_size):
         print(f"Running DDP checkpoint example on rank {rank}.")
         setup(rank, world_size)
    @@ -536,13 +519,12 @@ 

    Save and Load CheckpointsCHECKPOINT_PATH = tempfile.gettempdir() + "/model.checkpoint" if rank == 0: - # All processes should see same parameters as they all start from same - # random parameters and gradients are synchronized in backward passes. - # Therefore, saving it in one process is sufficient. + # 모든 작업은 같은 매개변수로부터 시작된다고 생각해야 합니다. + # 무작위의 매개변수와 변화도는 역전파 전달로 동기화됩니다. + # 그럼으로, 하나의 작업은 모델을 저장하기에 충분합니다. torch.save(ddp_model.state_dict(), CHECKPOINT_PATH) - # Use a barrier() to make sure that process 1 loads the model after process - # 0 saves it. + # 작업 0이 저장한 후 작업 1이 모델을 읽어오도록 barrier()를 사용합니다. dist.barrier() # configure map_location properly map_location = {'cuda:%d' % 0: 'cuda:%d' % rank} @@ -556,9 +538,8 @@

    Save and Load Checkpointsloss_fn(outputs, labels).backward() optimizer.step() - # Not necessary to use a dist.barrier() to guard the file deletion below - # as the AllReduce ops in the backward pass of DDP already served as - # a synchronization. + # 파일삭제를 보호하기 위해 아래에 dist.barrier()를 사용할 필요는 없습니다. + # DDP의 역전파 전달 과정에 있는 AllReduce 옵스(ops)가 동기화 기능을 수행했기 때문에 if rank == 0: os.remove(CHECKPOINT_PATH) @@ -567,10 +548,10 @@

    Save and Load Checkpoints -

    Combine DDP with Model Parallelism

    -

    DDP also works with multi-GPU models. DDP wrapping multi-GPU models is especially -helpful when training large models with a huge amount of data.

    +
    +

    모델 병렬 처리를 활용한 DDP

    +

    DDP는 다중 – GPU 모델에서도 작동합니다. +다중 – GPU 모델을 활용한 DDP는 대용량의 데이터를 가진 대용량 모델을 학습시킬 때 특히 유용합니다.

    class ToyMpModel(nn.Module):
         def __init__(self, dev0, dev1):
             super(ToyMpModel, self).__init__()
    @@ -587,14 +568,13 @@ 

    Combine DDP with Model Parallelismreturn self.net2(x)

    -

    When passing a multi-GPU model to DDP, device_ids and output_device -must NOT be set. Input and output data will be placed in proper devices by -either the application or the model forward() method.

    +

    다중 GPU 모델을 DDP로 전달할 때, device_idsoutput_device를 설정하지 않아야 합니다. +입력 및 출력 데이터는 어플리케이션 또는 모델 forward()에 의해 적절한 장치에 배치됩니다.

    def demo_model_parallel(rank, world_size):
         print(f"Running DDP with model parallel example on rank {rank}.")
         setup(rank, world_size)
     
    -    # setup mp_model and devices for this process
    +    # 작업을 위한 mp_model 및 장치 설정
         dev0 = rank * 2
         dev1 = rank * 2 + 1
         mp_model = ToyMpModel(dev0, dev1)
    @@ -604,7 +584,7 @@ 

    Combine DDP with Model Parallelismoptimizer = optim.SGD(ddp_mp_model.parameters(), lr=0.001) optimizer.zero_grad() - # outputs will be on dev1 + # 출력값은 dev1에 저장 outputs = ddp_mp_model(torch.randn(20, 10)) labels = torch.randn(20, 5).to(dev1) loss_fn(outputs, labels).backward() @@ -680,12 +660,12 @@

    Combine DDP with Model Parallelism

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/intermediate/dist_tuto.html b/docs/intermediate/dist_tuto.html index 1c20b1cf1..1eabd699e 100644 --- a/docs/intermediate/dist_tuto.html +++ b/docs/intermediate/dist_tuto.html @@ -35,7 +35,7 @@ - + @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    @@ -880,7 +880,7 @@

    초기화 방법(Initialization Methods) - +

    diff --git a/docs/intermediate/dynamic_quantization_bert_tutorial.html b/docs/intermediate/dynamic_quantization_bert_tutorial.html index ecd4d0fb2..23687937a 100644 --- a/docs/intermediate/dynamic_quantization_bert_tutorial.html +++ b/docs/intermediate/dynamic_quantization_bert_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/intermediate/flask_rest_api_tutorial.html b/docs/intermediate/flask_rest_api_tutorial.html index a5bb0d028..f6db2eaf0 100644 --- a/docs/intermediate/flask_rest_api_tutorial.html +++ b/docs/intermediate/flask_rest_api_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/intermediate/fx_conv_bn_fuser.html b/docs/intermediate/fx_conv_bn_fuser.html index dafacb085..b5d78015a 100644 --- a/docs/intermediate/fx_conv_bn_fuser.html +++ b/docs/intermediate/fx_conv_bn_fuser.html @@ -9,7 +9,7 @@ - (beta) Building a Convolution/Batch Norm fuser in FX — PyTorch Tutorials 1.9.0+cu102 documentation + (베타) FX에서 합성곱/배치 정규화(Convolution/Batch Norm) 결합기(Fuser) 만들기 — PyTorch Tutorials 1.9.0+cu102 documentation @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    @@ -295,7 +295,7 @@

  • -
  • (beta) Building a Convolution/Batch Norm fuser in FX
  • +
  • (베타) FX에서 합성곱/배치 정규화(Convolution/Batch Norm) 결합기(Fuser) 만들기
  • @@ -353,20 +353,19 @@

    Note

    Click here to download the full example code

    -
    -

    (beta) Building a Convolution/Batch Norm fuser in FX

    -

    Author: Horace He

    -

    In this tutorial, we are going to use FX, a toolkit for composable function -transformations of PyTorch, to do the following:

    +
    +

    (베타) FX에서 합성곱/배치 정규화(Convolution/Batch Norm) 결합기(Fuser) 만들기

    +

    저자: Horace He

    +

    번역: 오찬희

    +

    이 튜토리얼에서는 PyTorch의 구성 가능한 함수의 변환을 위한 툴킷인 FX를 사용하여 다음을 수행하고자 합니다.

      -
    1. Find patterns of conv/batch norm in the data dependencies.
    2. -
    3. For the patterns found in 1), fold the batch norm statistics into the convolution weights.
    4. +
    5. 데이터 의존성에서 합성곱/배치 정규화 패턴을 찾습니다.
    6. +
    7. 1번에서 발견된 패턴의 경우 배치 정규화 통계를 합성곱 가중치로 결합합니다(folding).
    -

    Note that this optimization only works for models in inference mode (i.e. mode.eval())

    -

    We will be building the fuser that exists here: +

    이 최적화는 추론 모드(즉, mode.eval())의 모델에만 적용된다는 점에 유의하세요.

    +

    다음 링크에 있는 결합기를 만들 것입니다. https://github.com/pytorch/pytorch/blob/orig/release/1.8/torch/fx/experimental/fuser.py -First, let’s get some imports out of the way (we will be using all -of these later in the code).

    +몇 가지의 import 과정을 먼저 처리해줍시다(나중에 코드에서 모두 사용할 것입니다).

    from typing import Type, Dict, Any, Tuple, Iterable
     import copy
     import torch.fx as fx
    @@ -374,10 +373,10 @@
     import torch.nn as nn
     
    -

    For this tutorial, we are going to create a model consisting of convolutions -and batch norms. Note that this model has some tricky components - some of -the conv/batch norm patterns are hidden within Sequentials and one of the -BatchNorms is wrapped in another Module.

    +

    이 튜토리얼에서는 합성곱과 배치 정규화로 구성된 모델을 만들 것입니다. +이 모델에는 아래와 같은 까다로운 요소가 있습니다. +합성곱/배치 정규화 패턴 중의 일부는 시퀀스에 숨겨져 있고 +배치 정규화 중 하나는 다른 모듈로 감싸져 있습니다.

    class WrappedBatchNorm(nn.Module):
         def __init__(self):
             super().__init__()
    @@ -410,39 +409,36 @@
     model.eval()
     
    -
    -

    Fusing Convolution with Batch Norm

    -

    One of the primary challenges with trying to automatically fuse convolution -and batch norm in PyTorch is that PyTorch does not provide an easy way of -accessing the computational graph. FX resolves this problem by symbolically -tracing the actual operations called, so that we can track the computations -through the forward call, nested within Sequential modules, or wrapped in -an user-defined module.

    +
    +

    합성곱과 배치 정규화 결합하기

    +

    PyTorch에서 합성곱과 배치 정규화를 자동으로 결합하려고 할 때 가장 큰 어려움 중 하나는 +PyTorch가 계산 그래프에 쉽게 접근할 수 있는 방법을 제공하지 않는다는 것입니다. +FX는 호출된 실제 연산을 기호적(symbolically)으로 추적하여 이 문제를 해결하므로 순차적 모듈 내에 중첩되거나 +사용자 정의 모듈로 감싸진 forward 호출을 통해 계산을 추적할 수 있습니다.

    traced_model = torch.fx.symbolic_trace(model)
     print(traced_model.graph)
     
    -

    This gives us a graph representation of our model. Note that both the modules -hidden within the sequential as well as the wrapped Module have been inlined -into the graph. This is the default level of abstraction, but it can be -configured by the pass writer. More information can be found at the FX -overview https://pytorch.org/docs/master/fx.html#module-torch.fx

    +

    이렇게 하면 모델을 그래프로 나타낼 수 있습니다. +순차적 모듈 및 감싸진 모듈 내에 숨겨진 두 모듈이 모두 그래프에 삽입되어 있습니다. +이는 기본 추상화 수준이지만 전달 기록기(pass writer)에서 구성할 수 있습니다. +자세한 내용은 다음 링크의 FX 개요에서 확인할 수 있습니다. +https://pytorch.org/docs/master/fx.html#module-torch.fx

    -
    -

    Fusing Convolution with Batch Norm

    -

    Unlike some other fusions, fusion of convolution with batch norm does not -require any new operators. Instead, as batch norm during inference -consists of a pointwise add and multiply, these operations can be “baked” -into the preceding convolution’s weights. This allows us to remove the batch -norm entirely from our model! Read -https://nenadmarkus.com/p/fusing-batchnorm-and-conv/ for further details. The -code here is copied from -https://github.com/pytorch/pytorch/blob/orig/release/1.8/torch/nn/utils/fusion.py -clarity purposes.

    +
    +

    합성곱과 배치 정규화 결합하기

    +

    일부 다른 결합과 달리, 합성곱과 배치 정규화의 결합은 새로운 연산자를 필요로 하지 않습니다. +대신, 추론 중 배치 정규화는 점별 덧셈과 곱셈으로 구성되므로, +이러한 연산은 이전 합성곱의 가중치로 “미리 계산되어 저장(baked)” 될 수 있습니다. +이를 통해 배치 정규화를 모델에서 완전히 제거할 수 있습니다! +자세한 내용은 다음 링크에서 확인 할 수 있습니다. +https://nenadmarkus.com/p/fusing-batchnorm-and-conv/ +이 코드는 명확성을 위해 다음 링크에서 복사한 것입니다. +https://github.com/pytorch/pytorch/blob/orig/release/1.8/torch/nn/utils/fusion.py

    def fuse_conv_bn_eval(conv, bn):
         """
    -    Given a conv Module `A` and an batch_norm module `B`, returns a conv
    -    module `C` such that C(x) == B(A(x)) in inference mode.
    +    합성곱 모듈 'A'와 배치 정규화 모듈 'B'가 주어지면
    +    C(x) == B(A(x))를 만족하는 합성곱 모듈 'C'를 추론 모드로 반환합니다.
         """
         assert(not (conv.training or bn.training)), "Fusion only for eval!"
         fused_conv = copy.deepcopy(conv)
    @@ -469,15 +465,14 @@ 

    Fusing Convolution with Batch Norm -

    FX Fusion Pass

    -

    Now that we have our computational graph as well as a method for fusing -convolution and batch norm, all that remains is to iterate over the FX graph -and apply the desired fusions.

    +
    +

    FX 결합 전달(pass)

    +

    이제 합성곱과 배치 정규화를 결합하는 방법뿐만 아니라 계산 그래프도 얻었으므로 +남은 것은 FX 그래프에 절차를 반복하고 원하는 결합을 적용하는 것입니다.

    def _parent_name(target : str) -> Tuple[str, str]:
         """
    -    Splits a qualname into parent path and last atom.
    -    For example, `foo.bar.baz` -> (`foo.bar`, `baz`)
    +    정규화 된 이름(qualname)을 부모경로(parent path)와 마지막 요소(last atom)로 나눠줍니다.
    +    예를 들어, `foo.bar.baz` -> (`foo.bar`, `baz`)
         """
         *parent, name = target.rsplit('.', 1)
         return parent[0] if parent else '', name
    @@ -490,62 +485,57 @@ 

    FX Fusion Passdef fuse(model: torch.nn.Module) -> torch.nn.Module: model = copy.deepcopy(model) - # The first step of most FX passes is to symbolically trace our model to - # obtain a `GraphModule`. This is a representation of our original model - # that is functionally identical to our original model, except that we now - # also have a graph representation of our forward pass. + # 대부분의 FX 전달의 첫 번째 단계는 `GraphModule` 을 얻기 위해 + # 모델을 기호적으로 추적하는 것입니다. + # 이것은 원래 모델과 기능적으로 동일한 원래 모델의 표현입니다. + # 단, 이제는 순전파 단계(forward pass)에 대한 그래프 표현도 가지고 있습니다. fx_model: fx.GraphModule = fx.symbolic_trace(model) modules = dict(fx_model.named_modules()) - # The primary representation for working with FX are the `Graph` and the - # `Node`. Each `GraphModule` has a `Graph` associated with it - this - # `Graph` is also what generates `GraphModule.code`. - # The `Graph` itself is represented as a list of `Node` objects. Thus, to - # iterate through all of the operations in our graph, we iterate over each - # `Node` in our `Graph`. + # FX 작업을 위한 기본 표현은 `그래프(Graph)` 와 `노드(Node)` 입니다. + # 각 `GraphModule` 에는 연관된 `그래프` 가 있습니다. + # 이 `그래프` 는 `GraphModule.code` 를 생성하는 것이기도 합니다. + # `그래프` 자체는 `노드` 객체의 목록으로 표시됩니다. + # 따라서 그래프의 모든 작업을 반복하기 위해 `그래프` 에서 각 `노드` 에 대해 반복합니다. for node in fx_model.graph.nodes: - # The FX IR contains several types of nodes, which generally represent - # call sites to modules, functions, or methods. The type of node is - # determined by `Node.op`. - if node.op != 'call_module': # If our current node isn't calling a Module then we can ignore it. + # FX IR 에는 일반적으로 모듈, 함수 또는 메소드에 대한 + # 호출 사이트를 나타내는 여러 유형의 노드가 있습니다. + # 노드의 유형은 `Node.op` 에 의해 결정됩니다. + if node.op != 'call_module': # 현재 노드가 모듈을 호출하지 않으면 무시할 수 있습니다. continue - # For call sites, `Node.target` represents the module/function/method - # that's being called. Here, we check `Node.target` to see if it's a - # batch norm module, and then check `Node.args[0].target` to see if the - # input `Node` is a convolution. + # 호출 사이트의 경우, `Node.target` 은 호출되는 모듈/함수/방법을 나타냅니다. + # 여기서는 'Node.target' 을 확인하여 배치 정규화 모듈인지 확인한 다음 + # `Node.args[0].target` 을 확인하여 입력 `노드` 가 합성곱인지 확인합니다. if type(modules[node.target]) is nn.BatchNorm2d and type(modules[node.args[0].target]) is nn.Conv2d: - if len(node.args[0].users) > 1: # Output of conv is used by other nodes + if len(node.args[0].users) > 1: # 합성곱 출력은 다른 노드에서 사용됩니다. continue conv = modules[node.args[0].target] bn = modules[node.target] fused_conv = fuse_conv_bn_eval(conv, bn) replace_node_module(node.args[0], modules, fused_conv) - # As we've folded the batch nor into the conv, we need to replace all uses - # of the batch norm with the conv. + # 배치 정규화를 합성곱으로 결합했기 때문에 + # 배치 정규화의 사용을 모두 합성곱으로 교체해야 합니다. node.replace_all_uses_with(node.args[0]) - # Now that all uses of the batch norm have been replaced, we can - # safely remove the batch norm. + # 배치 정규화 사용을 모두 교체했으므로 + # 안전하게 배치 정규화를 제거할 수 있습니다. fx_model.graph.erase_node(node) fx_model.graph.lint() - # After we've modified our graph, we need to recompile our graph in order - # to keep the generated code in sync. + # 그래프를 수정한 후에는 생성된 코드를 동기화하기 위해 그래프를 다시 컴파일해야 합니다. fx_model.recompile() return fx_model

    Note

    -

    We make some simplifications here for demonstration purposes, such as only -matching 2D convolutions. View -https://github.com/pytorch/pytorch/blob/master/torch/fx/experimental/fuser.py -for a more usable pass.

    +

    여기서는 2D 합성곱만 일치시키는 등 시연 목적으로 약간의 단순화를 하였습니다. +더 유용한 전달은 다음 링크를 참조하십시오. +https://github.com/pytorch/pytorch/blob/master/torch/fx/experimental/fuser.py

    -
    -

    Testing out our Fusion Pass

    -

    We can now run this fusion pass on our initial toy model and verify that our -results are identical. In addition, we can print out the code for our fused -model and verify that there are no more batch norms.

    +
    +

    결합 전달(Fusion pass) 실험하기

    +

    이제 아주 작은 초기 모델에 대해 이 결합 전달을 실행해 결과가 동일한지 확인할 수 있습니다. +또한 결합 모델의 코드를 출력하여 더 이상 배치 정규화가 없는지 확인할 수 있습니다.

    fused_model = fuse(model)
     print(fused_model.code)
     inp = torch.randn(5, 1, 1, 1)
    @@ -553,10 +543,10 @@ 

    Testing out our Fusion Pass -

    Benchmarking our Fusion on ResNet18

    -

    We can test our fusion pass on a larger model like ResNet18 and see how much -this pass improves inference performance.

    +
    +

    ResNet18에서 결합 벤치마킹하기

    +

    이제 ResNet18과 같은 대형 모델에서 결합 전달을 실험하고 +이 전달이 추론 성능을 얼마나 향상시키는지 확인할 수 있습니다.

    import torchvision.models as models
     import time
     
    @@ -579,25 +569,22 @@ 

    Benchmarking our Fusion on ResNet18print("Fused time: ", benchmark(fused_rn18))

    -

    As we previously saw, the output of our FX transformation is -(Torchscriptable) PyTorch code, we can easily jit.script the output to try -and increase our performance even more. In this way, our FX model -transformation composes with Torchscript with no issues.

    +

    앞서 살펴본 바와 같이, FX 변환의 출력은 (Torchscriptable) PyTorch 코드입니다. +따라서 jit.script 를 통해 쉽게 출력하여 성능을 더 높일 수 있습니다. +이러한 방식으로 FX 모델 변환은 Torchscript와 아무런 문제 없이 구성됩니다.

    jit_rn18 = torch.jit.script(fused_rn18)
     print("jit time: ", benchmark(jit_rn18))
     
     
    -############
    -# Conclusion
    -# ----------
    -# As we can see, using FX we can easily write static graph transformations on
    -# PyTorch code.
    +######
    +# 결론
    +# ---
    +# FX를 사용하면 PyTorch 코드에 정적 그래프 변환을 쉽게 작성할 수 있습니다.
     #
    -# Since FX is still in beta, we would be happy to hear any
    -# feedback you have about using it. Please feel free to use the
    -# PyTorch Forums (https://discuss.pytorch.org/) and the issue tracker
    -# (https://github.com/pytorch/pytorch/issues) to provide any feedback
    -# you might have.
    +# FX는 아직 베타 버전이기 때문에 FX 사용에 대한 피드백을 보내주시면 감사하겠습니다.
    +# PyTorch 포럼 (https://discuss.pytorch.org/)
    +# 이슈 추적기 (https://github.com/pytorch/pytorch/issues)
    +# 위 두 링크를 사용하여 피드백을 제공해주시면 됩니다.
     

    Total running time of the script: ( 0 minutes 0.000 seconds)

    @@ -665,12 +652,12 @@

    Benchmarking our Fusion on ResNet18

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    @@ -744,7 +744,7 @@

    Conclusion - +

    diff --git a/docs/intermediate/mario_rl_tutorial.html b/docs/intermediate/mario_rl_tutorial.html index 82b77a0cd..6fc734b27 100644 --- a/docs/intermediate/mario_rl_tutorial.html +++ b/docs/intermediate/mario_rl_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/intermediate/memory_format_tutorial.html b/docs/intermediate/memory_format_tutorial.html index f5dd38926..2bdf6ea18 100644 --- a/docs/intermediate/memory_format_tutorial.html +++ b/docs/intermediate/memory_format_tutorial.html @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    diff --git a/docs/intermediate/model_parallel_tutorial.html b/docs/intermediate/model_parallel_tutorial.html index 91e997591..82f4a6bfd 100644 --- a/docs/intermediate/model_parallel_tutorial.html +++ b/docs/intermediate/model_parallel_tutorial.html @@ -34,7 +34,7 @@ - + @@ -207,7 +207,7 @@

    Code Transforms with FX

    프론트엔드 API

    @@ -221,7 +221,7 @@ @@ -241,7 +241,7 @@

    Mobile

    @@ -672,7 +672,7 @@

    입력 텐서값을 분할하는 파이프라인을 설계하여 학습 시