[XLA 研究] How to use XLA AOT compilation in TensorFlow ( Part II )

My previous post: [XLA 研究] How to use XLA AOT compilation in TensorFlow is about a simple example to use XLA AOT. But, if you want to see a more complicated example, please take a look at this: https://gist.github.com/carlthome/6ae8a570e21069c60708017e3f96c9fd

Before getting started, these commands can build all the testcases in "test_main.cc" and run them for verification.

$ cd tensorflow/compiler/aot/tests
$ bazel build tfcompile_test
$ bazel run tfcompile_test

-----------------------------------------------------------------------------
Running main() from test_main.cc
[==========] Running 15 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 15 tests from TFCompileTest
[ RUN ] TFCompileTest.Add
[ OK ] TFCompileTest.Add (0 ms)
[ RUN ] TFCompileTest.Add_SetArg
[ OK ] TFCompileTest.Add_SetArg (0 ms)
[ RUN ] TFCompileTest.AddWithCkpt
[ OK ] TFCompileTest.AddWithCkpt (0 ms)
[ RUN ] TFCompileTest.AddWithCkptSaver
[ OK ] TFCompileTest.AddWithCkptSaver (0 ms)
[ RUN ] TFCompileTest.Cond
[ OK ] TFCompileTest.Cond (0 ms)
[ RUN ] TFCompileTest.Gather
[ OK ] TFCompileTest.Gather (0 ms)
[ RUN ] TFCompileTest.MatMul2
[ OK ] TFCompileTest.MatMul2 (1 ms)
[ RUN ] TFCompileTest.MatMul2_SetArg
[ OK ] TFCompileTest.MatMul2_SetArg (2 ms)
[ RUN ] TFCompileTest.MatMulAndAdd1
[ OK ] TFCompileTest.MatMulAndAdd1 (0 ms)
[ RUN ] TFCompileTest.Function
[ OK ] TFCompileTest.Function (0 ms)
[ RUN ] TFCompileTest.Splits
[ OK ] TFCompileTest.Splits (0 ms)
[ RUN ] TFCompileTest.AssertEqAndReturnDiff
[ OK ] TFCompileTest.AssertEqAndReturnDiff (0 ms)
[ RUN ] TFCompileTest.LookupNameIndex
[ OK ] TFCompileTest.LookupNameIndex (0 ms)
[ RUN ] TFCompileTest.ProgramShape
[ OK ] TFCompileTest.ProgramShape (0 ms)
[ RUN ] TFCompileTest.HloProfiling
[ OK ] TFCompileTest.HloProfiling (4 ms)
[----------] 15 tests from TFCompileTest (10 ms total)

[----------] Global test environment tear-down
[==========] 15 tests from 1 test case ran. (10 ms total)
[ PASSED ] 15 tests.

The build result is put away here:

bazel-genfiles/tensorflow/compiler/aot/tests

In the previous post, I used bazel build :test_graph_tfmatmul to compile my graph and generates the header file.
Now, in this post, I want to use the command "tfcompile" to do the same thing.

$ export TF_CPP_MIN_VLOG_LEVEL=2
$ bazel-bin/tensorflow/compiler/aot/tfcompile --graph=test_graph_tfmatmul.pb --config=test_graph_tfmatmul.config.pbtxt --cpp_class="foo::bar::MatMulComp"

Why do I run in this way? Because I can dump the all the process log of XLA AOT frontend and backend as follows:

tensorflow/compiler/xla/service/service.cc:165] XLA compile-only service constructed
tensorflow/compiler/tf2xla/dump_graph.cc:79] Dumped GraphDef to /tmp//tf2xla_post_rewrite.pbtxt
tensorflow/compiler/tf2xla/tf2xla.cc:166] Post rewrite: /tmp//tf2xla_post_rewrite.pbtxt
tensorflow/core/graph/algorithm.cc:187] Reverse reach init: _retval_0
tensorflow/core/graph/algorithm.cc:196] Reverse reach : _retval_0 from x_y_prod
tensorflow/core/graph/algorithm.cc:196] Reverse reach : x_y_prod from _arg_0
tensorflow/core/graph/algorithm.cc:196] Reverse reach : x_y_prod from _arg_1
tensorflow/compiler/tf2xla/dump_graph.cc:79] Dumped GraphDef to /tmp//tfcompile_post_prune.pbtxt
tensorflow/compiler/tf2xla/tf2xla.cc:170] Post prune: /tmp//tfcompile_post_prune.pbtxt
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: XlaWhile
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: StatelessIf
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: XlaSelectAndScatter
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: XlaReduce
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: XlaPad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: XlaConv
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: XlaBroadcastHelper
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceScatterNdAdd
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceScatterMax
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceScatterDiv
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceScatterSub
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: XlaDot
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceGather
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: AssignVariableOp
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ReadVariableOp
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: VariableShape
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Digamma
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Erfc
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Real
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Tan
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Sinh
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Sign
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Rsqrt
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Neg
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Exp
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Sin
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Cosh
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Cos
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Atanh
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Atan
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: While
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Asinh
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Acosh
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: XlaIf
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Abs
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Conj
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Angle
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ComplexAbs
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Acos
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Transpose
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceApplyAdaMax
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceApplyAdam
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceApplyAdagradDA
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceApplyAdagrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Rint
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceApplyMomentum
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceApplyProximalGradientDescent
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceApplyGradientDescent
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: If
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Tile
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: IsFinite
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: TensorArrayCloseV3
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: TensorArrayGradV3
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: TensorArraySizeV3
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Sqrt
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: TensorArrayConcatV3
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: TensorArrayScatterV3
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: TensorArrayReadV3
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: TensorArrayWriteV3
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: TensorArrayV3
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: StridedSliceGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: StatelessTruncatedNormal
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: StatelessRandomNormal
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: StatelessRandomUniform
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: StackPushV2
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: StackV2
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Imag
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: StridedSlice
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: SplitV
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: SparseToDense
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceScatterAdd
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: SpaceToDepth
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: SpaceToBatch
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: SpaceToBatchND
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Softmax
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: InvertPermutation
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Slice
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: OnesLike
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceApplyRMSProp
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: FFT3D
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ShapeN
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: SparseSoftmaxCrossEntropyWithLogits
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: FFT
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: DepthToSpace
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Asin
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ArgMin
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Cross
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: IFFT
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: DepthwiseConv2dNativeBackpropInput
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: FakeQuantWithMinMaxArgsGradient
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: MatMul
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: RFFT2D
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: UnsortedSegmentMin
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Expm1
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Sub
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Conv2D
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Select
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceScatterMin
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: AdjustContrastv2
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ClipByValue
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Cast
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Invert
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: LogicalAnd
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: IsInf
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: BroadcastTo
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Elu
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: SoftsignGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: XlaReduceWindow
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: AvgPool3D
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: DepthwiseConv2dNative
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Cumprod
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: SoftplusGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResizeBilinearGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: EluGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: SigmoidGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: VarIsInitializedOp
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ReciprocalGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceApplyPowerSign
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: PadV2
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ReverseSequence
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Add
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Unpack
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Lgamma
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Rank
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: RealDiv
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Mul
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Bitcast
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: BitwiseAnd
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: BiasAddGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Reciprocal
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Const
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Conv3DBackpropInputV2
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Mod
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Shape
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ApproximateEqual
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Div
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Softplus
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: BatchMatMul
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Pad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: AssignSubVariableOp
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Conv3DBackpropFilterV2
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ScatterNd
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Less
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Maximum
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: PreventGradient
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: BiasAddV1
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Conv2DBackpropInput
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Log
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: DiagPart
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: BatchToSpace
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: FakeQuantWithMinMaxArgs
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: CheckNumerics
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Square
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: FloorMod
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: BatchToSpaceND
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: IdentityN
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: AssignAddVariableOp
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: MatrixDiagPart
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ExpandDims
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: QuantizeAndDequantizeV3
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: XlaDynamicUpdateSlice
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: SquaredDifference
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: LogicalNot
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: LRN
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Minimum
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: AddN
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Max
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Cholesky
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: DynamicStitch
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: AvgPool
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: StackPopV2
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: _Arg
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: TensorArraySplitV3
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: BiasAdd
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Snapshot
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Atan2
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: FusedBatchNormV2
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: FusedBatchNormGradV2
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: StopGradient
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceScatterNdUpdate
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Pow
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceStridedSliceAssign
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Diag
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Tanh
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: TruncateDiv
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ConjugateTranspose
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: MaxPoolGradGradV2
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Multinomial
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ParallelDynamicStitch
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Complex
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Floor
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: MaxPool3DGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Pack
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceApplyFtrl
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Any
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceApplyProximalAdagrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: BitwiseXor
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: RightShift
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ConcatV2
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ArgMax
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: LeftShift
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: LogicalOr
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Reverse
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Inv
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: FakeQuantWithMinMaxVars
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: MaxPool3D
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceApplyFtrlV2
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: TruncatedNormal
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: RsqrtGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: NotEqual
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: FFT2D
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: IFFT2D
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: IFFT3D
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: TanhGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: RFFT
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: MatrixTriangularSolve
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: IRFFT
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: RFFT3D
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: IsNan
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Selu
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: MirrorPad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Conv2DBackpropFilter
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: MaxPoolGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: IRFFT3D
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Fill
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceApplyAdadelta
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Identity
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: _ListToArray
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: StackCloseV2
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: LogSoftmax
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: SymbolicGradient
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: MatrixDiag
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Gather
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: GatherV2
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: SoftmaxCrossEntropyWithLogits
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: FusedBatchNormGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: L2Loss
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: GatherNd
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Erf
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: PlaceholderWithDefault
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: RGBToHSV
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Conv3D
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ReverseV2
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: GreaterEqual
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Relu
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: HSVToRGB
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: BroadcastArgs
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: AdjustSaturation
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: QuantizeAndDequantizeV2
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: AdjustHue
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: NonMaxSuppressionV4
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResizeBilinear
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: MatrixBandPart
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: LessEqual
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Bucketize
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ListDiff
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: LRNGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Concat
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: RandomUniformInt
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: SparseMatMul
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceScatterUpdate
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Softsign
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ConcatOffset
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Prod
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: BroadcastGradientArgs
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: MatrixSetDiag
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: DepthwiseConv2dNativeBackpropFilter
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: NoOp
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: TruncateMod
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Assert
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: IRFFT2D
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ControlTrigger
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: TensorArrayGatherV3
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: RandomShuffle
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: OneHot
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceScatterMul
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: MaxPool
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Sum
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: MaxPoolV2
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Split
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: FakeQuantWithMinMaxVarsGradient
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: AvgPoolGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ZerosLike
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: FloorDiv
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: MaxPoolGradV2
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: UnsortedSegmentProd
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: AvgPool3DGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceApplyAddSign
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: MaxPoolGradGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Equal
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: MaxPool3DGradGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Log1p
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Qr
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: RandomUniform
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Sigmoid
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: FusedBatchNorm
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Greater
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: RandomStandardNormal
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Ceil
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: _ArrayToList
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: UnsortedSegmentMax
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: BitwiseOr
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: SeluGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Min
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: _Retval
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: SqrtGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: All
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Reshape
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Relu6
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ReluGrad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Relu6Grad
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Cumsum
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: UnsortedSegmentSum
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ExtractImagePatches
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: XlaRecv
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Range
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Round
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: XlaSend
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: LinSpace
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: ResourceApplyCenteredRMSProp
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Size
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: StatelessWhile
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Mean
tensorflow/compiler/tf2xla/xla_op_registry.cc:292] XLA op registration: device: XLA_CPU_JIT op: Squeeze
tensorflow/core/common_runtime/local_device.cc:41] Local device intra op parallelism threads: 2
tensorflow/compiler/tf2xla/xla_compiler.cc:732] Executing graph symbolically to populate XlaBuilder.
tensorflow/compiler/tf2xla/dump_graph.cc:79] Dumped GraphDef to /tmp//xla_compile_graph_tfcompile.pbtxt
tensorflow/compiler/tf2xla/xla_compiler.cc:735] XlaCompiler::CompileGraph: /tmp//xla_compile_graph_tfcompile.pbtxt
tensorflow/compiler/tf2xla/dump_graph.cc:79] Dumped GraphDef to /tmp//functionalize_initial.pbtxt
tensorflow/compiler/tf2xla/functionalize_control_flow.cc:45] FunctionalizeControlFlow (initial): /tmp//functionalize_initial.pbtxt
tensorflow/compiler/tf2xla/functionalize_while.cc:600] node: x_y_prod (4) frame_name:  frame: _SOURCE parent_frame: _SOURCE
tensorflow/compiler/tf2xla/functionalize_while.cc:600] node: _arg_0 (5) frame_name:  frame: _SOURCE parent_frame: _SOURCE
tensorflow/compiler/tf2xla/functionalize_while.cc:600] node: _arg_1 (6) frame_name:  frame: _SOURCE parent_frame: _SOURCE
tensorflow/compiler/tf2xla/functionalize_while.cc:600] node: _retval_0 (7) frame_name:  frame: _SOURCE parent_frame: _SOURCE
tensorflow/compiler/tf2xla/functionalize_cond.cc:1371] FunctionalizeCond::Functionalize
tensorflow/compiler/tf2xla/dump_graph.cc:79] Dumped GraphDef to /tmp//functionalize_final.pbtxt
tensorflow/compiler/tf2xla/functionalize_control_flow.cc:57] FunctionalizeControlFlow (final): /tmp//functionalize_final.pbtxt
tensorflow/compiler/tf2xla/xla_compiler.cc:592] XLA computation inputs:
tensorflow/compiler/tf2xla/xla_compiler.cc:595]   XLA arg 0 shape: f32[2,3] name:  TF arg 0
tensorflow/compiler/tf2xla/xla_compiler.cc:595]   XLA arg 1 shape: f32[3,2] name:  TF arg 1
tensorflow/core/framework/op_kernel.cc:1112] Instantiating kernel for node: {{node _SOURCE}} = NoOp[]()
tensorflow/core/framework/op_kernel.cc:1112] Instantiating kernel for node: {{node _arg_0}} = _Arg[T=DT_FLOAT, _debug_name="", _feed_id="x_hold:0", _shape=[2,3], index=0]()
tensorflow/core/framework/op_kernel.cc:1112] Instantiating kernel for node: {{node _arg_1}} = _Arg[T=DT_FLOAT, _debug_name="", _feed_id="y_hold:0", _shape=[3,2], index=1]()
tensorflow/core/framework/op_kernel.cc:1112] Instantiating kernel for node: {{node x_y_prod}} = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false](aot_feed_0/x_hold, aot_feed_0/y_hold)
tensorflow/compiler/tf2xla/xla_op_kernel.cc:47] Fetched T3
tensorflow/compiler/tf2xla/xla_op_kernel.cc:47] Fetched T2
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/core/framework/op_kernel.cc:1112] Instantiating kernel for node: {{node _retval_0}} = _Retval[T=DT_FLOAT, _fetch_id="x_y_prod:0", index=0](x_y_prod)
tensorflow/compiler/tf2xla/xla_op_kernel.cc:47] Fetched T4
tensorflow/compiler/xla/client/xla_builder.cc:218] Non-constant: arg0
tensorflow/compiler/xla/client/xla_builder.cc:218] Non-constant: reshape
tensorflow/compiler/xla/client/xla_builder.cc:218] Non-constant: dot
tensorflow/compiler/tf2xla/xla_context.cc:85] Added retval index 0 to XLA computation
tensorflow/core/framework/op_kernel.cc:1112] Instantiating kernel for node: {{node _SINK}} = NoOp[]()
tensorflow/core/framework/log_memory.cc:35] __LOG_MEMORY__ MemoryLogTensorDeallocation { allocator_name: "xla_compilation" }
tensorflow/core/framework/log_memory.cc:35] __LOG_MEMORY__ MemoryLogTensorDeallocation { allocator_name: "xla_compilation" }
tensorflow/core/framework/log_memory.cc:35] __LOG_MEMORY__ MemoryLogTensorDeallocation { allocator_name: "xla_compilation" }
tensorflow/compiler/tf2xla/xla_compiler.cc:783] Outputs: total: 1 nonconstant: 1
tensorflow/compiler/tf2xla/xla_compiler.cc:792] XLA output shape: (f32[2,2])
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:396] backend_optimization_level: 3
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:764] Compiling ahead-of-time: tfcompile0
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:766] Before optimization:
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:767] HloModule tfcompile0
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:767] 
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:767] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:767]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:767]   %reshape.0.2 = f32[2,3]{1,0} reshape(f32[2,3]{1,0} %arg0.0.0)
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:767]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:767]   %reshape.0.3 = f32[3,2]{1,0} reshape(f32[3,2]{1,0} %arg1.0.1)
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:767]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %reshape.0.2, f32[3,2]{1,0} %reshape.0.3), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:767]   %reshape.0.5 = f32[2,2]{1,0} reshape(f32[2,2]{1,0} %dot.0.4), metadata={op_type="_Retval" op_name="_retval_0"}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:767]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %reshape.0.5), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:767] }
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:767] 
tensorflow/compiler/xla/service/layout_assignment.cc:971] Entry computation layout given to layout assignment: (f32[2,3]{1,0}, f32[3,2]{1,0}) => (f32[2,2]{1,0})
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:64] Running HLO pass pipeline HLO passes through layout assignment
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.2
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.3
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.5
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass cpu_hlo_support_checker
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.2
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.3
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.5
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass CallInliner
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/call_graph.cc:243] Building call graph for:
tensorflow/compiler/xla/service/call_graph.cc:244] HloModule tfcompile0
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:244] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %reshape.0.2 = f32[2,3]{1,0} reshape(f32[2,3]{1,0} %arg0.0.0)
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %reshape.0.3 = f32[3,2]{1,0} reshape(f32[3,2]{1,0} %arg1.0.1)
tensorflow/compiler/xla/service/call_graph.cc:244]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %reshape.0.2, f32[3,2]{1,0} %reshape.0.3), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %reshape.0.5 = f32[2,2]{1,0} reshape(f32[2,2]{1,0} %dot.0.4), metadata={op_type="_Retval" op_name="_retval_0"}
tensorflow/compiler/xla/service/call_graph.cc:244]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %reshape.0.5), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/call_graph.cc:244] }
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:273] Call graph for module tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273] Computation tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273]   calls:
tensorflow/compiler/xla/service/call_graph.cc:273]   called by:
tensorflow/compiler/xla/service/call_graph.cc:273]   callsites:
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.2
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.3
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.5
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass batch-dot-simplification
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.2
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.3
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.5
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass dot_decomposer
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/dot_decomposer.cc:162] DotDecomposer ENTRY
tensorflow/compiler/xla/service/dot_decomposer.cc:162] HloModule tfcompile0
tensorflow/compiler/xla/service/dot_decomposer.cc:162] 
tensorflow/compiler/xla/service/dot_decomposer.cc:162] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/dot_decomposer.cc:162]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/dot_decomposer.cc:162]   %reshape.0.2 = f32[2,3]{1,0} reshape(f32[2,3]{1,0} %arg0.0.0)
tensorflow/compiler/xla/service/dot_decomposer.cc:162]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/dot_decomposer.cc:162]   %reshape.0.3 = f32[3,2]{1,0} reshape(f32[3,2]{1,0} %arg1.0.1)
tensorflow/compiler/xla/service/dot_decomposer.cc:162]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %reshape.0.2, f32[3,2]{1,0} %reshape.0.3), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/dot_decomposer.cc:162]   %reshape.0.5 = f32[2,2]{1,0} reshape(f32[2,2]{1,0} %dot.0.4), metadata={op_type="_Retval" op_name="_retval_0"}
tensorflow/compiler/xla/service/dot_decomposer.cc:162]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %reshape.0.5), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/dot_decomposer.cc:162] }
tensorflow/compiler/xla/service/dot_decomposer.cc:162] 
tensorflow/compiler/xla/service/dot_decomposer.cc:182] DotDecompose EXIT
tensorflow/compiler/xla/service/dot_decomposer.cc:182] HloModule tfcompile0
tensorflow/compiler/xla/service/dot_decomposer.cc:182] 
tensorflow/compiler/xla/service/dot_decomposer.cc:182] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/dot_decomposer.cc:182]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/dot_decomposer.cc:182]   %reshape.0.2 = f32[2,3]{1,0} reshape(f32[2,3]{1,0} %arg0.0.0)
tensorflow/compiler/xla/service/dot_decomposer.cc:182]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/dot_decomposer.cc:182]   %reshape.0.3 = f32[3,2]{1,0} reshape(f32[3,2]{1,0} %arg1.0.1)
tensorflow/compiler/xla/service/dot_decomposer.cc:182]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %reshape.0.2, f32[3,2]{1,0} %reshape.0.3), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/dot_decomposer.cc:182]   %reshape.0.5 = f32[2,2]{1,0} reshape(f32[2,2]{1,0} %dot.0.4), metadata={op_type="_Retval" op_name="_retval_0"}
tensorflow/compiler/xla/service/dot_decomposer.cc:182]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %reshape.0.5), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/dot_decomposer.cc:182] }
tensorflow/compiler/xla/service/dot_decomposer.cc:182] 
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.2
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.3
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.5
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass convolution-feature-group-converter
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:237] ConvolutionFeatureGroupConverter::Run(), before:
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:237] HloModule tfcompile0
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:237] 
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:237] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:237]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:237]   %reshape.0.2 = f32[2,3]{1,0} reshape(f32[2,3]{1,0} %arg0.0.0)
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:237]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:237]   %reshape.0.3 = f32[3,2]{1,0} reshape(f32[3,2]{1,0} %arg1.0.1)
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:237]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %reshape.0.2, f32[3,2]{1,0} %reshape.0.3), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:237]   %reshape.0.5 = f32[2,2]{1,0} reshape(f32[2,2]{1,0} %dot.0.4), metadata={op_type="_Retval" op_name="_retval_0"}
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:237]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %reshape.0.5), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:237] }
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:237] 
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.2
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.3
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.5
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:245] ConvolutionFeatureGroupConverter::Run(), after:
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:245] HloModule tfcompile0
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:245] 
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:245] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:245]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:245]   %reshape.0.2 = f32[2,3]{1,0} reshape(f32[2,3]{1,0} %arg0.0.0)
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:245]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:245]   %reshape.0.3 = f32[3,2]{1,0} reshape(f32[3,2]{1,0} %arg1.0.1)
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:245]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %reshape.0.2, f32[3,2]{1,0} %reshape.0.3), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:245]   %reshape.0.5 = f32[2,2]{1,0} reshape(f32[2,2]{1,0} %dot.0.4), metadata={op_type="_Retval" op_name="_retval_0"}
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:245]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %reshape.0.5), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:245] }
tensorflow/compiler/xla/service/convolution_feature_group_converter.cc:245] 
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.2
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.3
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.5
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass convolution-canonicalization
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.2
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.3
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.5
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass simplification
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:64] Running HLO pass pipeline simplification
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.2
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.3
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.5
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass batchnorm_expander
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/batchnorm_expander.cc:608] BatchNormExpander::Run(), before:
tensorflow/compiler/xla/service/batchnorm_expander.cc:608] HloModule tfcompile0
tensorflow/compiler/xla/service/batchnorm_expander.cc:608] 
tensorflow/compiler/xla/service/batchnorm_expander.cc:608] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/batchnorm_expander.cc:608]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/batchnorm_expander.cc:608]   %reshape.0.2 = f32[2,3]{1,0} reshape(f32[2,3]{1,0} %arg0.0.0)
tensorflow/compiler/xla/service/batchnorm_expander.cc:608]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/batchnorm_expander.cc:608]   %reshape.0.3 = f32[3,2]{1,0} reshape(f32[3,2]{1,0} %arg1.0.1)
tensorflow/compiler/xla/service/batchnorm_expander.cc:608]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %reshape.0.2, f32[3,2]{1,0} %reshape.0.3), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/batchnorm_expander.cc:608]   %reshape.0.5 = f32[2,2]{1,0} reshape(f32[2,2]{1,0} %dot.0.4), metadata={op_type="_Retval" op_name="_retval_0"}
tensorflow/compiler/xla/service/batchnorm_expander.cc:608]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %reshape.0.5), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/batchnorm_expander.cc:608] }
tensorflow/compiler/xla/service/batchnorm_expander.cc:608] 
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.2
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.3
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.5
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/batchnorm_expander.cc:617] BatchNormExpander::Run(), after:
tensorflow/compiler/xla/service/batchnorm_expander.cc:617] HloModule tfcompile0
tensorflow/compiler/xla/service/batchnorm_expander.cc:617] 
tensorflow/compiler/xla/service/batchnorm_expander.cc:617] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/batchnorm_expander.cc:617]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/batchnorm_expander.cc:617]   %reshape.0.2 = f32[2,3]{1,0} reshape(f32[2,3]{1,0} %arg0.0.0)
tensorflow/compiler/xla/service/batchnorm_expander.cc:617]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/batchnorm_expander.cc:617]   %reshape.0.3 = f32[3,2]{1,0} reshape(f32[3,2]{1,0} %arg1.0.1)
tensorflow/compiler/xla/service/batchnorm_expander.cc:617]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %reshape.0.2, f32[3,2]{1,0} %reshape.0.3), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/batchnorm_expander.cc:617]   %reshape.0.5 = f32[2,2]{1,0} reshape(f32[2,2]{1,0} %dot.0.4), metadata={op_type="_Retval" op_name="_retval_0"}
tensorflow/compiler/xla/service/batchnorm_expander.cc:617]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %reshape.0.5), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/batchnorm_expander.cc:617] }
tensorflow/compiler/xla/service/batchnorm_expander.cc:617] 
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.2
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.3
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.5
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass algsimp
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521] AlgebraicSimplifier::Run(), before:
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521] HloModule tfcompile0
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521] 
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521]   %reshape.0.2 = f32[2,3]{1,0} reshape(f32[2,3]{1,0} %arg0.0.0)
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521]   %reshape.0.3 = f32[3,2]{1,0} reshape(f32[3,2]{1,0} %arg1.0.1)
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %reshape.0.2, f32[3,2]{1,0} %reshape.0.3), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521]   %reshape.0.5 = f32[2,2]{1,0} reshape(f32[2,2]{1,0} %dot.0.4), metadata={op_type="_Retval" op_name="_retval_0"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %reshape.0.5), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521] }
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521] 
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.2
tensorflow/compiler/xla/service/hlo_computation.cc:259] Removing instruction reshape.0.2 from computation tfcompile0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.3
tensorflow/compiler/xla/service/hlo_computation.cc:259] Removing instruction reshape.0.3 from computation tfcompile0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %reshape.0.5
tensorflow/compiler/xla/service/hlo_computation.cc:259] Removing instruction reshape.0.5 from computation tfcompile0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531] AlgebraicSimplifier::Run(), after:
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531] HloModule tfcompile0
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531] 
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531] }
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531] 
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass dce
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_dce.cc:40] Before dce:
tensorflow/compiler/xla/service/hlo_dce.cc:41] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dce.cc:41] 
tensorflow/compiler/xla/service/hlo_dce.cc:41] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dce.cc:41] }
tensorflow/compiler/xla/service/hlo_dce.cc:41] 
tensorflow/compiler/xla/service/hlo_dce.cc:88] After dce:
tensorflow/compiler/xla/service/hlo_dce.cc:89] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dce.cc:89] 
tensorflow/compiler/xla/service/hlo_dce.cc:89] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dce.cc:89] }
tensorflow/compiler/xla/service/hlo_dce.cc:89] 
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass zero_sized_hlo_elimination
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass while-loop-invariant-code-motion
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:256] HLO module before WhileLoopConstantSinking:
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:257] HloModule tfcompile0
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:257] 
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:257] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:257]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:257]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:257]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:257]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:257] }
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:257] 
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:291] HLO module unchanged after WhileLoopConstantSinking
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass tuple-simplifier
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass while-loop-invariant-code-motion
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:85] HLO module before WhileLoopConstantSinking:
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:86] HloModule tfcompile0
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:86] 
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:86] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:86]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:86]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:86]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:86]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:86] }
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:86] 
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:130] HLO module unchanged after WhileLoopConstantSinking
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass simplify-while-loops
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass dce
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_dce.cc:40] Before dce:
tensorflow/compiler/xla/service/hlo_dce.cc:41] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dce.cc:41] 
tensorflow/compiler/xla/service/hlo_dce.cc:41] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dce.cc:41] }
tensorflow/compiler/xla/service/hlo_dce.cc:41] 
tensorflow/compiler/xla/service/hlo_dce.cc:88] After dce:
tensorflow/compiler/xla/service/hlo_dce.cc:89] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dce.cc:89] 
tensorflow/compiler/xla/service/hlo_dce.cc:89] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dce.cc:89] }
tensorflow/compiler/xla/service/hlo_dce.cc:89] 
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass reshape-mover
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/reshape_mover.cc:407] Pre ReshapeMover HLO:
tensorflow/compiler/xla/service/reshape_mover.cc:408] HloModule tfcompile0
tensorflow/compiler/xla/service/reshape_mover.cc:408] 
tensorflow/compiler/xla/service/reshape_mover.cc:408] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/reshape_mover.cc:408]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/reshape_mover.cc:408]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/reshape_mover.cc:408]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/reshape_mover.cc:408]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/reshape_mover.cc:408] }
tensorflow/compiler/xla/service/reshape_mover.cc:408] 
tensorflow/compiler/xla/service/reshape_mover.cc:420] Post ReshapeMover HLO:
tensorflow/compiler/xla/service/reshape_mover.cc:421] HloModule tfcompile0
tensorflow/compiler/xla/service/reshape_mover.cc:421] 
tensorflow/compiler/xla/service/reshape_mover.cc:421] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/reshape_mover.cc:421]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/reshape_mover.cc:421]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/reshape_mover.cc:421]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/reshape_mover.cc:421]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/reshape_mover.cc:421] }
tensorflow/compiler/xla/service/reshape_mover.cc:421] 
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass constant_folding
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_constant_folding.cc:45] HloConstantFolding::Run(), before:
tensorflow/compiler/xla/service/hlo_constant_folding.cc:45] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_constant_folding.cc:45] 
tensorflow/compiler/xla/service/hlo_constant_folding.cc:45] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_constant_folding.cc:45]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_constant_folding.cc:45]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_constant_folding.cc:45]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_constant_folding.cc:45]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_constant_folding.cc:45] }
tensorflow/compiler/xla/service/hlo_constant_folding.cc:45] 
tensorflow/compiler/xla/service/hlo_constant_folding.cc:93] HloConstantFolding::Run(), after:
tensorflow/compiler/xla/service/hlo_constant_folding.cc:93] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_constant_folding.cc:93] 
tensorflow/compiler/xla/service/hlo_constant_folding.cc:93] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_constant_folding.cc:93]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_constant_folding.cc:93]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_constant_folding.cc:93]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_constant_folding.cc:93]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_constant_folding.cc:93] }
tensorflow/compiler/xla/service/hlo_constant_folding.cc:93] 
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass simplify-conditional
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:64] Running HLO pass pipeline simplification
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass batchnorm_expander
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/batchnorm_expander.cc:608] BatchNormExpander::Run(), before:
tensorflow/compiler/xla/service/batchnorm_expander.cc:608] HloModule tfcompile0
tensorflow/compiler/xla/service/batchnorm_expander.cc:608] 
tensorflow/compiler/xla/service/batchnorm_expander.cc:608] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/batchnorm_expander.cc:608]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/batchnorm_expander.cc:608]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/batchnorm_expander.cc:608]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/batchnorm_expander.cc:608]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/batchnorm_expander.cc:608] }
tensorflow/compiler/xla/service/batchnorm_expander.cc:608] 
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/batchnorm_expander.cc:617] BatchNormExpander::Run(), after:
tensorflow/compiler/xla/service/batchnorm_expander.cc:617] HloModule tfcompile0
tensorflow/compiler/xla/service/batchnorm_expander.cc:617] 
tensorflow/compiler/xla/service/batchnorm_expander.cc:617] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/batchnorm_expander.cc:617]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/batchnorm_expander.cc:617]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/batchnorm_expander.cc:617]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/batchnorm_expander.cc:617]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/batchnorm_expander.cc:617] }
tensorflow/compiler/xla/service/batchnorm_expander.cc:617] 
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass algsimp
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521] AlgebraicSimplifier::Run(), before:
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521] HloModule tfcompile0
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521] 
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521] }
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521] 
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531] AlgebraicSimplifier::Run(), after:
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531] HloModule tfcompile0
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531] 
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531] }
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531] 
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass dce
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_dce.cc:40] Before dce:
tensorflow/compiler/xla/service/hlo_dce.cc:41] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dce.cc:41] 
tensorflow/compiler/xla/service/hlo_dce.cc:41] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dce.cc:41] }
tensorflow/compiler/xla/service/hlo_dce.cc:41] 
tensorflow/compiler/xla/service/hlo_dce.cc:88] After dce:
tensorflow/compiler/xla/service/hlo_dce.cc:89] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dce.cc:89] 
tensorflow/compiler/xla/service/hlo_dce.cc:89] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dce.cc:89] }
tensorflow/compiler/xla/service/hlo_dce.cc:89] 
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass zero_sized_hlo_elimination
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass while-loop-invariant-code-motion
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:256] HLO module before WhileLoopConstantSinking:
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:257] HloModule tfcompile0
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:257] 
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:257] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:257]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:257]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:257]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:257]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:257] }
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:257] 
tensorflow/compiler/xla/service/while_loop_invariant_code_motion.cc:291] HLO module unchanged after WhileLoopConstantSinking
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass tuple-simplifier
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass while-loop-invariant-code-motion
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:85] HLO module before WhileLoopConstantSinking:
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:86] HloModule tfcompile0
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:86] 
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:86] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:86]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:86]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:86]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:86]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:86] }
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:86] 
tensorflow/compiler/xla/service/while_loop_constant_sinking.cc:130] HLO module unchanged after WhileLoopConstantSinking
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass simplify-while-loops
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass dce
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_dce.cc:40] Before dce:
tensorflow/compiler/xla/service/hlo_dce.cc:41] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dce.cc:41] 
tensorflow/compiler/xla/service/hlo_dce.cc:41] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dce.cc:41] }
tensorflow/compiler/xla/service/hlo_dce.cc:41] 
tensorflow/compiler/xla/service/hlo_dce.cc:88] After dce:
tensorflow/compiler/xla/service/hlo_dce.cc:89] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dce.cc:89] 
tensorflow/compiler/xla/service/hlo_dce.cc:89] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dce.cc:89] }
tensorflow/compiler/xla/service/hlo_dce.cc:89] 
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass reshape-mover
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/reshape_mover.cc:407] Pre ReshapeMover HLO:
tensorflow/compiler/xla/service/reshape_mover.cc:408] HloModule tfcompile0
tensorflow/compiler/xla/service/reshape_mover.cc:408] 
tensorflow/compiler/xla/service/reshape_mover.cc:408] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/reshape_mover.cc:408]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/reshape_mover.cc:408]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/reshape_mover.cc:408]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/reshape_mover.cc:408]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/reshape_mover.cc:408] }
tensorflow/compiler/xla/service/reshape_mover.cc:408] 
tensorflow/compiler/xla/service/reshape_mover.cc:420] Post ReshapeMover HLO:
tensorflow/compiler/xla/service/reshape_mover.cc:421] HloModule tfcompile0
tensorflow/compiler/xla/service/reshape_mover.cc:421] 
tensorflow/compiler/xla/service/reshape_mover.cc:421] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/reshape_mover.cc:421]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/reshape_mover.cc:421]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/reshape_mover.cc:421]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/reshape_mover.cc:421]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/reshape_mover.cc:421] }
tensorflow/compiler/xla/service/reshape_mover.cc:421] 
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass constant_folding
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_constant_folding.cc:45] HloConstantFolding::Run(), before:
tensorflow/compiler/xla/service/hlo_constant_folding.cc:45] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_constant_folding.cc:45] 
tensorflow/compiler/xla/service/hlo_constant_folding.cc:45] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_constant_folding.cc:45]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_constant_folding.cc:45]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_constant_folding.cc:45]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_constant_folding.cc:45]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_constant_folding.cc:45] }
tensorflow/compiler/xla/service/hlo_constant_folding.cc:45] 
tensorflow/compiler/xla/service/hlo_constant_folding.cc:93] HloConstantFolding::Run(), after:
tensorflow/compiler/xla/service/hlo_constant_folding.cc:93] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_constant_folding.cc:93] 
tensorflow/compiler/xla/service/hlo_constant_folding.cc:93] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_constant_folding.cc:93]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_constant_folding.cc:93]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_constant_folding.cc:93]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_constant_folding.cc:93]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_constant_folding.cc:93] }
tensorflow/compiler/xla/service/hlo_constant_folding.cc:93] 
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass simplify-conditional
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass indexed-array-analysis-printer-pass
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass transpose-folding
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass cse
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass fusion
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/instruction_fusion.cc:299] Before instruction fusion:
tensorflow/compiler/xla/service/instruction_fusion.cc:300] HloModule tfcompile0
tensorflow/compiler/xla/service/instruction_fusion.cc:300] 
tensorflow/compiler/xla/service/instruction_fusion.cc:300] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/instruction_fusion.cc:300]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/instruction_fusion.cc:300]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/instruction_fusion.cc:300]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/instruction_fusion.cc:300]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/instruction_fusion.cc:300] }
tensorflow/compiler/xla/service/instruction_fusion.cc:300] 
tensorflow/compiler/xla/service/cpu/cpu_instruction_fusion.cc:67] Considering for fusion: operand 0 of %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/cpu/cpu_instruction_fusion.cc:81] Producer is not fusible.
tensorflow/compiler/xla/service/cpu/cpu_instruction_fusion.cc:67] Considering for fusion: operand 0 of %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/cpu/cpu_instruction_fusion.cc:81] Producer is not fusible.
tensorflow/compiler/xla/service/instruction_fusion.cc:459] After instruction fusion:
tensorflow/compiler/xla/service/instruction_fusion.cc:460] HloModule tfcompile0
tensorflow/compiler/xla/service/instruction_fusion.cc:460] 
tensorflow/compiler/xla/service/instruction_fusion.cc:460] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/instruction_fusion.cc:460]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/instruction_fusion.cc:460]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/instruction_fusion.cc:460]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/instruction_fusion.cc:460]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/instruction_fusion.cc:460] }
tensorflow/compiler/xla/service/instruction_fusion.cc:460] 
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass scatter_expander
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass layout-assignment
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/layout_assignment.cc:1747] Running layout assignment on module tfcompile0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/layout_assignment.cc:1569] LayoutAssignment::RunOnComputation(tfcompile0)
tensorflow/compiler/xla/service/layout_assignment.cc:1579]   New ComputationLayout = (f32[2,3]{1,0}, f32[3,2]{1,0}) => (f32[2,2]{1,0})
tensorflow/compiler/xla/service/layout_assignment.cc:1151] Propagating OperandLayoutConstraint dot.0.4, operand 1: f32[3,2]{1,0} to its neighbors.
tensorflow/compiler/xla/service/layout_assignment.cc:1151] Propagating OperandLayoutConstraint dot.0.4, operand 0: f32[2,3]{1,0} to its neighbors.
tensorflow/compiler/xla/service/layout_assignment.cc:1151] Propagating ResultLayoutConstraint: (f32[2,2]{1,0}) to its neighbors.
tensorflow/compiler/xla/service/layout_assignment.cc:1151] Propagating BufferLayoutConstraint dot.0.4[](#2): {1,0} to its neighbors.
tensorflow/compiler/xla/service/layout_assignment.cc:1151] Propagating BufferLayoutConstraint arg1.0.1[](#1): {1,0} to its neighbors.
tensorflow/compiler/xla/service/layout_assignment.cc:1151] Propagating BufferLayoutConstraint arg0.0.0[](#0): {1,0} to its neighbors.
tensorflow/compiler/xla/service/layout_assignment.cc:1443] Assigning layouts to computation: tfcompile0
tensorflow/compiler/xla/service/layout_assignment.cc:1444] %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/layout_assignment.cc:1444]   %arg0.0.0 = f32[2,3] parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/layout_assignment.cc:1444]   %arg1.0.1 = f32[3,2] parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/layout_assignment.cc:1444]   %dot.0.4 = f32[2,2] dot(f32[2,3] %arg0.0.0, f32[3,2] %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/layout_assignment.cc:1444]   ROOT %tuple.0.6 = (f32[2,2]) tuple(f32[2,2] %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/layout_assignment.cc:1444] }
tensorflow/compiler/xla/service/layout_assignment.cc:1445] LayoutConstraints for computation tfcompile0:
tensorflow/compiler/xla/service/layout_assignment.cc:1445]   %arg0.0.0 = parameter()
tensorflow/compiler/xla/service/layout_assignment.cc:1445]     arg0.0.0[](#0) : {1,0}
tensorflow/compiler/xla/service/layout_assignment.cc:1445]   %arg1.0.1 = parameter()
tensorflow/compiler/xla/service/layout_assignment.cc:1445]     arg1.0.1[](#1) : {1,0}
tensorflow/compiler/xla/service/layout_assignment.cc:1445]   %dot.0.4 = dot(%arg0.0.0, %arg1.0.1)
tensorflow/compiler/xla/service/layout_assignment.cc:1445]     operand (0): f32[2,3]{1,0}
tensorflow/compiler/xla/service/layout_assignment.cc:1445]     operand (1): f32[3,2]{1,0}
tensorflow/compiler/xla/service/layout_assignment.cc:1445]     dot.0.4[](#2) : {1,0}
tensorflow/compiler/xla/service/layout_assignment.cc:1445]   %tuple.0.6 = tuple(%dot.0.4)
tensorflow/compiler/xla/service/layout_assignment.cc:1445]   => (f32[2,2]{1,0})
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/layout_assignment.cc:1569] LayoutAssignment::RunOnComputation(tfcompile0)
tensorflow/compiler/xla/service/layout_assignment.cc:1584]   Existing ComputationLayout = (f32[2,3]{1,0}, f32[3,2]{1,0}) => (f32[2,2]{1,0})
tensorflow/compiler/xla/service/layout_assignment.cc:1151] Propagating OperandLayoutConstraint dot.0.4, operand 1: f32[3,2]{1,0} to its neighbors.
tensorflow/compiler/xla/service/layout_assignment.cc:1151] Propagating OperandLayoutConstraint dot.0.4, operand 0: f32[2,3]{1,0} to its neighbors.
tensorflow/compiler/xla/service/layout_assignment.cc:1151] Propagating ResultLayoutConstraint: (f32[2,2]{1,0}) to its neighbors.
tensorflow/compiler/xla/service/layout_assignment.cc:1151] Propagating BufferLayoutConstraint dot.0.4[](#2): {1,0} to its neighbors.
tensorflow/compiler/xla/service/layout_assignment.cc:1151] Propagating BufferLayoutConstraint arg1.0.1[](#1): {1,0} to its neighbors.
tensorflow/compiler/xla/service/layout_assignment.cc:1151] Propagating BufferLayoutConstraint arg0.0.0[](#0): {1,0} to its neighbors.
tensorflow/compiler/xla/service/layout_assignment.cc:1443] Assigning layouts to computation: tfcompile0
tensorflow/compiler/xla/service/layout_assignment.cc:1444] %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/layout_assignment.cc:1444]   %arg0.0.0 = f32[2,3] parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/layout_assignment.cc:1444]   %arg1.0.1 = f32[3,2] parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/layout_assignment.cc:1444]   %dot.0.4 = f32[2,2] dot(f32[2,3] %arg0.0.0, f32[3,2] %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/layout_assignment.cc:1444]   ROOT %tuple.0.6 = (f32[2,2]) tuple(f32[2,2] %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/layout_assignment.cc:1444] }
tensorflow/compiler/xla/service/layout_assignment.cc:1445] LayoutConstraints for computation tfcompile0:
tensorflow/compiler/xla/service/layout_assignment.cc:1445]   %arg0.0.0 = parameter()
tensorflow/compiler/xla/service/layout_assignment.cc:1445]     arg0.0.0[](#0) : {1,0}
tensorflow/compiler/xla/service/layout_assignment.cc:1445]   %arg1.0.1 = parameter()
tensorflow/compiler/xla/service/layout_assignment.cc:1445]     arg1.0.1[](#1) : {1,0}
tensorflow/compiler/xla/service/layout_assignment.cc:1445]   %dot.0.4 = dot(%arg0.0.0, %arg1.0.1)
tensorflow/compiler/xla/service/layout_assignment.cc:1445]     operand (0): f32[2,3]{1,0}
tensorflow/compiler/xla/service/layout_assignment.cc:1445]     operand (1): f32[3,2]{1,0}
tensorflow/compiler/xla/service/layout_assignment.cc:1445]     dot.0.4[](#2) : {1,0}
tensorflow/compiler/xla/service/layout_assignment.cc:1445]   %tuple.0.6 = tuple(%dot.0.4)
tensorflow/compiler/xla/service/layout_assignment.cc:1445]   => (f32[2,2]{1,0})
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:64] Running HLO pass pipeline HLO passes after layout assignment
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass after layout assignment
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:64] Running HLO pass pipeline after layout assignment
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass simplification after layout assignement
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:64] Running HLO pass pipeline simplification after layout assignement
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass algsimp
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521] AlgebraicSimplifier::Run(), before:
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521] HloModule tfcompile0
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521] 
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521] }
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2521] 
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531] AlgebraicSimplifier::Run(), after:
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531] HloModule tfcompile0
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531] 
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531] }
tensorflow/compiler/xla/service/algebraic_simplifier.cc:2531] 
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass dce
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_dce.cc:40] Before dce:
tensorflow/compiler/xla/service/hlo_dce.cc:41] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dce.cc:41] 
tensorflow/compiler/xla/service/hlo_dce.cc:41] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dce.cc:41] }
tensorflow/compiler/xla/service/hlo_dce.cc:41] 
tensorflow/compiler/xla/service/hlo_dce.cc:88] After dce:
tensorflow/compiler/xla/service/hlo_dce.cc:89] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dce.cc:89] 
tensorflow/compiler/xla/service/hlo_dce.cc:89] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dce.cc:89] }
tensorflow/compiler/xla/service/hlo_dce.cc:89] 
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass cse
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:78]     Invariant checker verifier
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/shape_inference.cc:714] inferred dot shape: f32[2,2]
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:80]     Invariant checker done verifier
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass element_type_converter
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_element_type_converter.cc:223] HloElementTypeConverter::Run(), after:
tensorflow/compiler/xla/service/hlo_element_type_converter.cc:223] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_element_type_converter.cc:223] 
tensorflow/compiler/xla/service/hlo_element_type_converter.cc:223] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_element_type_converter.cc:223]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_element_type_converter.cc:223]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_element_type_converter.cc:223]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_element_type_converter.cc:223]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_element_type_converter.cc:223] }
tensorflow/compiler/xla/service/hlo_element_type_converter.cc:223] 
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass dce
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_dce.cc:40] Before dce:
tensorflow/compiler/xla/service/hlo_dce.cc:41] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dce.cc:41] 
tensorflow/compiler/xla/service/hlo_dce.cc:41] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dce.cc:41] }
tensorflow/compiler/xla/service/hlo_dce.cc:41] 
tensorflow/compiler/xla/service/hlo_dce.cc:88] After dce:
tensorflow/compiler/xla/service/hlo_dce.cc:89] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dce.cc:89] 
tensorflow/compiler/xla/service/hlo_dce.cc:89] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dce.cc:89] }
tensorflow/compiler/xla/service/hlo_dce.cc:89] 
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass flatten-call-graph
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/call_graph.cc:243] Building call graph for:
tensorflow/compiler/xla/service/call_graph.cc:244] HloModule tfcompile0
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:244] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/call_graph.cc:244]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/call_graph.cc:244] }
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:273] Call graph for module tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273] Computation tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273]   calls:
tensorflow/compiler/xla/service/call_graph.cc:273]   called by:
tensorflow/compiler/xla/service/call_graph.cc:273]   callsites:
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass copy-insertion
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/call_graph.cc:243] Building call graph for:
tensorflow/compiler/xla/service/call_graph.cc:244] HloModule tfcompile0
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:244] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/call_graph.cc:244]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/call_graph.cc:244] }
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:273] Call graph for module tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273] Computation tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273]   calls:
tensorflow/compiler/xla/service/call_graph.cc:273]   called by:
tensorflow/compiler/xla/service/call_graph.cc:273]   callsites:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:455] HloAliasAnalysis::Run on module tfcompile0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456] 
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456] }
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456] 
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:885] HloDataflowAnalysis::Run on module tfcompile0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886] 
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886] }
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886] 
tensorflow/compiler/xla/service/call_graph.cc:243] Building call graph for:
tensorflow/compiler/xla/service/call_graph.cc:244] HloModule tfcompile0
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:244] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/call_graph.cc:244]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/call_graph.cc:244] }
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:273] Call graph for module tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273] Computation tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273]   calls:
tensorflow/compiler/xla/service/call_graph.cc:273]   called by:
tensorflow/compiler/xla/service/call_graph.cc:273]   callsites:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934] HloDataflowAnalysis, module tfcompile0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]   Instruction value sets:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     arg0.0.0:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       0 arg0.0.0 (def)
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     arg1.0.1:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       1 arg1.0.1 (def)
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     dot.0.4:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       2 dot.0.4 (def)
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     tuple.0.6:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       tuple index {}:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]         3 tuple.0.6{} (def)
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       tuple index {0}:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]         2 dot.0.4
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]   HloValues:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     0 arg0.0.0, positions:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       arg0.0.0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]      uses:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       dot.0.4, operand 0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     1 arg1.0.1, positions:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       arg1.0.1
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]      uses:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       dot.0.4, operand 1
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     2 dot.0.4, positions:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       dot.0.4
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       tuple.0.6 {0}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]      uses:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       tuple.0.6, operand 0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     3 tuple.0.6{}, positions:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       tuple.0.6 {}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]      uses:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:276] Use of value 0 arg0.0.0: dot.0.4, operand 0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:276] Use of value 1 arg1.0.1: dot.0.4, operand 1
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:276] Use of value 2 dot.0.4: tuple.0.6, operand 0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488] HloAliasAnalysis, module tfcompile0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]   Buffers at each position:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     arg0.0.0:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       HloBuffer 0, values: 0 arg0.0.0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     arg1.0.1:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       HloBuffer 1, values: 1 arg1.0.1
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     dot.0.4:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       HloBuffer 2, values: 2 dot.0.4
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     tuple.0.6:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       tuple index {}:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         HloBuffer 3, values: 3 tuple.0.6{}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       tuple index {0}:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         HloBuffer 2, values: 2 dot.0.4
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]   Buffers:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     HloBuffer 0, values: 0 arg0.0.0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       positions:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         arg0.0.0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     HloBuffer 1, values: 1 arg1.0.1
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       positions:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         arg1.0.1
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     HloBuffer 2, values: 2 dot.0.4
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       positions:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         dot.0.4
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         tuple.0.6 {0}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     HloBuffer 3, values: 3 tuple.0.6{}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       positions:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         tuple.0.6 {}
tensorflow/compiler/xla/service/hlo_dce.cc:40] Before dce:
tensorflow/compiler/xla/service/hlo_dce.cc:41] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dce.cc:41] 
tensorflow/compiler/xla/service/hlo_dce.cc:41] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dce.cc:41] }
tensorflow/compiler/xla/service/hlo_dce.cc:41] 
tensorflow/compiler/xla/service/hlo_dce.cc:88] After dce:
tensorflow/compiler/xla/service/hlo_dce.cc:89] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dce.cc:89] 
tensorflow/compiler/xla/service/hlo_dce.cc:89] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dce.cc:89] }
tensorflow/compiler/xla/service/hlo_dce.cc:89] 
tensorflow/compiler/xla/service/call_graph.cc:243] Building call graph for:
tensorflow/compiler/xla/service/call_graph.cc:244] HloModule tfcompile0
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:244] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/call_graph.cc:244]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/call_graph.cc:244] }
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:273] Call graph for module tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273] Computation tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273]   calls:
tensorflow/compiler/xla/service/call_graph.cc:273]   called by:
tensorflow/compiler/xla/service/call_graph.cc:273]   callsites:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:455] HloAliasAnalysis::Run on module tfcompile0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456] 
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456] }
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456] 
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:885] HloDataflowAnalysis::Run on module tfcompile0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886] 
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886] }
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886] 
tensorflow/compiler/xla/service/call_graph.cc:243] Building call graph for:
tensorflow/compiler/xla/service/call_graph.cc:244] HloModule tfcompile0
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:244] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/call_graph.cc:244]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/call_graph.cc:244] }
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:273] Call graph for module tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273] Computation tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273]   calls:
tensorflow/compiler/xla/service/call_graph.cc:273]   called by:
tensorflow/compiler/xla/service/call_graph.cc:273]   callsites:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934] HloDataflowAnalysis, module tfcompile0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]   Instruction value sets:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     arg0.0.0:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       0 arg0.0.0 (def)
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     arg1.0.1:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       1 arg1.0.1 (def)
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     dot.0.4:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       2 dot.0.4 (def)
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     tuple.0.6:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       tuple index {}:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]         3 tuple.0.6{} (def)
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       tuple index {0}:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]         2 dot.0.4
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]   HloValues:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     0 arg0.0.0, positions:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       arg0.0.0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]      uses:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       dot.0.4, operand 0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     1 arg1.0.1, positions:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       arg1.0.1
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]      uses:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       dot.0.4, operand 1
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     2 dot.0.4, positions:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       dot.0.4
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       tuple.0.6 {0}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]      uses:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       tuple.0.6, operand 0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     3 tuple.0.6{}, positions:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       tuple.0.6 {}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]      uses:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:276] Use of value 0 arg0.0.0: dot.0.4, operand 0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:276] Use of value 1 arg1.0.1: dot.0.4, operand 1
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:276] Use of value 2 dot.0.4: tuple.0.6, operand 0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488] HloAliasAnalysis, module tfcompile0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]   Buffers at each position:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     arg0.0.0:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       HloBuffer 0, values: 0 arg0.0.0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     arg1.0.1:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       HloBuffer 1, values: 1 arg1.0.1
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     dot.0.4:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       HloBuffer 2, values: 2 dot.0.4
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     tuple.0.6:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       tuple index {}:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         HloBuffer 3, values: 3 tuple.0.6{}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       tuple index {0}:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         HloBuffer 2, values: 2 dot.0.4
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]   Buffers:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     HloBuffer 0, values: 0 arg0.0.0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       positions:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         arg0.0.0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     HloBuffer 1, values: 1 arg1.0.1
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       positions:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         arg1.0.1
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     HloBuffer 2, values: 2 dot.0.4
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       positions:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         dot.0.4
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         tuple.0.6 {0}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     HloBuffer 3, values: 3 tuple.0.6{}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       positions:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         tuple.0.6 {}
tensorflow/compiler/xla/service/call_graph.cc:243] Building call graph for:
tensorflow/compiler/xla/service/call_graph.cc:244] HloModule tfcompile0
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:244] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/call_graph.cc:244]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/call_graph.cc:244] }
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:273] Call graph for module tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273] Computation tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273]   calls:
tensorflow/compiler/xla/service/call_graph.cc:273]   called by:
tensorflow/compiler/xla/service/call_graph.cc:273]   callsites:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:455] HloAliasAnalysis::Run on module tfcompile0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456] 
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456] }
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:456] 
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:885] HloDataflowAnalysis::Run on module tfcompile0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886] 
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886] }
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886] 
tensorflow/compiler/xla/service/call_graph.cc:243] Building call graph for:
tensorflow/compiler/xla/service/call_graph.cc:244] HloModule tfcompile0
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:244] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/call_graph.cc:244]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/call_graph.cc:244] }
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:273] Call graph for module tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273] Computation tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273]   calls:
tensorflow/compiler/xla/service/call_graph.cc:273]   called by:
tensorflow/compiler/xla/service/call_graph.cc:273]   callsites:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934] HloDataflowAnalysis, module tfcompile0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]   Instruction value sets:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     arg0.0.0:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       0 arg0.0.0 (def)
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     arg1.0.1:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       1 arg1.0.1 (def)
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     dot.0.4:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       2 dot.0.4 (def)
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     tuple.0.6:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       tuple index {}:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]         3 tuple.0.6{} (def)
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       tuple index {0}:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]         2 dot.0.4
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]   HloValues:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     0 arg0.0.0, positions:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       arg0.0.0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]      uses:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       dot.0.4, operand 0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     1 arg1.0.1, positions:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       arg1.0.1
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]      uses:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       dot.0.4, operand 1
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     2 dot.0.4, positions:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       dot.0.4
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       tuple.0.6 {0}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]      uses:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       tuple.0.6, operand 0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     3 tuple.0.6{}, positions:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       tuple.0.6 {}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]      uses:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:276] Use of value 0 arg0.0.0: dot.0.4, operand 0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:276] Use of value 1 arg1.0.1: dot.0.4, operand 1
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:276] Use of value 2 dot.0.4: tuple.0.6, operand 0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488] HloAliasAnalysis, module tfcompile0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]   Buffers at each position:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     arg0.0.0:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       HloBuffer 0, values: 0 arg0.0.0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     arg1.0.1:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       HloBuffer 1, values: 1 arg1.0.1
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     dot.0.4:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       HloBuffer 2, values: 2 dot.0.4
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     tuple.0.6:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       tuple index {}:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         HloBuffer 3, values: 3 tuple.0.6{}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       tuple index {0}:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         HloBuffer 2, values: 2 dot.0.4
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]   Buffers:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     HloBuffer 0, values: 0 arg0.0.0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       positions:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         arg0.0.0
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     HloBuffer 1, values: 1 arg1.0.1
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       positions:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         arg1.0.1
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     HloBuffer 2, values: 2 dot.0.4
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       positions:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         dot.0.4
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         tuple.0.6 {0}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]     HloBuffer 3, values: 3 tuple.0.6{}
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]       positions:
tensorflow/compiler/xla/service/hlo_alias_analysis.cc:488]         tuple.0.6 {}
tensorflow/compiler/xla/service/hlo_dce.cc:40] Before dce:
tensorflow/compiler/xla/service/hlo_dce.cc:41] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dce.cc:41] 
tensorflow/compiler/xla/service/hlo_dce.cc:41] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dce.cc:41] }
tensorflow/compiler/xla/service/hlo_dce.cc:41] 
tensorflow/compiler/xla/service/hlo_dce.cc:88] After dce:
tensorflow/compiler/xla/service/hlo_dce.cc:89] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dce.cc:89] 
tensorflow/compiler/xla/service/hlo_dce.cc:89] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dce.cc:89] }
tensorflow/compiler/xla/service/hlo_dce.cc:89] 
tensorflow/compiler/xla/service/copy_insertion.cc:1178] Num copies before copy-insertion: 0
tensorflow/compiler/xla/service/copy_insertion.cc:1179] Num copies after copy-insertion: 0
tensorflow/compiler/xla/service/call_graph.cc:243] Building call graph for:
tensorflow/compiler/xla/service/call_graph.cc:244] HloModule tfcompile0
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:244] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/call_graph.cc:244]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/call_graph.cc:244] }
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:273] Call graph for module tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273] Computation tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273]   calls:
tensorflow/compiler/xla/service/call_graph.cc:273]   called by:
tensorflow/compiler/xla/service/call_graph.cc:273]   callsites:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:885] HloDataflowAnalysis::Run on module tfcompile0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886] 
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886] }
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:886] 
tensorflow/compiler/xla/service/call_graph.cc:243] Building call graph for:
tensorflow/compiler/xla/service/call_graph.cc:244] HloModule tfcompile0
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:244] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/call_graph.cc:244]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/call_graph.cc:244] }
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:273] Call graph for module tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273] Computation tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273]   calls:
tensorflow/compiler/xla/service/call_graph.cc:273]   called by:
tensorflow/compiler/xla/service/call_graph.cc:273]   callsites:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934] HloDataflowAnalysis, module tfcompile0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]   Instruction value sets:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     arg0.0.0:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       0 arg0.0.0 (def)
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     arg1.0.1:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       1 arg1.0.1 (def)
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     dot.0.4:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       2 dot.0.4 (def)
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     tuple.0.6:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       tuple index {}:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]         3 tuple.0.6{} (def)
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       tuple index {0}:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]         2 dot.0.4
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]   HloValues:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     0 arg0.0.0, positions:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       arg0.0.0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]      uses:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       dot.0.4, operand 0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     1 arg1.0.1, positions:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       arg1.0.1
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]      uses:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       dot.0.4, operand 1
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     2 dot.0.4, positions:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       dot.0.4
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       tuple.0.6 {0}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]      uses:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       tuple.0.6, operand 0
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]     3 tuple.0.6{}, positions:
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]       tuple.0.6 {}
tensorflow/compiler/xla/service/hlo_dataflow_analysis.cc:934]      uses:
tensorflow/compiler/xla/service/hlo_dce.cc:40] Before dce:
tensorflow/compiler/xla/service/hlo_dce.cc:41] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dce.cc:41] 
tensorflow/compiler/xla/service/hlo_dce.cc:41] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dce.cc:41] }
tensorflow/compiler/xla/service/hlo_dce.cc:41] 
tensorflow/compiler/xla/service/hlo_dce.cc:88] After dce:
tensorflow/compiler/xla/service/hlo_dce.cc:89] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dce.cc:89] 
tensorflow/compiler/xla/service/hlo_dce.cc:89] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dce.cc:89] }
tensorflow/compiler/xla/service/hlo_dce.cc:89] 
tensorflow/compiler/xla/service/hlo_pass_pipeline.cc:113]   HLO pass dce
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/hlo_dce.cc:40] Before dce:
tensorflow/compiler/xla/service/hlo_dce.cc:41] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dce.cc:41] 
tensorflow/compiler/xla/service/hlo_dce.cc:41] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dce.cc:41]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dce.cc:41] }
tensorflow/compiler/xla/service/hlo_dce.cc:41] 
tensorflow/compiler/xla/service/hlo_dce.cc:88] After dce:
tensorflow/compiler/xla/service/hlo_dce.cc:89] HloModule tfcompile0
tensorflow/compiler/xla/service/hlo_dce.cc:89] 
tensorflow/compiler/xla/service/hlo_dce.cc:89] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/hlo_dce.cc:89]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/hlo_dce.cc:89] }
tensorflow/compiler/xla/service/hlo_dce.cc:89] 
tensorflow/compiler/xla/service/hlo_graph_dumper.cc:1510] MaybeDumpHloModule called on module tfcompile0 with generate_hlo_graph regex ""
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:772] After optimization:
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:773] HloModule tfcompile0
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:773] 
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:773] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:773]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:773]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:773]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:773]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:773] }
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:773] 
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_scheduling.cc:384] Computation: tfcompile0
tensorflow/compiler/xla/service/hlo_scheduling.cc:278] Schedule instruction: %arg1.0.1 = parameter() Bytes freed: 0
tensorflow/compiler/xla/service/hlo_scheduling.cc:278] Schedule instruction: %arg0.0.0 = parameter() Bytes freed: 0
tensorflow/compiler/xla/service/hlo_scheduling.cc:278] Schedule instruction: %dot.0.4 = dot(%arg0.0.0, %arg1.0.1) Bytes freed: -16
tensorflow/compiler/xla/service/hlo_scheduling.cc:278] Schedule instruction: %tuple.0.6 = tuple(%dot.0.4) Bytes freed: -8
tensorflow/compiler/xla/service/hlo_scheduling.cc:509] Min-memory list sequence: 72B
tensorflow/compiler/xla/service/hlo_instruction.cc:2550] HloInstruction::AcceptWithOperandOrder(%tuple.0.6)
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_instruction.cc:2565] HloInstruction::AcceptWithOperandOrder EXIT
tensorflow/compiler/xla/service/hlo_scheduling.cc:518] Min-memory dfs sequence: 72B
tensorflow/compiler/xla/service/hlo_scheduling.cc:528] Min-memory post order sequence: 72B
tensorflow/compiler/xla/service/hlo_scheduling.cc:534] Chose min-memory list sequence: 72B
tensorflow/compiler/xla/service/hlo_scheduling.cc:569] Module schedule:
Computation tfcompile0:
  arg1.0.1
  arg0.0.0
  dot.0.4
  tuple.0.6

tensorflow/compiler/xla/service/call_graph.cc:243] Building call graph for:
tensorflow/compiler/xla/service/call_graph.cc:244] HloModule tfcompile0
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:244] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/call_graph.cc:244]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/call_graph.cc:244]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/call_graph.cc:244] }
tensorflow/compiler/xla/service/call_graph.cc:244] 
tensorflow/compiler/xla/service/call_graph.cc:273] Call graph for module tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273] Computation tfcompile0:
tensorflow/compiler/xla/service/call_graph.cc:273]   calls:
tensorflow/compiler/xla/service/call_graph.cc:273]   called by:
tensorflow/compiler/xla/service/call_graph.cc:273]   callsites:
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/buffer_assignment.cc:1633] Assigning buffers to module tfcompile0
tensorflow/compiler/xla/service/buffer_assignment.cc:1634] HloModule tfcompile0
tensorflow/compiler/xla/service/buffer_assignment.cc:1634] 
tensorflow/compiler/xla/service/buffer_assignment.cc:1634] ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
tensorflow/compiler/xla/service/buffer_assignment.cc:1634]   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/buffer_assignment.cc:1634]   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/buffer_assignment.cc:1634]   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %arg0.0.0, f32[3,2]{1,0} %arg1.0.1), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
tensorflow/compiler/xla/service/buffer_assignment.cc:1634]   ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/buffer_assignment.cc:1634] }
tensorflow/compiler/xla/service/buffer_assignment.cc:1634] 
tensorflow/compiler/xla/service/buffer_assignment.cc:1066] Running whole-module heap simulation
tensorflow/compiler/xla/service/buffer_assignment.cc:522] CombineTempAllocations()
tensorflow/compiler/xla/service/buffer_assignment.cc:1716] BufferAssignment:
tensorflow/compiler/xla/service/buffer_assignment.cc:1716] allocation 0: 0x6050900, size 24, parameter 0 at ShapeIndex {}:
tensorflow/compiler/xla/service/buffer_assignment.cc:1716]   arg0.0.0[](#0 @0) [0,24]: f32[2,3]{1,0}
tensorflow/compiler/xla/service/buffer_assignment.cc:1716] allocation 1: 0x60509c0, size 24, parameter 1 at ShapeIndex {}:
tensorflow/compiler/xla/service/buffer_assignment.cc:1716]   arg1.0.1[](#1 @0) [0,24]: f32[3,2]{1,0}
tensorflow/compiler/xla/service/buffer_assignment.cc:1716] allocation 2: 0x6050a80, size 16, maybe-live-out:
tensorflow/compiler/xla/service/buffer_assignment.cc:1716]   dot.0.4[](#2 @0) [0,16]: f32[2,2]{1,0}
tensorflow/compiler/xla/service/buffer_assignment.cc:1716] allocation 3: 0x6050b40, size 8, maybe-live-out:
tensorflow/compiler/xla/service/buffer_assignment.cc:1716]   tuple.0.6[](#3 @0) [0,8]: (f32[2,2]{1,0})
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg0.0.0
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %arg1.0.1
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %dot.0.4
tensorflow/compiler/xla/service/hlo_instruction.cc:2489] Visiting HLO %tuple.0.6
tensorflow/compiler/xla/service/buffer_assignment.cc:1718] BufferAssignment stats:
tensorflow/compiler/xla/service/buffer_assignment.cc:1718]              parameter allocation:        48B
tensorflow/compiler/xla/service/buffer_assignment.cc:1718]               constant allocation:         0B
tensorflow/compiler/xla/service/buffer_assignment.cc:1718]         maybe_live_out allocation:        24B
tensorflow/compiler/xla/service/buffer_assignment.cc:1718]      preallocated temp allocation:         0B
tensorflow/compiler/xla/service/buffer_assignment.cc:1718]                  total allocation:        72B
tensorflow/compiler/xla/service/buffer_assignment.cc:1718]               total fragmentation:         0B (0.00%)
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:791] BufferAssignment:
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:791] allocation 0: 0x6050900, size 24, parameter 0 at ShapeIndex {}:
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:791]   arg0.0.0[](#0 @0) [0,24]: f32[2,3]{1,0}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:791] allocation 1: 0x60509c0, size 24, parameter 1 at ShapeIndex {}:
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:791]   arg1.0.1[](#1 @0) [0,24]: f32[3,2]{1,0}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:791] allocation 2: 0x6050a80, size 16, maybe-live-out:
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:791]   dot.0.4[](#2 @0) [0,16]: f32[2,2]{1,0}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:791] allocation 3: 0x6050b40, size 8, maybe-live-out:
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:791]   tuple.0.6[](#3 @0) [0,8]: (f32[2,2]{1,0})
tensorflow/compiler/xla/service/cpu/ir_emitter.cc:115] Emitting IR for CPU function [entry]; ordered? 1
tensorflow/compiler/xla/service/cpu/ir_emitter.cc:1246] HandleParameter: %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/llvm_ir/llvm_util.cc:128] EmitBufferIndexingGEP with type=i8** array=i8** %temps index=i64 1
tensorflow/compiler/xla/service/cpu/ir_emitter.cc:1246] HandleParameter: %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
tensorflow/compiler/xla/service/llvm_ir/llvm_util.cc:128] EmitBufferIndexingGEP with type=i8** array=i8** %temps index=i64 0
tensorflow/compiler/xla/service/llvm_ir/llvm_util.cc:128] EmitBufferIndexingGEP with type=i8** array=i8** %temps index=i64 2
tensorflow/compiler/xla/service/cpu/ir_emitter.cc:805] HandleDot: 
tensorflow/compiler/xla/service/cpu/ir_emitter.cc:806]   lhs operand:   %arg0.0.0 = bitcast i8* %3 to [2 x [3 x float]]*
tensorflow/compiler/xla/service/cpu/ir_emitter.cc:808]   rhs operand:   %arg1.0.1 = bitcast i8* %1 to [3 x [2 x float]]*
tensorflow/compiler/xla/service/cpu/ir_emitter.cc:810]   target:   %dot.0.4 = bitcast i8* %5 to [2 x [2 x float]]*
tensorflow/compiler/xla/service/llvm_ir/llvm_util.cc:128] EmitBufferIndexingGEP with type=i8** array=i8** %temps index=i64 3
tensorflow/compiler/xla/service/cpu/ir_emitter.cc:2474] FinishVisit root: %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %dot.0.4), metadata={op_name="XLA_Retvals"}
tensorflow/compiler/xla/service/cpu/ir_emitter.cc:2479]   value:   %tuple.0.6 = bitcast i8* %19 to [1 x i8*]*
tensorflow/compiler/xla/util.cc:53] CpuCompiler - Running LLVM verifier time: 1.04 ms
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] LLVM IR:
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] ; ModuleID = '__compute_module'
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] source_filename = "__compute_module"
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] target triple = "x86_64-pc-linux"
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] 
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] define void @entry(i8* %retval, i8* noalias %run_options, i8** noalias %params, i8** noalias %temps, i64* noalias %prof_counters) #0 {
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] entry:
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %accum_address = alloca float
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %dot.0.4.invar_address.reduction = alloca i64
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %dot.0.4.invar_address.rhs.1 = alloca i64
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %dot.0.4.invar_address.lhs.0 = alloca i64
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %0 = getelementptr inbounds i8*, i8** %temps, i64 1
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %1 = load i8*, i8** %0, !invariant.load !1, !dereferenceable !2, !align !3
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %arg1.0.1 = bitcast i8* %1 to [3 x [2 x float]]*
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %2 = getelementptr inbounds i8*, i8** %temps, i64 0
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %3 = load i8*, i8** %2, !invariant.load !1, !dereferenceable !2, !align !3
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %arg0.0.0 = bitcast i8* %3 to [2 x [3 x float]]*
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %4 = getelementptr inbounds i8*, i8** %temps, i64 2
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %5 = load i8*, i8** %4, !invariant.load !1, !dereferenceable !4, !align !3
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %dot.0.4 = bitcast i8* %5 to [2 x [2 x float]]*
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   store i64 0, i64* %dot.0.4.invar_address.lhs.0
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   br label %dot.0.4.loop_header.lhs.0
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] 
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] dot.0.4.loop_header.lhs.0:                        ; preds = %dot.0.4.loop_exit.rhs.1, %entry
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %dot.0.4.indvar.lhs.0 = load i64, i64* %dot.0.4.invar_address.lhs.0
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %6 = icmp uge i64 %dot.0.4.indvar.lhs.0, 2
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   br i1 %6, label %dot.0.4.loop_exit.lhs.0, label %dot.0.4.loop_body.lhs.0
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] 
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] dot.0.4.loop_body.lhs.0:                          ; preds = %dot.0.4.loop_header.lhs.0
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   store i64 0, i64* %dot.0.4.invar_address.rhs.1
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   br label %dot.0.4.loop_header.rhs.1
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] 
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] dot.0.4.loop_header.rhs.1:                        ; preds = %dot.0.4.loop_exit.reduction, %dot.0.4.loop_body.lhs.0
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %dot.0.4.indvar.rhs.1 = load i64, i64* %dot.0.4.invar_address.rhs.1
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %7 = icmp uge i64 %dot.0.4.indvar.rhs.1, 2
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   br i1 %7, label %dot.0.4.loop_exit.rhs.1, label %dot.0.4.loop_body.rhs.1
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] 
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] dot.0.4.loop_body.rhs.1:                          ; preds = %dot.0.4.loop_header.rhs.1
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   store i64 0, i64* %dot.0.4.invar_address.reduction
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   store float 0.000000e+00, float* %accum_address
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   br label %dot.0.4.loop_header.reduction
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] 
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] dot.0.4.loop_header.reduction:                    ; preds = %dot.0.4.loop_body.reduction, %dot.0.4.loop_body.rhs.1
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %dot.0.4.indvar.reduction = load i64, i64* %dot.0.4.invar_address.reduction
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %8 = icmp uge i64 %dot.0.4.indvar.reduction, 3
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   br i1 %8, label %dot.0.4.loop_exit.reduction, label %dot.0.4.loop_body.reduction
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] 
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] dot.0.4.loop_body.reduction:                      ; preds = %dot.0.4.loop_header.reduction
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %9 = getelementptr inbounds [2 x [3 x float]], [2 x [3 x float]]* %arg0.0.0, i64 0, i64 %dot.0.4.indvar.lhs.0, i64 %dot.0.4.indvar.reduction
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %10 = load float, float* %9, !invariant.load !1, !noalias !5
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %11 = getelementptr inbounds [3 x [2 x float]], [3 x [2 x float]]* %arg1.0.1, i64 0, i64 %dot.0.4.indvar.reduction, i64 %dot.0.4.indvar.rhs.1
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %12 = load float, float* %11, !invariant.load !1, !noalias !5
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %13 = load float, float* %accum_address
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %14 = fmul fast float %10, %12
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %15 = fadd fast float %13, %14
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   store float %15, float* %accum_address
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %invar.inc2 = add nuw nsw i64 %dot.0.4.indvar.reduction, 1
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   store i64 %invar.inc2, i64* %dot.0.4.invar_address.reduction
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   br label %dot.0.4.loop_header.reduction
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] 
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] dot.0.4.loop_exit.reduction:                      ; preds = %dot.0.4.loop_header.reduction
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %16 = load float, float* %accum_address
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %17 = getelementptr inbounds [2 x [2 x float]], [2 x [2 x float]]* %dot.0.4, i64 0, i64 %dot.0.4.indvar.lhs.0, i64 %dot.0.4.indvar.rhs.1
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   store float %16, float* %17, !alias.scope !5, !noalias !8
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %invar.inc1 = add nuw nsw i64 %dot.0.4.indvar.rhs.1, 1
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   store i64 %invar.inc1, i64* %dot.0.4.invar_address.rhs.1
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   br label %dot.0.4.loop_header.rhs.1
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] 
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] dot.0.4.loop_exit.rhs.1:                          ; preds = %dot.0.4.loop_header.rhs.1
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %invar.inc = add nuw nsw i64 %dot.0.4.indvar.lhs.0, 1
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   store i64 %invar.inc, i64* %dot.0.4.invar_address.lhs.0
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   br label %dot.0.4.loop_header.lhs.0
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] 
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] dot.0.4.loop_exit.lhs.0:                          ; preds = %dot.0.4.loop_header.lhs.0
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %18 = getelementptr inbounds i8*, i8** %temps, i64 3
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %19 = load i8*, i8** %18, !invariant.load !1, !dereferenceable !3, !align !3
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %tuple.0.6 = bitcast i8* %19 to [1 x i8*]*
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %20 = getelementptr inbounds [1 x i8*], [1 x i8*]* %tuple.0.6, i64 0, i64 0
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   %21 = bitcast [2 x [2 x float]]* %dot.0.4 to i8*
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   store i8* %21, i8** %20, !alias.scope !8, !noalias !5
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859]   ret void
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] }
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] 
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] attributes #0 = { "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "unsafe-fp-math"="true" }
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] 
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] !llvm.module.flags = !{!0}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] 
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] !0 = !{i32 7, !"PIC Level", i32 2}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] !1 = !{}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] !2 = !{i64 24}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] !3 = !{i64 8}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] !4 = !{i64 16}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] !5 = !{!6}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] !6 = !{!"buffer: {index:2, offset:0, size:16}", !7}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] !7 = !{!"XLA global AA domain"}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] !8 = !{!9}
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:859] !9 = !{!"buffer: {index:3, offset:0, size:8}", !7}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:102] IR before optimizations
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] ; ModuleID = '__compute_module'
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] source_filename = "__compute_module"
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] target triple = "x86_64-pc-linux"
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] 
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] define void @entry(i8* %retval, i8* noalias %run_options, i8** noalias %params, i8** noalias %temps, i64* noalias %prof_counters) #0 {
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] entry:
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %accum_address = alloca float
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %dot.0.4.invar_address.reduction = alloca i64
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %dot.0.4.invar_address.rhs.1 = alloca i64
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %dot.0.4.invar_address.lhs.0 = alloca i64
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %0 = getelementptr inbounds i8*, i8** %temps, i64 1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %1 = load i8*, i8** %0, !invariant.load !1, !dereferenceable !2, !align !3
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %arg1.0.1 = bitcast i8* %1 to [3 x [2 x float]]*
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %2 = getelementptr inbounds i8*, i8** %temps, i64 0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %3 = load i8*, i8** %2, !invariant.load !1, !dereferenceable !2, !align !3
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %arg0.0.0 = bitcast i8* %3 to [2 x [3 x float]]*
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %4 = getelementptr inbounds i8*, i8** %temps, i64 2
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %5 = load i8*, i8** %4, !invariant.load !1, !dereferenceable !4, !align !3
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %dot.0.4 = bitcast i8* %5 to [2 x [2 x float]]*
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   store i64 0, i64* %dot.0.4.invar_address.lhs.0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   br label %dot.0.4.loop_header.lhs.0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] 
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] dot.0.4.loop_header.lhs.0:                        ; preds = %dot.0.4.loop_exit.rhs.1, %entry
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %dot.0.4.indvar.lhs.0 = load i64, i64* %dot.0.4.invar_address.lhs.0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %6 = icmp uge i64 %dot.0.4.indvar.lhs.0, 2
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   br i1 %6, label %dot.0.4.loop_exit.lhs.0, label %dot.0.4.loop_body.lhs.0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] 
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] dot.0.4.loop_body.lhs.0:                          ; preds = %dot.0.4.loop_header.lhs.0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   store i64 0, i64* %dot.0.4.invar_address.rhs.1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   br label %dot.0.4.loop_header.rhs.1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] 
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] dot.0.4.loop_header.rhs.1:                        ; preds = %dot.0.4.loop_exit.reduction, %dot.0.4.loop_body.lhs.0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %dot.0.4.indvar.rhs.1 = load i64, i64* %dot.0.4.invar_address.rhs.1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %7 = icmp uge i64 %dot.0.4.indvar.rhs.1, 2
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   br i1 %7, label %dot.0.4.loop_exit.rhs.1, label %dot.0.4.loop_body.rhs.1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] 
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] dot.0.4.loop_body.rhs.1:                          ; preds = %dot.0.4.loop_header.rhs.1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   store i64 0, i64* %dot.0.4.invar_address.reduction
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   store float 0.000000e+00, float* %accum_address
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   br label %dot.0.4.loop_header.reduction
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] 
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] dot.0.4.loop_header.reduction:                    ; preds = %dot.0.4.loop_body.reduction, %dot.0.4.loop_body.rhs.1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %dot.0.4.indvar.reduction = load i64, i64* %dot.0.4.invar_address.reduction
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %8 = icmp uge i64 %dot.0.4.indvar.reduction, 3
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   br i1 %8, label %dot.0.4.loop_exit.reduction, label %dot.0.4.loop_body.reduction
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] 
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] dot.0.4.loop_body.reduction:                      ; preds = %dot.0.4.loop_header.reduction
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %9 = getelementptr inbounds [2 x [3 x float]], [2 x [3 x float]]* %arg0.0.0, i64 0, i64 %dot.0.4.indvar.lhs.0, i64 %dot.0.4.indvar.reduction
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %10 = load float, float* %9, !invariant.load !1, !noalias !5
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %11 = getelementptr inbounds [3 x [2 x float]], [3 x [2 x float]]* %arg1.0.1, i64 0, i64 %dot.0.4.indvar.reduction, i64 %dot.0.4.indvar.rhs.1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %12 = load float, float* %11, !invariant.load !1, !noalias !5
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %13 = load float, float* %accum_address
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %14 = fmul fast float %10, %12
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %15 = fadd fast float %13, %14
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   store float %15, float* %accum_address
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %invar.inc2 = add nuw nsw i64 %dot.0.4.indvar.reduction, 1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   store i64 %invar.inc2, i64* %dot.0.4.invar_address.reduction
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   br label %dot.0.4.loop_header.reduction
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] 
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] dot.0.4.loop_exit.reduction:                      ; preds = %dot.0.4.loop_header.reduction
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %16 = load float, float* %accum_address
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %17 = getelementptr inbounds [2 x [2 x float]], [2 x [2 x float]]* %dot.0.4, i64 0, i64 %dot.0.4.indvar.lhs.0, i64 %dot.0.4.indvar.rhs.1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   store float %16, float* %17, !alias.scope !5, !noalias !8
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %invar.inc1 = add nuw nsw i64 %dot.0.4.indvar.rhs.1, 1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   store i64 %invar.inc1, i64* %dot.0.4.invar_address.rhs.1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   br label %dot.0.4.loop_header.rhs.1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] 
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] dot.0.4.loop_exit.rhs.1:                          ; preds = %dot.0.4.loop_header.rhs.1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %invar.inc = add nuw nsw i64 %dot.0.4.indvar.lhs.0, 1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   store i64 %invar.inc, i64* %dot.0.4.invar_address.lhs.0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   br label %dot.0.4.loop_header.lhs.0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] 
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] dot.0.4.loop_exit.lhs.0:                          ; preds = %dot.0.4.loop_header.lhs.0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %18 = getelementptr inbounds i8*, i8** %temps, i64 3
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %19 = load i8*, i8** %18, !invariant.load !1, !dereferenceable !3, !align !3
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %tuple.0.6 = bitcast i8* %19 to [1 x i8*]*
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %20 = getelementptr inbounds [1 x i8*], [1 x i8*]* %tuple.0.6, i64 0, i64 0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   %21 = bitcast [2 x [2 x float]]* %dot.0.4 to i8*
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   store i8* %21, i8** %20, !alias.scope !8, !noalias !5
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103]   ret void
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] }
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] 
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] attributes #0 = { "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "unsafe-fp-math"="true" }
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] 
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] !llvm.module.flags = !{!0}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] 
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] !0 = !{i32 7, !"PIC Level", i32 2}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] !1 = !{}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] !2 = !{i64 24}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] !3 = !{i64 8}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] !4 = !{i64 16}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] !5 = !{!6}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] !6 = !{!"buffer: {index:2, offset:0, size:16}", !7}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] !7 = !{!"XLA global AA domain"}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] !8 = !{!9}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:103] !9 = !{!"buffer: {index:3, offset:0, size:8}", !7}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:146] IR after optimizations
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] ; ModuleID = '__compute_module'
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] source_filename = "__compute_module"
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] target triple = "x86_64-pc-linux"
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] 
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] ; Function Attrs: norecurse nounwind
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] define void @entry(i8* nocapture readnone %retval, i8* noalias nocapture readnone %run_options, i8** noalias nocapture readnone %params, i8** noalias nocapture readonly %temps, i64* noalias nocapture readnone %prof_counters) local_unnamed_addr #0 {
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] entry:
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %0 = getelementptr inbounds i8*, i8** %temps, i64 1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %1 = bitcast i8** %0 to [3 x [2 x float]]**
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %2 = load [3 x [2 x float]]*, [3 x [2 x float]]** %1, align 8, !invariant.load !1, !dereferenceable !2, !align !3
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %3 = bitcast i8** %temps to [2 x [3 x float]]**
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %4 = load [2 x [3 x float]]*, [2 x [3 x float]]** %3, align 8, !invariant.load !1, !dereferenceable !2, !align !3
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %5 = getelementptr inbounds i8*, i8** %temps, i64 2
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %6 = load i8*, i8** %5, align 8, !invariant.load !1, !dereferenceable !4, !align !3
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %7 = getelementptr inbounds [2 x [3 x float]], [2 x [3 x float]]* %4, i64 0, i64 0, i64 0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %8 = load float, float* %7, align 8, !invariant.load !1, !noalias !5
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %9 = getelementptr inbounds [2 x [3 x float]], [2 x [3 x float]]* %4, i64 0, i64 0, i64 1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %10 = load float, float* %9, align 4, !invariant.load !1, !noalias !5
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %11 = getelementptr inbounds [3 x [2 x float]], [3 x [2 x float]]* %2, i64 0, i64 1, i64 0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %12 = getelementptr inbounds [2 x [3 x float]], [2 x [3 x float]]* %4, i64 0, i64 0, i64 2
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %13 = load float, float* %12, align 8, !invariant.load !1, !noalias !5
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %14 = getelementptr inbounds [3 x [2 x float]], [3 x [2 x float]]* %2, i64 0, i64 2, i64 0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %15 = bitcast [3 x [2 x float]]* %2 to <2 x float>*
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %16 = load <2 x float>, <2 x float>* %15, align 8, !invariant.load !1, !noalias !5
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %shuffle12 = shufflevector <2 x float> %16, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %17 = bitcast float* %11 to <2 x float>*
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %18 = load <2 x float>, <2 x float>* %17, align 8, !invariant.load !1, !noalias !5
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %shuffle10 = shufflevector <2 x float> %18, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %19 = bitcast float* %14 to <2 x float>*
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %20 = load <2 x float>, <2 x float>* %19, align 8, !invariant.load !1, !noalias !5
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %shuffle = shufflevector <2 x float> %20, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %21 = getelementptr inbounds [2 x [3 x float]], [2 x [3 x float]]* %4, i64 0, i64 1, i64 0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %22 = load float, float* %21, align 4, !invariant.load !1, !noalias !5
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %23 = getelementptr inbounds [2 x [3 x float]], [2 x [3 x float]]* %4, i64 0, i64 1, i64 1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %24 = load float, float* %23, align 4, !invariant.load !1, !noalias !5
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %25 = getelementptr inbounds [2 x [3 x float]], [2 x [3 x float]]* %4, i64 0, i64 1, i64 2
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %26 = load float, float* %25, align 4, !invariant.load !1, !noalias !5
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %27 = insertelement <2 x float> undef, float %8, i32 0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %28 = insertelement <2 x float> %27, float %22, i32 1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %shuffle13 = shufflevector <2 x float> %28, <2 x float> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %29 = fmul fast <4 x float> %shuffle12, %shuffle13
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %30 = insertelement <2 x float> undef, float %10, i32 0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %31 = insertelement <2 x float> %30, float %24, i32 1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %shuffle11 = shufflevector <2 x float> %31, <2 x float> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %32 = fmul fast <4 x float> %shuffle10, %shuffle11
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %33 = fadd fast <4 x float> %32, %29
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %34 = insertelement <2 x float> undef, float %13, i32 0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %35 = insertelement <2 x float> %34, float %26, i32 1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %shuffle9 = shufflevector <2 x float> %35, <2 x float> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %36 = fmul fast <4 x float> %shuffle, %shuffle9
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %37 = fadd fast <4 x float> %36, %33
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %38 = bitcast i8* %6 to <4 x float>*
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   store <4 x float> %37, <4 x float>* %38, align 8, !alias.scope !5, !noalias !8
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %39 = getelementptr inbounds i8*, i8** %temps, i64 3
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %40 = bitcast i8** %39 to [1 x i8*]**
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %41 = load [1 x i8*]*, [1 x i8*]** %40, align 8, !invariant.load !1, !dereferenceable !3, !align !3
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   %42 = getelementptr inbounds [1 x i8*], [1 x i8*]* %41, i64 0, i64 0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   store i8* %6, i8** %42, align 8, !alias.scope !8, !noalias !5
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147]   ret void
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] }
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] 
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] attributes #0 = { norecurse nounwind "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "unsafe-fp-math"="true" }
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] 
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] !llvm.module.flags = !{!0}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] 
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] !0 = !{i32 7, !"PIC Level", i32 2}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] !1 = !{}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] !2 = !{i64 24}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] !3 = !{i64 8}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] !4 = !{i64 16}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] !5 = !{!6}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] !6 = !{!"buffer: {index:2, offset:0, size:16}", !7}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] !7 = !{!"XLA global AA domain"}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] !8 = !{!9}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:147] !9 = !{!"buffer: {index:3, offset:0, size:8}", !7}
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] entry:
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x00000000  mov rax, qword ptr [rcx]
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x00000003  mov rdx, qword ptr [rcx + 8]
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x00000007  mov rsi, qword ptr [rcx + 16]
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x0000000b  movss xmm0, dword ptr [rax]
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x0000000f  movss xmm1, dword ptr [rax + 4]
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x00000014  movss xmm2, dword ptr [rax + 8]
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x00000019  movq xmm3, qword ptr [rdx]
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x0000001d  pshufd xmm3, xmm3, 68
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x00000022  movq xmm4, qword ptr [rdx + 8]
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x00000027  pshufd xmm4, xmm4, 68
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x0000002c  movq xmm5, qword ptr [rdx + 16]
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x00000031  pshufd xmm5, xmm5, 68
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x00000036  movss xmm6, dword ptr [rax + 12]
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x0000003b  shufps xmm0, xmm6, 0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x0000003f  mulps xmm0, xmm3
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x00000042  movss xmm3, dword ptr [rax + 16]
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x00000047  shufps xmm1, xmm3, 0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x0000004b  mulps xmm1, xmm4
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x0000004e  movss xmm3, dword ptr [rax + 20]
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x00000053  shufps xmm2, xmm3, 0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x00000057  mulps xmm2, xmm5
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x0000005a  addps xmm2, xmm0
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x0000005d  addps xmm2, xmm1
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x00000060  movups xmmword ptr [rsi], xmm2
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x00000063  mov rax, qword ptr [rcx + 24]
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x00000067  mov qword ptr [rax], rsi
tensorflow/compiler/xla/service/cpu/compiler_functor.cc:169] 0x0000006a  ret
tensorflow/compiler/xla/service/cpu/cpu_compiler.cc:884] Compilation finished

To excerpt the HloModule from the previous dump log:

 HloModule tfcompile0
 
 ENTRY %tfcompile0 (arg0.0.0: f32[2,3], arg1.0.1: f32[3,2]) -> (f32[2,2]) {
   %arg0.0.0 = f32[2,3]{1,0} parameter(0), metadata={op_name="XLA_Args"}
   %reshape.0.2 = f32[2,3]{1,0} reshape(f32[2,3]{1,0} %arg0.0.0)
   %arg1.0.1 = f32[3,2]{1,0} parameter(1), metadata={op_name="XLA_Args"}
   %reshape.0.3 = f32[3,2]{1,0} reshape(f32[3,2]{1,0} %arg1.0.1)
   %dot.0.4 = f32[2,2]{1,0} dot(f32[2,3]{1,0} %reshape.0.2, f32[3,2]{1,0} %reshape.0.3), lhs_contracting_dims={1}, rhs_contracting_dims={0}, operand_precision={default,default}, metadata={op_type="BatchMatMul" op_name="x_y_prod"}
  %reshape.0.5 = f32[2,2]{1,0} reshape(f32[2,2]{1,0} %dot.0.4), metadata={op_type="_Retval" op_name="_retval_0"}
  ROOT %tuple.0.6 = (f32[2,2]{1,0}) tuple(f32[2,2]{1,0} %reshape.0.5), metadata={op_name="XLA_Retvals"}
}

Having this dump log is quite useful to help you trace source code, especially to compare with other's source code study in XLA AOT.
https://www.slideshare.net/ssuser479fa3/tensorflow-xla-aot

Danny's tech notebook | 丹尼技術手札

Tuesday, September 18, 2018

[XLA 研究] How to use XLA AOT compilation in TensorFlow ( Part II )

No comments: