Open3D (C++ API)
0.18.0+3975044
|
#include <Indexer.h>
Public Member Functions | |
Indexer () | |
Indexer (const Indexer &)=default | |
Indexer & | operator= (const Indexer &)=default |
Indexer (const std::vector< Tensor > &input_tensors, const Tensor &output_tensor, DtypePolicy dtype_policy=DtypePolicy::ALL_SAME, const SizeVector &reduction_dims={}) | |
Indexer (const std::vector< Tensor > &input_tensors, const std::vector< Tensor > &output_tensors, DtypePolicy dtype_policy=DtypePolicy::ALL_SAME, const SizeVector &reduction_dims={}) | |
bool | CanUse32BitIndexing () const |
Returns true iff the maximum_offsets in bytes are smaller than 2^31 - 1. More... | |
IndexerIterator | SplitTo32BitIndexing () const |
std::unique_ptr< Indexer > | SplitLargestDim () |
Indexer | GetPerOutputIndexer (int64_t output_idx) const |
bool | ShouldAccumulate () const |
bool | IsFinalOutput () const |
void | ShrinkDim (int64_t dim, int64_t start, int64_t size) |
int64_t | NumReductionDims () const |
Returns the number of reduction dimensions. More... | |
int64_t | NumDims () const |
Returns number of dimensions of the Indexer. More... | |
const int64_t * | GetPrimaryShape () const |
int64_t * | GetPrimaryShape () |
const int64_t * | GetPrimaryStrides () const |
int64_t | NumWorkloads () const |
int64_t | NumOutputElements () const |
Returns the number of output elements. More... | |
int64_t | NumInputs () const |
Number of input Tensors. More... | |
int64_t | NumOutputs () const |
Number of output Tensors. More... | |
TensorRef & | GetInput (int64_t i) |
Returns input TensorRef. More... | |
const TensorRef & | GetInput (int64_t i) const |
TensorRef & | GetOutput (int64_t i) |
Returns output TensorRef. More... | |
const TensorRef & | GetOutput (int64_t i) const |
TensorRef & | GetOutput () |
const TensorRef & | GetOutput () const |
bool | IsReductionDim (int64_t dim) const |
Returns true if the dim -th dimension is reduced. More... | |
OPEN3D_HOST_DEVICE char * | GetInputPtr (int64_t input_idx, int64_t workload_idx) const |
template<typename T > | |
OPEN3D_HOST_DEVICE T * | GetInputPtr (int64_t input_idx, int64_t workload_idx) const |
OPEN3D_HOST_DEVICE char * | GetOutputPtr (int64_t workload_idx) const |
template<typename T > | |
OPEN3D_HOST_DEVICE T * | GetOutputPtr (int64_t workload_idx) const |
OPEN3D_HOST_DEVICE char * | GetOutputPtr (int64_t output_idx, int64_t workload_idx) const |
template<typename T > | |
OPEN3D_HOST_DEVICE T * | GetOutputPtr (int64_t output_idx, int64_t workload_idx) const |
Protected Member Functions | |
void | CoalesceDimensions () |
void | ReorderDimensions (const SizeVector &reduction_dims) |
void | UpdatePrimaryStrides () |
Update primary_strides_ based on primary_shape_. More... | |
void | UpdateContiguousFlags () |
Update input_contiguous_ and output_contiguous_. More... | |
OPEN3D_HOST_DEVICE char * | GetWorkloadDataPtr (const TensorRef &tr, bool tr_contiguous, int64_t workload_idx) const |
template<typename T > | |
OPEN3D_HOST_DEVICE T * | GetWorkloadDataPtr (const TensorRef &tr, bool tr_contiguous, int64_t workload_idx) const |
Static Protected Member Functions | |
static void | BroadcastRestride (TensorRef &src, int64_t dst_ndims, const int64_t *dst_shape) |
static void | ReductionRestride (TensorRef &dst, int64_t src_ndims, const int64_t *src_shape, const SizeVector &reduction_dims) |
Protected Attributes | |
int64_t | num_inputs_ = 0 |
Number of input and output Tensors. More... | |
int64_t | num_outputs_ = 0 |
TensorRef | inputs_ [MAX_INPUTS] |
Array of input TensorRefs. More... | |
TensorRef | outputs_ [MAX_OUTPUTS] |
Array of output TensorRefs. More... | |
bool | inputs_contiguous_ [MAX_INPUTS] |
Array of contiguous flags for all input TensorRefs. More... | |
bool | outputs_contiguous_ [MAX_OUTPUTS] |
Array of contiguous flags for all output TensorRefs. More... | |
int64_t | primary_shape_ [MAX_DIMS] |
int64_t | primary_strides_ [MAX_DIMS] |
int64_t | ndims_ = 0 |
Indexer's global number of dimensions. More... | |
bool | final_output_ = true |
bool | accumulate_ = false |
Indexing engine for elementwise ops with broadcasting support.
Fancy indexing is supported by restriding input tensor and treating the operation as elementwise op.
After constructing Indexer on the host, the indexing methods can be used from both host and device.
|
inline |
|
default |
open3d::core::Indexer::Indexer | ( | const std::vector< Tensor > & | input_tensors, |
const Tensor & | output_tensor, | ||
DtypePolicy | dtype_policy = DtypePolicy::ALL_SAME , |
||
const SizeVector & | reduction_dims = {} |
||
) |
Only single output is supported for simplicity. To extend this function to support multiple outputs, one may check for shape compatibility of all outputs.
open3d::core::Indexer::Indexer | ( | const std::vector< Tensor > & | input_tensors, |
const std::vector< Tensor > & | output_tensors, | ||
DtypePolicy | dtype_policy = DtypePolicy::ALL_SAME , |
||
const SizeVector & | reduction_dims = {} |
||
) |
|
staticprotected |
Broadcast src to dst by setting shape 1 to omitted dimensions and setting stride 0 to brocasted dimensions.
Note that other approaches may also work. E.g. one could set src's shape to exactly the same as dst's shape. In general, if a dimension is of size 1, the stride have no effect in computing offsets; or likewise if a dimension has stride 0, the shape have no effect in computing offsets.
[Before] Omitted | Broadcast | | No broadcast | | | V V V src.shape_: [ 2, 1, 1, 3] src.strides_: [ 3, 3, 3, 1] dst.shape_: [ 2, 2, 2, 1, 3] dst.strides_: [12, 6, 3, 3, 1]
[After] src.shape_: [ 1, 2, 1, 1, 3] src.strides_: [ 0, 3, 0, 3, 1]
src | The source TensorRef to be broadcasted. |
dst_ndims | Number of dimensions to be broadcasted to. |
dst_shape | Shape to be broadcasted to. |
bool open3d::core::Indexer::CanUse32BitIndexing | ( | ) | const |
Returns true iff the maximum_offsets in bytes are smaller than 2^31 - 1.
|
protected |
Merge adjacent dimensions if either dim is 1 or if: shape[n] * stride[n] == shape[n + 1]
|
inline |
|
inline |
Get input Tensor data pointer based on workload_idx
.
input_idx | Input tensor index. |
workload_idx | The index of the compute workload, similar to thread_id, if a thread only processes one workload. |
|
inline |
Get input Tensor data pointer based on workload_idx
.
input_idx | Input tensor index. |
workload_idx | The index of the compute workload, similar to thread_id, if a thread only processes one workload. |
Note: Assumes that sizeof(T) matches the input's dtype size, but does not check this constraint for performance reasons.
|
inline |
Returns output TensorRef. Only works if there's only one output. Equivalent to GetOutput(0).
|
inline |
|
inline |
|
inline |
Get output Tensor data pointer based on workload_idx
.
output_idx | Output tensor index. |
workload_idx | The index of the compute workload, similar to thread_id, if a thread only processes one workload. |
|
inline |
Get output Tensor data pointer based on workload_idx
.
output_idx | Output tensor index. |
workload_idx | The index of the compute workload, similar to thread_id, if a thread only processes one workload. |
|
inline |
Get output Tensor data pointer based on workload_idx
.
workload_idx | The index of the compute workload, similar to thread_id, if a thread only processes one workload. |
|
inline |
Get output Tensor data pointer based on workload_idx
.
workload_idx | The index of the compute workload, similar to thread_id, if a thread only processes one workload. |
Note: Assumes that sizeof(T) matches the output's dtype size, but does not check this constraint for performance reasons.
Indexer open3d::core::Indexer::GetPerOutputIndexer | ( | int64_t | output_idx | ) | const |
Get a sub-indexer that loops through all inputs corresponding to a single output.
|
inline |
|
inline |
|
inline |
|
inlineprotected |
Get data pointer from a TensorRef with workload_idx
. Note: can be optimized by computing all input ptrs and output ptr together.
|
inlineprotected |
Get data pointer from a TensorRef with workload_idx
. Note: can be optimized by computing all input ptrs and output ptr together.
Note: Assumes that sizeof(T) matches the data's dtype size, but does not check this constraint for performance reasons.
|
inline |
|
inline |
Returns true if the dim
-th dimension is reduced.
|
inline |
Returns number of dimensions of the Indexer.
|
inline |
Number of input Tensors.
int64_t open3d::core::Indexer::NumOutputElements | ( | ) | const |
Returns the number of output elements.
|
inline |
Number of output Tensors.
int64_t open3d::core::Indexer::NumReductionDims | ( | ) | const |
Returns the number of reduction dimensions.
int64_t open3d::core::Indexer::NumWorkloads | ( | ) | const |
Returns the total number of workloads (e.g. computations) needed for the op. The scheduler schedules these workloads to run on parallel threads.
For non-reduction ops, NumWorkloads() is the same as number of output elements (e.g. for broadcasting ops).
For reduction ops, NumWorkLoads() is the same as the number of input elements. Currently we don't allow mixing broadcasting and reduction in one op kernel.
|
staticprotected |
Symmetrical to BroadcastRestride. Set the reduced dimensions' stride to 0 at output. Currently only support the keepdim=true case.
|
protected |
|
inline |
void open3d::core::Indexer::ShrinkDim | ( | int64_t | dim, |
int64_t | start, | ||
int64_t | size | ||
) |
Shrink iteration to a specific range in a specific dimension.
dim | The dimension to be shrunken to. |
start | Starting index (inclusive) for dimension dim . No dimension wrapping is available. |
size | The size to iterate in dimension dim . |
std::unique_ptr< Indexer > open3d::core::Indexer::SplitLargestDim | ( | ) |
Split the indexer such that the largest-span-dimension is split into two halves. The returned new indexer iterates the first half while the current indexer iterates the second half.
IndexerIterator open3d::core::Indexer::SplitTo32BitIndexing | ( | ) | const |
Returns an iterator of Indexers, each of which can be indexed in 32 bits.
|
protected |
Update input_contiguous_ and output_contiguous_.
|
protected |
Update primary_strides_ based on primary_shape_.
|
protected |
If the kernel should accumulate into the output. Only relevant for CUDA reductions.
|
protected |
Whether this iterator produces the actual output, as opposed to something that will be accumulated further. Only relevant for CUDA reductions.
|
protected |
Array of input TensorRefs.
|
protected |
Array of contiguous flags for all input TensorRefs.
|
protected |
Indexer's global number of dimensions.
|
protected |
Number of input and output Tensors.
|
protected |
|
protected |
Array of output TensorRefs.
|
protected |
Array of contiguous flags for all output TensorRefs.
|
protected |
Indexer's global shape. The shape's number of elements is the same as GetNumWorkloads() for the Indexer.
|
protected |
The default strides for primary_shape_ for internal use only. Used to compute the actual strides and ultimately the index offsets.