Medial Code Documentation
|
line split implementation from single FILE simply returns lines of files, used for stdin More...
#include <single_file_split.h>
Public Member Functions | |
SingleFileSplit (const char *fname) | |
virtual void | BeforeFirst (void) |
reset the position of InputSplit to beginning | |
virtual void | HintChunkSize (size_t chunk_size) |
hint the inputsplit how large the chunk size it should return when implementing NextChunk this is a hint so may not be enforced, but InputSplit will try adjust its internal buffer size to the hinted value | |
virtual size_t | GetTotalSize (void) |
get the total size of the InputSplit | |
virtual size_t | Read (void *ptr, size_t size) |
virtual void | ResetPartition (unsigned part_index, unsigned num_parts) |
reset the Input split to a certain part id, The InputSplit will be pointed to the head of the new specified segment. This feature may not be supported by every implementation of InputSplit. | |
virtual void | Write (const void *, size_t) |
virtual bool | NextRecord (Blob *out_rec) |
get the next record, the returning value is valid until next call to NextRecord, NextChunk or NextBatch caller can modify the memory content of out_rec | |
virtual bool | NextChunk (Blob *out_chunk) |
get a chunk of memory that can contain multiple records, the caller needs to parse the content of the resulting chunk, for text file, out_chunk can contain data of multiple lines for recordio, out_chunk can contain multiple records(including headers) | |
bool | ReadChunk (void *buf, size_t *size) |
![]() | |
virtual bool | NextBatch (Blob *out_chunk, size_t) |
get a chunk of memory that can contain multiple records, with hint for how many records is needed, the caller needs to parse the content of the resulting chunk, for text file, out_chunk can contain data of multiple lines for recordio, out_chunk can contain multiple records(including headers) | |
virtual | ~InputSplit (void) DMLC_THROW_EXCEPTION |
destructor | |
Protected Member Functions | |
const char * | FindLastRecordBegin (const char *begin, const char *end) |
char * | FindNextRecord (char *begin, char *end) |
bool | LoadChunk (void) |
Additional Inherited Members | |
![]() | |
static InputSplit * | Create (const char *uri, unsigned part_index, unsigned num_parts, const char *type) |
factory function: create input split given a uri | |
static InputSplit * | Create (const char *uri, const char *index_uri, unsigned part_index, unsigned num_parts, const char *type, const bool shuffle=false, const int seed=0, const size_t batch_size=256, const bool recurse_directories=false) |
factory function: create input split given a uri for input and index | |
line split implementation from single FILE simply returns lines of files, used for stdin
|
inlinevirtual |
reset the position of InputSplit to beginning
Implements dmlc::InputSplit.
|
inlinevirtual |
get the total size of the InputSplit
Implements dmlc::InputSplit.
|
inlinevirtual |
hint the inputsplit how large the chunk size it should return when implementing NextChunk this is a hint so may not be enforced, but InputSplit will try adjust its internal buffer size to the hinted value
Reimplemented from dmlc::InputSplit.
|
inlinevirtual |
get a chunk of memory that can contain multiple records, the caller needs to parse the content of the resulting chunk, for text file, out_chunk can contain data of multiple lines for recordio, out_chunk can contain multiple records(including headers)
This function ensures there won't be partial record in the chunk caller can modify the memory content of out_chunk, the memory is valid until next call to NextRecord, NextChunk or NextBatch
Usually NextRecord is sufficient, NextChunk can be used by some multi-threaded parsers to parse the input content
out_chunk | used to store the result |
Implements dmlc::InputSplit.
|
inlinevirtual |
get the next record, the returning value is valid until next call to NextRecord, NextChunk or NextBatch caller can modify the memory content of out_rec
For text, out_rec contains a single line For recordio, out_rec contains one record content(with header striped)
out_rec | used to store the result |
Implements dmlc::InputSplit.
|
inlinevirtual |
reset the Input split to a certain part id, The InputSplit will be pointed to the head of the new specified segment. This feature may not be supported by every implementation of InputSplit.
part_index | The part id of the new input. |
num_parts | The total number of parts. |
Implements dmlc::InputSplit.