Skip to content
This repository was archived by the owner on Sep 27, 2019. It is now read-only.

Support COPY for CSV files #1371

Merged
merged 42 commits into from
Jun 8, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
78f6e34
Add more information to CopyStatement. Cleaned up includes.
pmenon Apr 25, 2018
ab08eba
Add CSVScan node to ToString and FromString. Removed BINARY external …
pmenon Apr 26, 2018
2c681f0
Move COPY from DDL to DML processing. COPY now goes through planner/o…
pmenon Apr 26, 2018
c749fe2
Propagate external file information
pmenon Apr 26, 2018
73d583f
Removed unused serialization stuff from plan nodes
pmenon Apr 27, 2018
7e61425
Codegen can now have constant generic/opaque bytes in module
pmenon Apr 30, 2018
02bd504
When no columns specified during copy, all columns are inserted
pmenon Apr 30, 2018
226d341
Added function to throw expception with ill-formatted input string wh…
pmenon Apr 30, 2018
e9e1a8f
Removed serialization
pmenon Apr 30, 2018
0847d23
Added input functions in prepartion to read table data from files
pmenon Apr 30, 2018
1d9e334
All SQL types must now provide an input function to convert a string …
pmenon Apr 30, 2018
3e3d689
Added test for value integrity
pmenon Apr 30, 2018
473b9b4
First take at CSV Scan translator
pmenon May 1, 2018
852fd42
Fix after rebase
pmenon May 1, 2018
b4906df
file api
pmenon May 8, 2018
b41b863
CSV scanner reads lines
pmenon May 8, 2018
93f39fc
Process CSV line in scanner
pmenon May 8, 2018
2aa0aa1
Free memory when re-allocating line buffer
pmenon May 9, 2018
78c0805
Added memcmp to codegen interface. Renamed CallPrintf() to Printf().
pmenon May 9, 2018
f46634a
Cleaned up CSV scan translator. Added null checking.
pmenon May 9, 2018
66bb521
Moved TupleRuntime::CreateVarlen() into ValuesRuntime::WriteVarlen().…
pmenon May 9, 2018
e318f66
Added error handling for long columns. Added null-terminator byte for…
pmenon May 14, 2018
447932c
Added inputs for decimal types
pmenon May 14, 2018
d1e214a
Moved type-specific functions into function namespace
pmenon May 16, 2018
e76ea69
REALLY simple Date support
pmenon May 16, 2018
c4ede0a
Compile fixes for GCC 6+
pmenon May 16, 2018
c50b665
Get string inputs working
pmenon May 16, 2018
d6bb873
Beefed up tests
pmenon May 16, 2018
b34ba30
Simple CSV scan test
pmenon May 16, 2018
70d5012
Updated optimize to continue support for old/weird/strange AF copy ex…
pmenon May 16, 2018
f04d036
Extracted implementation into CPP file for plan node
pmenon May 16, 2018
0483a6d
* Propagatge file options through optimization.
pmenon May 16, 2018
342c4ca
Fixes after rebase
pmenon May 22, 2018
669cca3
Simple function to convert tuple to string CSV
pmenon May 24, 2018
3cd74b6
Fix void* -> i8* conversion
pmenon May 24, 2018
dbb042e
More tests
pmenon May 24, 2018
985d329
Address reviews
pmenon May 29, 2018
70f94ec
Revert "Removed serialization"
pmenon May 29, 2018
b8d0c34
Revert "Removed unused serialization stuff from plan nodes"
pmenon May 29, 2018
13f84a4
Beefed up tests, which caught more bugs
pmenon Jun 6, 2018
bca783d
Fix tests
pmenon Jun 6, 2018
e327ac7
Reducing copying overhead for columns, constraints and loop variables…
pmenon Jun 6, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions script/validators/source_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,12 +58,12 @@
"src/network/protocol.cpp",
"src/include/common/macros.h",
"src/common/stack_trace.cpp",
"src/include/parser/sql_scanner.h", # There is a free() in comments
"src/include/index/bloom_filter.h",
"src/include/index/compact_ints_key.h",
"src/include/index/bwtree.h",
"src/codegen/util/oa_hash_table.cpp",
"src/codegen/util/cc_hash_table.cpp"
"src/codegen/util/cc_hash_table.cpp",
"src/codegen/codegen.cpp", # We allow calling printf() from codegen for debugging
]

## ==============================================
Expand Down
14 changes: 13 additions & 1 deletion src/binder/bind_node_visitor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,19 @@ void BindNodeVisitor::Visit(parser::DeleteStatement *node) {
}

void BindNodeVisitor::Visit(parser::LimitDescription *) {}
void BindNodeVisitor::Visit(parser::CopyStatement *) {}

void BindNodeVisitor::Visit(parser::CopyStatement *node) {
context_ = std::make_shared<BinderContext>(nullptr);
if (node->table != nullptr) {
node->table->Accept(this);

// If the table is given, we're either writing or reading all columns
context_->GenerateAllColumnExpressions(node->select_list);
} else {
node->select_stmt->Accept(this);
}
}

void BindNodeVisitor::Visit(parser::CreateFunctionStatement *) {}
void BindNodeVisitor::Visit(parser::CreateStatement *node) {
node->TryBindDatabaseName(default_database_name_);
Expand Down
1 change: 1 addition & 0 deletions src/catalog/abstract_catalog.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
#include "executor/plan_executor.h"
#include "executor/seq_scan_executor.h"
#include "executor/update_executor.h"
#include "expression/constant_value_expression.h"

#include "storage/database.h"
#include "storage/storage_manager.h"
Expand Down
54 changes: 27 additions & 27 deletions src/catalog/catalog.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
#include "codegen/code_context.h"
#include "concurrency/transaction_manager_factory.h"
#include "function/date_functions.h"
#include "function/decimal_functions.h"
#include "function/numeric_functions.h"
#include "function/old_engine_string_functions.h"
#include "function/timestamp_functions.h"
#include "index/index_factory.h"
Expand Down Expand Up @@ -1283,43 +1283,43 @@ void Catalog::InitializeFunctions() {
AddBuiltinFunction("abs", {type::TypeId::DECIMAL}, type::TypeId::DECIMAL,
internal_lang, "Abs",
function::BuiltInFuncType{
OperatorId::Abs, function::DecimalFunctions::_Abs},
OperatorId::Abs, function::NumericFunctions::_Abs},
txn);
AddBuiltinFunction(
"sqrt", {type::TypeId::TINYINT}, type::TypeId::DECIMAL, internal_lang,
"Sqrt",
function::BuiltInFuncType{OperatorId::Sqrt,
function::DecimalFunctions::Sqrt},
function::NumericFunctions::Sqrt},
txn);
AddBuiltinFunction(
"sqrt", {type::TypeId::SMALLINT}, type::TypeId::DECIMAL,
internal_lang, "Sqrt",
function::BuiltInFuncType{OperatorId::Sqrt,
function::DecimalFunctions::Sqrt},
function::NumericFunctions::Sqrt},
txn);
AddBuiltinFunction(
"sqrt", {type::TypeId::INTEGER}, type::TypeId::DECIMAL, internal_lang,
"Sqrt",
function::BuiltInFuncType{OperatorId::Sqrt,
function::DecimalFunctions::Sqrt},
function::NumericFunctions::Sqrt},
txn);
AddBuiltinFunction(
"sqrt", {type::TypeId::BIGINT}, type::TypeId::DECIMAL, internal_lang,
"Sqrt",
function::BuiltInFuncType{OperatorId::Sqrt,
function::DecimalFunctions::Sqrt},
function::NumericFunctions::Sqrt},
txn);
AddBuiltinFunction(
"sqrt", {type::TypeId::DECIMAL}, type::TypeId::DECIMAL, internal_lang,
"Sqrt",
function::BuiltInFuncType{OperatorId::Sqrt,
function::DecimalFunctions::Sqrt},
function::NumericFunctions::Sqrt},
txn);
AddBuiltinFunction(
"floor", {type::TypeId::DECIMAL}, type::TypeId::DECIMAL,
internal_lang, "Floor",
function::BuiltInFuncType{OperatorId::Floor,
function::DecimalFunctions::_Floor},
function::NumericFunctions::_Floor},
txn);

/**
Expand All @@ -1328,126 +1328,126 @@ void Catalog::InitializeFunctions() {
AddBuiltinFunction("abs", {type::TypeId::TINYINT}, type::TypeId::TINYINT,
internal_lang, "Abs",
function::BuiltInFuncType{
OperatorId::Abs, function::DecimalFunctions::_Abs},
OperatorId::Abs, function::NumericFunctions::_Abs},
txn);

AddBuiltinFunction("abs", {type::TypeId::SMALLINT},
type::TypeId::SMALLINT, internal_lang, "Abs",
function::BuiltInFuncType{
OperatorId::Abs, function::DecimalFunctions::_Abs},
OperatorId::Abs, function::NumericFunctions::_Abs},
txn);

AddBuiltinFunction("abs", {type::TypeId::INTEGER}, type::TypeId::INTEGER,
internal_lang, "Abs",
function::BuiltInFuncType{
OperatorId::Abs, function::DecimalFunctions::_Abs},
OperatorId::Abs, function::NumericFunctions::_Abs},
txn);

AddBuiltinFunction("abs", {type::TypeId::BIGINT}, type::TypeId::BIGINT,
internal_lang, "Abs",
function::BuiltInFuncType{
OperatorId::Abs, function::DecimalFunctions::_Abs},
OperatorId::Abs, function::NumericFunctions::_Abs},
txn);

AddBuiltinFunction(
"floor", {type::TypeId::INTEGER}, type::TypeId::DECIMAL,
internal_lang, "Floor",
function::BuiltInFuncType{OperatorId::Floor,
function::DecimalFunctions::_Floor},
function::NumericFunctions::_Floor},
txn);
AddBuiltinFunction(
"floor", {type::TypeId::BIGINT}, type::TypeId::DECIMAL, internal_lang,
"Floor",
function::BuiltInFuncType{OperatorId::Floor,
function::DecimalFunctions::_Floor},
function::NumericFunctions::_Floor},
txn);
AddBuiltinFunction(
"floor", {type::TypeId::TINYINT}, type::TypeId::DECIMAL,
internal_lang, "Floor",
function::BuiltInFuncType{OperatorId::Floor,
function::DecimalFunctions::_Floor},
function::NumericFunctions::_Floor},
txn);
AddBuiltinFunction(
"floor", {type::TypeId::SMALLINT}, type::TypeId::DECIMAL,
internal_lang, "Floor",
function::BuiltInFuncType{OperatorId::Floor,
function::DecimalFunctions::_Floor},
function::NumericFunctions::_Floor},
txn);
AddBuiltinFunction(
"round", {type::TypeId::DECIMAL}, type::TypeId::DECIMAL,
internal_lang, "Round",
function::BuiltInFuncType{OperatorId::Round,
function::DecimalFunctions::_Round},
function::NumericFunctions::_Round},
txn);

AddBuiltinFunction(
"ceil", {type::TypeId::DECIMAL}, type::TypeId::DECIMAL, internal_lang,
"Ceil",
function::BuiltInFuncType{OperatorId::Ceil,
function::DecimalFunctions::_Ceil},
function::NumericFunctions::_Ceil},
txn);

AddBuiltinFunction(
"ceil", {type::TypeId::TINYINT}, type::TypeId::DECIMAL, internal_lang,
"Ceil",
function::BuiltInFuncType{OperatorId::Ceil,
function::DecimalFunctions::_Ceil},
function::NumericFunctions::_Ceil},
txn);

AddBuiltinFunction(
"ceil", {type::TypeId::SMALLINT}, type::TypeId::DECIMAL,
internal_lang, "Ceil",
function::BuiltInFuncType{OperatorId::Ceil,
function::DecimalFunctions::_Ceil},
function::NumericFunctions::_Ceil},
txn);

AddBuiltinFunction(
"ceil", {type::TypeId::INTEGER}, type::TypeId::DECIMAL, internal_lang,
"Ceil",
function::BuiltInFuncType{OperatorId::Ceil,
function::DecimalFunctions::_Ceil},
function::NumericFunctions::_Ceil},
txn);

AddBuiltinFunction(
"ceil", {type::TypeId::BIGINT}, type::TypeId::DECIMAL, internal_lang,
"Ceil",
function::BuiltInFuncType{OperatorId::Ceil,
function::DecimalFunctions::_Ceil},
function::NumericFunctions::_Ceil},
txn);

AddBuiltinFunction(
"ceiling", {type::TypeId::DECIMAL}, type::TypeId::DECIMAL,
internal_lang, "Ceil",
function::BuiltInFuncType{OperatorId::Ceil,
function::DecimalFunctions::_Ceil},
function::NumericFunctions::_Ceil},
txn);

AddBuiltinFunction(
"ceiling", {type::TypeId::TINYINT}, type::TypeId::DECIMAL,
internal_lang, "Ceil",
function::BuiltInFuncType{OperatorId::Ceil,
function::DecimalFunctions::_Ceil},
function::NumericFunctions::_Ceil},
txn);

AddBuiltinFunction(
"ceiling", {type::TypeId::SMALLINT}, type::TypeId::DECIMAL,
internal_lang, "Ceil",
function::BuiltInFuncType{OperatorId::Ceil,
function::DecimalFunctions::_Ceil},
function::NumericFunctions::_Ceil},
txn);

AddBuiltinFunction(
"ceiling", {type::TypeId::INTEGER}, type::TypeId::DECIMAL,
internal_lang, "Ceil",
function::BuiltInFuncType{OperatorId::Ceil,
function::DecimalFunctions::_Ceil},
function::NumericFunctions::_Ceil},
txn);

AddBuiltinFunction(
"ceiling", {type::TypeId::BIGINT}, type::TypeId::DECIMAL,
internal_lang, "Ceil",
function::BuiltInFuncType{OperatorId::Ceil,
function::DecimalFunctions::_Ceil},
function::NumericFunctions::_Ceil},
txn);

/**
Expand Down
9 changes: 9 additions & 0 deletions src/codegen/buffering_consumer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,15 @@ WrappedTuple &WrappedTuple::operator=(const WrappedTuple &o) {
return *this;
}

std::string WrappedTuple::ToCSV() const {
std::string ret;
for (uint32_t i = 0; i < tuple_.size(); i++) {
if (i != 0) ret.append(",");
ret.append(tuple_[i].ToString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this is that we are not escaping commas. Is there a CSV library that we could use?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a utility function to pretty print tuples for debugging purposes. It isn't used in execution.

}
return ret;
}

//===----------------------------------------------------------------------===//
// BufferTuple() Proxy
//===----------------------------------------------------------------------===//
Expand Down
91 changes: 73 additions & 18 deletions src/codegen/codegen.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -59,12 +59,30 @@ llvm::Constant *CodeGen::ConstDouble(double val) const {
return llvm::ConstantFP::get(DoubleType(), val);
}

llvm::Constant *CodeGen::ConstString(const std::string &s) const {
llvm::Value *CodeGen::ConstString(const std::string &str_val,
const std::string &name) const {
// Strings are treated as arrays of bytes
auto *str = llvm::ConstantDataArray::getString(GetContext(), s);
return new llvm::GlobalVariable(GetModule(), str->getType(), true,
llvm::GlobalValue::InternalLinkage, str,
"str");
auto *str = llvm::ConstantDataArray::getString(GetContext(), str_val);
auto *global_var =
new llvm::GlobalVariable(GetModule(), str->getType(), true,
llvm::GlobalValue::InternalLinkage, str, name);
return GetBuilder().CreateInBoundsGEP(global_var, {Const32(0), Const32(0)});
}

llvm::Value *CodeGen::ConstGenericBytes(const void *data, uint32_t length,
const std::string &name) const {
// Create the constant data array that wraps the input data
llvm::ArrayRef<uint8_t> elements{reinterpret_cast<const uint8_t *>(data),
length};
auto *arr = llvm::ConstantDataArray::get(GetContext(), elements);

// Create a global variable for the data
auto *global_var =
new llvm::GlobalVariable(GetModule(), arr->getType(), true,
llvm::GlobalValue::InternalLinkage, arr, name);

// Return a pointer to the first element
return GetBuilder().CreateInBoundsGEP(global_var, {Const32(0), Const32(0)});
}

llvm::Constant *CodeGen::Null(llvm::Type *type) const {
Expand All @@ -75,11 +93,6 @@ llvm::Constant *CodeGen::NullPtr(llvm::PointerType *type) const {
return llvm::ConstantPointerNull::get(type);
}

llvm::Value *CodeGen::ConstStringPtr(const std::string &s) const {
auto &ir_builder = GetBuilder();
return ir_builder.CreateConstInBoundsGEP2_32(nullptr, ConstString(s), 0, 0);
}

llvm::Value *CodeGen::AllocateVariable(llvm::Type *type,
const std::string &name) {
// To allocate a variable, a function must be under construction
Expand Down Expand Up @@ -135,26 +148,68 @@ llvm::Value *CodeGen::CallFunc(llvm::Value *fn,
return GetBuilder().CreateCall(fn, args);
}

llvm::Value *CodeGen::CallPrintf(const std::string &format,
const std::vector<llvm::Value *> &args) {
llvm::Value *CodeGen::Printf(const std::string &format,
const std::vector<llvm::Value *> &args) {
auto *printf_fn = LookupBuiltin("printf");
if (printf_fn == nullptr) {
#if GCC_AT_LEAST_6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment to explain what this does?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a short comment explaining this.

TL;DR; GCC 6+ complains when it doesn't see you using all function attributes (i.e., nothrow, throw, notnull etc.). We use decltype to get the signature for some functions, so we don't need the attributes. This fails compilation. We ignore this warning for this section of code.

// In newer GCC versions (i.e., GCC 6+), function attributes are part of the
// type system and are attached to the function signature. For example, printf()
// comes with the "noexcept" attribute. Moreover, GCC 6+ will complain when
// attributes attached to a function (e.g., noexcept()) are not used at
// their call-site. Below, we use decltype(printf) to get the C/C++ function
// type of printf(...), but we discard the attributes since we don't need
// them. Hence, on GCC 6+, compilation will fail without adding the
// "-Wignored-attributes" flag. So, we add it here only.
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wignored-attributes"
#endif
printf_fn = RegisterBuiltin(
"printf", llvm::TypeBuilder<int(char *, ...), false>::get(GetContext()),
"printf", llvm::TypeBuilder<decltype(printf), false>::get(GetContext()),
reinterpret_cast<void *>(printf));
#if GCC_AT_LEAST_6
#pragma GCC diagnostic pop
#endif
}
auto &ir_builder = code_context_.GetBuilder();
auto *format_str =
ir_builder.CreateGEP(ConstString(format), {Const32(0), Const32(0)});

// Collect all the arguments into a vector
std::vector<llvm::Value *> printf_args{format_str};
std::vector<llvm::Value *> printf_args = {ConstString(format, "format")};
printf_args.insert(printf_args.end(), args.begin(), args.end());

// Call the function
// Call printf()
return CallFunc(printf_fn, printf_args);
}

llvm::Value *CodeGen::Memcmp(llvm::Value *ptr1, llvm::Value *ptr2,
llvm::Value *len) {
static constexpr char kMemcmpFnName[] = "memcmp";
auto *memcmp_fn = LookupBuiltin(kMemcmpFnName);
if (memcmp_fn == nullptr) {
#if GCC_AT_LEAST_6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, please explain why we need this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a short comment explaining this.

TL;DR; GCC 6+ complains when it doesn't see you using all function attributes (i.e., nothrow, throw, notnull etc.). We use decltype to get the signature for some functions, so we don't need the attributes. This fails compilation. We ignore this warning for this section of code.

// In newer GCC versions (i.e., GCC 6+), function attributes are part of the
// type system and are attached to the function signature. For example, memcmp()
// comes with the "throw()" attribute, among many others. Moreover, GCC 6+ will
// complain when attributes attached to a function are not used at their
// call-site. Below, we use decltype(memcmp) to get the C/C++ function type
// of memcmp(...), but we discard the attributes since we don't need them.
// Hence, on GCC 6+, compilation will fail without adding the
// "-Wignored-attributes" flag. So, we add it here only.
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wignored-attributes"
#endif
memcmp_fn = RegisterBuiltin(
kMemcmpFnName,
llvm::TypeBuilder<decltype(memcmp), false>::get(GetContext()),
reinterpret_cast<void *>(memcmp));
#if GCC_AT_LEAST_6
#pragma GCC diagnostic pop
#endif
}

// Call memcmp()
return CallFunc(memcmp_fn, {ptr1, ptr2, len});
}

llvm::Value *CodeGen::Sqrt(llvm::Value *val) {
llvm::Function *sqrt_func = llvm::Intrinsic::getDeclaration(
&GetModule(), llvm::Intrinsic::sqrt, val->getType());
Expand Down
Loading