Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To-do list #6

Open
wukefe opened this issue Jul 24, 2017 · 1 comment
Open

To-do list #6

wukefe opened this issue Jul 24, 2017 · 1 comment

Comments

@wukefe
Copy link
Contributor

wukefe commented Jul 24, 2017

Remained things for the following weeks.

July 23, 2017

Adding more primitives

Although dozens of primitives have been added into the backend, there are a couple of important primitives are missing. Most of them are linked to efficient libraries, such as like, which requires a high-performance string matching library PCRE.

  1. like: PCRE library, online
  2. BLAS and CLAPACK for complex numbers
  3. More....

Enum

Enumeration is a mapping from a primary-key column (unique values) to a foreign-key column. In the underline implementation, it stores numeric indices other than actual value.

u <- `apple`ibm`google
v <- `google`google`apple`apple`ibm
e <- `u:v

The logic value of e is

`u:`google`google`apple`apple`ibm

while it keeps indices as follows

`u:2 2 0 0  1

Note: If a foreign-key column is replaced with an enumeration, it may sacrifice data locality when a query is merely within a table without any join operations. (Meeting with Bettina on July 28)

Null and Inf

The value of an item may be null (absence of a value) or inf (infinite). It can be applied to ordinary operations as an operand without any breakdowns. We'd like to keep a HorseIR program running with maximum tolerance on these special values.

For example,

inf + 10 = inf
inf - 10 = inf

System functions

A system function is designed for connecting HorseIR and outside environment. Its tasks include

  1. File I/O
  2. Floating numbers output precision
  3. More...

Completing interpreter

We're working on the interpreter to execute HorseIR code with user-defined functions (UDFs). As a DB-related system, the interpreter is designed as an online system which processes requests when it accepts. We summarize our design of the interpreter as follows.

  1. Low-latency
  2. High-throughput (SIMD)
  3. Extensible compiler framework (for compiler optimizations)
  4. Sustainable for a long time (with solid memory management)

Possible back-end tricks

  • A stride represents a consecutive range
  • Lazy evaluation (e.g. len(reverse(x)) and len(x) are the same)
@wukefe
Copy link
Contributor Author

wukefe commented Jul 27, 2017

PCRE version 8.41.
| Download Source Code | Documents | Online Cheetsheet |

Mac OS

brew install pcre

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant