-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ir: rethink sumtypes to allow for user-definable types #59
Comments
While here, we should also add remaining test cases to ensure that sumtypes are implemented and enforced. e.g. // Assert that each constant implements the constant.Constant interface.
var _ constant.Constant = (*Global)(nil) |
I don't have a strong opinion on this. I can see the benefit of "locking down an API to only hold objects you exactly know what they are". But the alternative view in my mind is that you don't care what the object is, but only the interface it satisfies. If someone is willing to pass an object that satisfies the interface, I don't immediately see a strong reason to disallow it. Someone also recently pointed out that you can actually defeat this by embedding anyway! https://news.ycombinator.com/item?id=19347281 The context they brought it up was that this technique is actually used in go SSA. |
The benefit is unquestionably stability. However, with stability comes limitation, just like in life :p
The main reason I can think of is that there are definitely parts of the For instance, let say someone defines their own Line 560 in fe7863f
However, we also want to allow such use cases as they are very powerful. For instance, it is easy to envision a user-defined LLVM IR type for structures with named fields. package foolang
type StructType struct {
FieldNames []string
*types.StructType
} I really wish we can find a way to fix such fragile code, so that we may allow users to define their own LLVM IR types, values, etc. I think it is good to enable using the IR package in ways not envisioned by the authors of the IR package. Essentially following the principles of Unix.
That's a fair point. |
Well the immediate thing that strikes me is that wherever we do a type switch maybe that's a hint that you could perhaps make it an interface instead (or type switching over interfaces), perhaps, for example on types that have an "Elem()" method. I think in general if the user wants to implement their own objects they have to accept that things might break. At least this kind of breakage will tend to be fairly obvious! |
I stumbled upon sumgen, and it may be worth investigating whether we should use such a tool to generate the dummy method definitions for sumtypes. Perhaps it's worth it, perhaps not. Regardless, it would be interesting to explore. |
There were two main changes of llir/llvm that required updates in Geode, namely: 1) The sanity type-checks proposed in llir/llvm#65 and added in PR llir/llvm#72. 2) The API update of NewLoad, NewStore, NewGetElementPtr which added a elemType argument in preparation for opaque pointer types of the official LLVM project (see llir/llvm#112). To resolve 1), an Equal method was added to StructType and SliceType of gtypes. To resolve 2) the element type was passed as the first argument to NewLoad, NewStore and NewGetElementPtr. Note, we've definitely noticed that working with user-defined types is difficult and does not work well with the extensive type-switching used in llir/llvm. The user-defined types (structs with names and slices) of Geode have been very helpful to get a better understanding of this limitation of llir/llvm. This has been outlined in llir/llvm#59 (comment) and we hope to get a better design for sumtypes to allow user-defined types, values, constants, instructions, etc. One step at the time :) I'm still very happy that you started the Geode project Nick as I've learnt a lot from it and continue to learn more. Wish you all the best and a most lovely start of the new year. Cheers, Robin
Having worked a bit on getting Geode up to speed with using v0.3.0 (in geode-lang/geode#27), I'm quite convinced that we should work on getting sumtypes right for the v0.4.0 release, to allow for user-defined instructions (e.g. line comments), types (e.g. structs with field names), constants, etc. Just putting this here, so we can prioritize working on this issue early on as it could have quite invasive API changes. |
This comment has been minimized.
This comment has been minimized.
Hi @dannypsnl, Thanks for starting to take a look at this! I agree, we have to figure out what kind of interfaces we need to define. I think the list of usages you produced is very interesting, as from there we should be able to infer common behaviour that needs an interface definition. In the case of type Elemer interface {
Elem() types.Type
} This would handle Note that most of the code is used to handle
|
When I implementing // NOTE: keep getIndex in sync with getIndex in:
//
// * ast/inst_memory.go
// * ir/inst_memory.go
// * ir/constant/expr_memory.go
//
// The reference point and source of truth is in ir/constant/expr_memory.go. code changesdiff --git a/ir/constant/const_indexer.go b/ir/constant/const_indexer.go
new file mode 100644
index 0000000..0f54aee
--- /dev/null
+++ b/ir/constant/const_indexer.go
@@ -0,0 +1,64 @@
+package constant
+
+import (
+ "fmt"
+
+ "github.com/llir/llvm/internal/gep"
+)
+
+type Indexer interface {
+ GetIndex() gep.Index
+}
+
+func (i *Int) GetIndex() gep.Index {
+ return gep.NewIndex(i.X.Int64())
+}
+
+func (z *ZeroInitializer) GetIndex() gep.Index {
+ return gep.NewIndex(0)
+}
+
+func (u *Undef) GetIndex() gep.Index {
+ return gep.Index{HasVal: false}
+}
+
+func (index *Vector) GetIndex() gep.Index {
+ // ref: https://llvm.org/docs/LangRef.html#getelementptr-instruction
+ //
+ // > The getelementptr returns a vector of pointers, instead of a single
+ // > address, when one or more of its arguments is a vector. In such
+ // > cases, all vector arguments should have the same number of elements,
+ // > and every scalar argument will be effectively broadcast into a vector
+ // > during address calculation.
+ if len(index.Elems) == 0 {
+ return gep.Index{HasVal: false}
+ }
+ // Sanity check. All vector elements must be integers, and must have the
+ // same value.
+ var val int64
+ for i, elem := range index.Elems {
+ switch elem := elem.(type) {
+ case *Int:
+ x := elem.X.Int64()
+ if i == 0 {
+ val = x
+ } else if x != val {
+ // since all elements were not identical, we must conclude that
+ // the index vector does not have a concrete value.
+ return gep.Index{
+ HasVal: false,
+ VectorLen: uint64(len(index.Elems)),
+ }
+ }
+ default:
+ // TODO: remove debug output.
+ panic(fmt.Errorf("support for gep index vector element type %T not yet implemented", elem))
+ //return gep.Index{HasVal: false}
+ }
+ }
+ return gep.Index{
+ HasVal: true,
+ Val: val,
+ VectorLen: uint64(len(index.Elems)),
+ }
+}
diff --git a/ir/constant/expr_memory.go b/ir/constant/expr_memory.go
index 392c5f8..be2b73d 100644
--- a/ir/constant/expr_memory.go
+++ b/ir/constant/expr_memory.go
@@ -116,7 +116,7 @@ func (index *Index) String() string {
func gepExprType(elemType, src types.Type, indices []Constant) types.Type {
var idxs []gep.Index
for _, index := range indices {
- idx := getIndex(index)
+ idx := GetIndex(index)
// Check if index is of vector type.
if indexType, ok := index.Type().(*types.VectorType); ok {
idx.VectorLen = indexType.Len
@@ -126,7 +126,7 @@ func gepExprType(elemType, src types.Type, indices []Constant) types.Type {
return gep.ResultType(elemType, src, idxs)
}
-// NOTE: keep getIndex in sync with getIndex in:
+// NOTE: keep GetIndex in sync with GetIndex in:
//
// * ast/inst_memory.go
// * ir/inst_memory.go
@@ -134,8 +134,8 @@ func gepExprType(elemType, src types.Type, indices []Constant) types.Type {
//
// The reference point and source of truth is in ir/constant/expr_memory.go.
-// getIndex returns the gep index corresponding to the given constant index.
-func getIndex(index Constant) gep.Index {
+// GetIndex returns the gep index corresponding to the given constant index.
+func GetIndex(index Constant) gep.Index {
// unpack inrange indices.
if idx, ok := index.(*Index); ok {
index = idx.Constant
@@ -146,52 +146,8 @@ func getIndex(index Constant) gep.Index {
index = idx.Simplify()
}
switch index := index.(type) {
- case *Int:
- val := index.X.Int64()
- return gep.NewIndex(val)
- case *ZeroInitializer:
- return gep.NewIndex(0)
- case *Vector:
- // ref: https://llvm.org/docs/LangRef.html#getelementptr-instruction
- //
- // > The getelementptr returns a vector of pointers, instead of a single
- // > address, when one or more of its arguments is a vector. In such
- // > cases, all vector arguments should have the same number of elements,
- // > and every scalar argument will be effectively broadcast into a vector
- // > during address calculation.
- if len(index.Elems) == 0 {
- return gep.Index{HasVal: false}
- }
- // Sanity check. All vector elements must be integers, and must have the
- // same value.
- var val int64
- for i, elem := range index.Elems {
- switch elem := elem.(type) {
- case *Int:
- x := elem.X.Int64()
- if i == 0 {
- val = x
- } else if x != val {
- // since all elements were not identical, we must conclude that
- // the index vector does not have a concrete value.
- return gep.Index{
- HasVal: false,
- VectorLen: uint64(len(index.Elems)),
- }
- }
- default:
- // TODO: remove debug output.
- panic(fmt.Errorf("support for gep index vector element type %T not yet implemented", elem))
- //return gep.Index{HasVal: false}
- }
- }
- return gep.Index{
- HasVal: true,
- Val: val,
- VectorLen: uint64(len(index.Elems)),
- }
- case *Undef:
- return gep.Index{HasVal: false}
+ case Indexer:
+ return index.GetIndex()
case Expression:
// should already have been simplified to a form we can handle.
return gep.Index{HasVal: false}
diff --git a/ir/inst_memory.go b/ir/inst_memory.go
index f0aef35..0a1e90a 100644
--- a/ir/inst_memory.go
+++ b/ir/inst_memory.go
@@ -538,69 +538,5 @@ func gepInstType(elemType, src types.Type, indices []value.Value) types.Type {
// getIndex returns the gep index corresponding to the given constant index.
func getIndex(index constant.Constant) gep.Index {
- // unpack inrange indices.
- if idx, ok := index.(*constant.Index); ok {
- index = idx.Constant
- }
- // Use index.Simplify() to simplify the constant expression to a concrete
- // integer constant or vector of integers constant.
- if idx, ok := index.(constant.Expression); ok {
- index = idx.Simplify()
- }
- switch index := index.(type) {
- case *constant.Int:
- val := index.X.Int64()
- return gep.NewIndex(val)
- case *constant.ZeroInitializer:
- return gep.NewIndex(0)
- case *constant.Vector:
- // ref: https://llvm.org/docs/LangRef.html#getelementptr-instruction
- //
- // > The getelementptr returns a vector of pointers, instead of a single
- // > address, when one or more of its arguments is a vector. In such
- // > cases, all vector arguments should have the same number of elements,
- // > and every scalar argument will be effectively broadcast into a vector
- // > during address calculation.
- if len(index.Elems) == 0 {
- return gep.Index{HasVal: false}
- }
- // Sanity check. All vector elements must be integers, and must have the
- // same value.
- var val int64
- for i, elem := range index.Elems {
- switch elem := elem.(type) {
- case *constant.Int:
- x := elem.X.Int64()
- if i == 0 {
- val = x
- } else if x != val {
- // since all elements were not identical, we must conclude that
- // the index vector does not have a concrete value.
- return gep.Index{
- HasVal: false,
- VectorLen: uint64(len(index.Elems)),
- }
- }
- default:
- // TODO: remove debug output.
- panic(fmt.Errorf("support for gep index vector element type %T not yet implemented", elem))
- //return gep.Index{HasVal: false}
- }
- }
- return gep.Index{
- HasVal: true,
- Val: val,
- VectorLen: uint64(len(index.Elems)),
- }
- case *constant.Undef:
- return gep.Index{HasVal: false}
- case constant.Expression:
- // should already have been simplified to a form we can handle.
- return gep.Index{HasVal: false}
- default:
- // TODO: add support for more constant expressions.
- // TODO: remove debug output.
- panic(fmt.Errorf("support for gep index type %T not yet implemented", index))
- //return gep.Index{HasVal: false}
- }
+ return constant.GetIndex(index)
} |
BTW, in document mentions that pointer could be an integer in some cases. This probably a good reason to unify them by an interface.
|
Fair point, this would work. I think the reason is historical, as before the type of a call instruction could be either the return type of the callee or the function type of the callee. This was needed to handle variadic functions, but as of rev c56773e this logic was simplified. As such, we can definitely use your suggested approach instead; that is to just use |
This is one small step towards allowing user-defined instructions. Note, this commit is not perfect though, since we may want to allow user-defined types too, and this commit relies on types.Void instead of some to-be-defined IsVoidType interface (or something along those lines). Updates #59.
I have another idea: What if we make every IR structure became an interface. Therefore type Function interface {
// ...
}
var _ Function = &function{}
type function struct {
// ...
} Anyone want to extend it would be able to: type Class struct {
ir.Function
ir.Struct
}
func NewClass() *Class {
return &Class {
Function: ir.NewFunction(...),
Struct: ir.NewStruct(...),
}
} The benefit I can see at here, is users don't need to learn advanced concept like BTW, this kind of implementation also using in Kubernetes. |
I like the idea! We should definitely create an experimental branch of The main downside I see is that the fields would no longer be exported, so we would have to create a lot of getters and setters, and this would make the API quite bloated. Besides that, the extensibility of the sumtypes would be hugely improved. So there are definitely benefits and drawbacks. So, to get a better understanding of how to balance this, we should implement it in an experimental branch and get more experience. Cheers, |
This comment has been minimized.
This comment has been minimized.
Thanks for starting to experiment with this! As you already pointed out, there are benefits and drawbacks with the interface-everything apprach. The benefits of such extensive use of interfaces is that we would enable users to extend the IR in almost every way. Whether we need this full power of extensibility is yet to be determined, thus great to have more concrete experiments to base such decisions on. For now, lets consider some of the drawbacks with the current experimental implementation. One of the drawbacks to the API seems to be the handling of slices (as you already pointed out in https://github.com/dannypsnl/llvm/commit/ac2baf5820d9c13291f0be568546aba18fceac9e#r37512892). E.e. f.SetFuncAttrs(append(f.GetFuncAttrs(), funcAttr)) The other issue is that type assertions on interfaces may not be enough to fully and uniquely discriminate them. The basic issue is that an interface with just getters/setters for its fields may not be enough to fully determine a unique concrete underlying type, e.g. Cheers, Edit: one possible outcome that we see from this experiment is that some IR concepts may be turned into interfaces to allow user-defined types, but others may be kept as core IR concepts with concrete struct types. Rather than going for a purist approach of having everything be an interface, it may turn out more pragmatic to have the concepts of the IR which are most likely for a user to modify and extend be turned into interfaces. For this, instructions is the obvious contender. And also the LLVM IR defined types. One such example I've wanted personally is an LLVM IR struct type which also tracks the names of its fields. When it comes to instructions, the pseudo-instruction used for comments is also a common example. |
I suddenly realize we can embed |
Can you provide a quick example code which uses embedding of |
For example, we want to generate a class handle, in the meanwhile affect our declaration~~~ type EModule struct {
*ir.Module
}
func (m *EModule) NewClass() *Class {
return &Class {
Function: m.NewFunction(...),
Struct: m.NewStruct(...),
}
} or type EModule struct {
*ir.Module
}
type EStruct struct {
FieldNames []string
*types.StructType
}
func (m *EModule) NewStruct(fieldNames []string, fieldTypes []types.Type) *EStruct {
s := &EStruct {
StructType: m.NewStruct(...),
}
s.FieldNames = fieldNames
return s
} |
I understand your approach and do think it solves a lot of problems, but we still have some remaining issues unless we further refine how we handle sumtypes internally within the For instance, lets consider the user-defined The problem is that it will not work for some parts of our type checking code. Take for instance *types.StructType.Equal, which is implemented as follows: func (t *StructType) Equal(u Type) bool {
if u, ok := u.(*StructType); ok {
if len(t.TypeName) > 0 || len(u.TypeName) > 0 {
// Identified struct types are uniqued by type names, not by structural
// identity.
//
// t or u is an identified struct type.
return t.TypeName == u.TypeName
}
// Literal struct types are uniqued by structural identity.
if t.Packed != u.Packed {
return false
}
if len(t.Fields) != len(u.Fields) {
return false
}
for i := range t.Fields {
if !t.Fields[i].Equal(u.Fields[i]) {
return false
}
}
return true
}
return false
} Notice on line 788 the runtime type assertion: if u, ok := u.(*StructType); ok { If we pass a structType := types.NewStruct(types.I32, types.Float)
structTypeWithNames := &EStruct{
StructType: structType,
FieldNames: []string{"x", "y"},
}
structTypeWithNames.Equal(structType) // return true.
structType.Equal(structTypeWithNames) // return false. <<< NOTE: unexpected result.
structType.Equal(structTypeWithNames.StructType) // return true So, for these reasons, we still have to think about relaxing the internal code base of Cheers, |
Should we rethink sumtypes to allow for user-defined types?
For instance,
ir.Instruction
currently requires the unexportedisInstruction
method, but there are valid use cases where users may wish to define their own instructions to put in basic blocks.One such use case seen in the wild is a comment pseudo-instruction which prints itself as
; data...
, e.g.The text was updated successfully, but these errors were encountered: