Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for tagging mechanism #79

Merged
merged 18 commits into from
Dec 13, 2023
Merged

Proposal for tagging mechanism #79

merged 18 commits into from
Dec 13, 2023

Conversation

jiegillet
Copy link
Contributor

Dear @mpizenberg, @ceddlyburge and @ErikSchierboom,

Following our (fun and productive) discussion about tags, I came up with a prototype for the tagging mechanism, using elm-review's data extractors, which seems like a perfect fit, I have been able to add some tags very quickly.

Some comments:

  • All of the tagging logic is in src/Tags.elm. If you are not familiar with elm-review, you can ignore the boilerplate, the core function is matchExpression and is easy to understand.
  • For now, I have added random tags, some because they were easy, and because they were a bit more subtle (LetExpression, multiline strings).
  • There is one rule commonTagsRule that always returns the same tags commonTags for any project.
  • For more complex traversals, for things like recursion, I would recommend adding a separate rule.
  • I don't know if we want to have more structure around the rules, for now one rule for each type of elm-syntax type (project, module, expressions, function declarations...) seems the most natural fit.
  • I have been testing every case so far, and I have found bugs thanks to it
  • I haven't added smoke tests for now, but it will be simple, we can do it once we have more tags in, they will be easy to add.- @ErikSchierboom I'm not sure if it can go live or not in this incomplete state. Is there a mechanism for going live? The tags.json files will be created from this PR on anyway.

I'd be happy to hear feedback about any of it.

@jiegillet jiegillet requested a review from a team as a code owner November 23, 2023 10:28
src/Tags.elm Outdated Show resolved Hide resolved
Copy link
Member

@ErikSchierboom ErikSchierboom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't added smoke tests for now, but it will be simple, we can do it once we have more tags in, they will be easy to add.- @ErikSchierboom I'm not sure if it can go live or not in this incomplete state. Is there a mechanism for going live? The tags.json files will be created from this PR on anyway.

The mechanism for going live is: merge this PR and then it will deploy. There is no on/off switch, which is also why I would like to see some smoke tests. An alternative would be to don't yet output the tags.json file and put it behind a feature flag. Ideally we would not have to re-run the analyzer on all solutions multiple times.

@mpizenberg
Copy link
Member

This sounds like a solid starting point for all things that do not need more-than-one-node analysis!
I was wondering if uses:prefix-operator could be a tag? seems like something we might want to know (and forbid ahah, at least for operators that are not symmetric).

@jiegillet
Copy link
Contributor Author

jiegillet commented Nov 24, 2023

An alternative would be to don't yet output the tags.json file and put it behind a feature flag. Ideally we would not have to re-run the analyzer on all solutions multiple times.

@ErikSchierboom How would a feature flag work?

This sounds like a solid starting point for all things that do not need more-than-one-node analysis! I was wondering if uses:prefix-operator could be a tag? seems like something we might want to know (and forbid ahah, at least for operators that are not symmetric).

I actually added that very tag on the spreadsheet, but I forgot to include it in the code :)

Copy link
Contributor

@ceddlyburge ceddlyburge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Its a stunningly small amount of code to create the tags :)

It seems like we need to think about what we need to do to release this.

Here is my first pass at it:

  • Make sure that all the tags we want are created
  • Write at least one smoke test, but probably a few more, probably for some of the more common exercises / solutions (definitely twofer!)
  • Ideally do a canary release, just using it for a small subset of solutions (maybe just solutions done by the maintainters or something like that). Not sure how easy / possible this is from an exercism point of view though.
  • Maybe have some monitoring / logging for a while at the start, so we can keep an eye on it. Not sure what this would entail, maybe it could use the AI tags as well and flag if anything is very different or something like that. Not totally sure this is worth it, but would provide more confidence.

src/Tags.elm Outdated

type alias ModuleContext =
{ lookupTable : ModuleNameLookupTable
, tags : Set Tag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be ProjectContext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmh that's an interesting question. I would say no, because conceptually I'm not trying to hold the ProjectContext inside of the ModuleContext, the ProjectContext happens to only have tags in it, but it could potentially need more things.
I think a satisfying solution would be to define type alias ProjectContext = { tags: Set Tag }.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to derail the conv so I said nothing before, but now that's on the table ... I tend to forbid all type alias that are trivial renames in my code bases. I always prefer having something like { context : Set Tag } instead of type alias Context = Set Tag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you do for type alias Tag = String?
It really does make the code easier to read for me, and wrapping would be quite noisy. Would you wrap it anyway? Or give up and use String?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- I'd change the following
type alias Tag = String
type alias ProjectContext = Set Tag
type alias ModuleContext = { ..., tags : Set Tag }
commonTags : Set Tag
fromProjectToModule : Rule.ContextCreator ProjectContext ModuleContext
fromModuleToProject : Rule.ContextCreator ModuleContext ProjectContext
dataExtractor : ProjectContext -> Value
matchExpression : Node Expression -> Set Tag

-- Into the following (no Tag alias, no ProjectContext alias)
type alias ModuleContext = { ..., tags : Set String }
commonTags : Set String
fromProjectToModule : Rule.ContextCreator { tags : Set String } ModuleContext
fromModuleToProject : Rule.ContextCreator ModuleContext { tags : Set String }
dataExtractor : { tags : Set String } -> Value
-- also changed the name here
tagMatchExpression : Node Expression -> Set String


ruleConfig : RuleConfig
ruleConfig =
{ restrictToFiles = Nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this include any template type files that are provided along with the solution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will, that's a good point, the data extractor will run on the whole project (it can even look at the elm.json if we want).
Do we want to skip the support files? If we do, we should enforce a convention on their names (I think we already do that, but it should be in the CI).

@ErikSchierboom what does the C# tagger do when it comes to solutions with editor files in it? Skip it? Include it?

You just made me realize that this will probably tag the tests as well, which I guess we don't want, so I'll try to avoid that.

Adding file names in the field restrictToFiles won't work though, because it's eventually fed Rule.filterErrorsForFiles which I think only works for errors, not for data extractions. I should rename it to reflect that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ErikSchierboom what does the C# tagger do when it comes to solutions with editor files in it? Skip it? Include it?

I skip editor files (and example, exemplar, test and invalidator files)

FunctionOrValue [ "Regex" ] _ ->
Set.singleton "technique:regular-expression"

FunctionOrValue [ "Debug" ] _ ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we forbid this in the Analyser elsewhere, as part of the common rules? Might be worth having here just in case though.

Copy link
Contributor Author

@jiegillet jiegillet Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That rule is Actionable, so I think it's not blocking students from publishing a solution with a debug anyway. Also solutions published before the rule would have them.

BTW, that just made me think of adding a Debug concept, that might be worth doing.

@jiegillet
Copy link
Contributor Author

jiegillet commented Nov 25, 2023

Thank you for the feedback so far. I made some changes and I added a bunch of tags, but it's still WIP.
Could I ask for your help on some of the decisions? On the spreadsheet I marked some questions in orange wherever I couldn't make a clear decision, please add some decisive comments on there so we can make progress :)

@jiegillet jiegillet marked this pull request as draft November 25, 2023 14:30
@ErikSchierboom
Copy link
Member

Ideally do a canary release, just using it for a small subset of solutions (maybe just solutions done by the maintainters or something like that). Not sure how easy / possible this is from an exercism point of view though.

We don't yet have anything like this and I wouldnt' really want to start adding exceptions, sorry.

Maybe have some monitoring / logging for a while at the start, so we can keep an eye on it. Not sure what this would entail, maybe it could use the AI tags as well and flag if anything is very different or something like that. Not totally sure this is worth it, but would provide more confidence.

This sounds somewhat complicated. Personally, having a nice set of smoke tests should give us enough confidence that this is working.

@ErikSchierboom
Copy link
Member

@ErikSchierboom How would a feature flag work?

Well, the thing I would like to enable is for you all to continue working on adding more tags whilst still having CI verify the tags output. So maybe a docker build arg that is off by default?

@jiegillet jiegillet marked this pull request as ready for review December 9, 2023 02:17
@jiegillet jiegillet added the x:action/create Work on something from scratch label Dec 9, 2023
@jiegillet jiegillet added x:type/coding Write code that is not student-facing content (e.g. test-runners, generators, but not exercises) x:size/massive Massive amount of work labels Dec 9, 2023
@jiegillet
Copy link
Contributor Author

jiegillet commented Dec 9, 2023

I know I won't have a lot of free time in the next few weeks, so I'll stop working on this PR further and submit it as it is, it's already very big anyway. Some notes:

  • I have added a feature flag, bin/run.sh slug input output --tags. If the 4th argument is missing or anything other than --tags then the tags.json won't be written. run-in-docker.sh doesn't have it and run-tests-in-docker.sh has it.
  • I added smoke tests for tags for the existing test data we have, but it would be nice to add more specific ones later.
  • There is still a number of tags not implemented, that can be done later in smaller chunks
  • If you want to add a tag or something, add it on the spreadsheet
  • I will open an issue with a roadmap for deploying the tags feature
  • @ErikSchierboom I have asked for a review from you, obviously you don't need to check the Elm code, but please check the scripts

@jiegillet jiegillet mentioned this pull request Dec 9, 2023
5 tasks
Copy link
Contributor

@ceddlyburge ceddlyburge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jeremie, these latest changes look good to me.

Copy link
Member

@ErikSchierboom ErikSchierboom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Awesome work

@jiegillet jiegillet merged commit 7db1b85 into main Dec 13, 2023
4 checks passed
@jiegillet jiegillet deleted the jie-tags branch December 13, 2023 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
x:action/create Work on something from scratch x:size/massive Massive amount of work x:type/coding Write code that is not student-facing content (e.g. test-runners, generators, but not exercises)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants