-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test Generation Post #9
Conversation
…markdown Date + advise to use latest nightly build
…markdown to 2023-11-01-automated-test-generation-chess-puzzles-with-dafny.markdown Rename file
All the associated PRs are now merged into Dafny so I think the blogpost is ready to merge as well! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This post is in good shape. I'm requesting that you provide a mechanism as described in the README so that we can test this blog post with any upcoming version of Dafny, and a few other reformulations.
date: 2023-11-01 18:00:00 +0100 | ||
author: Sasha Fedchin | ||
--- | ||
Even if you have successfully verified your code, you might still want to generate test cases, whether to ensure the verified properties hold at runtime once you link your binary against external libraries, or to compare your code to an existing reference implementation. Or perhaps you are about to embark on another proof and want to have some initial assurance that your yet-to-be-verified implementation is correct - again, runtime testing can help. In this blog post, I show how Dafny’s built-in automated test generation can be useful in these situations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The introduction is excellent for someone who already likes to verify their code;
That said, I envision that a lof of people in the audience might not be always familiar with Dafny, have heard of it, or are currently deciding whether to give it a try.
Could you modify slightly your introduction so that it addresses someone like that in mind? It could be something like
Given all the incredible examples that showcase how Dafny can help you verify your code on all its aspects (links to other posts), you might start to think that verification of programs can replace tests. Can you imagine a future where no tests are needed, because verification does all the job? This looks so promising! However if, like number of industrial Dafny users, you want to integrate Dafny code with existing codebases, you will face challenges that verification alone won't solve:
- If you haven't written high-level specifications, your code does not crash (thanks Dafny!) but you are not sure you covered all cases that your describe.
- You actually link to external code that you are not sure you trust, and you would like to test them by calling your Dafny code.
- Or your Dafny code is actually a reference implementation of another program and you want to compare both programs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for suggestion a new intro, I think it works much better! I have now included it in the blogpost with minor rewrites.
_posts/2023-11-01-automated-test-generation-chess-puzzles-with-dafny.markdown
Outdated
Show resolved
Hide resolved
_posts/2023-11-01-automated-test-generation-chess-puzzles-with-dafny.markdown
Outdated
Show resolved
Hide resolved
_posts/2023-11-01-automated-test-generation-chess-puzzles-with-dafny.markdown
Outdated
Show resolved
Hide resolved
_posts/2023-11-01-automated-test-generation-chess-puzzles-with-dafny.markdown
Outdated
Show resolved
Hide resolved
Describe(board); | ||
} | ||
} | ||
{% endhighlight %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, if you can confirm me that this code is run when I launch "make test", I'd feel more comfortable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now testing all examples in the blogpost in the CI
if CheckedByPlayer(board, whiteKing, Black) { | ||
print("White king is in check\n"); | ||
} else { | ||
print("White king is safe\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idiomatic use case of tests in Dafny is actually to write expect
statements that will test the behavior of your methods if they return something, not just print something to ensure that the program does not crash (which should not happen in pure Dafny, but does happen when interfacing with external code). Could you come up with one or more expect statements here that could be relevant? For example:
print("White king is safe\n"); | |
print("White king is safe\n"); | |
expect CheckmatedByPlayer(board, whiteKing, Black); |
It should not be possible to print "White kind is safe" followed by "Is is checkmate for white", but all other combinations are possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! I have added an expect statement and a paragraph about how they work:
Note that I also added an
expect
statement in the code. Anexpect
statement is checked to be true at runtime and triggers a runtime exception if the check fails. In our case, theexpect
statement is trivially true (a checkmate always implies a check) and every test we generate passes it. In fact, if you were to turn thisexpect
statement into an assertion, Dafny would prove it. Not everyexpect
statement can be proven, however, especially in the presence of external libraries. You can read more aboutexpect
statements here.
{% include test-generation-svgs/block-no-inlining/1.svg %} | ||
</center> | ||
|
||
The image on the right shows a checkmate and the image on the left corresponds to the absence of either a checkmate or a check. While the exact configurations will depend on the version of Dafny you are using, you will always get two tests and there will never be a test for the case in which the king is checked but can escape. This is because under the block coverage criteria, such a test is unnecessary - there is no one statement in the target method that this test would cover that the two existing ones do not. To expand the set of tests Dafny generates we can instead prompt it to target path-coverage - the most expensive form of coverage Dafny supports. The relevant command is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two ifs in the test entry, so I understand two cases are necessary at minimum because otherwise there is at least one branch that would not be executed. However, it's not obvious that you need only two, especially if there are dependencies. Could you rephrase your paragraph so that you can give immediately the intuition about why two tests is sufficient, and why the two tests will always be no check and checkmate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have paraphrased the paragraph to say that Dafny could have generated three (or even more) tests but that it is not compelled to do so by the selected coverage criterion. Two tests are sufficient because they cover all statements within the test entry method (and because we are not inlining any other methods, so Dafny only cares about statements within the test entry method). It will always generate two tests in the particular case due to how test generation is optimized under the hood but it probably doesn't make sense to go into that in the blogpost.
_posts/2023-11-01-automated-test-generation-chess-puzzles-with-dafny.markdown
Outdated
Show resolved
Hide resolved
_posts/2023-11-01-automated-test-generation-chess-puzzles-with-dafny.markdown
Outdated
Show resolved
Hide resolved
Co-authored-by: Mikaël Mayer <[email protected]>
Co-authored-by: Mikaël Mayer <[email protected]>
…-dafny.markdown Co-authored-by: Mikaël Mayer <[email protected]>
Co-authored-by: Mikaël Mayer <[email protected]>
…-dafny.markdown Co-authored-by: Mikaël Mayer <[email protected]>
Thank you, @MikaelMayer, for all the feedback on this! I have incorporated all the comments into the text and created a testing pipeline for the blogpost, which should run in the GitHub CI. The CI runs all the Dafny commands in the blogposts and checks that: (i) tests are generated without errors, (ii) tests are executed without errors, and (iii) all Dafny sources, including those generated automatically, verify. The CI runs for a few minutes due to the final step which generates a lot of tests in the path mode (this could be easily removed, if necessary). The example in this blogpost is also tested separately within Dafny's own CI here. This other set of tests is modified slightly to make it run faster but it also checks the exact output of each test in more detail. On a separate note, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great post! Thanks for writing it, for the SVG generation and for the shiny new workflow!
date: 2023-11-10 11:00:00 -0500 | ||
author: Sasha Fedchin | ||
--- | ||
Given all the incredible examples that showcase how powerful Dafny is (did you know that part of Dafny is written in Dafny?), you might think that verification can fully replace testing. And in many cases it does! However if, like a number of industrial Dafny users, you want to integrate Dafny code with existing codebases, you will face challenges that verification alone might not solve: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all the incredible examples
which?
(did you know that part of Dafny is written in Dafny?)
Is this one of the incredible examples? The parenthesis and phrasing make it sound like a barely related "by the way"
you might think
This implies an expected lack of understanding by the reader. Consider only stating the facts.
And in many cases it does!
I would drop the "And" and the !
.
industrial Dafny users
I would refrain from dividing Dafny users into industrial and non-industrial. I suggest dropping "like a number of industrial Dafny users". I think your intention of that subsentence is to state that what you will describe is a proven use-case for at least some people, but I think the reader will mostly care whether this feature matches their use-case.
you will face challenges that verification alone might not solve
"will" followed by "might" seems strange. Apparently all the challenges you will face might only require verification. Consider changing to "You may face challenges that can not be solved using just verification:"
However, another problem with that sentence is that users will face many challenges, many of which don't relate to correctness at all, like getting the Dafny type checker to pass, learning Dafny, or needing sleep, so saying there are challenges not solved by verification seems obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for raising these issues! I have rewritten the first paragraph to take your suggestions into account, particularly by adding several references to previous blogposts to showcase the power of Dafny and clarify that the challenges I refer to specifically call to testing as an additional quality assurance technique on top of verification.
--- | ||
Given all the incredible examples that showcase how powerful Dafny is (did you know that part of Dafny is written in Dafny?), you might think that verification can fully replace testing. And in many cases it does! However if, like a number of industrial Dafny users, you want to integrate Dafny code with existing codebases, you will face challenges that verification alone might not solve: | ||
|
||
- Perhaps your code is actually a reference implementation for another program and you want to test the equivalence between the two. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This list of items was prefixed by "might", so I would leave out the conditionals here and drop "Perhaps", "might", etc, like the third item in the list does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Given all the incredible examples that showcase how powerful Dafny is (did you know that part of Dafny is written in Dafny?), you might think that verification can fully replace testing. And in many cases it does! However if, like a number of industrial Dafny users, you want to integrate Dafny code with existing codebases, you will face challenges that verification alone might not solve: | ||
|
||
- Perhaps your code is actually a reference implementation for another program and you want to test the equivalence between the two. | ||
- You might be calling out to external libraries, which are not themselves verified, and would like to test them by calling your Dafny code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's external? I think you mean code that's not written in Dafny.
Consider something like "Through its foreign function interface, Dafny allows implementing specifications using languages other than Dafny. However, Dafny can not verify such external implementations. To still gain confidence in the correctness of these, you can use Dafny to write tests on them."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I think your suggestion makes the meaning much clearer, especially for people not familiar with Dafny, and I have adopted it in the post: "You are using Dafny's foreign function interface, which allows implementing specifications in languages other than Dafny. Dafny can not verify such external implementations, so you want to test them at runtime to still gain confidence in their correctness."
|
||
- Perhaps your code is actually a reference implementation for another program and you want to test the equivalence between the two. | ||
- You might be calling out to external libraries, which are not themselves verified, and would like to test them by calling your Dafny code. | ||
- You have written high-level specifications for your code and are about to embark on a proof but want to have some initial assurance that your yet-to-be-verified implementation does conform to these specifications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have written high-level specifications
What do you mean by high-level?
specifications
Does my implementation have multiple specifications? Is one not enough?
You have written high-level specifications for your code
Is a specification not code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed this line to "You have written a specification for your program..." - I think this should make the phrasing more precise!
- You might be calling out to external libraries, which are not themselves verified, and would like to test them by calling your Dafny code. | ||
- You have written high-level specifications for your code and are about to embark on a proof but want to have some initial assurance that your yet-to-be-verified implementation does conform to these specifications. | ||
|
||
In this blog post, I show how Dafny’s built-in automated test generation can be useful in these situations. In the example I use, Dafny has to position pieces on a chess board in such a way that certain constraints are satisfied. This is exactly what I want to illustrate: chess boards will be generated tests, and constraints will be the program we want to test. A brute force enumeration or fuzzing might not be up to the task here, since the number of ways in which you can arrange pieces on the board is astronomically high, even under the constraints I add to the problem. Dafny, however, can use the verifier to generate tests and is, therefore, much more efficient in finding solutions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this blog post
Anywhere in this blog post? Consider being more specific: "In the following sections,"
Also, whether it's a blog post or a newspaper article doesn't seem relevant.
I show how
Maybe ignore this comment, but I have to get used to the usage of "I", although I remember we would use or allow "I" for the blog posts.
Looking at one of the other blog posts, like https://dafny.org/blog/2023/08/14/clear-specification-and-implementation/, they don't use "I" at all.
Consider using the text itself as the subject. "The following sections explain how Dafny’s built-in automated test generation can help with these scenarios"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have changed "In this blog, I..." to "The following sections explain".
I removed a few other uses of first person in the latest set of changes, but I think even amongst the already published blogpost, this is more a matter of preference, e.g. making verification compelling blogpost uses first person quite often.
|
||
In this blog post, I show how Dafny’s built-in automated test generation can be useful in these situations. In the example I use, Dafny has to position pieces on a chess board in such a way that certain constraints are satisfied. This is exactly what I want to illustrate: chess boards will be generated tests, and constraints will be the program we want to test. A brute force enumeration or fuzzing might not be up to the task here, since the number of ways in which you can arrange pieces on the board is astronomically high, even under the constraints I add to the problem. Dafny, however, can use the verifier to generate tests and is, therefore, much more efficient in finding solutions. | ||
|
||
The post is structured as follow: I will first outline a [subset of chess rules using Dafny](#1-modelling-chess-in-dafny). This code serves as a reference point throughout the post. I then demonstrate [the basics of test generation](#2-test-generation-basics) and discuss the different coverage metrics you can target. This will let me deal with [visualizing coverage and identifying dead code](#3-there-is-dead-code-here). Moving on to advanced features, I will offer a [broader discussion of quantifiers, loops and recursion](#sec-quantifiers-loops-and-recursion) — features that require special care when attempting to generate tests. Finally, this blogpost concludes with a [summary and general guidelines](#conclusions-and-best-practices) for how to apply this technique to your own Dafny programs. You can also find more information on automated test generation in the [Dafny reference manual](https://dafny.org/dafny/DafnyRef/DafnyRef#sec-dafny-generate-tests). If you want to try test generation on your own, I recommend using the latest stable Dafny nightly release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend using the latest stable Dafny nightly release
Can we make the text more future proof by providing some minimal version that's recommended, even if it not yet released? "I recommend using Dafny 4.4 or greater once it's released, and before that the Dafny nightly release"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense! Have added a recommendation to use Dafny 4.4 or above
|
||
In this blog post, I show how Dafny’s built-in automated test generation can be useful in these situations. In the example I use, Dafny has to position pieces on a chess board in such a way that certain constraints are satisfied. This is exactly what I want to illustrate: chess boards will be generated tests, and constraints will be the program we want to test. A brute force enumeration or fuzzing might not be up to the task here, since the number of ways in which you can arrange pieces on the board is astronomically high, even under the constraints I add to the problem. Dafny, however, can use the verifier to generate tests and is, therefore, much more efficient in finding solutions. | ||
|
||
The post is structured as follow: I will first outline a [subset of chess rules using Dafny](#1-modelling-chess-in-dafny). This code serves as a reference point throughout the post. I then demonstrate [the basics of test generation](#2-test-generation-basics) and discuss the different coverage metrics you can target. This will let me deal with [visualizing coverage and identifying dead code](#3-there-is-dead-code-here). Moving on to advanced features, I will offer a [broader discussion of quantifiers, loops and recursion](#sec-quantifiers-loops-and-recursion) — features that require special care when attempting to generate tests. Finally, this blogpost concludes with a [summary and general guidelines](#conclusions-and-best-practices) for how to apply this technique to your own Dafny programs. You can also find more information on automated test generation in the [Dafny reference manual](https://dafny.org/dafny/DafnyRef/DafnyRef#sec-dafny-generate-tests). If you want to try test generation on your own, I recommend using the latest stable Dafny nightly release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"as follow:"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this! Have fixed
|
||
Note also that Dafny annotates the two test methods with the `{:test}` attribute. This means that after saving the tests to a file (`tests.dfy`), you can then use the `dafny test tests.dfy` command to compile and execute them. The default compilation target is C# but you can pick a different one using the `--target` option - read more about the `dafny-test` command [here](https://dafny.org/dafny/DafnyRef/DafnyRef#sec-dafny-test). | ||
|
||
Oftentimes, you would want to translate the tests from Dafny code to some other format that you are using throughout your project. To do that, you can serialize every test by adding a set of print statements at the end of the test entry method. For our purposes, I wrote the `SerializeToSVG` method (included in the <a href="/blog/assets/src/chess.dfy">sources</a>) that converts each test to an `.svg` image of the corresponding chess board. You can download the relevant image assets <a href="/blog/assets/images/test-generation/assets">here</a> (originally from <a href="https://github.com/tabler/tabler-icons/tree/master">tabler icons</a>). Let us look at the results in picture form (knights are the pieces that look like a horse’s head and the white king is the only piece that isn’t filled in): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oftentimes
Why not just "often" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular reason) Have changed to "often"
- You might be calling out to external libraries, which are not themselves verified, and would like to test them by calling your Dafny code. | ||
- You have written high-level specifications for your code and are about to embark on a proof but want to have some initial assurance that your yet-to-be-verified implementation does conform to these specifications. | ||
|
||
In this blog post, I show how Dafny’s built-in automated test generation can be useful in these situations. In the example I use, Dafny has to position pieces on a chess board in such a way that certain constraints are satisfied. This is exactly what I want to illustrate: chess boards will be generated tests, and constraints will be the program we want to test. A brute force enumeration or fuzzing might not be up to the task here, since the number of ways in which you can arrange pieces on the board is astronomically high, even under the constraints I add to the problem. Dafny, however, can use the verifier to generate tests and is, therefore, much more efficient in finding solutions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dafny has to position pieces on a chess board
Does Dafny, the language, have to position pieces on a chess board? How can a language do that?
This is exactly what I want to illustrate: chess boards will be generated tests, and constraints will be the program we want to test.
Don't you want to use the example to illustrate a higher level concept?
Consider:
"I will use Dafny's test generation feature to search for chess board states that satisfy a condition, such as the white king being checked. The generated tests will contain the board states, while the condition is defined by the Dafny program under test"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now specifically refer to Dafny's test generation toolkit when describing this functionality and have incorporated the paraphrase you suggest!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the content of the article super cool and have no trouble reading it. I see no problem with publishing this as is.
However, reviewing only the first 17 (quite long) lines of the file 2023-11-10-automated-test-generation-chess-puzzles-with-dafny.markdown
, I already had a lot of comments. I expect the rest of the text could benefit from a review as well but I didn't get to that yet.
I'll leave it up to you to decide whether to seek out further review or to try to wrap this up.
Thank you, @keyboardDrummer, for the review, particularly for all the suggestions regarding the intro - I have incorporated all of them into the text. I think the intro is, as is usually the case, the most contentious part, so if you think it is in a good shape after these changes, I would say the text is in a good shape to go. Let me know if there are any other changes you would want me to make! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
This is the latest version of the blog post about test generation. The example from the post is now tested via GitHub actions both in this repository and, in a shortened form, as part of Dafny's own CI.