Let's add an option to turn off aborts and error messages on errors. #340
Replies: 6 comments 3 replies
-
If we change the bort() subroutine, so that it checks the new option and only aborts if it should, then we have a pretty easy fix for a lot of the problem. For example we could then test error conditions for all our other code. The problem is that we would be using the same error code for all errors. In order to have library-wide error codes, we would need a new bort(), one which took an integer argument, which would be an error code. Then there needs to be a library function which returns the associated error message. What would also be nice is if we could also handle the existing error messages. We want to turn them off, but save them in case the user wants to get them. For that we could have an internal buffer that contains the last error message (or all spaces). Then there could be a function to get that. (And if we were doing that we might as well have an integer buffer which stores the last error number.) In that case, the user would get the general bort return code for errors, but could then check the error message buffer for any message. |
Beta Was this translation helpful? Give feedback.
-
What if we add an optional argument to bort(), which is err_code. Then, as time goes by, we can come up with a library-wide list of error codes, and return the appropriate code from each bort. |
Beta Was this translation helpful? Give feedback.
-
From @jack-woollen
|
Beta Was this translation helpful? Give feedback.
-
I don't think we have to worry about performance because:
As you very correctly practice in your netCDF code: you must always check the return value. This always adds an if-statement, but that's OK, it's worth it. Right now we have code like:
And bort is like this:
So we change to:
And then do: The we change calls to it like this:
If the function/subroutine already has a return code in it's parameters, we can use that. If not, callers can check the BORT_ERRSTR to see if it is empty. If it's not, there was an error. So all NCEPLIBS-bufr calls would need to be like:
Ideally we would have an error number, instead of a string. That's a further elaboration on this idea that would require adding a parameter to bort(). |
Beta Was this translation helpful? Give feedback.
-
Passing return codes from library functions is not cosmetic. It is basic functionality that is expected from every library. Crashing the calling application is a serious bug that makes NCEPLIBS-bufr unusable to most other applications. When we do something in our library which no one else does one of the following is true:
Choice 2 is far more likely. In a fully tested code base I would expect this to take a couple of days, or a week at most. I will get back to this in a few months, after I have made some more GRIB progress. I have no doubt that I can make these changes quickly and easily, once the library is fully tested. |
Beta Was this translation helpful? Give feedback.
-
I think any solution also needs to take into account that an application code can have multiple Fortran logical units attached to the library at any given time. So a bort-like error condition could occur which impacts one logical unit but not the others, and therefore any bort_flag or bort_errmsg values should be indexed to or otherwise keyed to a particular logical unit. This would allow an application code to make a more informed decision about what to do, and this would also make sense from a user's standpoint, since pretty-much every routine that a user ever calls is already passing in such a logical unit number anyway as one of the arguments. This is the basic approach I took in PR #509, using the internal module stcode and the user-callable subroutine igetsc to gracefully exit from an error that previously caused an abort in subroutine usrtpl. In particular, note how the internal iscodes array and the input argument to igetsc are both keyed to a specific logical unit, which in turn allows there to be different status codes for each logical unit rather than multiple ones simultaneously overwriting or otherwise tripping over each other between multiple logical units. I realize this isn't exactly the same as what's being discussed above to route all error processing through subroutine bort, but it's the same basic concept, so let's please keep it mind whenever we get around to tackling this issue on a library-wide basis. In particular, let's plan to make any bort_flag or bort_errmsg values specific to each individual Fortran logical unit. Also, if we do end up deciding to route everything through subroutine bort, then let's please also go back and retrofit the solution that I implemented in #509, so that we have the same consistent approach for all error processing throughout the library. |
Beta Was this translation helpful? Give feedback.
-
Right now we have aborts when there is an error. This is inconvenient to the calling applcation.
How about we add an option to change this behavior. In the option is used, the library will print nothing, and will not abort, but will instead return an error code.
Obviously this would touch a lot of code, and would have to be done very carefully. But it is absolutely required if we are going to create a V2 API, and also required to reasonably test all error behavior. It's also required for the C API to get any traction.
Calling abort on error is a holdover practice from ancient times. We can change this behavior in a backward compatible way.
Beta Was this translation helpful? Give feedback.
All reactions