Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P2 timeout is not checked correctly when response has consecutive frame #42

Open
tienlocnguyen opened this issue Nov 28, 2019 · 27 comments

Comments

@tienlocnguyen
Copy link

In client.py, function send_request wait for entire response with timeout is P2 timeout. But as I know, P2 timeout is timeout to the first response(can be first frame) not entire response. So with long response, timeout error will occur

@pylessard
Copy link
Owner

pylessard commented Nov 28, 2019

Hi,
I understand the issue you point out. It might be a little tricky to handle. Do you have the reference in the standard that suggest this is the right behaviour?

@tienlocnguyen
Copy link
Author

Hi pylessard,
After double check, I found that ISO 15765-3 define P2client timeout as timeout for waiting start of incoming response(Single or first frame).
But this standard is revised by ISO 14229-3:2012.
So I need more time to check the standard.
Thanks

@pylessard
Copy link
Owner

Thank you for your input.
I will wait a feedback from you before looking into any solutions.

Regards

@tienlocnguyen
Copy link
Author

Hi pylessard,
Could you double check definition of P2 client timeout on 14229-2? I don't have this standard.

@Sauci
Copy link

Sauci commented Jun 23, 2020

according to ISO-15765-3, the P2_can_client is defined as:
Timeout for the client to wait after the successful transmission of a request message (indicated via N_USData.con) for the start of incoming response messages (N_USDataFirstFrame.ind of a multi-frame message or N_USData.ind of a SingleFrame message).

@martinjthompson
Copy link
Contributor

ISO14229-2:2013(E) defines P2Client as

Timeout for the client to wait after the successful transmission of a request message (indicated via T_Data.con) for the start of incoming response messages (indicated via T_DataSOM.ind of a multi-frame message or T_Data.ind of a SingleFrame message).

Figure 9 shows the P2Client timer being stopped at SOM.ind - just after the start of the response message, with the following text:

Client T_DataSOM.ind: transport/network layer issues to diagnostic application the reception of a StartOfMessage. Client stops
the P Client timer.

@MatinF
Copy link

MatinF commented Dec 9, 2021

Hi all,

Is there a standardized timeout/interval specified when performing CAN ISO-TP requests?

Specifically, I am thinking of the following intervals to be satisfied by the ECU and tester:

tester:  Send Single Frame (SF) request
- interval_1
ECU:    Send First Frame (FF) response
- interval_2
tester:  Send Flow Control (FC) frame
ECU:    Send remaining Consecutive Frames (CF)

It is unclear to me if interval_1 and interval_2 above are well-defined - and if so in what standard?

Further, if they are "dynamic" and depend on the specific ECU implementation, how is the tester (e.g. a diagnostic tool) supposed to know how fast to respond with the Flow Control frame after receiving the First Frame from the ECU?

@pylessard
Copy link
Owner

pylessard commented Dec 13, 2021

Hi @MatinF
Timing requirements would be defined by ISO-15765.
For the specific interval you are requesting, I think there is no requirement. Only timeouts value if the delay is too long.
I am referring to ISO-15765-2:2016 section 9.8

image

image

@MatinF
Copy link

MatinF commented Dec 13, 2021

Thanks for the clarification. So just to verify that I've understood this correctly:

  1. There is no "minimal wait time" before an ECU sends the First Frame
  2. There is no "minimal wait time" before a tester can send the Flow Control frame after receiving the First Frame

In regards to the Timeout, it looks like this is up to 1000 ms based on the above
3) Is it correctly understood that the First Frame will always arrive within 0 to 1000 ms after the Single Frame request?
4) Is it correctly understood that the Flow Control frame must always be sent within 0 to 1000 ms after the First Frame?

I guess what I am unsure about is whether one can rely on the 1000 ms as a standard across all ECUs - or if this is implementation specific. If the latter is the case, do you know if there are some general conventions/good practices to follow in terms of what timings to expect?

Thanks!

@Sauci
Copy link

Sauci commented Dec 13, 2021

Hello @MatinF,

  • Concerning the interval_1:
    I think the transport layer handles the request and the response in two different transactions, and is not responsible of the interval_1 timeout management. This timeout value is managed by the communication layer above the transport layer.
    For example, having the UDS protocol in mind, the tester will send, (for example):
    • 22 xx yy as first transport layer transaction
    • 62 xx yy 12 34 as second transport layer transaction (which might also be a segmented message)
  • Concerning the interval_2:
    The interval between the confirmation of the request (client/tester side) and the indication of the first frame reception (client/tester side) is N_Bs from what I see on the screenshot shared by @pylessard, so I would say yes, 1000ms looks correct.

@pylessard
Copy link
Owner

Thanks for the clarification. So just to verify that I've understood this correctly:

  1. There is no "minimal wait time" before an ECU sends the First Frame
    I think this is right

  2. There is no "minimal wait time" before a tester can send the Flow Control frame after receiving the First Frame
    I think this is right

  3. Is it correctly understood that the First Frame will always arrive within 0 to 1000 ms after the Single Frame request?
    No. But it is not a timing specification question here, you misunderstood the purpose of single frame and first frame. First frames and single frames does not come one after the other, so First Frame is not expected after a single frame, tehrefore, no timeout value applies here.

Single frame carries an isotp frame in a single CAN message. First Frame is the begining of an isotp frame that requires multiple CAN message. So they are both the beginning of a transmission. They are unrelated.

  1. Is it correctly understood that the Flow Control frame must always be sent within 0 to 1000 ms after the First Frame?
    I think this is right. 1000ms is very long though. From my experience, you can expect it within few milliseconds.

On a additional note.
a 500kbits link will allow a bit more than 4 messages per milliseconds. A good ECU is able to use 100% of bandwidth. In that case, a response is expected within the following millisecond (if the CAN bus is not too much loaded). Many less good ECU will be able to read 1 CAN message per milliseconds, very often limited by software. I even saw one very bad ECU limited to 1 message per 10 milliseconds (ouch.).

If you want to be safe in your implementation, you can leave a 1 or 2 millisecond delay where there is no specification. That will ensure maximum compatibility.

Regards

@Sauci
Copy link

Sauci commented Dec 13, 2021

Hello @pylessard,

Concerning the hard-coded delay, I'm not sure to understand. The transport layer also specifies the STMin parameter (minimum separation time between two consecutive frames), which is communicated to the client/tester by the ECU trough the Flow Control frame (byte 3). Wouldn't it be the responsibility of the ECU to tell the client/tester to lower the communication speed?

@pylessard
Copy link
Owner

Yes, between consecutive frames. (I think)
But the interval pointed out by @MatinF are not defined.
I guess the STmin from previous frame could be reused between last frame and next beginning of transmission.

@MatinF
Copy link

MatinF commented Dec 13, 2021

Just to clarify:

When I mention the time between the Single Frame and First Frame, by Single Frame I refer to the "request" frame to be sent by the testing tool (which will have the same format as a Single Frame in the ISO TP context). Hence, my question was related to what the defined max timeout would be before the ECU must send the First Frame response to the request.

My understanding here is that this delay may be 1000 ms in theory, but in practice it will most likely be far less (e.g. 1-10 ms).

Similarly, the timeout before the Flow Control must be sent by the tester device should be 1000 ms in practice (i.e. N_Bs), but in practice the Flow Control would typically also be sent far more swiftly after the First Frame response from the ECU (again e.g. within 1-10 ms).

Let me know if I missed something here.

@canon193
Copy link

canon193 commented Jun 13, 2022

Hi @pylessard,

I have just faced this issue during testing read dtc request. The expected data flow is as below:

Tester: Send read DTC request
ECU: First frame response
Tester: Send control flow
ECU: Send consecutive frame
ECU: Send consecutive frame
...

Then I got the P2 timeout after 1 second. From my understanding, the lib is waiting for isotp to send receive done notification during waiting for P2 time. However, according to ISO14229-2, the P2 timeout should be the timing between tester request and first response from ECU (which is the first frame). I believe there should be another notification from isotp like start receiving notification to stop the P2 timeout.
Could you share your oppinion?

Reference from ISO14229-2:
image

@pylessard
Copy link
Owner

Just to let you know that isotp v2.x will be released soon and the possibility of doing a blocking send has been added. This will allow starting the timer at the right moment.

I will see how I can bring that api up to the UDS level so it can be used across connections (that supports it)

@pylessard
Copy link
Owner

isotp v2.x has been released. and udsoncan 1.21 supports it.

using the new parameter from isotp v2.x blocking_send=True, this issue should be resolved.
I am still wondering if I should enable something within the udsoncan API for that

@demedvedev
Copy link

Hi,

I believe the problem a still here. I am trying to read DID which is 20b long and get P2Server timeout error.

p2_timeout(float)
Maximum amount of time in seconds to wait for a first response (positive, negative, or NRC 0x78). After this time is elapsed, a TimeoutException will be raised if no response has been received. See ISO 14229-2:2013 (UDS Session Layer Services) for more details. Default value of 1

Python 3.11.8 (tags/v3.11.8:db85d51, Feb 6 2024, 22:03:32) [MSC v.1937 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.

= RESTART: READ.py
Timestamp: 1713366358.755885    ID:      5c4    S Rx                DL:  8    60 00 00 03 35 00 00 00     Channel: 0
2024-04-17 18:05:58 [INFO] Connection: Connection opened

2024-04-17 18:05:58 [INFO] UdsClient: DiagnosticSessionControl<0x10> - Switching session to defaultSession (0x01)
2024-04-17 18:05:58 [DEBUG] Connection: Sending 2 bytes : [1001]
Timestamp: 1713366358.820241    ID:      7aa    S Tx                DL:  8    02 10 01 00 00 00 00 00     Channel: 0
Timestamp: 1713366358.820765    ID:      7ad    S Rx                DL:  8    06 50 01 00 32 01 f4 aa     Channel: 0
2024-04-17 18:05:58 [DEBUG] Connection: Received 6 bytes : [5001003201f4]
2024-04-17 18:05:58 [INFO] UdsClient: Received positive response for service DiagnosticSessionControl (0x10) from server.
2024-04-17 18:05:58 [INFO] UdsClient: DiagnosticSessionControl<0x10> - Received new timing parameters. P2=0.050s and P2*=5.000s.  Using these value from now on.
2024-04-17 18:05:58 [INFO] UdsClient: ReadDataByIdentifier<0x22> - Reading data identifier : 0xf18c (ECUSerialNumberDataIdentifier)
2024-04-17 18:05:58 [DEBUG] Connection: Sending 3 bytes : [22f18c]
Timestamp: 1713366358.839337    ID:      7aa    S Tx                DL:  8    03 22 f1 8c 00 00 00 00     Channel: 0
Timestamp: 1713366358.839795    ID:      7ad    S Rx                DL:  8    10 17 62 f1 8c 43 4a 4d     Channel: 0
Timestamp: 1713366358.847864    ID:      7aa    S Tx                DL:  8    30 08 20 00 00 00 00 00     Channel: 0
Timestamp: 1713366358.849576    ID:      7ad    S Rx                DL:  8    21 30 38 58 56 30 36 35     Channel: 0
Timestamp: 1713366358.889578    ID:      7ad    S Rx                DL:  8    22 32 34 30 30 56 35 20     Channel: 0
2024-04-17 18:05:58 [DEBUG] Connection: No data received: [TimeoutException] - Did not receive IsoTP frame from the Transport layer in time (timeout=0.05 sec) 
2024-04-17 18:05:58 [ERROR] UdsClient: [TimeoutException] : Did not receive response in time. P2 timeout time has expired (timeout=0.050 sec)
Timeout in the communication happens : Did not receive response in time. P2 timeout time has expired (timeout=0.050 sec)
Timestamp: 1713366358.929579    ID:      7ad    S Rx                DL:  8    23 20 20 20 aa aa aa aa     Channel: 0
2024-04-17 18:05:58 [INFO] Connection: Connection closed
Timestamp: 1713366359.410253    ID:      5c4    S Rx                DL:  8    60 00 00 03 35 00 00 00     Channel: 0
Timestamp: 1713366360.410218    ID:      5c4    S Rx                DL:  8    60 00 00 03 35 00 00 00     Channel: 0
Timestamp: 1713366361.410183    ID:      5c4    S Rx                DL:  8    60 00 00 03 35 00 00 00     Channel: 0
Timestamp: 1713366362.410156    ID:      5c4    S Rx                DL:  8    60 00 00 03 35 00 00 00     Channel: 0

pip3 list
Package Version


annotated-types 0.6.0
can-isotp 2.0.4
canlib 1.25.393
intelhex 2.3.0
packaging 23.2
pip 24.0
pydantic 2.6.3
pydantic_core 2.16.3
python-can 4.3.1
pywin32 306
setuptools 65.5.0
typing_extensions 4.10.0
udsoncan 1.23.0
wrapt 1.16.0

@pylessard
Copy link
Owner

you're not having the same issue here.
You server requires a timeout of 50ms, and the lib follows it. But 50ms is very tight for something that runs in pure python on window sin the user space.

Check this parameter : https://udsoncan.readthedocs.io/en/latest/udsoncan/client.html#use_server_timing

@AlexanderRavenheart
Copy link

Hello,

I have encountered this issue recently and was surprised to see it is a couple of years old and has not been tackled yet.

So I took the time to build a sample script to have a concrete example of how the issue occurs and then I looked at the root cause and possible fixes.

Test script (using a virtual CAN channel named vcan0)

import logging
from threading import Thread, Event
from time import sleep

from can import Notifier, Message
from can.interfaces.socketcan import SocketcanBus
from isotp import Address, AddressingMode
from udsoncan import Request
from udsoncan.client import Client
from isotp.protocol import NotifierBasedCanStack
from udsoncan.connections import PythonIsoTpConnection


reply_is_ready = Event()


def create_connection() ->  PythonIsoTpConnection:
    bus = SocketcanBus(channel="vcan0", fd=False)
    isotp_params = {
        'stmin': 10,
        'blocksize': 48,
        'wftmax': 0,
        'tx_padding': 0,
        'rx_flowcontrol_timeout': 1000,
        'rx_consecutive_frame_timeout': 1000,
        'blocking_send': True
    }
    address = Address(
        AddressingMode.Normal_11bits, txid=0x123, rxid=0x321
    )

    notifier = Notifier(bus, [], timeout=0.1)
    stack = NotifierBasedCanStack(bus, notifier, address=address, params=isotp_params)
    return PythonIsoTpConnection(stack)


def simulate_reply():
    payloads = map(bytes.fromhex, (
        "1023620123aaaaaa",
        "21bbbbbbbbbbbbbb",
        "22cccccccccccccc",
        "23dddddddddddddd",
        "24eeeeeeeeeeeeee",
        "25ffffffffffffff",
    ))
    messages = [
        Message(arbitration_id=0x321, dlc=8, is_extended_id=False, is_fd=False, is_rx=False, data=payload)
        for payload in payloads
    ]

    with SocketcanBus(channel="vcan0", fd=False) as bus:
        reply_is_ready.set()
        bus.recv()

        for message in messages:
            sleep(0.02)
            bus.send(message)


def main():
    logging.basicConfig(level=logging.INFO)

    connection = create_connection()
    connection.open()
    client = Client(connection, config={"p2_timeout": 0.05})
    request = Request.from_payload(b'\x22\x01\x23')

    reply_thread = Thread(target=simulate_reply)
    reply_thread.start()
    reply_is_ready.wait()
    client.send_request(request)
    reply_thread.join()
    client.close()


if __name__ == '__main__':
    main()

What the script does:

  • Simulate a Read DID Request
  • Simulate a Read DID Response with First Frame + Consecutive Frames

Root cause of the issue

The line that causes the failure is the following one, defined in send_request:
recv_payload = self.conn.wait_frame(timeout=timeout_value, exception=True)

This wait_frame() call will cause a TimeoutException when timeout_value is P2_timeout (default) and the Response from the UDS Server is a multi-frame one.

That is because that method simply calls specific_wait_frame which will eventually just do a get from a Queue object.

Looking inside the TransportLayerLogic class, the Queue that is being used for the previously mentioned get is rx_queue.
That queue gets new data added to it in the _process_rx method.

The problem lies in the fact that no data is added to rx_queue until all CAN Frames of a multi-frame message have been received and the upper layer (UDS Client) has no knowledge that the Server has sent the First Frame within the expected P2 timeout and is in the process of sending the Consecutive Frames.

The one field that gets populated after a First Frame is received (and validated) is actual_rxdl.

A possible solution

The quick solution I came up with is to add a secondary wait_frame call in case the first one times out and actual_rxdl is not None.

Changing from this:

            try:
                recv_payload = self.conn.wait_frame(timeout=timeout_value, exception=True)
            except TimeoutException:
                timed_out = True
            except Exception as e:
                raise e

To this:

            try:
                recv_payload = self.conn.wait_frame(timeout=timeout_value, exception=True)
            except TimeoutException:
                timed_out = True
            except Exception as e:
                raise e

            if isinstance(self.conn, PythonIsoTpConnection):
                actual_rxdl = self.conn.subconn.isotp_layer.actual_rxdl
            else:
                actual_rxdl = None

            if timed_out and actual_rxdl:
                timed_out = False
                try:
                    recv_payload = self.conn.wait_frame(timeout=overall_timeout, exception=True)
                except TimeoutException:
                    timed_out = True
                except Exception as e:
                    raise e

However, this solution is not usable for anything other than a PythonIsoTpConnection (as far as I can tell), since the other connection types don't have an isotp_layer that has the actual_rxdl field to check for.

Also, this definitely feels like a hack and my proposal for a better solution would be to have a receiving_multi_frame_message boolean flag on the Connection class that the upper layer (UDS Client) could check for and retry the wait_frame call using the overall_timeout.

What do you think?

@pylessard
Copy link
Owner

pylessard commented Oct 6, 2024

Hi,
This problem is not as obvious as it may seems and your solution is to ignore the timeout, that's not good.

The problem is whether you start the timer at the beginning of the transmission or at the end. The actual situation is that existing transports layer aren't able to provide a reliable feedback when the transmission is complete.

I implemented a blocking send feature inside my isotp TransportLayer (see here). If you enable that, it will improve because send() will block until the transmission is finished before calling wait_frame(). If you use an isotp socket, there is a flag to have a similar behaviour. BUT, although this sounds nice, if the receiver request stmin=0 and blocksize=0, the isotp layer can just dump everything to the CAN layer buffer. The isotp layer will be done, but the CAN layer will need time to empty the queue and there is no feedback to tell the queue is finally emptied.

This being said, we would also need to change the interface of the connection to add something that tells if the message is fully transmitted and wait for that if supported. I never dived into implementing such thing since the low-level layers are not reliable in this regard... and the problem can be worked around quite easily by extending the timeout.

It all boils down to the modularity of the design and trying to have all the layers independent, yet compatible. You may not see that kind of issue if you use a library that implements all layers in a single stack. Something maybe like what Vector and Peak can offer... but that mean you're stuck with their ecosystem

@AlexanderRavenheart
Copy link

and your solution is to ignore the timeout, that's not good.

Partially correct:

  • Yes, we ignore the TimeoutException raised when timeout_value (P2 timeout) is used and actual_rxdl is not None (Connection has received First Frame and is processing Consecutive Frames)
  • We switch to checking using the overall_timeout (request_timeout) as expected, since we already received the First Frame (which is supposed to cancel the P2 timer) and now we just need to ensure the entire message (all expected Consecutive Frames) is received in the timeout set of the Request itself (overall_timeout a.k.a. request_timeout)

I implemented a blocking send feature inside my isotp TransportLayer (see #42 (comment)). If you enable that, it will improve because send() will block until the transmission is finished before calling wait_frame().

It is enabled, please look carefully at the sample script I provided. :)

The send has nothing to do with the root issue.
It has been said multiple times throughout the issue. The P2 timeout exception is invalid (should be ignored) if First Frame has already been received in that time frame (which I replicated in the sample script) .

If you don't want to have a retry in the case I explained in detail above, then what needs to change is for the P2, P2* and Request timeout values to be handled by the Connection, not the Client, since it is that low level layer dealing with the Single Frame vs First Frame + Consecutive Frame situation causing the issue today.

@pylessard
Copy link
Owner

I get what you mean. Check ISO-14229
It's clearly stated that P2 timer should start after sending the request message.
You approach of ignoring P2 timer and rely on global timeout (which is not part of the standard), is incompatible with the standard.

I'd rather improve the whole stack to add the missing .ind. My time is on something else right now.
There is also another issue that reported that if the transmission fails, it didn't fail the UDS request right away. This issue would also be fixable if some effort was put onto improving the inter-layer API.

image

@AlexanderRavenheart
Copy link

AlexanderRavenheart commented Oct 6, 2024

"Maximum amount of time in seconds to wait for a first response (positive, negative, or NRC 0x78). After this time is elapsed, a TimeoutException will be raised if no response has been received. See ISO 14229-2:2013 (UDS Session Layer Services) for more details." (Source: https://udsoncan.readthedocs.io/en/latest/udsoncan/client.html#p2_timeout)

Yes, you are correct about starting the P2 timer/timeout period only after doing the send, which is what that blocking_send parameter you added handles.

However, I see you are still not getting the point, that the P2 timer shall be cancelled when a first response is received, which does happen in the sample code I provided (payload: 1023620123aaaaaa), yet without the proposed changes, the Request will always fail, because the P2 timeout is not disregarded at the point when the First Frame is received.

No matter how you look at it, in the example I provided, by sending each frame at 20 ms, it is well bellow the 50 ms target, and so the P2 timeout exception should never occur.

The problem with your understanding is that you expect the entire response message (FF + all CFs) to be received in the P2 timeout period, which is not what the spec, the documentation of your project and the various snippets attached to the issue mention.

@pylessard
Copy link
Owner

I also think you are not getting the point :)
UDS has no ties to the underlying protocol. By "first response" they mean first UDS response, not first CAN message of the response.

UDS has a mechanism to respond with a message that say, "acknowledge, will respond later". That is the first response. P2 timeout applies to that message, then P2* for subsequent messages. Nothing to do with how the underlying protocol segment a big message in smaller frames

The diagram cannot be more clear about when to starts timers

@AlexanderRavenheart
Copy link

Hello,

After taking a step back and looking at all of your explanations throughout this thread, I think I see your point.

UDS is a higher level protocol that can be implemented over CAN (DoCAN) or Ethernet (DoIP), and as such timings related to it should not be directly tied to timings of the lower layer protocol.

So, when a UDS Server sends back timings (P2 and P2*) during a Response to a Change Diagnostic Session Request, the Server is supposed to honor that P2 timing going forward for all future requests.

In order to do that, it needs to ensure that, no matter what shape the UDS payload is packaged in (Single Frame or First Frame + Consecutive Frames), those all need to arrive to the Client within P2 timeout period, otherwise it is supposed to use NRC 0x78 to signal to the Client that it should switch to P2* timeout as will take longer to send the entire UDS payload to it.

Is that correct?

@pylessard
Copy link
Owner

Yes, that is the how the protocol is designed.
Still, the underlying layers are expected to give some sort of feedback to the upper layer so that timers can be correctly started/stopped. Those feedbacks are not meaningful enough right now, which is kind of a bummer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants