-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wait_tx_done does not seem to be working [RPi + python-can-isotp] #51
Comments
I assume the receiving communication part is setting the blocksize to zero?? |
Perhaps I am misunderstanding how this works but from this issue, I had understood that if I send a large packet, the I am sending >3kB which, from logs, takes maybe 200ms & nearly 500 frames, but the I tried reducing txqueuelen to 256 & even lower to ensure the |
In this case the send() will return instantly as the ISO-TP state machine would dump the CAN frames to the netdevice and is done. Please set up a receiving counterpart which sets the BS and/or STmin to see the effect. Also works on virtual CAN. Terminal 1: Terminal 2: Terminal 3: |
I am definitely misunderstanding how this works then! :) Could you please help me understand what happens differently when receiver has STmin>0? At which point does the I think what you're saying is if STmin>0 (or blocksize), then the Is that correct understanding? If so, is there any possibility to have the |
Not exactly. Please read what I wrote: "In this case the send() will return instantly as the ISO-TP state machine would dump the CAN frames to the netdevice and is done."
Same here. Read the line above.
No that's currently not possible. This would need a rework of the ISO-TP kernel driver to have a throttling for CAN interfaces with small tx queues. The ISO-TP is an unreliable datagram protocol. You are sending things. And wait for the answer with a timer on application level (e.g. for UDS). Why do you need the "end of transmission on the bus"? Just because there is a value in the spec - or do you have a real use-case? |
Yes, this is a real world use case. It is pretty much the same scenario in the issue I linked to earlier. I am doing data transfer (software update). The module in my case is sending STmin/blocksize=0 (so I can't change it). I need to check the timeouts and, right now, it is starting the timer in udsoncan almost immediately after |
I see your problem to get more reliable timeout values. But currently the implementation does not provide this information - and due to the common concept of unreliable datagrams there is no strong requirement IMO. Btw. implement this throttling option would make it easier to make |
Thanks for the info. Before I close this issue, could you explain why the timing issue was resolved in the earlier linked issue? i.e. with STmin=0x0A. Before the change the user reported:
And after the change, they no longer have the timeout issue even though it takes 6s to send the data. I'd like to understand how this is working in this use case (what is the difference to when STmin=0). As a thought, is there a way to set transmit STmin regardless of value received from receiver? |
STmin makes the isotp stack separate the CAN frame transmission. As we need to make sure the separation time is guaranteed until the very last frame, the transmission is ending there.
Oh, good idea! Yes, there is an option Additionally there is the option to force a STmin value which overwrites the value you get from the receiving node. Search for Both these additions cause the tx_gap value in the isotp.c to be not zero, which creates a separation time and disables the isotp_tx_burst functionality (search for it in isotp.c). That should help ;-) |
I have used the python-can-isotp library to test with and have set:
From the docs:
But unfortunately it still didn't wait until the end of the full packet to be sent and I didn't see any change in behaviour. I then increased But I don't understand why I had to set that to 1000000 - any ideas? (Even 100000 was not enough). And I guess the next issue is that I can't know that the receiver has set STmin=0 until I start sending. And I can't change the tx STmin once the socket is bound. Which means if a module responds with STmin>0 I will now be making it unnecessarily slower (STmin + |
Hi @hartkopp, Peoples are harassing me as well for this feature. ISO-26262 has a requirement for a back-notification to the upper layer when completion is done. |
@pylessard And we can't wait for the queue to empty either because something else may be sending to the same device. The only thing I could think of is if PS: I am still curious as to why my |
I'm currently thinking about this approach for the kernel implementation too. But this would have a heavy performance impact as we usually would queue up on the netdev queue and I have no clue about other outgoing traffic from that host. And I would need to rethink the entire timeout mechanisms. Would you think waiting for every outgoing CAN frame to be sent would be fine from the performance view?
No idea. 1000000 us == 1 sec == general ISO-TP timeout for lost connection. But I will look into it. |
Yes, because - at least from a UDS point of view - that is exactly what we need. i.e. do not return from I guess there would be some performance issues to read back the messages coming from the bus but I think the situation is no different to receiving messages currently (e.g. when you send and then await a response). @pylessard what do you think on this topic? |
Hi @dpatel20 , can you build your own Linux kernel for a test? But it works for testing and should really return only when the last frame is sent out. Updated (cleanup & fixed some stuff like timer handling) : 0001-can-isotp-add-local-echo-handling-for-consecutive-fr.patch.txt |
Thanks for the patch. I haven't built a kernel before but might be able to give it a go! |
Thanks! That would be really valuable as you have some intrinsic motivation for this feature to work :-D Other kernels (5.15 or something like that) should work too. If you have problems to apply the patch to an older kernel I can provide a backport for you. |
@dpatel20 @pylessard https://lore.kernel.org/linux-can/[email protected]/T/#m95217a28b4e0f208dac844d4f9b52fd808760263 The raw patch is here: I tested and beautified the code this weekend - and I'm pretty confident that it reliably does what we've been searching for. |
If that works well, I will try to see if I can do something similar in my library and make sure the timeout handling is the same with the kernel module interface and the pure python implementation. Good works! |
Short update. I can't get the patch to apply (using https://lore.kernel.org/linux-can/[email protected]/raw).
I haven't had chance to look into why the patch fails at that point. I guess it's because of differing kernel version. I have 5.13.0-1017-raspi so I suspect the isotp.c file differs between mine and 5.17 that the patch is based on. |
There is only one patch in the 5.13 kernel that should prevent the compilation: I'm currently compiling a 5.13 kernel. Will upload a working |
Just remove the |
The patch is now on its way upstream to the Linux mainline kernel: I'm really confident and happy about this solution. I should have done this earlier ;-) |
That's great news! Should be finished re-building soon and then I can test. Hopefully my Pi boots at least once I switch to the kernel I built. Just to check, the echo feature is on by default? So, no extra flags to set or is it tied to |
Looking forward to your feedback!
The
There is no visible effect with |
Not sure if I am reading your code correctly but is the following understanding correct: In a multi frame message, the next consecutive frame is not sent until the current one is seen on bus? If so, then the one small concern I have is that this could be a bit slower to send a multi frame message. For example, we're sending a multi frame message and the same PC is also sending other CAN messages. If we only cared about the last frame of a multi frame being seen on bus, then Above is a typical use case for me...I send a number of cyclic frames to keep the module awake whilst also performing multi frame UDS (e.g. transfer data). So, two things are trying to place data onto the same netdevice queue. Not sure, in reality, if this will make much difference but wanted to see if I understood the code correctly or not. |
ACK
Yes. It could be a bit slower.
The interaction of different 'writers' on the same host can always lead to a problem with the netdevice queue. E.g. when you place a big chunk of isotp frames with no gap into the netdev queue the other senders will get out of 'in-time' transmissions. There is a solution for this CAN-ID wise controlling of the outgoing traffic -> use queuing disciplines with the See concept: https://wiki.automotivelinux.org/_media/agl-distro/agl2017-socketcan-print.pdf (slide 41ff) But I'm pretty sure you won't need it by default.
Yes. I've been thinking about writing e.g. 8 frames as a set of bulk content into the netdev queue and then wait for the last frame to be transmitted. The point was that the code got really complex and the reliability of catching the right frame will decrease. I did multiple tests with real (USB) CAN interfaces here and the performance was still excellent. IMO this comparably simple and dump transmission process does a good reliable job (much much better than the former unreliable dumping into the queue).
Yes. That's also my common use-case to create the 'tester present' with the CAN_BCM socket. The
Yes. You understood the code correctly. Thanks for the review ;-) |
Thanks for the explanation above. The queuing disciplines seems very interesting, so will take deeper dive into that. And, I have now done my own testing and your new code works superbly. The Also, I haven't seen any issues with this in my testing. Thank you for the excellent implementation. (And bonus for me: I got to build a Linux kernel for the first time!) |
Ah! Win-Win :-D Thanks for the valuable feedback! |
@hartkopp |
The changes are in the upstream process fir Linux 5.18. https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/log/net/can/isotp.c |
I am enabling the
wait_tx_done
flag via python-can-isotp project and if I read back the socket opts it is showing as set:However, when I test sending a large message (>3kB), the send returns almost immediately (within ~2ms).
The kernel I have is: 5.13.0-1016-raspi
Is there a way to know if the can-isotp I have supports the
wait_tx_done
flag? And, if so, any idea why this might not be working for me?The text was updated successfully, but these errors were encountered: