-
Notifications
You must be signed in to change notification settings - Fork 5
Errors in concurrent access to cached files #80
Comments
I'm not sure I understand why you need to run a request on the same url concurrently ? Because |
Thanks for looking into this and for your work on the library! It has been very helpful and a huge time saver for me. I am probably not thinking about this clearly but guess I'm confused what the read/write locks are for, if not to support concurrent requests to the same url? |
You are absolutely right, forgot about the different locks in place for a bit there 😓 The read/write locks should prevent reading a request if another thread/event is writing a response 🤔 I've gone with the simplest solution of using an external lib to handle that, but there probably is something goofy there, I'll have a more detailed look when I have the time, also happy to take any contribution if you have any fix in mind |
Thanks! Happy to contribute where I can, but for now I am mostly puzzled. Will take a look at the locking library and let you know if I come up with anything. |
First, when reading the cache, I wonder if we should acquire the lock before checking if the file exists? There could be a race condition where the cached file gets deleted (because it's no longer fresh) between the moment one worker checks that the cached file exists and the moment it actually reads the file. Second, I don't know if this was clear to you, but it looks like the locks are at an instance level, not at a process level let alone cross-process: import asyncio
import aiorwlock
async def worker():
rwlock = aiorwlock.RWLock()
async with rwlock.writer_lock:
print(f'{asyncio.current_task().get_name()}: inside writer lock')
await asyncio.sleep(0.1)
print(f'{asyncio.current_task().get_name()}: gave up writer lock')
tasks = [worker() for i in range(10)]
foo = await asyncio.gather(*tasks, return_exceptions=False)
Task-5: inside writer lock
Task-6: inside writer lock
Task-7: inside writer lock
Task-8: inside writer lock
Task-9: inside writer lock
Task-10: inside writer lock
Task-11: inside writer lock
Task-12: inside writer lock
Task-13: inside writer lock
Task-14: inside writer lock
Task-5: gave up writer lock
Task-6: gave up writer lock
Task-7: gave up writer lock
Task-8: gave up writer lock
Task-9: gave up writer lock
Task-10: gave up writer lock
Task-11: gave up writer lock
Task-12: gave up writer lock
Task-13: gave up writer lock
Task-14: gave up writer lock import asyncio
import aiorwlock
rwlock = aiorwlock.RWLock()
async def worker():
async with rwlock.writer_lock:
print(f'{asyncio.current_task().get_name()}: inside writer lock')
await asyncio.sleep(0.1)
print(f'{asyncio.current_task().get_name()}: gave up writer lock')
tasks = [worker() for i in range(10)]
foo = await asyncio.gather(*tasks, return_exceptions=False)
Task-16: inside writer lock
Task-16: gave up writer lock
Task-17: inside writer lock
Task-17: gave up writer lock
Task-18: inside writer lock
Task-18: gave up writer lock
Task-19: inside writer lock
Task-19: gave up writer lock
Task-20: inside writer lock
Task-20: gave up writer lock
Task-21: inside writer lock
Task-21: gave up writer lock
Task-22: inside writer lock
Task-22: gave up writer lock
Task-23: inside writer lock
Task-23: gave up writer lock
Task-24: inside writer lock
Task-24: gave up writer lock
Task-25: inside writer lock
Task-25: gave up writer lock So in my initial example where each worker creates it's own client, there's nothing stopping multiple workers touching the cache file at the same time, hence the race conditions where a file gets deleted by one worker while another one is trying to read it. In real life, my workers run on different compute nodes so there's no easy way for me to share single instance of httpx. I see two options:
Let me know what you think edit: looking more closely, it looks like filelock doesn't allow multiple readers, so would have to either disallow concurrent reading (slower) or allow files to be deleted while they're being read.... |
Closing because this project will be archived in favor of: https://github.com/karosis88/hishel |
Describe the bug
Concurrent access to cached file throws an exception
To Reproduce
On the first run, the above raises a
FileNotFoundError
:On subsequent runs in the same jupyter notebook session, the same code raises some msgpack error:
Expected behavior
Concurrent access to an already-cached file should be fine ?
Additional context
I probably brought this on myself
The text was updated successfully, but these errors were encountered: