Feature proposal: improved caching support #1665
Replies: 9 comments 21 replies
-
Hey @kschmid this seems like a great idea. What kind of caching were you thinking about? I can see how we could implement caching for |
Beta Was this translation helpful? Give feedback.
-
Dear @rickturner2001 first off, I am thinking about this: Another (minor) issue would be to forward things like the responses, as this contains information on whether a request was served from a cache or directly. All of this is basically web-caching. When going for smarter ways of caching on the application level than probably something like yfinance-cache comes to mind. Not sure, whether the project wants to go as far as incorporating this. |
Beta Was this translation helpful? Give feedback.
-
We might implement this as a feature. Can discussion be referenced by id @ValueRaider ? |
Beta Was this translation helpful? Give feedback.
-
@kschmid why not just use the smart_scraping proposed in the documentation as you pointed out? What problem are you having with that? |
Beta Was this translation helpful? Give feedback.
-
@kschmid I am not too satisfied with the way yahoo_cache is operating, and as far as I can tell, it's not really the solution you were looking for. As far as knowing whether a request is being pulled from cache or not, I have a simple solution: class CachedLimiterSession(CacheMixin, LimiterMixin, Session):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.is_cached = False
def get(self, url: str, params=None, **kwargs):
expiration = get_expiration_from_interval(params['interval'])
res = super().get(url, params=params,
expire_after=expiration, **kwargs)
self.is_cached = res.from_cache # res.from_cache returns true if the request is being pulled from cache
return res
session = CachedLimiterSession(
limiter=Limiter(RequestRate(2, Duration.SECOND*5)),
bucket_class=MemoryQueueBucket,
backend=SQLiteCache("yfinance.cache"),
methods_allowed=['GET']
)
def get_apple(session: CachedLimiterSession):
yf.download("AAPL", interval="1d", period="1y", session=session)
print("Is cached: ", session.is_cached)
get_apple(session)
get_apple(session) Output
However, the expiration for the request IS NOT being set properly, in fact it is |
Beta Was this translation helpful? Give feedback.
-
Dear Rick, thanks for this help. I was looking into this mostly for debugging reasons and for this it should be fine. |
Beta Was this translation helpful? Give feedback.
-
I tried this, because I wanted to analyze the caching behavior in more detail. (Note that I did it today, i.e., not on a weekday, which may influence the access behavior.)
I could identify the expiration handling in the get method as the culprit. If I modify this:
This works, but I am not sure regarding other ramifications. Another strange thing, I observed: if instead of downloading the data directly, I create a ticker and use history, then the first ticker gives false, the second gives true. (data is always true). |
Beta Was this translation helpful? Give feedback.
-
Hey @kschmid I hope you're doing great. I have been making some adjustments to Class ChangesWe have now created this function that returns a regex with a dynamic url (note that tthe url is the same used by the Yfiannce API for prices data) def get_dynamic_range_interval(range: str, interval: str):
return re.compile(
rf"https://query2\.finance\.yahoo\.com/v8/finance/chart/([a-zA-Z]+)\?range={range}&interval={interval}&includePrePost=False&events=div%2Csplits%2CcapitalGains"
) We can then use this function to generate url_patterns and give different caching options based on the range-interval combination url_patterns = {
get_dynamic_range_interval("1y", "1d"): calculate_market_open(), # you may define calculate_market_open as you please
}
Also, the class is now a Singletonclass SingletonMeta(type):
_instances = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
instance = super().__call__(*args, **kwargs)
cls._instances[cls] = instance
return cls._instances[cls]
class CachedLimiterSession(
CacheMixin, LimiterMixin, Session, metaclass=SingletonMeta
):
def __init__(self, is_test: bool = False, *args, **kwargs):
super().__init__(
*args,
**kwargs,
limiter=Limiter(RequestRate(2, Duration.SECOND * 5)),
bucket_class=MemoryQueueBucket,
backend=SQLiteCache(
"yfinance.cache" if not is_test else "yfinance_test.cache"
),
methods_allowed=["GET"],
# Do not cache anything that is not included in url_patterns
# This is effectively the same as {..., *: DO_NOT_CACHE}
expire_after=DO_NOT_CACHE,
urls_expire_after=url_patterns,
)
self.is_caching = True
self.last_request = None
def get(self, url: str, params=None, **kwargs):
if self.is_caching:
res = super().get(url, params=params, **kwargs)
else:
with self.cache_disabled():
res = super().get(url, params=params, **kwargs)
self.last_request = res
return res
def toggle_caching(self):
self.is_caching = not self.is_caching TestingIn case you wish to match my config.pyimport pathlib
BASE_DIR = pathlib.Path().resolve()
DATABASE_PATH = BASE_DIR / "yfinance.cache"
TEST_DATABASE_PATH = BASE_DIR / "yfinance_test.cache" import os
import unittest
import yfinance as yf
from config import BASE_DIR
from session import CachedLimiterSession
# AAPL default for testing purposes
def download_ticker_data(
session: CachedLimiterSession | None = None,
ticker: str = "AAPL",
):
yf.download(
ticker,
period="1y",
interval="1d",
session=session,
)
def delete_test_cache():
if os.path.isfile(BASE_DIR / "yfinance_test.cache"):
os.remove(BASE_DIR / "yfinance_test.cache")
class TestCache(unittest.TestCase):
def test_global_instance(self):
session = CachedLimiterSession(is_test=True)
session2 = CachedLimiterSession(is_test=True)
self.assertEqual(id(session), id(session2))
delete_test_cache()
def test_caching(self):
session = CachedLimiterSession(is_test=True)
download_ticker_data(session)
self.assertEqual(session.last_request.from_cache, False)
download_ticker_data(session)
self.assertEqual(session.last_request.from_cache, True)
delete_test_cache()
def test_toggle_cache(self):
session = CachedLimiterSession(is_test=True)
session.toggle_caching()
self.assertEqual(session.is_caching, False)
download_ticker_data(session)
self.assertEqual(session.last_request.from_cache, False)
download_ticker_data(session)
self.assertEqual(session.last_request.from_cache, False)
delete_test_cache()
def test_url_patterns(self):
session = CachedLimiterSession(is_test=True)
session.get("https://www.google.com")
self.assertEqual(session.last_request.from_cache, False)
session.get("https://www.google.com")
self.assertEqual(session.last_request.from_cache, False)
delete_test_cache() |
Beta Was this translation helpful? Give feedback.
-
@rickturner2001 @kschmid |
Beta Was this translation helpful? Give feedback.
-
Due to rate limiting of the yahoo api, there is certainly a need for caching, which is also exemplified with the proposal of smart scraping, based on caching given.
However, it would be very helpful to control to some extend the way the cache is handled.
a) was a specific request served from cache or from network. This is part of the response, however, as this is inside yfinance, it can't be checked by the application program. It would be great, if this could be made available to the application, e.g., via a call to a get_last_caching_state (or similar), which would provide this part of the response.
b) it would be great, if a ticker could change the assigned session in order to bypass the cache altogether. Right now, I can only do something like this, by creating a second ticker with a different associated session, but that means that also initialization of this ticker will bypass the cache, unnecessarily increasing the number of requests.
Beta Was this translation helpful? Give feedback.
All reactions