You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am writing a web-scraper library with libcpr that cycles random proxies and headers on a GET request. My intended behavior is for other programs to call on force_connect() to perform different GET requests, while maintaining some information between all functions calls (e.g., Proxy / browser header variables, etc.)
When I perform a single GET request to a URL (e.g., URL 1), everything works correctly;
If I perform multiple GET requests to the same URL (URL 1), everything works correctly;
if I perform a single GET request to a URL (URL 1), then perform another GET request to a new URL (e.g., URL 2), the second Session.Get() call returns a response from URL 1 instead of URL 2.
This behavior can be verified with the last line of code (cout << url << " " << r.url << endl;). This prints both the url passed to the function, and the url used in the GET request.
This behavior remains whether I re-use a session object, create a new session object (i.e. remove 'static'), or omit the session and use Response objects only (though under-the-hood these seem similar as Response.Get() calls on Session).
My program uses many static variables because I want to maintain allocated memory between force_connect() calls; even if I remove static calls and re-declare variables, I encounter the same issue.
There are a lot of commented out code lines; these are potential solutions I tried (and failed with).
I am unsure why I am encountering this behavior; when I print 'session.GetFullRequestUrl()', it prints the PROPER url (URL 2) which is even stranger (it means part of the session object is updating and part of it is not).
Example/How to Reproduce
string force_connect(string url, int tries){
// ... (IP, header vars, objects defined, other extraneous code here)
static thread_local string proxy = "socks5h://" + info.creds + '@' + IP + ":1080";
// Initialize session() object, set URL
static thread_local cpr::Session session;
// session.SetUrl(url); // Commented out: using string url instead of cpr::Url in SetUrl()
// session.SetOption(url);
static thread_local cpr::Url CURL;
CURL = cpr::Url{url};
session.SetUrl(CURL)
// Assign proxy and headers to Session() object
static thread_local cpr::Header header;
header = cpr::Header{{"user-agent", hdr}};
session.SetProxies({{"http", proxy}, {"https", proxy}});
session.SetHeader(header);
// Perform Get request
// cout << session.GetFullRequestUrl() << endl; // This updates properly and prints the PROPER url (i.e. URL 2)
// static thread_local cpr::Response r = cpr::Get(cpr::Url{url}, header, cpr::Proxies{{"http", proxy}, {"https", proxy}});
static thread_local cpr::Response r = session.get();
cout << url << " " << r.url << endl; // url SHOULD be = r.url, but r.url is not updating (i.e., URL 1)
}
Possible Fix
cpr::Session::SetUrl(const Url& url); takes a passed cpr::Url object and sets the private parameter 'url_' to the reference.
It sets correctly initially (that's how it reaches URL 1), but refuses to update when the same object pointer (or an entirely new one) is passed. Even when a new session and/or cpr::url object is created, I still encounter this behavior.
Looking into Session.Get() code, it appears the underlying call is to curl_easy_perform(), which reads the URL from a libcurl flag (curl_easy_set_opt(curl, CURLOPT_URL, url_.c_str())) that was set in Session::prepareCommon().
I don't know why Session.url_ is not updating; maybe it is and something is wrong in libcurl's code (I can't check using a debugger because this library is meant for my main program which was written in PYTHON, and the class member is private).
Either a modification-check or a copy-by-value approach could be potential solutions.
Where did you get it from?
Other (specify in "Additional Context/Your Environment")
Additional Context/Your Environment
OS: MacOS Ventura 13.1
Version: 1.10.5
Package-Manager / installation method: Homebrew
The text was updated successfully, but these errors were encountered:
@kareemrt thanks for reporting!
Based on a quick test, this looks to be a multithreading issue. Perhaps not everything is declared thread local. Might be an issue with how we create curl objects. Could you try comparing the pointer of *session.GetCurlHolder() if they are actually different.
As ref. The following works in a single threaded scenario:
Description
I am writing a web-scraper library with libcpr that cycles random proxies and headers on a GET request. My intended behavior is for other programs to call on force_connect() to perform different GET requests, while maintaining some information between all functions calls (e.g., Proxy / browser header variables, etc.)
When I perform a single GET request to a URL (e.g., URL 1), everything works correctly;
If I perform multiple GET requests to the same URL (URL 1), everything works correctly;
if I perform a single GET request to a URL (URL 1), then perform another GET request to a new URL (e.g., URL 2), the second Session.Get() call returns a response from URL 1 instead of URL 2.
This behavior can be verified with the last line of code (cout << url << " " << r.url << endl;). This prints both the url passed to the function, and the url used in the GET request.
This behavior remains whether I re-use a session object, create a new session object (i.e. remove 'static'), or omit the session and use Response objects only (though under-the-hood these seem similar as Response.Get() calls on Session).
My program uses many static variables because I want to maintain allocated memory between force_connect() calls; even if I remove static calls and re-declare variables, I encounter the same issue.
There are a lot of commented out code lines; these are potential solutions I tried (and failed with).
I am unsure why I am encountering this behavior; when I print 'session.GetFullRequestUrl()', it prints the PROPER url (URL 2) which is even stranger (it means part of the session object is updating and part of it is not).
Example/How to Reproduce
string force_connect(string url, int tries){
}
Possible Fix
cpr::Session::SetUrl(const Url& url); takes a passed cpr::Url object and sets the private parameter 'url_' to the reference.
It sets correctly initially (that's how it reaches URL 1), but refuses to update when the same object pointer (or an entirely new one) is passed. Even when a new session and/or cpr::url object is created, I still encounter this behavior.
Looking into Session.Get() code, it appears the underlying call is to curl_easy_perform(), which reads the URL from a libcurl flag (curl_easy_set_opt(curl, CURLOPT_URL, url_.c_str())) that was set in Session::prepareCommon().
I don't know why Session.url_ is not updating; maybe it is and something is wrong in libcurl's code (I can't check using a debugger because this library is meant for my main program which was written in PYTHON, and the class member is private).
Either a modification-check or a copy-by-value approach could be potential solutions.
Where did you get it from?
Other (specify in "Additional Context/Your Environment")
Additional Context/Your Environment
The text was updated successfully, but these errors were encountered: