The engine is using very aggressive Cloudflare blocking for
a while now, no matter if using a normal browser like Firefox
or not.
Closes: https://github.com/searxng/searxng/issues/5976
On the first page of the WEB search, there are, among other things, sections for
videos and news. The video results from these sections should not be used as
results in the WEB search of SearXNG.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
PodcastIndex.org started using a Proof-of-Work JavaScript
challenge whose results are sent as `X-Pow-*` request headers.
Although it is technically possible to re-implement the
PoW challenge in Python, it's likely impossible to maintain
because
- the actual Proof of Concept logic might change very often
- the whole idea of the Proof of Work challenge is to use
a "big" amount of resources (about 1s on my PC); so executing the challenge
would almost block all other work on the SearXNG instance
At first glance, the challenge looks very similar to what
Anubis does, because it also uses SHA256 hashes.
The implementation is normalized, type annotations are applied, and the results
are freed from the HTML markup (which is partially present).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- https://lite.qwant.com seems to be dead.
- The request parameters were changed to match the ones from the Qwant website.
- Qwant is now set to inactive by default due to its strict rate-limits
In the result-list, the ``number_of_results`` indicate the number of hits in the
Index, they do not indicate how many results are in the answer.
In the past, search engines such as google or ddg had an indication on the first
page of a search term of how many hits there were for this term in total in
their index.
This info was added up in SearXNG and delivered under ``number_of_results``.
Nowadays the search engines no longer indicate how many hits there are in the
index and so this field in SearXNG is also superfluous.
- https://github.com/searxng/searxng/issues/2457#issuecomment-2566181574
- https://github.com/searxng/searxng/issues/2987
- https://github.com/searxng/searxng/issues/5034
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
[mod] normalize variable name for the max number of results per request
In the past, we have used different names for the variable that specifies the
maximum number of hits in the outgoing request.
- ``page_size``
- ``number_of_results``
- ``nb_per_page``
Since *page_size* is the most accurate term and is also used in the XPath
engines, all other engines are adjusted accordingly within this
patch .. documentation adjusted accordingly.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Changes:
- Setting the "abp" query parameter causes instant blocks, it's no longer
used at Startpage
- The safesearch map changed for both the request form and the cookies. As
we were sending invalid values, that also made it easier to detect us
- Detect HTTP 302 responses (Google redirecting to /sorry/index
without the HTTP client following the redirect)
- Detect short HTML responses (<2000 bytes) containing "/sorry/"
links (meta-refresh or JS redirect variants)
Instances with rotating IPs can set the `suspended_times.SearxEngineCaptcha` to
0 in the search settings [1], the next request will typically use a different
outgoing IP when rotating proxies are configured
[1] https://docs.searxng.org/admin/settings/settings_search.html
The type checker in my IDE shut down after over 500 errors / after this
patch there are still 125 criticisms, however its an improvement and a better
starting point.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Notes:
- Safesearch doesn't seem to work properly?
- In theory multiple languages are supported, but even in the web UI, they don't work properly
- Possibly, we could cache the request hashes (h query parameter), I'm not sure if it ever changes