- https://lite.qwant.com seems to be dead.
- The request parameters were changed to match the ones from the Qwant website.
- Qwant is now set to inactive by default due to its strict rate-limits
In the result-list, the ``number_of_results`` indicate the number of hits in the
Index, they do not indicate how many results are in the answer.
In the past, search engines such as google or ddg had an indication on the first
page of a search term of how many hits there were for this term in total in
their index.
This info was added up in SearXNG and delivered under ``number_of_results``.
Nowadays the search engines no longer indicate how many hits there are in the
index and so this field in SearXNG is also superfluous.
- https://github.com/searxng/searxng/issues/2457#issuecomment-2566181574
- https://github.com/searxng/searxng/issues/2987
- https://github.com/searxng/searxng/issues/5034
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
[mod] normalize variable name for the max number of results per request
In the past, we have used different names for the variable that specifies the
maximum number of hits in the outgoing request.
- ``page_size``
- ``number_of_results``
- ``nb_per_page``
Since *page_size* is the most accurate term and is also used in the XPath
engines, all other engines are adjusted accordingly within this
patch .. documentation adjusted accordingly.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Changes:
- Setting the "abp" query parameter causes instant blocks, it's no longer
used at Startpage
- The safesearch map changed for both the request form and the cookies. As
we were sending invalid values, that also made it easier to detect us
- Detect HTTP 302 responses (Google redirecting to /sorry/index
without the HTTP client following the redirect)
- Detect short HTML responses (<2000 bytes) containing "/sorry/"
links (meta-refresh or JS redirect variants)
Instances with rotating IPs can set the `suspended_times.SearxEngineCaptcha` to
0 in the search settings [1], the next request will typically use a different
outgoing IP when rotating proxies are configured
[1] https://docs.searxng.org/admin/settings/settings_search.html
The type checker in my IDE shut down after over 500 errors / after this
patch there are still 125 criticisms, however its an improvement and a better
starting point.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Notes:
- Safesearch doesn't seem to work properly?
- In theory multiple languages are supported, but even in the web UI, they don't work properly
- Possibly, we could cache the request hashes (h query parameter), I'm not sure if it ever changes
As a side effect, Cloudscraper is no longer needed.
It probably only ever worked by setting the correct request headers,
so we don't really need it since we can just set the right request
headers and ciphersuites ourselves.
Karmasearch seem to crash when searching with long queries >= 100 characters.
The returned JSON is exactly this `["",[]]`, which will crash when trying to
access `resp.json()["results"]`
Close: https://github.com/searxng/searxng/issues/5911
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Unsplash started using [Anubis](https://anubis.techaro.lol/)
for blocking crawlers. Therefore, requests using common
user agents (e.g. Firefox, Chrome) must pass a JavaScript
challenge.
However, other user agents seem unaffected for now, hence
settings the UA to something different does still work.
I found a bypass using the Android Google App this time. However, unlike the iPhone GSA method, this one does have rate limits. Although it took a couple of hundred consecutive requests to trigger them.
* [enh] engines: rework bing engine
Only Bing-Web has been reworked.
Some features now require JavaScript (paging and time-range results).
Cookies no longer work, parameters such as `cc`, `ui`, ... alter the results.
The engine only appears to use the locale from `Accept-Language` header properly.
The rest of Bing's child engines (Bing-Image, Bing-Video, ...) seem to benefit
from using `mkt` param in conjunction with the `Accept-Language` header
override, although Bing-Web does not (?)
* [enh] explicit mkt
* [fix] engines: bing_videos.py
https://github.com/searxng/searxng/pull/5793#pullrequestreview-3881883250
Google recently changed the DOM structure for mobile-centric responses, causing the `google_videos` engine to return zero results and the main `google` engine to drop the majority of its results (due to missing snippets or failed URL parsing). These changes restore the functionality and improve the result count for both engines.
This patch updates the parsing logic for both the `google` and `google_videos` engines to handle the modern HTML structure returned by Google when using GSA (Google Search App) User-Agents.
**Specific changes include:**
* **Google Videos (`gov`)**:
* Updated title XPath to support `role="heading"`.
* Improved URL extraction to correctly decode Google redirectors (`/url?q=...`) using `unquote`.
* Added support for the `WRu9Cd` class to capture publication metadata (author/date).
* Broadened thumbnail search and added a fallback to YouTube's `hqdefault.jpg`.
* **Google Web**:
* Relaxed the strict snippet (`content`) requirement. Valid results are no longer discarded if a snippet is missing in the mobile UI.
* Hardened URL extraction to handle both direct and redirected URLs safely.
* Improved thumbnail extraction by searching the entire result block.