Karmasearch seem to crash when searching with long queries >= 100 characters.
The returned JSON is exactly this `["",[]]`, which will crash when trying to
access `resp.json()["results"]`
Close: https://github.com/searxng/searxng/issues/5911
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Unsplash started using [Anubis](https://anubis.techaro.lol/)
for blocking crawlers. Therefore, requests using common
user agents (e.g. Firefox, Chrome) must pass a JavaScript
challenge.
However, other user agents seem unaffected for now, hence
settings the UA to something different does still work.
I found a bypass using the Android Google App this time. However, unlike the iPhone GSA method, this one does have rate limits. Although it took a couple of hundred consecutive requests to trigger them.
* [enh] engines: rework bing engine
Only Bing-Web has been reworked.
Some features now require JavaScript (paging and time-range results).
Cookies no longer work, parameters such as `cc`, `ui`, ... alter the results.
The engine only appears to use the locale from `Accept-Language` header properly.
The rest of Bing's child engines (Bing-Image, Bing-Video, ...) seem to benefit
from using `mkt` param in conjunction with the `Accept-Language` header
override, although Bing-Web does not (?)
* [enh] explicit mkt
* [fix] engines: bing_videos.py
https://github.com/searxng/searxng/pull/5793#pullrequestreview-3881883250
Google recently changed the DOM structure for mobile-centric responses, causing the `google_videos` engine to return zero results and the main `google` engine to drop the majority of its results (due to missing snippets or failed URL parsing). These changes restore the functionality and improve the result count for both engines.
This patch updates the parsing logic for both the `google` and `google_videos` engines to handle the modern HTML structure returned by Google when using GSA (Google Search App) User-Agents.
**Specific changes include:**
* **Google Videos (`gov`)**:
* Updated title XPath to support `role="heading"`.
* Improved URL extraction to correctly decode Google redirectors (`/url?q=...`) using `unquote`.
* Added support for the `WRu9Cd` class to capture publication metadata (author/date).
* Broadened thumbnail search and added a fallback to YouTube's `hqdefault.jpg`.
* **Google Web**:
* Relaxed the strict snippet (`content`) requirement. Valid results are no longer discarded if a snippet is missing in the mobile UI.
* Hardened URL extraction to handle both direct and redirected URLs safely.
* Improved thumbnail extraction by searching the entire result block.
The online engines emulate a request as it would come from a web browser, which
is why the HTTP headers in the default settings should also be set the way a
standard web browser would set them.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Add support for albums containing multiple videos in iqiyi engine. When
albumInfo contains a "videos" list, process each video individually to
create separate search results for each episode/video instead of a single
result for the entire album.
Also get video length from `duration` instead of `subscriptContent`.
Signed-off-by: Hu Butui <hot123tea123@gmail.com>
For unknown locales, the return value of::
locales.get_locale(params['searxng_locale'])
is None which cuase the following issue::
ERROR searx.engines.presearch : exception : 'NoneType' object has no attribute 'territory'
Traceback (most recent call last):
File "search/processors/online.py", line 256, in search
search_results = self._search_basic(query, params)
File "search/processors/online.py", line 231, in _search_basic
self.engine.request(query, params)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
File "engines/presearch.py", line 153, in request
request_id, cookies = _get_request_id(query, params)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
File "engines/presearch.py", line 140, in _get_request_id
if l.territory:
^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'territory'
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Apparently, yep has been broken for a while. Measures to fix it:
- only use HTTP/1.1, because our HTTP2 client gets fingerprinted and blocked
- send the `Origin` HTTP header
For some reason, I keep getting this error from the brave engine:
httpx.DecodingError: BrotliDecoderDecompressStream failed while processing the stream
Forcing the server to use either gzip or deflate fixes this issue.
This makes the brave engine work when the server seems to be encoding brotli incorrectly, or at least in a way incompatible with certain installs.
Related:
- https://github.com/searxng/searxng/pull/1787
- https://github.com/searxng/searxng/pull/5536