Exceptions in the execution of the callback must be caught / ignored and logged
on the ERROR log.
To test, apply this patch to provoke a ValueError exception::
diff --git a/searx/data/tracker_patterns.py b/searx/data/tracker_patterns.py
index ed4415bce..695ed05d2 100644
--- a/searx/data/tracker_patterns.py
+++ b/searx/data/tracker_patterns.py
@@ -114,6 +114,7 @@ class TrackerPatternsDB:
Returns bool ``True`` to use URL unchanged (``False`` to ignore URL).
If URL should be modified, the returned string is the new URL to use.
"""
+ raise ValueError("test callback exceptions")
new_url = url
parsed_new_url = urlparse(url=new_url)
Start a `make run` instance and query for example `amazon` .. have a look at the
ERROR log:
ERROR searx.result_types: filter_urls (field 'url'): ignore ValueError('test callback exceptions') from callback searx/data/tracker_patterns.py:117
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The implementation is normalized, type annotations are applied, and the results
are freed from the HTML markup (which is partially present).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- https://lite.qwant.com seems to be dead.
- The request parameters were changed to match the ones from the Qwant website.
- Qwant is now set to inactive by default due to its strict rate-limits
In the result-list, the ``number_of_results`` indicate the number of hits in the
Index, they do not indicate how many results are in the answer.
In the past, search engines such as google or ddg had an indication on the first
page of a search term of how many hits there were for this term in total in
their index.
This info was added up in SearXNG and delivered under ``number_of_results``.
Nowadays the search engines no longer indicate how many hits there are in the
index and so this field in SearXNG is also superfluous.
- https://github.com/searxng/searxng/issues/2457#issuecomment-2566181574
- https://github.com/searxng/searxng/issues/2987
- https://github.com/searxng/searxng/issues/5034
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
[mod] normalize variable name for the max number of results per request
In the past, we have used different names for the variable that specifies the
maximum number of hits in the outgoing request.
- ``page_size``
- ``number_of_results``
- ``nb_per_page``
Since *page_size* is the most accurate term and is also used in the XPath
engines, all other engines are adjusted accordingly within this
patch .. documentation adjusted accordingly.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Changes:
- Setting the "abp" query parameter causes instant blocks, it's no longer
used at Startpage
- The safesearch map changed for both the request form and the cookies. As
we were sending invalid values, that also made it easier to detect us
I've been profiling the `/preferences` endpoint using werkzeug's
`ProfilerMiddleware` (i.e. just do `app.wsgi_app = ProfilerMiddleware(app.wsgi_app)`)
and look at the outputs in the terminal when doing `make run`.
It turns out that 95%+ of the time spent were inside babel's
Locale parsing (> 700ms on my machine). That's because, when opening the settings,
we loaded the full engine traits of each engine and checked if it matches
the user-defined search language. As we have 250+ engines, and babel is
very slow when parsing Locale's, this took a very long time.
By removing this feature that shows whether the selected search language
is supported by the engine, the load time went down from 800ms to 50ms
on my machine (which is still very slow, but well, that's future work on
optimizing).
- Detect HTTP 302 responses (Google redirecting to /sorry/index
without the HTTP client following the redirect)
- Detect short HTML responses (<2000 bytes) containing "/sorry/"
links (meta-refresh or JS redirect variants)
Instances with rotating IPs can set the `suspended_times.SearxEngineCaptcha` to
0 in the search settings [1], the next request will typically use a different
outgoing IP when rotating proxies are configured
[1] https://docs.searxng.org/admin/settings/settings_search.html
This PR moves the `iframe` logic into a macro, so that `videos.html` and `general.html` both can benefit from the workaround to fix YouTube results by @return42 in https://github.com/searxng/searxng/pull/5858
It also fixes that only YouTube videos contained the closing `>` after `<iframe border="0" ...`, the regression has been caused by https://github.com/searxng/searxng/pull/5858
## Why is this change important?
Currently, the page breaks if there's any non-YouTube Iframe
Here, the page ends in the middle of the results and the footer and page number selector are not visible.
The type checker in my IDE shut down after over 500 errors / after this
patch there are still 125 criticisms, however its an improvement and a better
starting point.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The bug was introduced in commit 8769b7c6d (typification of result items); this
patch fixes the bug and also addresses the peculiarity that fields can be set
but contain no *usable* value:
If a field is set (exists) but contains an empty string or the value ``None``,
it is also considered *not set*. This also ensures that an integer 0 is
evaluated *as set*!
Co-Authored: Markus Heiser <markus.heiser@darmarit.de>