16 Commits

Author SHA1 Message Date
Bnyro a29cda858c [feat] engines: add luxxle (general, news, images, videos)
Add support for https://luxxle.com

Localization is not yet supported because it doesn't seem to work on their
website either, no matter which language I select, it only returns English web
results
2026-06-13 20:39:31 +02:00
Bnyro 2e10a2f614 [feat] engines: add rawweb engine (foss, hand-indexed blogs) (#6234)
RawWeb is a search engine for personal websites / blog posts.
It has its own index and the personal websites were selected
by hand. Results are quite good for what it is imo. [^1]

[^1]: https://github.com/0x2E/RawWeb.org
2026-06-13 19:09:58 +02:00
Bnyro 2100eb04e1 [feat] engines: add reloado engine (general, german) (#6233)
- adds support for https://reloado.com (german)
- as it has its own index, the results are hit or miss and mostly German, 
  but still worth integrating imo
2026-06-13 19:06:18 +02:00
Bnyro c58391d673 [feat] engines: add fastbot engine (general) (#6232)
- adds support for https://fastbot.de
- the results are really fast and mostly in English (even though it's a German
  engine)
2026-06-13 19:04:39 +02:00
Bnyro c3284c8238 [chore] make data.traits (#6211) 2026-06-13 18:37:57 +02:00
Bnyro 290d3e0c6a [feat] engines: add privacywall engine (#6211)
- add https://privacywall.org support
- the engine seems to use the Bing index, but not 100% sure
- it claims to be privacy friendly, but it's not really by itself [1]

[1]: https://discuss.privacyguides.net/t/how-is-privacy-wall-search-engine/29486
2026-06-13 18:37:57 +02:00
Bnyro 0608dfa4d1 [feat] autocomplete: add privacywall autocompleter (#6211) 2026-06-13 18:37:57 +02:00
Bnyro 1184b3212f [feat] engines: add podchaser podcast engine (#6202)
- add podchaser podcast engine
- the motivation is that podcastindex had to be removed, see #6140
2026-06-13 18:04:21 +02:00
Bnyro 65e0e4c069 [feat] engines: add vuhuv engine (#6196) 2026-06-13 17:52:43 +02:00
Bnyro d14fa1f6e2 [chore] data: add resulthunter engine traits 2026-06-13 17:21:52 +02:00
Bnyro 2d248704fa [feat] engines: add resulthunter 2026-06-13 17:21:52 +02:00
Markus Heiser 3096b1218f [mod] add type definitions for engine's "about" section (#6231)
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2026-06-13 17:05:59 +02:00
Bnyro 82a8a90230 [feat] engines: add abcnyheter engine (general, norway) (#6231)
Add support for https://startsiden.abcnyheter.no, a netherlandish search engine
that probably uses Google or Bing? idk it also returns English results, but
e.g. ``test`` returns mostly results from netherlands.
2026-06-13 17:05:59 +02:00
Bnyro e3d4fbe570 [feat] engines: add s1search general engine (#6186)
S1Search provides various different search services, which all seem
to be somewhat based on Google and Yahoo. The site looks kinda suspicious,
but the results are fine.

You can find a list of their engines by using a subdomain finder like
https://web-toolbox.dev/en/tools/subdomain-lookup and search for `s1search.co`.
2026-06-13 14:18:04 +02:00
Bnyro 031747f29e [feat] engines: add chatnoir general engine (#6183)
Chatnoir is an open source search engine developed by universities, based on
CommonCrawl (and others).  It's uncommented by default - we don't want to
overload the universities with bot traffic that targets SearXNG (sad truth why
we can't have nice things anymore)
2026-06-13 13:52:01 +02:00
Markus Heiser e3bd7f5df1 [mod] image results: add list of alternative formats (#6153)
* [mod] template images.html: reformatted for readability (no func change)

In preparation for upcoming changes, the template is being reformatted for
better readability; no functional changes are being made.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

* [mod] image results: add list of alternative formats

To test alternatives formats apply patch from below, query ``!flaticon bmw`` and
open the detail view for the image.

    diff --git a/searx/engines/flaticon.py b/searx/engines/flaticon.py
    index 06b6a8e25..d88388705 100644
    --- a/searx/engines/flaticon.py
    +++ b/searx/engines/flaticon.py
    @@ -8,7 +8,7 @@ from urllib.parse import urlencode

     import typing as t

    -from searx.result_types import EngineResults
    +from searx.result_types import EngineResults, ImageRef

     if t.TYPE_CHECKING:
         from searx.extended_types import SXNG_Response
    @@ -61,6 +61,14 @@ def response(resp: "SXNG_Response"):
                     thumbnail_src=_fix_url(result["png"]),
                     img_src=_fix_url(result["png512"]),
                     author=result["team_name"],
    +                formats=[
    +                    ImageRef(label="PNG 100x100", url="https://example.org/test.png", subtype="png"),
    +                    ImageRef(label="SVG", url="https://example.org/test.svg", subtype="svg+xml"),
    +                    ImageRef(url="https://example.org/test.jpg", subtype="jpeg"),
    +                    ImageRef(url="https://example.org/test.bmp", subtype="bmp"),
    +                    ImageRef(url="https://example.org/test.ico", subtype="x-icon"),
    +                    ImageRef(url="https://example.org/test.tif", subtype="tiff"),
    +                ],
                 )
             )

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

---------

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2026-06-13 13:28:05 +02:00
19 changed files with 1850 additions and 50 deletions
+1
View File
@@ -43,6 +43,7 @@
- ``google``
- ``mwmbl``
- ``naver``
- ``privacywall``
- ``quark``
- ``qwant``
- ``seznam``
+1 -1
View File
@@ -87,7 +87,7 @@ Parameters
``autocomplete`` : default from :ref:`settings search`
[ ``google``, ``dbpedia``, ``duckduckgo``, ``mwmbl``, ``startpage``,
``wikipedia``, ``swisscows``, ``qwant`` ]
``privacywall``, ``wikipedia``, ``swisscows``, ``qwant`` ]
Service which completes words as you type.
+18
View File
@@ -179,6 +179,23 @@ def naver(query: str, _sxng_locale: str) -> list[str]:
return results
def privacywall(query: str, sxng_locale: str) -> list[str]:
# Privacywall search autocompleter
country = None
if "-" in sxng_locale:
country = sxng_locale.split("-")[1]
args = {'q': query, 'cc': country}
url = f"https://www.privacywall.org/search/secure/suggestions.php?{urlencode(args)}"
response = get(url)
if not response.ok:
return []
data: list[list[str]] = response.json()
return data[1]
def qihu360search(query: str, _sxng_locale: str) -> list[str]:
# 360Search search autocompleter
url = f"https://sug.so.360.cn/suggest?{urlencode({'format': 'json', 'word': query})}"
@@ -361,6 +378,7 @@ backends: dict[str, t.Callable[[str, str], list[str]]] = {
'google': google_complete,
'mwmbl': mwmbl,
'naver': naver,
'privacywall': privacywall,
'quark': quark,
'qwant': qwant,
'seznam': seznam,
+466 -1
View File
@@ -6634,6 +6634,255 @@
},
"regions": {}
},
"privacywall": {
"all_locale": null,
"custom": {},
"data_type": "traits_v1",
"languages": {},
"regions": {
"bg-BG": "BG",
"cs-CZ": "CZ",
"da-DK": "DK",
"de-AT": "AT",
"de-BE": "BE",
"de-CH": "CH",
"de-DE": "DE",
"de-LI": "LI",
"de-LU": "LU",
"el-CY": "CY",
"el-GR": "GR",
"en-AU": "AU",
"en-CA": "CA",
"en-GB": "GB",
"en-HK": "HK",
"en-IE": "IE",
"en-IN": "IN",
"en-MT": "MT",
"en-NZ": "NZ",
"en-PH": "PH",
"en-SG": "SG",
"en-US": "US",
"es-AR": "AR",
"es-CL": "CL",
"es-CO": "CO",
"es-ES": "ES",
"es-MX": "MX",
"es-PE": "PE",
"es-VE": "VE",
"et-EE": "EE",
"fi-FI": "FI",
"fil-PH": "PH",
"fr-BE": "BE",
"fr-CA": "CA",
"fr-CH": "CH",
"fr-FR": "FR",
"fr-LU": "LU",
"ga-IE": "IE",
"gsw-CH": "CH",
"gsw-LI": "LI",
"hi-IN": "IN",
"hr-HR": "HR",
"hu-HU": "HU",
"id-ID": "ID",
"it-CH": "CH",
"it-IT": "IT",
"ja-JP": "JP",
"ko-KR": "KR",
"lb-LU": "LU",
"lt-LT": "LT",
"lv-LV": "LV",
"mi-NZ": "NZ",
"ms-MY": "MY",
"ms-SG": "SG",
"mt-MT": "MT",
"nb-NO": "NO",
"nl-BE": "BE",
"nl-NL": "NL",
"nn-NO": "NO",
"pl-PL": "PL",
"pt-BR": "BR",
"pt-PT": "PT",
"qu-PE": "PE",
"ro-RO": "RO",
"sk-SK": "SK",
"sl-SI": "SI",
"sv-FI": "FI",
"sv-SE": "SE",
"ta-SG": "SG",
"th-TH": "TH",
"tr-CY": "CY",
"vi-VN": "VN",
"zh-HK": "HK",
"zh-SG": "SG",
"zh-TW": "TW"
}
},
"privacywall images": {
"all_locale": null,
"custom": {},
"data_type": "traits_v1",
"languages": {},
"regions": {
"bg-BG": "BG",
"cs-CZ": "CZ",
"da-DK": "DK",
"de-AT": "AT",
"de-BE": "BE",
"de-CH": "CH",
"de-DE": "DE",
"de-LI": "LI",
"de-LU": "LU",
"el-CY": "CY",
"el-GR": "GR",
"en-AU": "AU",
"en-CA": "CA",
"en-GB": "GB",
"en-HK": "HK",
"en-IE": "IE",
"en-IN": "IN",
"en-MT": "MT",
"en-NZ": "NZ",
"en-PH": "PH",
"en-SG": "SG",
"en-US": "US",
"es-AR": "AR",
"es-CL": "CL",
"es-CO": "CO",
"es-ES": "ES",
"es-MX": "MX",
"es-PE": "PE",
"es-VE": "VE",
"et-EE": "EE",
"fi-FI": "FI",
"fil-PH": "PH",
"fr-BE": "BE",
"fr-CA": "CA",
"fr-CH": "CH",
"fr-FR": "FR",
"fr-LU": "LU",
"ga-IE": "IE",
"gsw-CH": "CH",
"gsw-LI": "LI",
"hi-IN": "IN",
"hr-HR": "HR",
"hu-HU": "HU",
"id-ID": "ID",
"it-CH": "CH",
"it-IT": "IT",
"ja-JP": "JP",
"ko-KR": "KR",
"lb-LU": "LU",
"lt-LT": "LT",
"lv-LV": "LV",
"mi-NZ": "NZ",
"ms-MY": "MY",
"ms-SG": "SG",
"mt-MT": "MT",
"nb-NO": "NO",
"nl-BE": "BE",
"nl-NL": "NL",
"nn-NO": "NO",
"pl-PL": "PL",
"pt-BR": "BR",
"pt-PT": "PT",
"qu-PE": "PE",
"ro-RO": "RO",
"sk-SK": "SK",
"sl-SI": "SI",
"sv-FI": "FI",
"sv-SE": "SE",
"ta-SG": "SG",
"th-TH": "TH",
"tr-CY": "CY",
"vi-VN": "VN",
"zh-HK": "HK",
"zh-SG": "SG",
"zh-TW": "TW"
}
},
"privacywall videos": {
"all_locale": null,
"custom": {},
"data_type": "traits_v1",
"languages": {},
"regions": {
"bg-BG": "BG",
"cs-CZ": "CZ",
"da-DK": "DK",
"de-AT": "AT",
"de-BE": "BE",
"de-CH": "CH",
"de-DE": "DE",
"de-LI": "LI",
"de-LU": "LU",
"el-CY": "CY",
"el-GR": "GR",
"en-AU": "AU",
"en-CA": "CA",
"en-GB": "GB",
"en-HK": "HK",
"en-IE": "IE",
"en-IN": "IN",
"en-MT": "MT",
"en-NZ": "NZ",
"en-PH": "PH",
"en-SG": "SG",
"en-US": "US",
"es-AR": "AR",
"es-CL": "CL",
"es-CO": "CO",
"es-ES": "ES",
"es-MX": "MX",
"es-PE": "PE",
"es-VE": "VE",
"et-EE": "EE",
"fi-FI": "FI",
"fil-PH": "PH",
"fr-BE": "BE",
"fr-CA": "CA",
"fr-CH": "CH",
"fr-FR": "FR",
"fr-LU": "LU",
"ga-IE": "IE",
"gsw-CH": "CH",
"gsw-LI": "LI",
"hi-IN": "IN",
"hr-HR": "HR",
"hu-HU": "HU",
"id-ID": "ID",
"it-CH": "CH",
"it-IT": "IT",
"ja-JP": "JP",
"ko-KR": "KR",
"lb-LU": "LU",
"lt-LT": "LT",
"lv-LV": "LV",
"mi-NZ": "NZ",
"ms-MY": "MY",
"ms-SG": "SG",
"mt-MT": "MT",
"nb-NO": "NO",
"nl-BE": "BE",
"nl-NL": "NL",
"nn-NO": "NO",
"pl-PL": "PL",
"pt-BR": "BR",
"pt-PT": "PT",
"qu-PE": "PE",
"ro-RO": "RO",
"sk-SK": "SK",
"sl-SI": "SI",
"sv-FI": "FI",
"sv-SE": "SE",
"ta-SG": "SG",
"th-TH": "TH",
"tr-CY": "CY",
"vi-VN": "VN",
"zh-HK": "HK",
"zh-SG": "SG",
"zh-TW": "TW"
}
},
"qwant": {
"all_locale": null,
"custom": {},
@@ -7175,6 +7424,222 @@
},
"regions": {}
},
"resulthunter": {
"all_locale": "all",
"custom": {
"ui_lang": {
"az": "az",
"bg": "bg",
"br": "br",
"ca": "ca",
"cs": "cs",
"cy": "cy",
"da": "da",
"de-DE": "de-de",
"el": "el",
"en-CA": "en-ca",
"en-GB": "en-gb",
"en-IN": "en-in",
"en-US": "en-us",
"es": "es",
"et": "et",
"eu": "eu",
"fi-FI": "fi-fi",
"fr-CA": "fr-ca",
"fr-FR": "fr-fr",
"gl": "gl",
"hr": "hr",
"hu": "hu",
"id": "id",
"it": "it",
"ja-JP": "ja-jp",
"ka": "ka",
"ko": "ko",
"lt": "lt",
"lv": "lv",
"ms": "ms",
"nb": "nb",
"nl": "nl",
"pl": "pl",
"pt-BR": "pt-br",
"ro": "ro",
"ru": "ru",
"sk": "sk",
"sl": "sl",
"sq-AL": "sq-al",
"sr": "sr",
"sr_Latn": "sr-latn",
"sv": "sv",
"sw-KE": "sw-ke",
"th": "th",
"tr": "tr",
"uk": "uk",
"vi": "vi",
"zh": "zh",
"zh-TW": "zh-tw"
}
},
"data_type": "traits_v1",
"languages": {},
"regions": {
"ar-SA": "sa",
"da-DK": "dk",
"de-AT": "at",
"de-BE": "be",
"de-CH": "ch",
"de-DE": "de",
"en-AU": "au",
"en-CA": "ca",
"en-GB": "gb",
"en-HK": "hk",
"en-IN": "in",
"en-NZ": "nz",
"en-PH": "ph",
"en-US": "us",
"en-ZA": "za",
"es-AR": "ar",
"es-CL": "cl",
"es-ES": "es",
"es-MX": "mx",
"fi-FI": "fi",
"fil-PH": "ph",
"fr-BE": "be",
"fr-CA": "ca",
"fr-CH": "ch",
"fr-FR": "fr",
"gsw-CH": "ch",
"hi-IN": "in",
"id-ID": "id",
"it-CH": "ch",
"it-IT": "it",
"ja-JP": "jp",
"ko-KR": "kr",
"mi-NZ": "nz",
"ms-MY": "my",
"nb-NO": "no",
"nl-BE": "be",
"nl-NL": "nl",
"nn-NO": "no",
"pl-PL": "pl",
"pt-BR": "br",
"pt-PT": "pt",
"ru-RU": "ru",
"sv-FI": "fi",
"sv-SE": "se",
"tr-TR": "tr",
"zh-CN": "cn",
"zh-HK": "hk",
"zh-TW": "tw"
}
},
"resulthunter images": {
"all_locale": "all",
"custom": {
"ui_lang": {
"az": "az",
"bg": "bg",
"br": "br",
"ca": "ca",
"cs": "cs",
"cy": "cy",
"da": "da",
"de-DE": "de-de",
"el": "el",
"en-CA": "en-ca",
"en-GB": "en-gb",
"en-IN": "en-in",
"en-US": "en-us",
"es": "es",
"et": "et",
"eu": "eu",
"fi-FI": "fi-fi",
"fr-CA": "fr-ca",
"fr-FR": "fr-fr",
"gl": "gl",
"hr": "hr",
"hu": "hu",
"id": "id",
"it": "it",
"ja-JP": "ja-jp",
"ka": "ka",
"ko": "ko",
"lt": "lt",
"lv": "lv",
"ms": "ms",
"nb": "nb",
"nl": "nl",
"pl": "pl",
"pt-BR": "pt-br",
"ro": "ro",
"ru": "ru",
"sk": "sk",
"sl": "sl",
"sq-AL": "sq-al",
"sr": "sr",
"sr_Latn": "sr-latn",
"sv": "sv",
"sw-KE": "sw-ke",
"th": "th",
"tr": "tr",
"uk": "uk",
"vi": "vi",
"zh": "zh",
"zh-TW": "zh-tw"
}
},
"data_type": "traits_v1",
"languages": {},
"regions": {
"ar-SA": "sa",
"da-DK": "dk",
"de-AT": "at",
"de-BE": "be",
"de-CH": "ch",
"de-DE": "de",
"en-AU": "au",
"en-CA": "ca",
"en-GB": "gb",
"en-HK": "hk",
"en-IN": "in",
"en-NZ": "nz",
"en-PH": "ph",
"en-US": "us",
"en-ZA": "za",
"es-AR": "ar",
"es-CL": "cl",
"es-ES": "es",
"es-MX": "mx",
"fi-FI": "fi",
"fil-PH": "ph",
"fr-BE": "be",
"fr-CA": "ca",
"fr-CH": "ch",
"fr-FR": "fr",
"gsw-CH": "ch",
"hi-IN": "in",
"id-ID": "id",
"it-CH": "ch",
"it-IT": "it",
"ja-JP": "jp",
"ko-KR": "kr",
"mi-NZ": "nz",
"ms-MY": "my",
"nb-NO": "no",
"nl-BE": "be",
"nl-NL": "nl",
"nn-NO": "no",
"pl-PL": "pl",
"pt-BR": "br",
"pt-PT": "pt",
"ru-RU": "ru",
"sv-FI": "fi",
"sv-SE": "se",
"tr-TR": "tr",
"zh-CN": "cn",
"zh-HK": "hk",
"zh-TW": "tw"
}
},
"sepiasearch": {
"all_locale": null,
"custom": {},
@@ -9120,4 +9585,4 @@
},
"regions": {}
}
}
}
+47 -14
View File
@@ -3,6 +3,7 @@
- :py:obj:`searx.enginelib.EngineCache`
- :py:obj:`searx.enginelib.Engine`
- :py:obj:`searx.enginelib.EngineAbout`
- :py:obj:`searx.enginelib.traits`
There is a command line for developer purposes and for deeper analysis. Here is
@@ -23,7 +24,7 @@ an example in which the command line is called in the development environment::
"""
__all__ = ["EngineCache", "Engine", "ENGINES_CACHE"]
__all__ = ["EngineCache", "Engine", "EngineAbout", "ENGINES_CACHE"]
import typing as t
import abc
@@ -31,6 +32,7 @@ from collections.abc import Callable
import logging
import string
import typer
import msgspec
from ..cache import ExpireCacheSQLite, ExpireCacheCfg
@@ -178,6 +180,48 @@ class EngineCache:
return ENGINES_CACHE.secret_hash(name=name)
class EngineAbout(msgspec.Struct):
"""Additional fields describing the engine.
.. code:: yaml
about:
website: https://example.com
wikidata_id: Q306656
official_api_documentation: https://example.com/api-doc
use_official_api: true
require_api_key: true
results: HTML
"""
# pylint: disable=too-few-public-methods
website: str = ""
"""Official web-site of the origin."""
wikidata_id: str = ""
"""`Wikidata ID <https://www.wikidata.org/wiki/Wikidata:Identifiers>`_"""
official_api_documentation: str = ""
"""URL of the official API (regardless of whether it is used)"""
use_official_api: bool = False
"""SearXNG engine makes use of the official API or not"""
require_api_key: bool = False
"""API requires a key or not."""
results: str = ""
"""Data format of the source (online-engines: of the response)."""
language: str = ""
"""If the engine supports only one language, this language is specified
here (``en``, ``de``, ``"no"`` or ..); otherwise, the value remains empty.
For the YAML configuration: think of the `YAML-Norway problem
<https://ruuda.nl/2023/the-yaml-document-from-hell#the-norway-problem>`_
"""
class Engine(abc.ABC): # pylint: disable=too-few-public-methods
"""Class of engine instances build from YAML settings.
@@ -282,19 +326,8 @@ class Engine(abc.ABC): # pylint: disable=too-few-public-methods
inactive: bool
"""Remove the engine from the settings (*disabled & removed*)."""
about: dict[str, dict[str, str]]
"""Additional fields describing the engine.
.. code:: yaml
about:
website: https://example.com
wikidata_id: Q306656
official_api_documentation: https://example.com/api-doc
use_official_api: true
require_api_key: true
results: HTML
"""
about: EngineAbout
"""Additional fields describing the engine."""
using_tor_proxy: bool
"""Using tor proxy (``true``) or not (``false``) for this engine."""
+134
View File
@@ -0,0 +1,134 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Chatnoir is an open source search engine developed by Webis, a network of
researchers from the universities of Weimar, Halle and Leipzig. It supports
different different text corpora as indexes, e.g. CommonCrawl. See its
`announcement`_ for more information.
.. _announcement : https://groups.google.com/g/common-crawl/c/3o2dOHpeRxo/m/H2Osqz9dAAAJ
"""
import typing as t
from searx.exceptions import SearxEngineAPIException
from searx.extended_types import SXNG_Response
from searx.network import get, post
from searx.result_types import EngineResults
from searx.utils import html_to_text
from searx.enginelib import EngineCache
if t.TYPE_CHECKING:
from searx.search.processors import OnlineParams
about = {
"website": "https://www.chatnoir.eu",
"official_api_documentation": "https://www.chatnoir.eu/docs/api-general",
"use_official_api": True,
"require_api_key": False,
"results": "JSON",
}
base_url = "https://www.chatnoir.eu"
categories = ["general"]
paging = True
page_size = 10
api_key = ""
"""You can optionally provide your own API key here. This one will then be used
instead of scraping an API key."""
search_index = "cw22"
"""Search index to browse in. See `the API documentation
<https://www.chatnoir.eu/docs/api-general>`_ for a full list."""
CACHE: EngineCache
"""Cache to store session info (i.e. api key, csrf token, session id)."""
def setup(engine_settings: dict[str, t.Any]) -> bool:
global CACHE # pylint: disable=global-statement
CACHE = EngineCache(engine_settings["name"])
return True
def _obtain_api_key() -> tuple[str, str, str]:
cached_session = CACHE.get("session")
if cached_session:
return tuple(cached_session.split("|"))
home_resp = get(base_url)
if not home_resp.ok:
raise SearxEngineAPIException("failed to obtain api key")
csrf_token = home_resp.cookies["csrftoken"]
token_resp = post(
"https://www.chatnoir.eu/?init",
headers={
"Referer": f"{base_url}/",
"X-Requested-With": "XMLHttpRequest",
"X-Csrf-Token": csrf_token,
},
cookies=home_resp.cookies,
)
if not token_resp.ok:
raise SearxEngineAPIException("failed to obtain api key")
session_id = token_resp.cookies["sessionid"]
scraped_api_key = token_resp.json()["token"]["token"]
# session keys seem to become rate-limited very fast, so only remembering
# for 1 minute here
CACHE.set("session", f"{csrf_token}|{session_id}|{scraped_api_key}", expire=60)
return csrf_token, session_id, scraped_api_key
def request(query: str, params: "OnlineParams"):
if api_key:
# use user-provided API key instead of scraping one
headers = {
"Authorization": f"Bearer {api_key}",
}
params["headers"].update(headers)
else:
csrf_token, session_id, scraped_api_key = _obtain_api_key()
headers = {
"Authorization": f"Bearer {scraped_api_key}",
"X-Csrf-Token": csrf_token,
}
params["headers"].update(headers)
params["cookies"] = {"csrftoken": session_id, "sessionid": session_id}
params["url"] = f"{base_url}/api/v1/_search"
params["method"] = "POST"
json_data = {
"query": query,
"index": [
search_index,
],
"from": (params["pageno"] - 1) * page_size,
"size": page_size,
"_extended_meta": True,
}
params["json"] = json_data
def response(resp: "SXNG_Response") -> EngineResults:
res = EngineResults()
results = resp.json()["results"]
for result in results:
res.add(
res.types.MainResult(
url=result["target_uri"],
title=html_to_text(result["title"]),
content=html_to_text(result["snippet"]),
)
)
return res
+1
View File
@@ -63,6 +63,7 @@ def response(resp: "SXNG_Response"):
url=_fix_url(result["slug"]),
thumbnail_src=_fix_url(result["png"]),
img_src=_fix_url(result["png512"]),
img_format="PNG",
author=result["team_name"],
)
)
+210
View File
@@ -0,0 +1,210 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Luxxle_ is an American search engine focusing on providing "unbiased"
results.
.. _Luxxle: https://luxxle.com
"""
from json import dumps
from urllib.parse import quote_plus, unquote_plus
import typing as t
from lxml import html
from searx.result_types import EngineResults
from searx.network import get
from searx.utils import (
extr,
gen_useragent,
eval_xpath_list,
extract_text,
eval_xpath,
parse_duration_string,
ElementType,
)
if t.TYPE_CHECKING:
from searx.search.processors import OnlineParams
from searx.extended_types import SXNG_Response
about = {
"website": "https://luxxle.com",
"official_api_documentation": None,
"use_official_api": False,
"require_api_key": False,
"results": "HTML",
}
categories = []
safeseach = True
base_url = "https://luxxle.com"
luxxle_categ = "search"
"""Supported categories: "search", "news", "images", "videos"."""
# otherwise all requests get blocked (http2-fingerprinted probably)
enable_http2 = False
safe_search_map = {0: "Off", 1: "Moderate", 2: "Strict"}
def init(_):
if luxxle_categ not in ("search", "images", "videos", "news"):
raise ValueError("invalid luxxle category: %s" % luxxle_categ)
def _obtain_telemetry_data(query: str) -> dict[str, str]:
"""This data is required for sending search queries.
The luxsearch page (for general results) has a JS dict called ``telemetryData``
that contains all the important info, but the others don't, so we don't use it
here. But it's useful to understand which info is needed.
.. code-block:: javascript
var telemetryData = {
errorInformation: errorInformation,
query: "youapps club",
ip: "10.10.10.10",
timeOf: "1781119224",
authorization: "db889e0ae67d3c320858ad97f51cc4f0a4d8e1913c4f5ebe5d2eafef606521dd",
};
This data is only valid for very short times
"""
resp = get(
f"{base_url}/lux{luxxle_categ}?q={quote_plus(query)}", headers={"User-Agent": gen_useragent(), "Sec-GPC": "1"}
)
def extr_js_variable(name: str) -> str:
val = extr(resp.text, f"var {name} = \"", "\";")
if not val:
val = extr(resp.text, f"var {name} = '", "';")
return val
return {
"ip": extr_js_variable("ip"),
"timeOf": extr_js_variable("timeOf"),
"authorization": extr_js_variable("authorization"),
"preferencesCookie": extr_js_variable("preferencesCookie"),
}
def request(query: str, params: "OnlineParams") -> None:
telemetry_data = _obtain_telemetry_data(query)
market = params["searxng_locale"]
if market == "all":
market = "en-US"
params["url"] = f"{base_url}/load_{luxxle_categ}.php"
search_data = {
**telemetry_data,
"query": query,
"market": market,
"safeSearch": safe_search_map[params["safesearch"]],
"freshness": "",
"language": "english", # UI language
}
if luxxle_categ == "images":
# for some reason this is sent as form data
params["data"] = {"searchData": dumps(search_data)}
else:
params["json"] = {"searchData": search_data}
params["method"] = "POST"
def _extract_url_from_redirect(url: str):
# urls usually look like "/redirect?url=<url>"
query_start_idx = url.find("?url=")
if query_start_idx < 0:
return url
url_start_idx = query_start_idx + len("?url=")
return unquote_plus(url[url_start_idx:])
def _general_results(doc: ElementType, res: EngineResults):
for result in eval_xpath_list(doc, "//div[@id='mainResults']/div[contains(@class, 'resultsContainer')]"):
res.add(
res.types.MainResult(
url=_extract_url_from_redirect(
extract_text(eval_xpath(result, "./div[contains(@class, 'urlAddressLink')]/a/@href")) or ""
),
title=extract_text(eval_xpath(result, "./div[contains(@class, 'urlname')]")) or "",
content=extract_text(eval_xpath(result, "./div[contains(@class, 'urlSnippet')]")) or "",
)
)
def _news_results(doc: ElementType, res: EngineResults):
for result in eval_xpath_list(
doc, "//div[contains(@class, 'newsResults')]/div[contains(@class, 'mediaResultNewsPage')]"
):
res.add(
res.types.MainResult(
url=_extract_url_from_redirect(
extract_text(eval_xpath(result, ".//div[contains(@class, 'mediaResultNewsPageTitle')]/a/@href"))
or ""
),
title=extract_text(eval_xpath(result, ".//div[contains(@class, 'mediaResultNewsPageTitle')]/a")) or "",
content=extract_text(eval_xpath(result, ".//div[contains(@class, 'mediaResultNewsPageDescription')]"))
or "",
thumbnail=extract_text(eval_xpath(result, ".//div[contains(@class, 'mediaResultThumbnail')]//img/@src"))
or "",
)
)
def _video_results(doc: ElementType, res: EngineResults):
for result in eval_xpath_list(doc, "//div[@id='mainResults']/div[contains(@class, 'mediaResult')]"):
res.add(
res.types.MainResult(
template="videos.html",
url=extract_text(eval_xpath(result, "./@data-url")) or "",
title=extract_text(eval_xpath(result, ".//div[contains(@class, 'mediaResultTitleVideo')]/a")) or "",
content=extract_text(eval_xpath(result, ".//div[contains(@class, 'mediaResultDescription')]")) or "",
thumbnail=extract_text(eval_xpath(result, ".//img[contains(@class, 'videoThumbnail')]/@src")) or "",
author=extract_text(eval_xpath(result, ".//div[contains(@class, 'videoCreator')]")) or "",
length=parse_duration_string(
extract_text(eval_xpath(result, ".//span[contains(@class, 'mediaResultDuration')]")) or ""
),
)
)
def _image_results(doc: ElementType, res: EngineResults):
for result in eval_xpath_list(doc, "//div[contains(@class, 'imageResultsWrapper')]/div"):
res.add(
res.types.Image(
url=_extract_url_from_redirect(
extract_text(eval_xpath(result, ".//a[contains(@class, 'imageResultSource')]/@href")) or ""
),
title=extract_text(eval_xpath(result, ".//a[contains(@class, 'imageResultTitle')]")) or "",
source=extract_text(eval_xpath(result, ".//div[contains(@class, 'imageResultSource')]")) or "",
thumbnail_src=extract_text(eval_xpath(result, "./@data-thumbnail-src")) or "",
img_src=extract_text(eval_xpath(result, "./@data-image-src")) or "",
)
)
def response(resp: "SXNG_Response") -> EngineResults:
doc = html.fromstring(resp.text)
res = EngineResults()
match luxxle_categ:
case "search":
_general_results(doc, res)
case "images":
_image_results(doc, res)
case "videos":
_video_results(doc, res)
case "news":
_news_results(doc, res)
case _:
raise ValueError("unsupported category: %s" % luxxle_categ)
return res
+62
View File
@@ -0,0 +1,62 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Podchaser (podcasts)"""
import typing as t
from datetime import datetime
from urllib.parse import urlencode
from searx.result_types import EngineResults
if t.TYPE_CHECKING:
from searx.extended_types import SXNG_Response
from searx.search.processors import OnlineParams
about = {
"website": "https://www.podchaser.com",
"official_api_documentation": "https://www.podchaser.com/api",
"use_official_api": False,
"require_api_key": False,
"results": "JSON",
}
categories = []
paging = True
base_url = "https://api.podchaser.com"
page_size = 25
def request(query: str, params: "OnlineParams") -> None:
args = {
"filters[term]": query,
"limit": page_size,
"offset": (params["pageno"] - 1) * page_size,
"sort_direction": "desc",
"sort_order": "SORT_ORDER_RELEVANCE",
}
params["url"] = f"{base_url}/podcasts?{urlencode(args)}"
params["headers"]["Accept"] = "application/prs.podchaser.v2+json"
def response(resp: "SXNG_Response"):
res = EngineResults()
json_results: list[dict[str, str]] = resp.json()["entities"] # pyright: ignore[reportAny]
for result in json_results:
metadata = [f"{result['number_of_episodes']} episodes"]
if result["categories"]:
metadata.append(", ".join(c["text"] for c in result["categories"])) # pyright: ignore[reportArgumentType]
res.add(
res.types.MainResult(
url=result["feed_url"],
title=result["title"],
content=result["description"],
thumbnail=result["image_url"],
publishedDate=datetime.strptime(result["created_at"], "%Y-%m-%d %H:%M:%S"),
metadata=" | ".join(metadata),
)
)
return res
+217
View File
@@ -0,0 +1,217 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Privacywall_ claims to be a "privacy-friendly" search engine,
but according to a `Privacyguides discussion`_ it's sharing private
user information with Microsoft and Amazon.
.. _Privacywall : https://www.privacywall.org
.. _`Privacyguides discussion` : https://discuss.privacyguides.net/t/how-is-privacy-wall-search-engine/29486
"""
import typing as t
from urllib.parse import urlencode, unquote_plus
from lxml import html
import babel
from searx.enginelib.traits import EngineTraits
from searx.utils import eval_xpath_list, eval_xpath, extract_text, get_embeded_stream_url, extr
from searx.locales import region_tag
from searx.result_types import EngineResults
if t.TYPE_CHECKING:
from lxml.etree import ElementBase
from searx.extended_types import SXNG_Response
from searx.search.processors import OnlineParams
about = {
"website": "https://privacywall.org",
"wikidata_id": None,
"official_api_documentation": None,
"use_official_api": False,
"require_api_key": False,
"results": "HTML",
}
paging = True
safesearch = True
time_range_support = True
base_url = "https://www.privacywall.org"
privacywall_category = "general"
"""Supported categories are ``general``, ``videos`` and ``images``."""
# corresponds to the "k" query param
safesearch_map = {0: "off", 1: "on", 2: "on"}
# page number sent for videos (is independent of the query) - certainly there's
# a pattern in this, but for our use case it's enough to just support the first
# 10 pages by hardcoding the page "numbers"
video_page_map = {
2: "CAoQAA",
3: "CBQQAA",
4: "CB4QAA",
5: "CCgQAA",
6: "CDIQAA",
7: "CDwQAA",
8: "CEYQAA",
9: "CFAQAA",
10: "CFoQAA",
}
def init(_):
if privacywall_category not in ("general", "images", "videos"):
raise ValueError("invalid category: %s" % privacywall_category)
def request(query: str, params: "OnlineParams") -> None:
if params["pageno"] > 10:
params["url"] = None
return
args = {"q": query, "safesearch": safesearch_map[params["safesearch"]]}
if params["searxng_locale"] != "all":
args["cc"] = traits.get_region(params["searxng_locale"]) or "US"
if params["time_range"]:
# time range uses the same "day", "week", "month", "year" naming scheme as SearXNG
args["time"] = params["time_range"]
if params["pageno"] > 1:
if privacywall_category == "images":
args["page"] = str(params["pageno"])
elif privacywall_category == "videos":
args["page"] = video_page_map[params["pageno"]]
else:
raise ValueError("general engine does not support pagination")
if privacywall_category == "general":
params["url"] = f"{base_url}/search/secure/?{urlencode(args)}"
else:
params["url"] = f"{base_url}/{privacywall_category}/?{urlencode(args)}"
def _general_results(doc: "ElementBase") -> EngineResults:
res = EngineResults()
for result in eval_xpath_list(doc, "//div[@id='pw-results-main']/div[contains(@class, 'result-card')]"):
(
res.add(
res.types.MainResult(
url=extract_text(eval_xpath(result, ".//a[contains(@class, 'result-url-anchor')]/@href")) or "",
title=extract_text(eval_xpath(result, ".//div[contains(@class, 'result_title')]")) or "",
content=extract_text(eval_xpath(result, ".//div[contains(@class, 'result-description')]")) or "",
),
)
)
return res
def _extract_thumbnail_url(url: str) -> str:
"""
Get the URL from strings like "/videos/video.php?id=<urlencoded-urlhere>".
"""
url_start = url.find("?id=") + len("?id=")
thumbnail = unquote_plus(url[url_start:])
return thumbnail
def _image_results(doc: "ElementBase") -> EngineResults:
res = EngineResults()
for result in eval_xpath_list(doc, "//div[@id='container']/div[contains(@class, 'imgcontainer')]"):
(
res.add(
res.types.Image(
url=extract_text(eval_xpath(result, "./a/@href")) or "",
content=extract_text(eval_xpath(result, "./a/@alt")) or "",
thumbnail_src=_extract_thumbnail_url(extract_text(eval_xpath(result, ".//img/@src")) or ""),
source=extract_text(eval_xpath(result, ".//div[contains(@class, 'image-source-badge')]")) or "",
),
)
)
return res
def _video_results(doc: "ElementBase") -> EngineResults:
res = EngineResults()
for result in eval_xpath_list(
doc, "//div[contains(@class, 'video-container')]/div[contains(@class, 'video-card')]"
):
url = extract_text(eval_xpath(result, "./a/@href")) or ""
if not url:
continue
thumbnail = None
# looks like <div style="background-image:url(/videos/video.php?id=<urlencoded-urlhere>);position:relative">
thumbnail_style = extract_text(eval_xpath(result, ".//div[contains(@class, 'video-img')]/@style"))
if thumbnail_style:
thumbnail = _extract_thumbnail_url(extr(thumbnail_style, ":url(", ")"))
res.add(
res.types.LegacyResult(
template="videos.html",
url=url,
title=extract_text(eval_xpath(result, ".//h2[contains(@class, 'video-card-title')]")) or "",
content=extract_text(eval_xpath(result, ".//p")) or "",
thumbnail=thumbnail or "",
iframe_src=get_embeded_stream_url(url) or "",
)
)
return res
def response(resp: "SXNG_Response") -> EngineResults:
doc = html.fromstring(resp.text)
match privacywall_category:
case "general":
return _general_results(doc)
case "images":
return _image_results(doc)
case "videos":
return _video_results(doc)
case _:
raise ValueError("invalid category: %s" % privacywall_category)
def fetch_traits(engine_traits: EngineTraits) -> None:
"""Fetch regions from Bing-Web."""
# pylint: disable=import-outside-toplevel
from searx.network import get # see https://github.com/searxng/searxng/issues/762
from searx.utils import gen_useragent
headers = {
"User-Agent": gen_useragent(),
}
resp = get(base_url, headers=headers)
if not resp.ok:
raise RuntimeError("Response from Privacywall is not OK.")
dom = html.fromstring(resp.text)
# <div class="dropdown-option" onclick="changeMenuLanguage(&quot;CZ&quot;)"></div>
for onclick_listener in eval_xpath(
dom, "//div[contains(@class, 'lang-menu')]//div[contains(@class, 'dropdown-option')]/@onclick"
):
# this is either a normal lang-country tag (e.g. cs-cz) or only a country code (e.g. de, at, ...)
country_tag = extr(onclick_listener, "(\"", "\")")
# the locale tag is only a country tag, so we get languages the from the list of official languages
# of the country
lang_tag: str
for lang_tag in babel.languages.get_official_languages(country_tag, de_facto=True): # pyright: ignore
try:
sxng_tag = region_tag(babel.Locale.parse(f"{lang_tag}_{country_tag.upper()}"))
except babel.UnknownLocaleError:
# silently ignore unknown languages
continue
conflict = engine_traits.regions.get(sxng_tag)
if conflict:
if conflict != sxng_tag:
print("CONFLICT: babel %s --> %s" % (sxng_tag, conflict))
continue
engine_traits.regions[sxng_tag] = country_tag
+120
View File
@@ -0,0 +1,120 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Resulthunter_ is an American search engine with results from Brave.
.. _Resulthunter : https://resulthunter.com
"""
import typing as t
from urllib.parse import urlencode
from lxml import html
from searx import locales
from searx.result_types import EngineResults
from searx.utils import eval_xpath_list, eval_xpath, extract_text
# as it uses brave internally, it has the same locales and timerange/safesearch types
from searx.engines.brave import safesearch_map, time_range_map, fetch_traits # pylint: disable=unused-import
if t.TYPE_CHECKING:
from lxml.etree import ElementBase
from searx.extended_types import SXNG_Response
from searx.search.processors import OnlineParams
from searx.enginelib.traits import EngineTraits
traits: EngineTraits
about = {
"website": "https://resulthunter.com",
"wikidata_id": None,
"official_api_documentation": None,
"use_official_api": False,
"require_api_key": False,
"results": "HTML",
}
paging = True
safesearch = True
time_range_support = True
base_url = "https://resulthunter.com"
resulthunter_categ = "web"
"""Supported categories are ``web`` and ``images``."""
def init(_):
if resulthunter_categ not in ("web", "images"):
raise ValueError("invalid category: %s" % resulthunter_categ)
def request(query: str, params: "OnlineParams") -> None:
args = {
"q": query,
"search_type": resulthunter_categ,
"offset": params["pageno"] - 1,
}
# uses Brave's engine traits
ui_lang = locales.get_engine_locale(params["searxng_locale"], traits.custom["ui_lang"], "all")
if ui_lang and ui_lang != "all":
args["search_lang"] = ui_lang.split("-")[0]
engine_region = traits.get_region(params["searxng_locale"], "all")
if engine_region and engine_region != "all":
args["country"] = engine_region
if params["time_range"]:
args["freshness"] = time_range_map[params["time_range"]]
params["cookies"]["safesearch"] = safesearch_map[params["safesearch"]]
params["url"] = f"{base_url}/search?{urlencode(args)}"
def _general_results(doc: "ElementBase") -> EngineResults:
res = EngineResults()
for result in eval_xpath_list(
doc, "//div[contains(@class, 'organic-results-container')]/div/div[contains(@class, 'group')]"
):
url = extract_text(eval_xpath(result, ".//a/@href"))
if not url:
continue
(
res.add(
res.types.MainResult(
url=url,
title=extract_text(eval_xpath(result, ".//a/h3")) or "",
content=extract_text(eval_xpath(result, ".//p")) or "",
),
)
)
return res
def _image_results(doc: "ElementBase") -> EngineResults:
res = EngineResults()
for result in eval_xpath_list(
doc, "//div[contains(@class, 'organic-results-container')]//a[contains(@class, 'group')]"
):
(
res.add(
res.types.Image(
url=extract_text(eval_xpath(result, "./@href")) or "",
title=extract_text(eval_xpath(result, "./img/@alt")) or "",
thumbnail_src=extract_text(eval_xpath(result, "./img/@src")) or "",
),
)
)
return res
def response(resp: "SXNG_Response") -> EngineResults:
doc = html.fromstring(resp.text)
match resulthunter_categ:
case "web":
return _general_results(doc)
case "images":
return _image_results(doc)
case _:
raise ValueError("invalid resulthunter category: %s" % resulthunter_categ)
+98
View File
@@ -0,0 +1,98 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Search engines by System1 (general).
System1 is an advertising company, and provides all its search engines as a
subdomain of ``s1search.co``. As a result, it has more than 1000 subdomains, of
which some work, and some don't.
Some of the engines get their results from Google, others get them from Yahoo.
"""
import typing as t
from urllib.parse import urlencode, urlparse, parse_qs
from lxml import html
from searx.result_types import EngineResults
from searx.enginelib import EngineCache
from searx.utils import eval_xpath_list, eval_xpath, extract_text
if t.TYPE_CHECKING:
from searx.search.processors import OnlineParams
from searx.extended_types import SXNG_Response
about = {
"website": "https://s1search.co",
"official_api_documentation": None,
"use_official_api": False,
"require_api_key": False,
"results": "HTML",
}
base_url = "" # alternatively: search.gmx.net
categories = ["general"]
paging = True
CACHE: EngineCache
"""Cache to store verification tokens for pagination."""
def init(_):
if not base_url:
raise ValueError("base_url must be set")
def setup(engine_settings: dict[str, t.Any]) -> bool:
global CACHE # pylint: disable=global-statement
CACHE = EngineCache(engine_settings["name"])
return True
def _cache_key(query: str, pageno: int) -> str:
return f"{query}|{pageno}"
def request(query: str, params: "OnlineParams"):
args = {"q": query, "page": params["pageno"]}
if params["pageno"] > 1:
sc = CACHE.get(_cache_key(query, params["pageno"]))
# sc is required for pagination to avoid rate-limits
if not sc:
params["url"] = None
return
args["sc"] = sc
params["url"] = f"{base_url}/serp?{urlencode(args)}"
def response(resp: "SXNG_Response") -> EngineResults:
res = EngineResults()
doc = html.fromstring(resp.text)
for suggestion in eval_xpath_list(doc, "//div[@class='aylf-yahoo-bottom' or @class='aylf-yahoo-sidebar']/div"):
res.add(res.types.LegacyResult({"suggestion": extract_text(suggestion)}))
for result in eval_xpath_list(
doc, "//div[contains(@class, 'web-yahoo') or contains(@class, 'web-google')]/div[contains(@class, '__result')]"
):
res.add(
res.types.MainResult(
url=extract_text(eval_xpath(result, ".//a[contains(@class, 'title')]/@href")),
title=extract_text(eval_xpath(result, ".//a[contains(@class, 'title')]")),
content=extract_text(eval_xpath(result, ".//span[contains(@class, 'description') or @class='']")),
)
)
# store pagination keys to be able to access next pages
for page_href in eval_xpath_list(doc, "//a[contains(@class, 'pagination__num')]"):
# target_url looks like "/serp?q=test&page=2&sc=RVlBPMDPVhWR20"
target_url = extract_text(eval_xpath(page_href, "./@href"))
target_url = parse_qs(urlparse(target_url).query)
pageno = int(target_url["page"][0])
sc = target_url["sc"][0]
CACHE.set(_cache_key(resp.search_params["query"], pageno), sc)
return res
+4 -1
View File
@@ -382,6 +382,9 @@ def _get_image_result(result) -> dict[str, t.Any] | None:
size_str = "".join(filter(str.isdigit, result["filesize"]))
filesize = humanize_bytes(int(size_str))
img_format = result.get("format").upper()
if img_format == "UNKNOWN":
img_format = ""
return {
"template": "images.html",
"url": url,
@@ -390,7 +393,7 @@ def _get_image_result(result) -> dict[str, t.Any] | None:
"img_src": result.get("rawImageUrl"),
"thumbnail_src": thumbnailUrl,
"resolution": resolution,
"img_format": result.get("format"),
"img_format": img_format,
"filesize": filesize,
}
+114
View File
@@ -0,0 +1,114 @@
# SPDX-License-Identifier: AGPL-3.0-or-later
"""Vuhuv_ is a Turkish search engine, that also provides English results.
.. _Vuhuv : https://vuhuv.com
"""
import typing as t
from urllib.parse import urlencode
from lxml import html
from searx.result_types import EngineResults
from searx.utils import eval_xpath_list, eval_xpath, extract_text
if t.TYPE_CHECKING:
from lxml.etree import ElementBase
from searx.extended_types import SXNG_Response
from searx.search.processors import OnlineParams
about = {
"website": "https://vuhuv.com",
"wikidata_id": None,
"official_api_documentation": None,
"use_official_api": False,
"require_api_key": False,
"results": "HTML",
}
paging = True
base_url = "https://vuhuv.com"
vuhuv_category = "general"
"""Supported categories are ``general``, ``videos`` and ``images``."""
# corresponds to the "k" query param
category_map = {"general": 1, "images": 2, "videos": 3}
def init(_):
if vuhuv_category not in category_map:
raise ValueError("invalid category: %s" % vuhuv_category)
def request(query: str, params: "OnlineParams") -> None:
# the purpose of "d" and "dh" are unknown, but the website
# sends them, and without them the results are different
args = {"k": category_map[vuhuv_category], "p": params["pageno"], "q": query, "d": 1, "dh": 1}
params["url"] = f"{base_url}/veri2/?{urlencode(args)}"
params["headers"]["Referer"] = f"{base_url}/"
def _general_results(doc: "ElementBase") -> EngineResults:
res = EngineResults()
for result in eval_xpath_list(doc, "//div[contains(@class, 'sonuc')]/div"):
(
res.add(
res.types.MainResult(
url=extract_text(eval_xpath(result, "./a/@href")) or "",
title=extract_text(eval_xpath(result, "./a/span")) or "",
content=extract_text(eval_xpath(result, "./ins")) or "",
),
)
)
return res
def _image_results(doc: "ElementBase") -> EngineResults:
res = EngineResults()
for result in eval_xpath_list(doc, "//div[contains(@class, 'item gorsel')]"):
(
res.add(
res.types.Image(
url=extract_text(eval_xpath(result, "./a/@href")) or "",
title=extract_text(eval_xpath(result, "./a/@title")) or "",
resolution=extract_text(eval_xpath(result, "div[contains(@class, 'olculeri')]")) or "",
thumbnail_src="https:" + str(extract_text(eval_xpath(result, "./@data-kgorsel"))),
img_src=extract_text(eval_xpath(result, "./@data-resimurl")) or "",
),
)
)
return res
def _video_results(doc: "ElementBase") -> EngineResults:
res = EngineResults()
for result in eval_xpath_list(doc, "//div[contains(@class, 'item video')]"):
(
res.add(
res.types.MainResult(
template="videos.html",
url=extract_text(eval_xpath(result, "./a/@href")) or "",
title=extract_text(eval_xpath(result, "./a/@title")) or "",
content=extract_text(eval_xpath(result, ".//div[contains(@class, 'abaslik')]")) or "",
thumbnail=extract_text(eval_xpath(result, "./@data-kgorsel")) or "",
iframe_src=extract_text(eval_xpath(result, "./@data-embedurl")) or "",
),
)
)
return res
def response(resp: "SXNG_Response") -> EngineResults:
doc = html.fromstring(resp.text)
match vuhuv_category:
case "general":
return _general_results(doc)
case "images":
return _image_results(doc)
case "videos":
return _video_results(doc)
case _:
raise ValueError("invalid vuhuv category: %s" % vuhuv_category)
+3 -1
View File
@@ -24,6 +24,8 @@ __all__ = [
"Code",
"Paper",
"File",
"Image",
"ImageRef",
]
import typing as t
@@ -35,7 +37,7 @@ from .keyvalue import KeyValue
from .code import Code
from .paper import Paper
from .file import File
from .image import Image
from .image import Image, ImageRef
class ResultList(list[Result | LegacyResult], abc.ABC):
+66 -3
View File
@@ -7,14 +7,51 @@ template.
:members:
:show-inheritance:
.. autoclass:: ImageRef
:members:
"""
# pylint: disable=too-few-public-methods
__all__ = ["Image", "ImageRef"]
__all__ = ["Image"]
import types
import typing as t
from collections.abc import Callable
import msgspec
from ._base import MainResult, Result, log, LegacyResult
MimeSubType = t.Literal["png", "svg+xml", "jpeg", "bmp", "x-icon", "tiff"]
MIMESUB: dict[MimeSubType, str] = {
"png": "PNG",
"svg+xml": "SVG",
"jpeg": "JPG",
"bmp": "BMP",
"x-icon": "ICO",
"tiff": "TIF",
}
from ._base import MainResult
class ImageRef(msgspec.Struct, kw_only=True):
"""Reference to an (alternative) image format"""
url: str
"""URL of the image reference."""
subtype: MimeSubType
"""Subtype (mimetype) of the image format."""
label: str = ""
"""Label of the reference, default is build from the uppercase of
:py:obj:`Image.ImageRef.subtype`."""
mtype: t.Literal["image"] = "image"
def __post_init__(self):
if not self.label:
self.label = MIMESUB.get(self.subtype, self.subtype.upper())
@t.final
@@ -42,3 +79,29 @@ class Image(MainResult, kw_only=True):
filesize: str = ""
"""Size of bytes in :py:obj:`human readable <searx.humanize_bytes>` notation
(e.g. ``1MB`` for ``1024*1024`` Bytes filesize)."""
formats: list[ImageRef] = []
"""List of links to alternative image formats."""
def filter_urls(self, filter_func: "Callable[[Result | LegacyResult, str, str], str | bool ]"):
for _ref in self.formats[:]:
_name = f"Image.formats:{_ref.label}"
try:
_url = filter_func(self, _name, _ref.url)
except Exception as exc: # pylint: disable=broad-exception-caught
# pylint: disable=no-member
_tb: types.TracebackType = exc.__traceback__.tb_next.tb_next # type: ignore
_fn = _tb.tb_frame.f_code.co_filename
_lno = _tb.tb_lineno
log.error("filter_urls: [%s] ignore %s from callback %s:%s", _name, repr(exc), _fn, _lno)
continue
if isinstance(_url, str):
log.debug("filter_urls: [%s] URL %s -> %s", _name, _ref.url, _url)
_ref.url = _url
elif not _url:
log.debug("filter_urls: [%s] drop ref %s", _name, _ref)
self.formats.remove(_ref)
return super().filter_urls(filter_func)
+209 -2
View File
@@ -41,8 +41,8 @@ search:
# Filter results. 0: None, 1: Moderate, 2: Strict
safe_search: 0
# Existing autocomplete backends: "360search", "baidu", "bing", "brave", "dbpedia", "duckduckgo", "google",
# "yandex", "mwmbl", "naver", "seznam", "sogou", "startpage", "swisscows", "quark", "qwant", "wikipedia" -
# leave blank to turn it off by default.
# "yandex", "privacywall", "mwmbl", "naver", "seznam", "sogou", "startpage", "swisscows", "quark", "qwant",
# "wikipedia" - leave blank to turn it off by default.
autocomplete: ""
# minimun characters to type before autocompleter starts
autocomplete_min: 4
@@ -320,6 +320,23 @@ engines:
shortcut: 9g
disabled: true
- name: abcnyheter
engine: xpath
paging: true
search_url: https://startsiden.abcnyheter.no/sok/?q={query}&page={pageno}
shortcut: abc
disabled: true
results_xpath: //ul[contains(@class, "results__list")]/li[contains(@class, "result")]
url_xpath: ./a/@href
title_xpath: ./a/h3
content_xpath: ./div
about:
website: https://abcnyheter.no
use_official_api: false
require_api_key: false
results: HTML
language: "no"
- name: acfun
engine: acfun
shortcut: acf
@@ -609,6 +626,12 @@ engines:
shortcut: ca
disabled: true
# - name: chatnoir
# engine: chatnoir
# shortcut: cha
# search_index: cw22
# disabled: true
- name: chefkoch
engine: chefkoch
shortcut: chef
@@ -914,6 +937,24 @@ engines:
timeout: 3.0
disabled: true
- name: fastbot
engine: xpath
search_url: https://fastbot.de/search?q={query}
results_xpath: //section[contains(@class, 'organic-results')]/div[contains(@class, 'result-item')]
url_xpath: (./a/@href)[last()]
title_xpath: (./a)[last()]
content_xpath: ./div[contains(@class, 'snippet')]
suggestion_xpath: //section[contains(@class, 'related-searches')]//a/span[1]
shortcut: fa
categories: general
disabled: true
about:
website: https://fastbot.de
official_api_documentation:
use_official_api: false
require_api_key: false
results: HTML
- name: fdroid
engine: fdroid
shortcut: fd
@@ -1436,6 +1477,38 @@ engines:
shortcut: luc
timeout: 3.0
- name: luxxle
engine: luxxle
categories: general
luxxle_categ: search
shortcut: lux
disabled: true
inactive: true
- name: luxxle images
engine: luxxle
categories: images
luxxle_categ: images
shortcut: luxi
disabled: true
inactive: true
- name: luxxle videos
engine: luxxle
categories: videos
luxxle_categ: videos
shortcut: luxv
disabled: true
inactive: true
- name: luxxle news
engine: luxxle
categories: news
luxxle_categ: news
shortcut: luxn
disabled: true
inactive: true
- name: marginalia
engine: marginalia
shortcut: mar
@@ -1794,6 +1867,11 @@ engines:
# query_str: 'SELECT * from my_table WHERE my_column = %(query)s'
# shortcut : psql
- name: podchaser
engine: podchaser
shortcut: poc
disabled: true
- name: presearch
engine: presearch
search_type: search
@@ -1933,6 +2011,27 @@ engines:
engine: radio_browser
shortcut: rb
- name: rawweb
engine: json_engine
shortcut: rw
categories: general
paging: true
search_url: 'https://api.rawweb.org/api/search?keyword={query}&page={pageno}&lang=*'
results_query: data
url_query: link
title_query: title
content_query: content
title_html_to_text: true
content_html_to_text: true
disabled: true
inactive: true
about:
website: https://rawweb.org
official_api_documentation:
use_official_api: false
require_api_key: false
results: JSON
- name: reddit
engine: reddit
shortcut: re
@@ -2053,6 +2152,28 @@ engines:
base_url: 'https://discourse.pi-hole.net'
disabled: true
- name: privacywall
engine: privacywall
categories: general
privacywall_category: general
paging: false # only images and videos support pagination
shortcut: pw
disabled: true
- name: privacywall images
engine: privacywall
categories: images
privacywall_category: images
shortcut: pwi
disabled: true
- name: privacywall videos
engine: privacywall
categories: videos
privacywall_category: videos
shortcut: pwv
disabled: true
# - name: searx
# engine: searx_engine
# shortcut: se
@@ -2630,12 +2751,45 @@ engines:
categories: videos
disabled: true
- name: reloado
engine: xpath
paging: true
search_url: https://reloado.com/search?q={query}&page={pageno}
results_xpath: //div[contains(@class, 'result-item')]
url_xpath: .//div[contains(@class, 'result-title')]/a/@href
title_xpath: .//div[contains(@class, 'result-title')]/a
content_xpath: .//div[contains(@class, 'result-excerpt')]
shortcut: rel
categories: general
disabled: true
about:
website: https://reloado.com
official_api_documentation:
use_official_api: false
require_api_key: false
results: HTML
language: de
- name: repology
engine: repology
shortcut: rep
disabled: true
inactive: true
- name: resulthunter
engine: resulthunter
resulthunter_categ: web
categories: general
shortcut: reh
disabled: true
- name: resulthunter images
engine: resulthunter
resulthunter_categ: images
categories: images
shortcut: rehi
disabled: true
- name: swisscows
engine: swisscows
categories: general
@@ -2706,6 +2860,27 @@ engines:
shortcut: void
disabled: true
- name: vuhuv
engine: vuhuv
categories: general
vuhuv_category: general
shortcut: vu
disabled: true
- name: vuhuv images
engine: vuhuv
categories: images
vuhuv_category: images
shortcut: vui
disabled: true
- name: vuhuv videos
engine: vuhuv
categories: videos
vuhuv_category: videos
shortcut: vuv
disabled: true
- name: wallhaven
engine: wallhaven
# api_key: abcdefghijklmnopqrstuvwxyz
@@ -2839,6 +3014,38 @@ engines:
website: https://minecraft.wiki/
wikidata_id: Q105533483
# s1search google engines / mirrors
- name: searchtoday
engine: s1search
shortcut: std
base_url: https://info.searchtoday.site
disabled: true
# - name: webcrawler
# engine: s1search
# shortcut: wc
# base_url: https://www.webcrawler.com
# disabled: true
# s1search yahoo engines / mirrors
# - name: excite
# engine: s1search
# shortcut: exc
# base_url: https://results.excite.com.s1search.co
# disabled: true
# - name: metacrawler
# engine: s1search
# shortcut: mec
# base_url: https://search.metacrawler.com
# disabled: true
- name: infospace
engine: s1search
shortcut: ifs
base_url: https://search.infospace.com
disabled: true
# Doku engine lets you access to any Doku wiki instance:
# A public one or a privete/corporate one.
# - name: ubuntuwiki
+11
View File
@@ -21,6 +21,7 @@ sxng_locales = (
('da-DK', 'Dansk', 'Danmark', 'Danish', '\U0001f1e9\U0001f1f0'),
('de', 'Deutsch', '', 'German', '\U0001f310'),
('de-AT', 'Deutsch', 'Österreich', 'German', '\U0001f1e6\U0001f1f9'),
('de-BE', 'Deutsch', 'Belgien', 'German', '\U0001f1e7\U0001f1ea'),
('de-CH', 'Deutsch', 'Schweiz', 'German', '\U0001f1e8\U0001f1ed'),
('de-DE', 'Deutsch', 'Deutschland', 'German', '\U0001f1e9\U0001f1ea'),
('el', 'Ελληνικά', '', 'Greek', '\U0001f310'),
@@ -29,6 +30,7 @@ sxng_locales = (
('en-AU', 'English', 'Australia', 'English', '\U0001f1e6\U0001f1fa'),
('en-CA', 'English', 'Canada', 'English', '\U0001f1e8\U0001f1e6'),
('en-GB', 'English', 'United Kingdom', 'English', '\U0001f1ec\U0001f1e7'),
('en-HK', 'English', 'Hong Kong SAR China', 'English', '\U0001f1ed\U0001f1f0'),
('en-IE', 'English', 'Ireland', 'English', '\U0001f1ee\U0001f1ea'),
('en-IN', 'English', 'India', 'English', '\U0001f1ee\U0001f1f3'),
('en-NZ', 'English', 'New Zealand', 'English', '\U0001f1f3\U0001f1ff'),
@@ -44,17 +46,23 @@ sxng_locales = (
('es-ES', 'Español', 'España', 'Spanish', '\U0001f1ea\U0001f1f8'),
('es-MX', 'Español', 'México', 'Spanish', '\U0001f1f2\U0001f1fd'),
('es-PE', 'Español', 'Perú', 'Spanish', '\U0001f1f5\U0001f1ea'),
('es-VE', 'Español', 'Venezuela', 'Spanish', '\U0001f1fb\U0001f1ea'),
('et', 'Eesti', '', 'Estonian', '\U0001f310'),
('et-EE', 'Eesti', 'Eesti', 'Estonian', '\U0001f1ea\U0001f1ea'),
('fi', 'Suomi', '', 'Finnish', '\U0001f310'),
('fi-FI', 'Suomi', 'Suomi', 'Finnish', '\U0001f1eb\U0001f1ee'),
('fil', 'Filipino', '', 'Filipino', '\U0001f310'),
('fil-PH', 'Filipino', 'Pilipinas', 'Filipino', '\U0001f1f5\U0001f1ed'),
('fr', 'Français', '', 'French', '\U0001f310'),
('fr-BE', 'Français', 'Belgique', 'French', '\U0001f1e7\U0001f1ea'),
('fr-CA', 'Français', 'Canada', 'French', '\U0001f1e8\U0001f1e6'),
('fr-CH', 'Français', 'Suisse', 'French', '\U0001f1e8\U0001f1ed'),
('fr-FR', 'Français', 'France', 'French', '\U0001f1eb\U0001f1f7'),
('gl', 'Galego', '', 'Galician', '\U0001f310'),
('hi', 'हिन्दी', '', 'Hindi', '\U0001f310'),
('hi-IN', 'हिन्दी', 'भारत', 'Hindi', '\U0001f1ee\U0001f1f3'),
('hr', 'Hrvatski', '', 'Croatian', '\U0001f310'),
('hr-HR', 'Hrvatski', 'Hrvatska', 'Croatian', '\U0001f1ed\U0001f1f7'),
('hu', 'Magyar', '', 'Hungarian', '\U0001f310'),
('hu-HU', 'Magyar', 'Magyarország', 'Hungarian', '\U0001f1ed\U0001f1fa'),
('id', 'Indonesia', '', 'Indonesian', '\U0001f310'),
@@ -71,6 +79,8 @@ sxng_locales = (
('nl', 'Nederlands', '', 'Dutch', '\U0001f310'),
('nl-BE', 'Nederlands', 'België', 'Dutch', '\U0001f1e7\U0001f1ea'),
('nl-NL', 'Nederlands', 'Nederland', 'Dutch', '\U0001f1f3\U0001f1f1'),
('nn', 'Norsk Nynorsk', '', 'Norwegian Nynorsk', '\U0001f310'),
('nn-NO', 'Norsk Nynorsk', 'Noreg', 'Norwegian Nynorsk', '\U0001f1f3\U0001f1f4'),
('pl', 'Polski', '', 'Polish', '\U0001f310'),
('pl-PL', 'Polski', 'Polska', 'Polish', '\U0001f1f5\U0001f1f1'),
('pt', 'Português', '', 'Portuguese', '\U0001f310'),
@@ -83,6 +93,7 @@ sxng_locales = (
('sk', 'Slovenčina', '', 'Slovak', '\U0001f310'),
('sq', 'Shqip', '', 'Albanian', '\U0001f310'),
('sv', 'Svenska', '', 'Swedish', '\U0001f310'),
('sv-FI', 'Svenska', 'Finland', 'Swedish', '\U0001f1eb\U0001f1ee'),
('sv-SE', 'Svenska', 'Sverige', 'Swedish', '\U0001f1f8\U0001f1ea'),
('th', 'ไทย', '', 'Thai', '\U0001f310'),
('th-TH', 'ไทย', 'ไทย', 'Thai', '\U0001f1f9\U0001f1ed'),
@@ -1,28 +1,69 @@
<article class="result result-images {% if result['category'] %}category-{{ result['category'] }}{% endif %}">{{- "" -}}
<a {% if results_on_new_tab %}target="_blank" rel="noopener noreferrer"{% else %}rel="noreferrer"{% endif %} href="{{ result.img_src }}">{{- "" -}}
<img class="image_thumbnail" {% if results_on_new_tab %}target="_blank" rel="noopener noreferrer"{% else %}rel="noreferrer"{% endif %} src="{% if result.thumbnail_src %}{{ image_proxify(result.thumbnail_src) }}{% else %}{{ image_proxify(result.img_src) }}{% endif %}" alt="{{ result.title|striptags }}" loading="lazy" width="200" height="200">{{- "" -}}
{%- if result.resolution %} <span class="image_resolution">{{ result.resolution }}</span> {%- endif -%}
<span class="title">{{ result.title|striptags }}</span>{{- "" -}}
<span class="source">{{- result.parsed_url.netloc -}}</span>{{- "" -}}
</a>{{- "" -}}
<div class="detail swipe-horizontal">{{- "" -}}
<a class="result-detail-close" href="#">{{ icon('close') }}</a>{{- "" -}}
<a class="result-detail-previous" href="#">{{ icon('navigate-left') }}</a>{{- "" -}}
<a class="result-detail-next" href="#">{{ icon('navigate-right') }}</a>{{- "" -}}
<a class="result-images-source" {% if results_on_new_tab %}target="_blank" rel="noopener noreferrer"{% else %}rel="noreferrer"{% endif %} href="{{ result.img_src }}">
<img src="" data-src="{{ image_proxify(result.img_src) }}" alt="{{ result.title|striptags }}">{{- "" -}}
</a>{{- "" -}}
<div class="result-images-labels">{{- "" -}}
<h4>{{ result.title|striptags }}</h4>{{- "" -}}
<p class="result-content">{%- if result.content %}{{ result.content|striptags }}{% else %}&nbsp;{% endif -%}</p>{{- "" -}}
<hr>{{- "" -}}
<p class="result-author">{%- if result.author %}<span>{{ _('Author') }}:</span>{{ result.author|striptags }}{% else %}&nbsp;{% endif -%}</p>{{- "" -}}
<p class="result-resolution">{%- if result.resolution %}<span>{{ _('Resolution') }}:</span>{{ result.resolution }}{% else %}&nbsp;{% endif -%}</p>{{- "" -}}
<p class="result-format">{%- if result.img_format %}<span>{{ _('Format') }}:</span>{{ result.img_format }}{% else %}&nbsp;{% endif -%}</p>{{- "" -}}
<p class="result-filesize">{%- if result.filesize %}<span>{{ _('Filesize') }}:</span>{{ result.filesize}}{% else %}&nbsp;{% endif -%}</p>{{- "" -}}
<p class="result-source">{%- if result.source %}<span>{{ _('Source') }}:</span>{{ result.source }}{% else %}&nbsp;{% endif -%}</p>{{- "" -}}
<p class="result-engine"><span>{{ _('Engine') }}:</span>{{ result.engine }}</p>{{- "" -}}{{- "" -}}
<p class="result-url"><span>{{ _('View source') }}:</span><a {% if results_on_new_tab %}target="_blank" rel="noopener noreferrer"{% else %}rel="noreferrer"{% endif %} href="{{ result.url }}">{{ result.url }}</a></p>{{- "" -}}
</div>{{- "" -}}
</div>{{- "" -}}
{% macro _target(url, new_tab=False) -%}
{%- if new_tab %} target="_blank" rel="noopener noreferrer"
{%- else %} rel="noreferrer"
{%- endif %}
{%- endmacro %}
{% macro _label(label, value) -%}
{%- if value -%}<span>{{ label }}:</span>{{ value }}
{%- else %} &nbsp;
{%- endif -%}
{%- endmacro %}
<article class="result result-images
{%- if result["category"] %} category-{{ result["category"] }}
{%- endif -%}"
>
<a {{ _target(results_on_new_tab) }} href="{{ result.img_src }}">
<img class="image_thumbnail" {{ _target(results_on_new_tab) }}
src="
{%- if result.thumbnail_src -%}
{{ image_proxify(result.thumbnail_src) }}
{%- else -%}
{{ image_proxify(result.img_src) }}
{%- endif -%}
"
alt="{{ result.title | striptags }}" loading="lazy" width="200" height="200"
{{- "" -}}
>
{%- if result.resolution %}
<span class="image_resolution">{{ result.resolution }}</span>
{%- endif -%}
<span class="title">{{ result.title | striptags }}</span>
<span class="source">{{- result.parsed_url.netloc -}}</span>
</a>
<div class="detail swipe-horizontal">
<a class="result-detail-close" href="#">{{ icon("close") }}</a>
<a class="result-detail-previous" href="#">{{ icon("navigate-left") }}</a>
<a class="result-detail-next" href="#">{{ icon("navigate-right") }}</a>
<a class="result-images-source" {{ _target(results_on_new_tab) }} href="{{ result.img_src }}">
<img src=""
data-src="{{ image_proxify(result.img_src) }}"
alt="{{ result.title | striptags }}">
</a>
<div class="result-images-labels">
<h4>{{ result.title | striptags }}</h4>
<p class="result-content">
{%- if result.content %} {{ result.content | striptags }}
{%- else %} &nbsp;
{%- endif -%}
</p>
<hr>
<p class="result-author">{{ _label(_("Author"), result.author) }}</p>
<p class="result-resolution">{{ _label(_("Resolution"), result.resolution) }}</p>
<p class="result-format">
<span>{{ _("Image formats") }}:</span>
{{- "" -}}<a {{ _target(results_on_new_tab) }} href="{{ result.img_src }}">{{- result.img_format or _("original format") -}}</a>
{%- for ref in result.formats -%}
&nbsp;| <a {{ _target(results_on_new_tab) }} href="{{ ref.url }}">{{ ref.label }}</a>
{%- endfor %}
</p>
<p class="result-filesize">{{ _label(_("Filesize"), result.filesize) }}</p>
<p class="result-source">{{ _label(_("Source"), result.source) }}</p>
<p class="result-engine">{{ _label(_("Engine"), result.engine) }}</p>
<p class="result-url"><span>{{ _("View source") }}:</span>{{- "" -}}
<a {{ _target(results_on_new_tab) }} href="{{ result.url }}">{{ result.url }}</a>
</p>
</div>
</div>
</article>