地球人

地球人的空间

世上本没有路
tg_channel
mastodon
pleroma

How to run the groupultra telegram-search code

groupultra telegram-search is a new tool for searching Telegram user chat records.

Some recommendations:

  • Telegram group management bot CSUBOT: can privately message new users, sending CloudFlare's web verification code for group entry review, to verify whether users applying to join the group are real humans. It will not disturb other group members.
  • Check the public IP of this machine accessing different websites, can detect proxy routing configurations to avoid being banned by AI tools. It can view the IPs used by users accessing Chinese websites, internationally known AI websites, and banned international websites.

Project Features#

According to the official documentation, Telegram Search provides the following core features:

  • Semantic Search: not only can search for keywords but also understand the context and meaning of messages.
  • Vector Matching: implements similarity search based on OpenAI's embedding vector technology.
  • Efficient Retrieval: provides a more accurate and intelligent search experience than Telegram's native search.
  • Multi-platform Support: offers a web interface and desktop applications.

Disincentives#

First, list some content that differs significantly from user expectations, as well as some existing issues, to dissuade some unsuitable users from this project. Also, provide suggestions for using other mature projects.

The difference between this project and other traditional projects that do not use AI lies in semantic search (not only can search for keywords but also understand the context and meaning of messages). However, currently, I have tried several queries, and regardless of whether the "Search Content" option is checked, I can hardly search with questions (only one sentence works); synonym queries are not possible. I can only find messages that share common words with the query. I'm not sure if it's an embedding model issue.

The commands for front-end and back-end deployment take time to compile, which can be lengthy for users accustomed to programming languages that do not require pre-compilation and those used to fast compilation speeds.

When opening the front end and then refreshing after a while, the front end will show a white screen, as shown below. Each time it requires restarting with pnpm run dev:frontend, and you have to wait for compilation. When deployed on a server, each time you use it, or say before each query, you have to connect to the server to compile again, which is indeed troublesome.

groupultra/telegram-search front-end white screen issue

In summary, the above issues indicate that as a project for searching Telegram historical messages, it currently does not perform as well as other mature traditional projects. Other project recommendations:

Luoxu lilydjwg/luoxu: A Telegram userbot to index Chinese and Japanese group contents., developed by the Arch Linux CN community, has been running stably for many years. You can check the actual effect: Public group message record search webpage. Project features:

  • The project mainly uses Python, so there is no need to wait for compilation. Although some components need to be compiled, it only needs to be done once.
  • Search strings do not differentiate between simplified and traditional characters (will use OpenCC for automatic conversion).
  • Search strings support some search syntax.
  • The resources used when running are lower than those of groupultra/telegram-search.

Issues with Luoxu:

  • Installation and deployment may be a bit more complicated than the groupultra/telegram-search project. (This is based on my actual deployment comparison of the two projects. Since I deployed Luoxu on a Linux arm32 system, all dependencies for Luoxu were compiled and installed, including PostgreSQL: Linux Compile and Install PostgreSQL 17.4)
  • The default code can only index the groups and channel messages written in the configuration file; of course, you can manually modify the code.

Deployment Prerequisites and Requirements#

  • AI that can be used, such as Google Gemini, OpenAI.
  • Node.js: version 20.0 or higher.
  • RAM: at least 100 MB.
  • A network that can access the international internet.

This tutorial uses the completely free Google Gemini, and the online API does not occupy local computing resources. Application introduction: Gemini API KEY application and usage

I deployed it on a Linux amd64 international server without using Docker; other operating system environments are for reference only.

Please follow the steps in this tutorial and compare with the official documentation.

Quick Start#

According to the official documentation, the basic process of using Telegram Search is as follows:

  1. Installation and Configuration: Install the application and configure the necessary APIs.
  2. Connect Telegram Account: Log in to your Telegram account.
  3. Sync Chat Records: Select the chat records you need to search for synchronization.
  4. Start Searching: Use the semantic search function to find messages.

Installation and Configuration#

Install Node.js#

Visit the Node.js official website to view the documentation and install it. It is recommended to use the latest LTS version. After installation, use the following two commands to confirm the version:

node -v
v22.16.0

npm --version
10.9.2

Since the project uses pnpm, install it. Official documentation: Installation | pnpm, current actual command:

curl -fsSL https://get.pnpm.io/install.sh | sh -

Follow the prompt

source /home/opc/.bashrc

Install PostgreSQL Database and pgvector Plugin#

Note to install PostgreSQL and the pgvector plugin. This section may not be complete.

Note that CentOS and other Red Hat systems use these installation commands. Please be sure to refer to other articles and AI suggestions online during operation.

Install the PostgreSQL official YUM repository configuration package:

sudo yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm

I chose version 13; you can choose a newer version. Since I previously installed PostgreSQL 13, I am not sure of the exact command. Now

Search for available packages related to pgvector in the YUM source:

sudo yum search pgvector

Install:

sudo yum install -y pgvector_13
sudo systemctl status postgresql-13

Initialize the PostgreSQL 13 data directory (must be executed after the first installation).

sudo /usr/pgsql-13/bin/postgresql-13-setup initdb

Restart the PostgreSQL 13 service to make changes effective.

sudo systemctl restart postgresql-13

Switch to the postgres database administrator user.

sudo -i -u postgres

Start the PostgreSQL command line client.

psql

Set the login password for the postgres user.

ALTER USER postgres WITH PASSWORD 'your_database_password';

Connect to PostgreSQL as the postgres user via localhost and prompt for a password.

psql -U postgres -h localhost --password

Exit psql:

exit

Then return to the initial system user:

exit

Try to enter the database:

psql -U postgres -h localhost --password
CREATE DATABASE tg_1;
CREATE EXTENSION IF NOT EXISTS vector;

Configure the Project and Start#

Since the Google Gemini API has rate limits, I made a few modifications:

git clone -b rate https://github.com/cjh0613/groupultra-telegram-search.git

Note that the current rate branch is modified entirely on the GitHub webpage, and no local testing has been done.

The rate limit modification that can run successfully on my machine is based on this release: v1.0.0-beta.10, commit: c60ac6416dcac6543d2623c49179681ed859e26f, which only ensures that the code can run without errors. If your code cannot run, please try reverting this commit after the author's modifications.

If the API you are using has no rate limits, you can directly use the official repository:

git clone https://github.com/groupultra/telegram-search.git

Copy the configuration file config\config.example.yaml to the same directory as config\config.yaml.

Modify the configuration file directly according to the following format, and modify other parts according to the default configuration file instructions.

The Telegram API application requires a good quality IP; if you cannot apply for one, use the author's without modification. This is for third-party clients, not Telegram bots.

database:
  # Database type: postgres, pglite
  type: postgres
  # PostgreSQL configuration (used when type: postgres)
  # Can use URL or separate field configuration
  # url: postgres://postgres:postgres@localhost:5432/tg_search
  host: localhost
  port: 5432
  user: postgres
  password: 'your_database_password'
  database: tg_1

api:
  embedding:
    # Embedding provider (openai or ollama) # Do not change this to Gemini
    provider: openai
    # Embedding model
    model: models/text-embedding-004
    # gemini-embedding-exp-03-07
    # API key for provider
    apiKey: <your_Gemini_API_KEY>
    # Gemini embedding-001 defaults to 3072 dimensions, cannot customize dimensions
    dimension: 768
    # Optional, for custom API providers
    apiBase: 'https://generativelanguage.googleapis.com/v1beta/openai/'
# Start the back-end service
pnpm run dev:server

# In another terminal window, start the front-end interface
pnpm run dev:frontend

Check if the front end can be accessed successfully:

curl http://localhost:3333/

If it is local, open the browser and access http://localhost:3333 to use the application.

If it is a remote device, just open the front-end port for access. However, it should not be directly exposed to the public internet; otherwise, anyone can manipulate your Telegram account. You need to configure access permissions yourself; there are many solutions for this.

Additionally, for domain access, you may need to modify apps\frontend\vite.config.ts to add:

 allowedHosts: ['your_access_domain']

For other content, please refer to the official documentation.

Other Versions of This Page#

This article has versions in multiple languages.

If you want to leave a comment, please visit the following webpage:

ZH EN ZH-TW JA

These pages are for browsing only and do not support comments or messages, but they provide more language options and load faster:

ZH EN ZH-TW JA RU KO CS ES AR FR PT DE TR IT NL SV DA FI PL UK HE RO HU EL HR TH HI BN ID SW VI NO

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.