How-To Guides

Creating a Database
Upgrading to Major Version
Installing and Updating an Extension
Creating Manual Backups
Restoring a Backup
Identifying Slow Queries
Detect and terminate long-running queries
Preventing Full Disk Issues
Checking Database Size and Related Issues

Creating a Database

ClickHouse is a high-performance columnar database designed for real-time analytical processing. It’s known for its blazing speed, horizontal scalability, and efficient use of disk I/O. Proper setup is essential for taking advantage of ClickHouse’s full capabilities, including fault tolerance, secure access, and high query performance. This guide walks through various ways to run and connect to ClickHouse: using the ClickHouse CLI (clickhouse-client), Docker containers, and command-line tools for scripting and automation. Best practices are highlighted throughout to ensure robust deployments.

Creating using clickhouse-client

The ClickHouse command-line interface (clickhouse-client) is a built-in tool used to connect to and manage ClickHouse servers. It supports both local and remote connections and allows for SQL-based interaction with the database engine.

Connect to ClickHouse:

If you’re running ClickHouse locally (via package manager or Docker), you can start the CLI with:

clickhouse-client

For remote connections, specify the hostname, port (default 9000), and user credentials:

clickhouse-client -h <host> --port <port> -u <username> --password

Once connected, you can run SQL queries directly from the shell.

Running ClickHouse Using Docker

Docker provides a fast, reproducible way to run ClickHouse in isolated environments. This is ideal for local development or self-contained production setups.

Access Elestio Terminal

If you’re using Elestio for ClickHouse hosting, log into the Elestio dashboard. Go to your ClickHouse service, then navigate to Tools > Terminal to open a pre-authenticated shell session.

Now change the directory:

cd /opt/app/

Access the ClickHouse Container Shell

Elestio-managed services run on Docker Compose. Use this to enter the ClickHouse container:

docker-compose exec clickhouse bash

Access ClickHouse CLI from Inside the Container

Once inside the container, the clickhouse-client tool is available. Run it like this (add --password if needed):

clickhouse-client -u <user> --port <port> --password

You are now connected to the running ClickHouse instance inside the container.

Test Connectivity

Try creating a database and querying data to verify functionality:

CREATE DATABASE test_db;
CREATE TABLE test_db.test_table (id UInt32, message String) ENGINE = MergeTree() ORDER BY id;
INSERT INTO test_db.test_table VALUES (1, 'Hello ClickHouse');
SELECT * FROM test_db.test_table;

Expected Output:

1	Hello ClickHouse

This confirms read/write operations and query functionality.

Connecting Using clickhouse-client in Scripts

clickhouse-client can be used non-interactively for scripting, automation, and cron-based jobs.

For example, to insert data from a shell script:

echo "INSERT INTO test_db.test_table VALUES (2, 'Automated')" | \
clickhouse-client -h <host> -u <user> --password

This is useful for automated ETL jobs, health checks, or backup pipelines.

Best Practices for Setting Up ClickHouse

Use Clear Naming for Databases and Tables

Adopt consistent naming conventions for databases, tables, and columns. Use lowercase, underscore-separated names like:

user_events_2024
product_sales_agg

This improves clarity in multi-schema environments and helps with automation and maintenance scripts.

Choose the Right Engine and Indexing Strategy

ClickHouse supports various table engines like MergeTree, ReplacingMergeTree, and SummingMergeTree. Pick the engine that best matches your use case and define ORDER BY keys carefully to optimize performance.

Example:

CREATE TABLE logs (
  timestamp DateTime,
  service String,
  message String
) ENGINE = MergeTree()
ORDER BY (timestamp, service);

Inappropriate engine selection can lead to poor query performance or high disk usage.

Enable Authentication and Secure Access

Always configure user-level authentication and restrict access in production. Add users and passwords in users.xml or via SQL:

CREATE USER secure_user IDENTIFIED WITH plaintext_password BY 'strong_password';
GRANT ALL ON *.* TO secure_user;

Use TLS for encrypted connections by enabling SSL in the config.xml file:

<tcp_port_secure>9440</tcp_port_secure>
<openSSL>
  <server>
    <certificateFile>/etc/clickhouse-server/certs/server.crt</certificateFile>
    <privateKeyFile>/etc/clickhouse-server/certs/server.key</privateKeyFile>
  </server>
</openSSL>

Configure Data Persistence and Storage Paths

ClickHouse stores data on disk by default, but ensure proper mounting, storage separation, and backup routines.

In config.xml:

<path>/var/lib/clickhouse/</path>
<tmp_path>/var/lib/clickhouse/tmp/</tmp_path>
<user_files_path>/var/lib/clickhouse/user_files/</user_files_path>

Use RAID, SSDs, or networked volumes depending on your availability and performance needs.

Monitor and Tune Performance

Use built-in introspection tools like:

SELECT * FROM system.metrics;
SELECT * FROM system.query_log ORDER BY event_time DESC LIMIT 10;
SELECT * FROM system.parts;

For real-time observability, integrate with Grafana, Prometheus, or use ClickHouse Keeper metrics.

Also review:

system.mutations for long-running mutation jobs
system.errors for crash/debug info
system.replication_queue for sync issues in replicated tables

Common Issues and Their Solutions

Issue	Cause	Solution
Authentication failure	Wrong password or no user set	Double-check credentials; use --password flag
Cannot connect to localhost	Service not running or incorrect port	Ensure ClickHouse is running and check the port
SSL/TLS handshake failed	Incorrect certificate paths or permissions	Verify file locations in config.xml and restart service
Queries are slow	Poor ORDER BY design or unoptimized table engine	Reevaluate schema design and use indexes effectively
Data lost after restart	Misconfigured data path or ephemeral container	Ensure proper disk volume mounts and storage persistence

Upgrading to Major Version

Upgrading a database service on Elestio can be done without creating a new instance or performing a full manual migration. Elestio provides a built-in option to change the database version directly from the dashboard. This is useful for cases where the upgrade does not involve breaking changes or when minimal manual involvement is preferred. The version upgrade process is handled by Elestio internally, including restarting the database service if required. This method reduces the number of steps involved and provides a way to keep services up to date with minimal configuration changes.

Log In and Locate Your Service

To begin the upgrade process, log in to your Elestio dashboard and navigate to the specific database service you want to upgrade. It is important to verify that the correct instance is selected, especially in environments where multiple databases are used for different purposes such as staging, testing, or production. The dashboard interface provides detailed information for each service, including version details, usage metrics, and current configuration. Ensure that you have access rights to perform upgrades on the selected service. Identifying the right instance helps avoid accidental changes to unrelated environments.

Back Up Your Data

Before starting the upgrade, create a backup of your database. A backup stores the current state of your data, schema, indexes, and configuration, which can be restored if something goes wrong during the upgrade. In Elestio, this can be done through the Backups tab by selecting Back up now under Manual local backups and Download the backup file. Scheduled backups may also be used, but it is recommended to create a manual one just before the upgrade. Keeping a recent backup allows quick recovery in case of errors or rollback needs. This is especially important in production environments where data consistency is critical.

Select the New Version

Once your backup is secure, proceed to the Overview and then Software > Update config tab within your database service page.

Here, you'll find an option labeled ENV. In the ENV menu, change the desired database version to SOFTWARE_VERSION. After confirming the version, Elestio will begin the upgrade process automatically. During this time, the platform takes care of the version change and restarts the database if needed. No manual commands are required, and the system handles most of the operational aspects in the background.

Monitor the Upgrade Process

The upgrade process may include a short downtime while the database restarts. Once it is completed, it is important to verify that the upgrade was successful and the service is operating as expected. Start by checking the logs available in the Elestio dashboard for any warnings or errors during the process. Then, review performance metrics to ensure the database is running normally and responding to queries. Finally, test the connection from your client applications to confirm that they can interact with the upgraded database without issues.

Installing and Updating an Extension

ClickHouse supports custom extensions via [User Defined Functions (UDFs)], external dictionaries, and shared libraries that extend its core capabilities with custom logic, formats, or integrations. These behave similarly to modules or plugins in other systems and must be configured at server startup. Common examples include integration with geospatial libraries, custom UDFs, or external dictionary sources like MySQL or HTTP.

In Elestio-hosted ClickHouse instances or any Docker Compose-based setup, extensions can be added by mounting external libraries or configuration files and referencing them in config.xml or users.xml. This guide walks through how to install, load, and manage ClickHouse extensions using Docker Compose along with best practices and common troubleshooting steps.

Installing and Enabling ClickHouse Extensions

ClickHouse extensions are typically compiled as shared objects (.so) files or defined as configuration files for dictionaries or formats. These files must be mounted into the container and referenced explicitly in the server’s configuration files.

Example: Load Custom Shared Library UDF

Suppose you have a compiled UDF called libexample_udf.so. To include it in a Docker Compose setup:

Update docker-compose.yml

Mount the shared library into the container:

services:
  clickhouse:
    image: clickhouse/clickhouse-server:latest
    volumes:
      - ./modules/libexample_udf.so:/usr/lib/clickhouse/user_defined/libexample_udf.so
      - ./configs/config.xml:/etc/clickhouse-server/config.xml
    ports:
      - "8123:8123"
      - "9000:9000"

./modules/libexample_udf.so: local path to the shared library on the host.
/usr/lib/clickhouse/user_defined/: default directory for user libraries inside the container.

Make sure the file exists before running Docker Compose.

Configure config.xml to Load the UDF

In your custom config.xml:

<user_defined>
    <function>
        <name>example_udf</name>
        <type>udf</type>
        <library>libexample_udf.so</library>
    </function>
</user_defined>

The library path must match the volume mount location.

Restart the ClickHouse Service

After updating the Compose and configuration files, restart the service:

docker-compose down
docker-compose up -d

This will reload ClickHouse with the specified UDF.

Verify the Extension is Loaded

Connect using the ClickHouse CLI or HTTP interface and run:

SELECT example_udf('test input');

If successful, the function will return expected results from the loaded library. You can also confirm the server loaded your shared library by inspecting logs:

docker-compose logs clickhouse

Look for lines that indicate the library was found and loaded.

Managing External Dictionaries

ClickHouse supports loading external data sources (like MySQL, HTTP APIs, or files) as dictionaries

Mount Dictionary Configuration

In docker-compose.yml:

volumes:
  - ./configs/dictionaries/:/etc/clickhouse-server/dictionaries/

Reference in config.xml

<dictionaries_config>/etc/clickhouse-server/dictionaries/*.xml</dictionaries_config>

Example dictionary file (mysql_dictionary.xml):

<dictionary>
  <name>mysql_dict</name>
  <source>
    <mysql>
      <host>mysql-host</host>
      <user>root</user>
      <password>password</password>
      <db>test</db>
      <table>cities</table>
    </mysql>
  </source>
  <layout><flat /></layout>
  <structure>
    <id>id</id>
    <attribute>
      <name>name</name>
      <type>String</type>
    </attribute>
  </structure>
</dictionary>

Use the dictionary in queries:

SELECT dictGetString('mysql_dict', 'name', toUInt64(42));

Updating or Removing Extensions

ClickHouse doesn’t support unloading UDFs or dictionaries at runtime. To modify or remove an extension:

1. Stop the container:

docker-compose down

2. Edit config files:

Replace or remove the <function> entry in config.xml or dictionary config.
Replace or remove the .so file if applicable.

3. Restart the container:

docker-compose up -d

Always test changes in staging before deploying to production.

Troubleshooting Common Extension Issues

Issue	Cause	Resolution
ClickHouse fails to start	Invalid config or missing .so file	Run docker-compose logs clickhouse and fix missing files or XML syntax
UDF not recognized	Wrong library path or missing permissions	Ensure volume mount is correct and file is executable inside container
Dictionary not available	Config file not found or misconfigured XML	Double-check dictionaries_config and validate with SHOW DICTIONARIES
Segmentation fault	Invalid shared library or ABI mismatch	Recompile UDF for correct platform, verify against installed ClickHouse version
Query fails silently	Dictionary or UDF not fully loaded	Recheck server logs for errors during startup

Security Considerations

ClickHouse extensions especially shared libraries run with the same privileges as the ClickHouse server. Be cautious:

Only load trusted .so files from verified sources.
Ensure clickhouse user has restricted permissions inside the container.
Never expose dictionary or UDF paths to writable directories from external systems.

Avoid using custom UDFs or dictionaries from unknown sources in production environments without a thorough code review.

Creating Manual Backups

Regular backups are essential when running a ClickHouse deployment, especially if you’re using it for persistent analytics or time-series data. While Elestio handles automated backups by default, you may want to create manual backups before configuration changes, retain a local archive, or test backup automation. This guide walks through multiple methods for creating ClickHouse backups on Elestio, including dashboard snapshots, command-line approaches, and Docker Compose-based setups. It also explains backup storage, retention, and automation using scheduled jobs.

Manual Service Backups on Elestio

If you’re using Elestio’s managed ClickHouse service, the simplest way to perform a full backup is directly through the Elestio dashboard. This creates a snapshot of your current ClickHouse dataset and stores it in Elestio’s infrastructure. These snapshots can be restored later from the same interface, which is helpful when making critical changes or testing recovery workflows.

To trigger a manual ClickHouse backup on Elestio:

Log in to the Elestio dashboard.
Navigate to your ClickHouse service or cluster.
Click the Backups tab in the service menu.
Choose Back up now to generate a manual snapshot.

This method is recommended for quick, reliable backups without needing to use the command line.

Manual Backups Using Docker Compose

If your ClickHouse instance is deployed via Docker Compose (as is common on Elestio-hosted environments), you can manually back up ClickHouse by either copying its internal storage files or using the native BACKUP SQL command (available in ClickHouse v21.12+). These approaches allow you to maintain control over backup logic and frequency.

Access Elestio Terminal

Go to your deployed ClickHouse service in the Elestio dashboard, navigate to Tools > Terminal, and log in using the credentials provided.

Locate the ClickHouse Container Directory

Navigate to your app directory:

cd /opt/app/

This is the working directory of your Docker Compose project, which contains the docker-compose.yml file.

Trigger a Backup (Using SQL)

If you’re using ClickHouse with backup support enabled, you can execute:

docker-compose exec clickhouse clickhouse-client --query="BACKUP DATABASE default TO Disk('/backups/backup_$(date +%F)')"

This creates a full backup of the default database inside the container at /backups.

Copy Backup Files from the Container

Use docker cp to move the backup directory to your host system:

docker cp $(docker-compose ps -q clickhouse):/backups/backup_$(date +%F) ./clickhouse_backup_$(date +%F)

This gives you a restorable backup snapshot for storage or future recovery.

Backup Storage & Retention Best Practices

After creating backups, it’s important to store them securely and manage retention properly. ClickHouse backups can grow large depending on the volume and compression of your data.

Guidelines to Follow:

Use clear naming: clickhouse_backup_2025_06_09
Store off-site or on cloud storage (e.g. AWS S3, Backblaze, encrypted storage)
Retain: 7 daily backups, 4 weekly backups, and 3–6 monthly backups
Automate old file cleanup with cron jobs or retention scripts
Optionally compress backups with tar, gzip, or xz to reduce space

Automating ClickHouse Backups (cron)

Manual backup commands can be scheduled using tools like cron on Linux-based systems. This allows you to regularly back up your database without needing to run commands manually. Automating the process also reduces the risk of forgetting backups and ensures more consistent retention.

Example: Daily Backup at 3 AM

Edit the crontab:

crontab -e

Add a job like:

0 3 * * * docker-compose -f /opt/app/docker-compose.yml exec clickhouse \
clickhouse-client --query="BACKUP DATABASE default TO Disk('/backups/backup_$(date +\%F)')" && \
docker cp $(docker-compose -f /opt/app/docker-compose.yml ps -q clickhouse):/backups/backup_$(date +\%F) /backups/clickhouse_backup_$(date +\%F)

Make sure /backups/ exists and is writable by the cron user.

You can also compress the file or upload to cloud storage in the same script:

tar -czf /backups/clickhouse_backup_$(date +\%F).tar.gz /backups/clickhouse_backup_$(date +\%F)
rclone copy /backups/clickhouse_backup_$(date +\%F).tar.gz remote:clickhouse-backups

Backup Format and Restore Notes

Format	Description	Restore Method
`/backups/backup_<date>`	SQL-based backup using BACKUP command	Use RESTORE DATABASE from the same Disk location
`.tar.gz` or `.tar` archive	Filesystem snapshot of `/var/lib/clickhouse`	Stop ClickHouse, extract data back into the directory, then restart

To restore from a backup:

Stop ClickHouse:

docker-compose down

Restore via SQL:

docker-compose exec clickhouse clickhouse-client --query="RESTORE DATABASE default FROM Disk('/backups/backup_2025-06-09')"

Or restore from file-based archive:

tar -xzf clickhouse_backup_2025-06-09.tar.gz -C /opt/app/data/clickhouse/
docker-compose up -d

Restoring a Backup

Restoring ClickHouse backups is essential for disaster recovery, staging environment duplication, or rolling back to a known state. Elestio supports backup restoration both through its web dashboard and manually through Docker Compose and command-line methods. This guide explains how to restore ClickHouse backups from SQL-based snapshots or file-based archives, covering both full and partial restore scenarios, and includes solutions for common restoration issues.

Restoring from a Backup via Terminal

This method applies when you have a backup created using ClickHouse’s native BACKUP command or a direct copy of the data directory. To restore the backup, you must stop the running ClickHouse container, replace the data files, and restart the container to load the restored dataset.

Stop the ClickHouse Container

Shut down the ClickHouse container cleanly to avoid issues with open file handles or inconsistent state:

docker-compose down

Replace the Backup Files

If your backup was created using the native ClickHouse BACKUP command and saved to /backups/backup_2025_06_09, copy it into the appropriate path within the container or bind mount.

Example:

cp -r ./clickhouse_backup_2025_06_09 /opt/app/backups/backup_2025_06_09

Make sure this path corresponds to the volumes specified in your docker-compose.yml. For example:

volumes:
  - ./backups:/backups
  - ./data:/var/lib/clickhouse

If you’re restoring from a tarball archive, extract it into the correct volume mount:

tar -xzf clickhouse_backup_2025_06_09.tar.gz -C /opt/app/data/

Restart ClickHouse

Start the ClickHouse container again:

docker-compose up -d

ClickHouse will load the data either from the standard data directory or, if using the backup snapshot method, you can explicitly restore the database using SQL (next section).

Restoring via Docker Compose Terminal

If you’re using backups made with the SQL BACKUP command, ClickHouse also provides a built-in method to restore via the RESTORE command.

Copy the Backup Directory into the Container

docker cp ./clickhouse_backup_2025_06_09 $(docker-compose ps -q clickhouse):/backups/backup_2025_06_09

Restore with ClickHouse SQL

Enter the container terminal:

docker-compose exec clickhouse bash

Then run the restore command:

clickhouse-client --query="RESTORE DATABASE default FROM Disk('/backups/backup_2025_06_09')"

This will restore the default database and its contents from the previously created backup directory.

Partial Restores in ClickHouse

ClickHouse supports more granular restore operations using SQL syntax. You can restore individual tables or databases if the backup was created using the native BACKUP command.

Restore a Single Table

clickhouse-client --query="RESTORE TABLE default.events FROM Disk('/backups/backup_2025_06_09')"

This restores just the events table from the default database without affecting other tables.

Restore Specific Schemas or Data

You can also export and import CSV or TSV snapshots for partial data management:

clickhouse-client --query="SELECT * FROM default.events FORMAT CSV" > events.csv
clickhouse-client --query="INSERT INTO default.events FORMAT CSV" < events.csv

Common Errors & How to Fix Them

Restoring ClickHouse data can occasionally fail due to permission issues, path mismatches, unsupported formats, or version conflicts. Here are some frequent issues and their solutions.

1. ClickHouse Fails to Start After Restore

Error:

DB::Exception: Corrupted data part ...

Cause: The backup directory is incomplete or corrupted, or the file was not extracted properly.

Resolution:

Re-verify that the backup files were copied completely.
Use tar -tzf to inspect archive contents before extracting.
Make sure you’re restoring on the same ClickHouse version that created the backup.

2. RESTORE Command Fails with Permission Denied

Error:

DB::Exception: Cannot read from backup: Permission denied

Cause: The container cannot access the /backups/ directory due to permissions.

Resolution:

Ensure the backup directory is readable by the ClickHouse process.
Use chmod -R 755 /opt/app/backups/ to adjust permissions if needed.

3. Data Not Restored

Cause: The RESTORE command did not include the correct database/table name or no data existed in the backup path.

Resolution:

Use clickhouse-client --query="SHOW DATABASES" to confirm no restore happened.
Run ls /backups/backup_2025_06_09/ inside the container to verify backup contents.

4. Permission Denied When Copying Files

Error:

cp: cannot create regular file ‘/opt/app/data/’: Permission denied

Resolution:

Ensure your terminal session or script has write access to the target directory. Use sudo if needed:

sudo cp -r ./clickhouse_backup_2025_06_09 /opt/app/data/

Identifying Slow Queries

Slow queries can impact ClickHouse performance, especially under high load or with inefficient queries or schema design. Whether you’re using ClickHouse on Elestio via the dashboard, accessing it inside a Docker Compose container, or running CLI queries, ClickHouse offers built-in tools to detect, diagnose, and optimize performance bottlenecks. This guide explains how to capture slow queries using system tables, measure query latency, and improve performance through tuning and query optimization.

Inspecting Slow Queries from the Terminal

ClickHouse logs query profiling information by default, which you can access via system tables. This allows you to identify long-running or resource-intensive queries directly from SQL.

Connect to ClickHouse via Terminal

Use the ClickHouse client to connect to your instance:

clickhouse-client -h <host> --port <port> --user <username> --password <password>

Replace <host>, <port>, <username>, and <password> with your credentials from the Elestio dashboard.

View Recent Slow Queries

ClickHouse logs query performance stats in the system.query_log table. To view the 10 most recent queries that took longer than 1 second:

SELECT
  query_start_time,
  query_duration_ms,
  query
FROM system.query_log
WHERE type = 'QueryFinish'
  AND query_duration_ms > 1000
ORDER BY query_start_time DESC
LIMIT 10;

You can adjust the query_duration_ms threshold to capture slower or more critical queries.

Analyzing Inside Docker Compose

If your ClickHouse instance is running inside Docker Compose, you can inspect query logs and system performance from inside the container.

Access the ClickHouse Container

Open a shell session inside the running container:

docker-compose exec clickhouse bash

Then run the ClickHouse client:

clickhouse-client --user root

If a password is required, append --password <yourpassword> to the command.

Query the system.query_log Inside the Container

Run the same slow query inspection SQL as above to analyze performance issues:

SELECT query_start_time, query_duration_ms, query
FROM system.query_log
WHERE type = 'QueryFinish' AND query_duration_ms > 1000
ORDER BY query_start_time DESC
LIMIT 10;

Using the System Metrics & Events Tables

ClickHouse includes system tables that expose performance-related metrics in real time.

Check Overall Query Performance

You can use the system.metrics table to view metrics like query execution time, memory usage, and background operations:

SELECT *
FROM system.metrics
WHERE value != 0
ORDER BY value DESC;

For cumulative statistics like total queries processed, check the system.events table:

SELECT *
FROM system.events
WHERE value > 0
ORDER BY value DESC;

Understanding and Resolving Common Bottlenecks

Slow performance in ClickHouse is often caused by suboptimal queries, improper indexing (i.e., no primary key usage), disk I/O, or high memory usage.

Common Causes of Slow Queries:

Large table scans: Caused by missing filtering conditions or lack of primary key usage.
JOINs on unindexed keys: Inefficient joins can result in full-table scans.
High cardinality aggregations: Especially costly without optimization (e.g., using uniqExact()).
High insert latency: Triggered by too frequent small batch writes.
Disk bottlenecks: Heavy merges or large result sets can overload I/O.

Best Practices to Avoid Slow Queries:

Use appropriate filtering: Always filter with indexed columns (usually primary keys).
Avoid SELECT *: Specify only the needed columns.
Use sampling when possible: ClickHouse supports SAMPLE clause on MergeTree tables.
Use LIMIT: Avoid returning large result sets when debugging.
Optimize JOINs: Prefer ANY INNER JOIN or JOIN ... USING for performance.

Optimizing with Configuration Changes

ClickHouse performance can be tuned via its configuration files (config.xml and users.xml) or environment variables. For Docker Compose setups, these can be overridden via docker-compose.override.yml.

Adjust Query and Memory Settings Dynamically

Some performance-related settings can be changed per session or globally:

SET max_memory_usage = 2000000000;
SET max_threads = 4;
SET log_queries = 1;

To make permanent changes, modify your config.xml or users.xml inside the container volume mount.

Detect and terminate long-running queries

ClickHouse is a high-performance, column-oriented OLAP database, but poorly optimized or long-running queries can still impact performance especially in resource-constrained environments like Elestio. Because ClickHouse executes large queries across multiple threads and can consume high memory and disk I/O, monitoring and controlling slow or blocking operations is essential.

This guide explains how to detect, analyze, and terminate long-running queries using terminal tools, Docker Compose setups, and ClickHouse’s internal system tables. It also outlines prevention strategies to help maintain system health.

Monitoring Long-Running Queries

ClickHouse exposes query execution data through system tables like system.processes and system.query_log. These allow you to monitor currently executing and historical queries for duration, memory usage, and user activity.

Check Active Queries via Terminal

To list currently running queries and their duration:

SELECT
  query_id,
  user,
  elapsed,
  memory_usage,
  query
FROM system.processes
ORDER BY elapsed DESC;

elapsed is the query runtime in seconds.
memory_usage is in bytes.
This lets you pinpoint queries that are taking too long or consuming excessive memory.

Monitor Query Load in Real Time

ClickHouse doesn’t have a MONITOR-like command, but you can simulate real-time monitoring by repeatedly querying system.processes:

watch -n 2 'clickhouse-client --query="SELECT elapsed, query FROM system.processes ORDER BY elapsed DESC LIMIT 5"'

This updates every 2 seconds and shows the top 5 longest-running queries.

Terminating Problematic Queries Safely

If you identify a query that is consuming too many resources or blocking critical workloads, you can terminate it by its query_id.

Kill a Query by ID

KILL QUERY WHERE query_id = '<id>';

The <id> can be found in the system.processes table.
This forces termination of the query while leaving the user session intact.

To forcibly kill all long-running queries (e.g., >60 seconds):

KILL QUERY WHERE elapsed > 60 SYNC;

Use SYNC to wait for the termination to complete before proceeding.

Managing Inside Docker Compose

If ClickHouse is running inside Docker Compose on Elestio, follow these steps:

Access the ClickHouse Container

docker-compose exec clickhouse bash

Then run:

clickhouse-client --user default

If authentication is enabled, add --password <your_password>.

You can now run queries like:

SELECT query_id, elapsed, query FROM system.processes;

Or terminate:

KILL QUERY WHERE query_id = '<id>';

Analyzing Query History

ClickHouse logs completed queries (including failures) in the system.query_log table.

View Historical Long-Running Queries

SELECT
  query_start_time,
  query_duration_ms,
  user,
  query
FROM system.query_log
WHERE type = 'QueryFinish'
  AND query_duration_ms > 1000
ORDER BY query_start_time DESC
LIMIT 10;

This helps identify patterns or repeat offenders.

Understanding Query Latency with Profiling Tools

ClickHouse provides advanced metrics via system.metrics, system.events, and system.asynchronous_metrics.

Generate a Performance Snapshot

SELECT * FROM system.metrics WHERE value != 0 ORDER BY value DESC;

Use to analyze memory pressure, merge operations, disk reads/writes, and thread usage.

To examine detailed breakdowns of CPU usage or IO latency:

SELECT * FROM system.events WHERE value > 0 ORDER BY value DESC;

Best Practices to Prevent Long-Running Queries

Preventing long-running queries is vital for maintaining ClickHouse performance, especially under high concurrency or on shared infrastructure.

Avoid Full Table Scans: Use filters on primary key or indexed columns. Avoid queries without WHERE clauses on large tables.

SELECT count() FROM logs WHERE date >= '2024-01-01';

Limit Result Set Sizes: Avoid returning millions of rows to clients. Use LIMIT and paginated access.

SELECT * FROM logs ORDER BY timestamp DESC LIMIT 100;

Optimize Joins and Aggregations: Use ANY INNER JOIN for faster lookups. Avoid joining two huge datasets unless one is pre-aggregated or dimensionally small.
Avoid High Cardinality Aggregates: Functions like uniqExact() are CPU-intensive. Prefer approximate variants (uniq()) when precision isn’t critical.
Set Query Timeouts and Memory Limits: Limit resource usage per query:

SET max_execution_time = 30;
SET max_memory_usage = 1000000000;

Use Partitions and Projections: Partition large datasets by time (e.g., toYYYYMM(date)) to reduce scanned rows. Use projections for fast pre-aggregated access.

Preventing Full Disk Issues

Running out of disk space in a ClickHouse environment can cause query failures, part merge errors, and even full service downtime. ClickHouse is highly dependent on disk for storing columnar data, part files, metadata, temporary sort buffers, and backups. On platforms like Elestio, infrastructure is managed, but users are still responsible for monitoring storage, managing data retention, and optimizing resource usage. This guide explains how to monitor and clean up disk usage, configure safe retention policies, and implement long-term strategies to prevent full disk scenarios in ClickHouse when running under Docker Compose

Monitoring Disk Usage

Inspect the host system storage

Run this on the host machine to check which mount point is filling up:

df -h

This shows usage across all mounted volumes. Look for the mount used by your ClickHouse volume—usually mapped to something like /var/lib/docker/volumes/clickhouse_data/_data.

Check disk usage from inside the container

Enter the ClickHouse container shell:

docker-compose exec clickhouse bash

Inside, check total ClickHouse disk usage:

du -sh /var/lib/clickhouse

To inspect usage of specific folders like data/, tmp/, or store/:

ls -lh /var/lib/clickhouse

Configuring Alerts and Cleaning Up Storage

Inspect Docker’s storage usage

On the host, check space used by containers, images, volumes:

docker system df

Identify and remove unused Docker volumes

List all Docker volumes:

docker volume ls

Remove unused volumes (only if you’re sure they’re not needed):

docker volume rm <volume-name>

Warning: Never delete your active ClickHouse data volume unless you’ve backed it up.

Drop data manually using SQL

To free space by removing outdated partitions or tables:

ALTER TABLE logs DROP PARTITION '2024-01';
TRUNCATE TABLE temp_events;

Clean up local backups

If you’re storing backups under /var/lib/clickhouse/backup, list and delete old ones:

ls -lh /var/lib/clickhouse/backup
rm -rf /var/lib/clickhouse/backup/backup-<timestamp>

Ensure important backups are offloaded before removing.

Managing Temporary Files

Monitor temporary file usage

Check the temp directory inside the container:

du -sh /var/lib/clickhouse/tmp

Old files may remain if queries or merges crashed. Clean up when the system is idle.

Redirect temporary paths to persistent storage

Modify the tmp_path in config.xml to use a volume-backed directory:

<tmp_path>/var/lib/clickhouse/tmp/</tmp_path>

Restart the container after applying changes.

Best Practices for Disk Space Management

Avoid storing binary blobs: Do not store large files like PDFs or images in ClickHouse. Use external object storage and only store references.
Use TTL to expire old data: Automatically delete old data based on timestamps:

ALTER TABLE logs MODIFY TTL created_at + INTERVAL 90 DAY;

Drop old partitions regularly: If partitioned by month/day, remove outdated partitions:

ALTER TABLE logs DROP PARTITION '2023-12';

Enable efficient compression: Use ZSTD for better compression and lower disk usage:

CREATE TABLE logs (...) ENGINE = MergeTree() SETTINGS compression = 'ZSTD';

Split large inserts into smaller batches: Avoid memory and disk spikes during large ingest operations.
Optimize background merge load: Tune merge concurrency and thresholds using:

<background_pool_size>8</background_pool_size>

Limit disk spill during queries: Prevent massive temp usage during large operations:

<max_bytes_before_external_sort>500000000</max_bytes_before_external_sort>

Rotate Docker logs: Prevent logs from filling up your disk using log rotation:

logging:
  driver: "json-file"
  options:
    max-size: "10m"
    max-file: "3"

Monitor disk usage from ClickHouse itself: Track table-level disk usage using system tables:

SELECT table, sum(bytes_on_disk) AS size FROM system.parts GROUP BY table ORDER BY size DESC;

Offload backups to remote storage: Backups inside containers should be copied off-host. Use Elestio’s backup tool or mount a backup volume:

volumes:
  - /mnt/backups:/backups

Checking Database Size and Related Issues

As your ClickHouse data grows especially with large analytical workloads or high-ingestion pipelines it’s important to track how storage is being used. Unchecked growth can lead to full disks, failed inserts, increased merge times, and slower queries. While Elestio handles the infrastructure, ClickHouse storage optimization and cleanup remain your responsibility. This guide explains how to inspect disk usage, analyze table size, detect inefficiencies, and manage ClickHouse storage effectively under a Docker Compose setup.

Checking Table Size and Disk Usage

ClickHouse stores data in columnar parts on disk, organized by partitions and merges. You can inspect disk consumption using SQL queries and Docker commands.

Check total disk space used by ClickHouse

From the host machine:

docker system df

Identify the Docker volume associated with ClickHouse, then check disk usage:

docker volume ls
sudo du -sh /var/lib/docker/volumes/<clickhouse_volume_name>/_data

Inspect space used per table

Connect to ClickHouse from the container:

docker-compose exec clickhouse clickhouse-client

Run:

SELECT
  database,
  table,
  formatReadableSize(sum(bytes_on_disk)) AS size_on_disk
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY sum(bytes_on_disk) DESC;

This shows total size used by each active table on disk.

View storage location inside container

ClickHouse typically writes data under /var/lib/clickhouse:

docker-compose exec clickhouse ls -lh /var/lib/clickhouse/store

This contains all table parts and metadata. Review sizes and delete orphaned data if needed.

Detecting Bloat and Inefficiencies

ClickHouse can accumulate unnecessary disk usage due to unoptimized merges, redundant partitions, or abandoned tables.

Check for unmerged parts

A high number of unmerged parts can slow down queries and increase disk usage:

SELECT
  database,
  table,
  count() AS part_count
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY part_count DESC;

Tables with many small parts may need a manual merge trigger.

Detect inactive or outdated parts

Look for inactive parts still occupying disk:

SELECT
  name,
  active,
  remove_time
FROM system.parts
WHERE active = 0
LIMIT 50;

These parts are safe to delete if they’re old and not part of ongoing operations.

Analyze storage by partition

To pinpoint heavy partitions:

SELECT
  table,
  partition_id,
  formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts
WHERE active
GROUP BY table, partition_id
ORDER BY sum(bytes_on_disk) DESC;

Large partitions can indicate hot data or poor partitioning strategy.

Optimizing and Reclaiming ClickHouse Storage

ClickHouse provides several tools to optimize disk usage and clear unnecessary files.

Drop old partitions manually

For time-series or event tables, drop outdated partitions:

ALTER TABLE logs DROP PARTITION '2023-12';

Use partition pruning to maintain data freshness.

Optimize tables to force merges

To reduce part count and improve compression:

OPTIMIZE TABLE logs FINAL;

Use FINAL sparingly it can be resource-intensive.

Clean up old tables or unused databases

Drop stale or abandoned tables:

DROP TABLE old_analytics;

Drop entire databases if needed:

DROP DATABASE dev_test;

Always ensure no production data is affected.

Managing and Optimizing Files on Disk

ClickHouse stores metadata, parts, WAL logs, and temp files under /var/lib/clickhouse. You should monitor this path inside the container and from the host.

Monitor disk from inside container

docker-compose exec clickhouse du -sh /var/lib/clickhouse

To drill down:

docker-compose exec clickhouse du -sh /var/lib/clickhouse/*

Identify unexpectedly large directories like /store, /tmp, or /data.

Purge temporary files and logs

ClickHouse writes to /var/lib/clickhouse/tmp and /var/log/clickhouse-server/:

docker-compose exec clickhouse du -sh /var/lib/clickhouse/tmp
docker-compose exec clickhouse du -sh /var/log/clickhouse-server/

Clear if disk is nearing full. Rotate or truncate logs if necessary.

Clean WALs and outdated mutations

If mutations or insert queues are stuck:

SELECT * FROM system.mutations WHERE is_done = 0;

Investigate and resolve the root cause. Consider restarting ClickHouse after clearing safe logs.

Best Practices for ClickHouse Storage Management

Use partitioning: Partition large tables by time (e.g., daily, monthly) to enable faster drops and better merge control.
Archive old data: Move cold data to object storage (S3, etc.) or external databases for long-term storage.
Avoid oversized inserts: Insert in smaller chunks to avoid bloating parts and reduce memory pressure during merges.
Rotate logs: If ClickHouse logs to file, configure log rotation:

logging:
  driver: "json-file"
  options:
    max-size: "10m"
    max-file: "3"

Use ZSTD compression: Prefer ZSTD over LZ4 for better compression ratio at the cost of slightly higher CPU.
Monitor merges and disk pressure: Use system.metrics and system.events to track merge performance, part counts, and disk usage trends.
Backup externally: Don’t store backups on the same disk. Use Elestio backup options to archive to remote or cloud storage.