Checking Database Size and Related Issues

As your ClickHouse data grows especially with large analytical workloads or high-ingestion pipelines it’s important to track how storage is being used. Unchecked growth can lead to full disks, failed inserts, increased merge times, and slower queries. While Elestio handles the infrastructure, ClickHouse storage optimization and cleanup remain your responsibility. This guide explains how to inspect disk usage, analyze table size, detect inefficiencies, and manage ClickHouse storage effectively under a Docker Compose setup.

Checking Table Size and Disk Usage

ClickHouse stores data in columnar parts on disk, organized by partitions and merges. You can inspect disk consumption using SQL queries and Docker commands.

Check total disk space used by ClickHouse

From the host machine:

docker system df

Identify the Docker volume associated with ClickHouse, then check disk usage:

docker volume ls
sudo du -sh /var/lib/docker/volumes/<clickhouse_volume_name>/_data

Inspect space used per table

Connect to ClickHouse from the container:

docker-compose exec clickhouse clickhouse-client

Run:

SELECT
  database,
  table,
  formatReadableSize(sum(bytes_on_disk)) AS size_on_disk
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY sum(bytes_on_disk) DESC;

This shows total size used by each active table on disk.

View storage location inside container

ClickHouse typically writes data under /var/lib/clickhouse:

docker-compose exec clickhouse ls -lh /var/lib/clickhouse/store

This contains all table parts and metadata. Review sizes and delete orphaned data if needed.

Detecting Bloat and Inefficiencies

ClickHouse can accumulate unnecessary disk usage due to unoptimized merges, redundant partitions, or abandoned tables.

Check for unmerged parts

A high number of unmerged parts can slow down queries and increase disk usage:

SELECT
  database,
  table,
  count() AS part_count
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY part_count DESC;

Tables with many small parts may need a manual merge trigger.

Detect inactive or outdated parts

Look for inactive parts still occupying disk:

SELECT
  name,
  active,
  remove_time
FROM system.parts
WHERE active = 0
LIMIT 50;

These parts are safe to delete if they’re old and not part of ongoing operations.

Analyze storage by partition

To pinpoint heavy partitions:

SELECT
  table,
  partition_id,
  formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts
WHERE active
GROUP BY table, partition_id
ORDER BY sum(bytes_on_disk) DESC;

Large partitions can indicate hot data or poor partitioning strategy.

Optimizing and Reclaiming ClickHouse Storage

ClickHouse provides several tools to optimize disk usage and clear unnecessary files.

Drop old partitions manually

For time-series or event tables, drop outdated partitions:

ALTER TABLE logs DROP PARTITION '2023-12';

Use partition pruning to maintain data freshness.

Optimize tables to force merges

To reduce part count and improve compression:

OPTIMIZE TABLE logs FINAL;

Use FINAL sparingly it can be resource-intensive.

Clean up old tables or unused databases

Drop stale or abandoned tables:

DROP TABLE old_analytics;

Drop entire databases if needed:

DROP DATABASE dev_test;

Always ensure no production data is affected.

Managing and Optimizing Files on Disk

ClickHouse stores metadata, parts, WAL logs, and temp files under /var/lib/clickhouse. You should monitor this path inside the container and from the host.

Monitor disk from inside container

docker-compose exec clickhouse du -sh /var/lib/clickhouse

To drill down:

docker-compose exec clickhouse du -sh /var/lib/clickhouse/*

Identify unexpectedly large directories like /store, /tmp, or /data.

Purge temporary files and logs

ClickHouse writes to /var/lib/clickhouse/tmp and /var/log/clickhouse-server/:

docker-compose exec clickhouse du -sh /var/lib/clickhouse/tmp
docker-compose exec clickhouse du -sh /var/log/clickhouse-server/

Clear if disk is nearing full. Rotate or truncate logs if necessary.

Clean WALs and outdated mutations

If mutations or insert queues are stuck:

SELECT * FROM system.mutations WHERE is_done = 0;

Investigate and resolve the root cause. Consider restarting ClickHouse after clearing safe logs.

Best Practices for ClickHouse Storage Management

Use partitioning: Partition large tables by time (e.g., daily, monthly) to enable faster drops and better merge control.
Archive old data: Move cold data to object storage (S3, etc.) or external databases for long-term storage.
Avoid oversized inserts: Insert in smaller chunks to avoid bloating parts and reduce memory pressure during merges.
Rotate logs: If ClickHouse logs to file, configure log rotation:

logging:
  driver: "json-file"
  options:
    max-size: "10m"
    max-file: "3"

Use ZSTD compression: Prefer ZSTD over LZ4 for better compression ratio at the cost of slightly higher CPU.
Monitor merges and disk pressure: Use system.metrics and system.events to track merge performance, part counts, and disk usage trends.
Backup externally: Don’t store backups on the same disk. Use Elestio backup options to archive to remote or cloud storage.

Connecting with ClickHouse GUI

Creating a Database

Upgrading to Major Version

Installing and Updating an Extension

Creating Manual Backups

Restoring a Backup

Identifying Slow Queries

Detect and terminate long-running queries

Preventing Full Disk Issues

Checking Database Size and Related Issues

Cloning a Service to Another Provider or Region

Database Migration Services for ClickHouse

Manual ClickHouse Migration Using clickhouse-backup

Overview

Deploying a New Cluster

Node Management

Adding a Node

Promoting a Node

Removing a Node

Backups and Restores

Cluster Resynchronization

Database Migration

Delete a Cluster

Restricting Access by IP

Checking Database Size and Related Issues

Checking Table Size and Disk Usage

Check total disk space used by ClickHouse

Inspect space used per table

View storage location inside container

Detecting Bloat and Inefficiencies

Check for unmerged parts

Detect inactive or outdated parts

Analyze storage by partition

Optimizing and Reclaiming ClickHouse Storage

Drop old partitions manually

Optimize tables to force merges

Clean up old tables or unused databases

Managing and Optimizing Files on Disk

Monitor disk from inside container

Purge temporary files and logs

Clean WALs and outdated mutations

Best Practices for ClickHouse Storage Management