# Checking Database Size and Related Issues

As your ClickHouse data grows especially with large analytical workloads or high-ingestion pipelines it’s important to track how storage is being used. Unchecked growth can lead to full disks, failed inserts, increased merge times, and slower queries. While Elestio handles the infrastructure, ClickHouse storage optimization and cleanup remain your responsibility. This guide explains how to inspect disk usage, analyze table size, detect inefficiencies, and manage ClickHouse storage effectively under a Docker Compose setup.

### **Checking Table Size and Disk Usage**

ClickHouse stores data in columnar parts on disk, organized by partitions and merges. You can inspect disk consumption using SQL queries and Docker commands.

#### **Check total disk space used by ClickHouse**

From the host machine:

```
docker system df
```

Identify the Docker volume associated with ClickHouse, then check disk usage:

```
docker volume ls
sudo du -sh /var/lib/docker/volumes/<clickhouse_volume_name>/_data
```

#### **Inspect space used per table**

Connect to ClickHouse from the container:

```
docker-compose exec clickhouse clickhouse-client
```

Run:

```sql
SELECT
  database,
  table,
  formatReadableSize(sum(bytes_on_disk)) AS size_on_disk
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY sum(bytes_on_disk) DESC;
```

This shows total size used by each active table on disk.

#### **View storage location inside container**

ClickHouse typically writes data under <span class="s2">/var/lib/clickhouse</span>:

```
docker-compose exec clickhouse ls -lh /var/lib/clickhouse/store
```

This contains all table parts and metadata. Review sizes and delete orphaned data if needed.

### **Detecting Bloat and Inefficiencies**

ClickHouse can accumulate unnecessary disk usage due to unoptimized merges, redundant partitions, or abandoned tables.

#### **Check for unmerged parts**

A high number of unmerged parts can slow down queries and increase disk usage:

```sql
SELECT
  database,
  table,
  count() AS part_count
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY part_count DESC;
```

Tables with many small parts may need a manual merge trigger.

#### **Detect inactive or outdated parts**

Look for inactive parts still occupying disk:

```sql
SELECT
  name,
  active,
  remove_time
FROM system.parts
WHERE active = 0
LIMIT 50;
```

These parts are safe to delete if they’re old and not part of ongoing operations.

#### **Analyze storage by partition**

To pinpoint heavy partitions:

```sql
SELECT
  table,
  partition_id,
  formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts
WHERE active
GROUP BY table, partition_id
ORDER BY sum(bytes_on_disk) DESC;
```

Large partitions can indicate hot data or poor partitioning strategy.

### **Optimizing and Reclaiming ClickHouse Storage**

ClickHouse provides several tools to optimize disk usage and clear unnecessary files.

#### **Drop old partitions manually**

For time-series or event tables, drop outdated partitions:

```sql
ALTER TABLE logs DROP PARTITION '2023-12';
```

Use partition pruning to maintain data freshness.

#### **Optimize tables to force merges**

To reduce part count and improve compression:

```sql
OPTIMIZE TABLE logs FINAL;
```

Use FINAL sparingly it can be resource-intensive.

#### **Clean up old tables or unused databases**

Drop stale or abandoned tables:

```sql
DROP TABLE old_analytics;
```

Drop entire databases if needed:

```sql
DROP DATABASE dev_test;
```

Always ensure no production data is affected.

### **Managing and Optimizing Files on Disk**

ClickHouse stores metadata, parts, WAL logs, and temp files under <span class="s2">/var/lib/clickhouse</span>. You should monitor this path inside the container and from the host.

#### **Monitor disk from inside container**

```
docker-compose exec clickhouse du -sh /var/lib/clickhouse
```

To drill down:

```
docker-compose exec clickhouse du -sh /var/lib/clickhouse/*
```

Identify unexpectedly large directories like <span class="s1">/store</span>, <span class="s1">/tmp</span>, or <span class="s1">/data</span>.

#### **Purge temporary files and logs**

<span class="s3">ClickHouse writes to </span>/var/lib/clickhouse/tmp<span class="s3"> and </span>/var/log/clickhouse-server/<span class="s3">:</span>

```
docker-compose exec clickhouse du -sh /var/lib/clickhouse/tmp
docker-compose exec clickhouse du -sh /var/log/clickhouse-server/
```

Clear if disk is nearing full. Rotate or truncate logs if necessary.

#### **Clean WALs and outdated mutations**

If mutations or insert queues are stuck:

```
SELECT * FROM system.mutations WHERE is_done = 0;
```

Investigate and resolve the root cause. Consider restarting ClickHouse after clearing safe logs.

### **Best Practices for ClickHouse Storage Management**

- Use partitioning: Partition large tables by time (e.g., daily, monthly) to enable faster drops and better merge control.
- Archive old data: Move cold data to object storage (S3, etc.) or external databases for long-term storage.
- Avoid oversized inserts: Insert in smaller chunks to avoid bloating parts and reduce memory pressure during merges.
- Rotate logs: If ClickHouse logs to file, configure log rotation:

```yaml
logging:
  driver: "json-file"
  options:
    max-size: "10m"
    max-file: "3"
```

- Use ZSTD compression: Prefer ZSTD over LZ4 for better compression ratio at the cost of slightly higher CPU.
- Monitor merges and disk pressure: Use system.metrics and system.events to track merge performance, part counts, and disk usage trends.
- Backup externally: Don’t store backups on the same disk. Use Elestio backup options to archive to remote or cloud storage.