Encountering issues with Postgres Index and the size of the database on the server

A problem has arisen where the server is offline, and we've noticed that the storage space is already exhausted, despite having approximately 20-30 million records on the server. The database size should not exceed 40-50GB. So, where has all the remaining space gone, considering that a default disk is typically around 1TB? 😅

df -h

Filesystem Size Used Avail Use% Mounted on

udev 16G 0 16G 0% /dev

tmpfs 3.1G 355M 2.8G 12% /run

/dev/sda1 226G 18G 199G 9% /

tmpfs 16G 4.0K 16G 1% /dev/shm

tmpfs 5.0M 0 5.0M 0% /run/lock

tmpfs 16G 0 16G 0% /sys/fs/cgroup

/dev/sdb 762G 670G 54G 93% /mnt/volume-nbg1-1

tmpfs 3.1G 0 3.1G 0% /run/user/0

ncdu

/mnt/volume-nbg1-1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------

472.0 GiB [##########] /postgresql

146.7 GiB [### ] /database

50.7 GiB [# ] /static

e 16.0 KiB [ ] /lost+found

This command is highly effective for identifying hard drive issues. Initially, verify that you are in the '/var/logs' directory, as sometimes the problem can arise from inadvertently running a debugger in verbose mode, resulting in excessive log generation. However, in this instance, we will explore potential additional challenges. I plan to outline some design decisions for removing unused indexes, although these indexes are typically intended for not very large database tables containing approximately 25 million records or something like that.

python manage.py dbshell

select schemaname as table_schema,

relname as table_name,

pg_size_pretty(pg_total_relation_size(relid)) as total_size,

pg_size_pretty(pg_relation_size(relid)) as data_size,

pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid))

as external_size

from pg_catalog.pg_statio_user_tables

order by pg_total_relation_size(relid) desc,

pg_relation_size(relid) desc;

table_schema | table_name | total_size | data_size | external_size

--------------+---------------------------------+------------+------------+---------------

public | job_jobarchive | 436 GB | 71 GB | 365 GB

public | crawler_page | 12 GB | 2589 MB | 9359 MB

public | job_jobarchive_locations | 9959 MB | 2225 MB | 7734 MB

So, as I suspected, the size in the file system for the data isn't that large, and it's actually the index that has significantly expanded.

We never search in the archive, so there's no need to have an index there. The archive serves as a record of which jobs belong to which employers, and this data is kept for exceptional cases. Consequently, I should remove all the indexes.

Show index

SELECT tablename, indexname, indexdef FROM pg_indexes;

Or filtered result

select * from pg_indexes where tablename like '%job_jobarchive%';

Small boolean or integer indexes won't cause any issues, so I need to focus on searching for a text index, such as 'job_jobarchive_slug_description_c4e9b533_like' using the 'btree' method.

Then find the name and size of index that you are not using and take place. You can remove them.

ALTER TABLE name DROP CONSTRAINT constran_name;

DROP INDEX index_name;

In the case of subjective instances we've encountered in our archives, those are the ones we need to eliminate.

job_jobarchive_slug_description

job_jobarchive_slug_description_like

Job_jobarchive_has_checked_duplicate

Search This Blog

Encountering issues with Postgres Index and the size of the database on the server

Comments

Post a Comment