Encountering issues with Postgres Index and the size of the database on the server

A problem has arisen where the server is offline, and we've noticed that the storage space is already exhausted, despite having approximately 20-30 million records on the server. The database size should not exceed 40-50GB. So, where has all the remaining space gone, considering that a default disk is typically around 1TB? 😅

df  -h

Filesystem      Size  Used Avail Use% Mounted on

udev             16G     0   16G   0% /dev

tmpfs           3.1G  355M  2.8G  12% /run

/dev/sda1       226G   18G  199G   9% /

tmpfs            16G  4.0K   16G   1% /dev/shm

tmpfs           5.0M     0  5.0M   0% /run/lock

tmpfs            16G     0   16G   0% /sys/fs/cgroup

/dev/sdb        762G  670G   54G  93% /mnt/volume-nbg1-1

tmpfs           3.1G     0  3.1G   0% /run/user/0


 /mnt/volume-nbg1-1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------

  472.0 GiB [##########] /postgresql                                                                                                                                                           

  146.7 GiB [###       ] /database

   50.7 GiB [#         ] /static

e  16.0 KiB [          ] /lost+found

This command is highly effective for identifying hard drive issues. Initially, verify that you are in the '/var/logs' directory, as sometimes the problem can arise from inadvertently running a debugger in verbose mode, resulting in excessive log generation. However, in this instance, we will explore potential additional challenges. I plan to outline some design decisions for removing unused indexes, although these indexes are typically intended for not very large database tables containing approximately 25 million records or something like that. 

python manage.py dbshell

select schemaname as table_schema,

       relname as table_name,

       pg_size_pretty(pg_total_relation_size(relid)) as total_size,

       pg_size_pretty(pg_relation_size(relid)) as data_size,

       pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) 

        as external_size

from pg_catalog.pg_statio_user_tables

order by pg_total_relation_size(relid) desc,

         pg_relation_size(relid) desc;

 table_schema |           table_name            | total_size | data_size  | external_size 


 public       | job_jobarchive                  | 436 GB     | 71 GB      | 365 GB

 public       | crawler_page                    | 12 GB      | 2589 MB    | 9359 MB

 public       | job_jobarchive_locations        | 9959 MB    | 2225 MB    | 7734 MB

So, as I suspected, the size in the file system for the data isn't that large, and it's actually the index that has significantly expanded.

We never search in the archive, so there's no need to have an index there. The archive serves as a record of which jobs belong to which employers, and this data is kept for exceptional cases. Consequently, I should remove all the indexes.

Show index

SELECT tablename, indexname, indexdef FROM pg_indexes;

Or filtered result 

select * from pg_indexes where tablename  like '%job_jobarchive%';

Small boolean or integer indexes won't cause any issues, so I need to focus on searching for a text index, such as 'job_jobarchive_slug_description_c4e9b533_like' using the 'btree' method.

Then find the name and size of index that you are not using and take place. You can remove them.


DROP INDEX index_name; 

In the case of subjective instances we've encountered in our archives, those are the ones we need to eliminate.