Elasticsarch Python combinated with Haystack delete objects by id

 



I cannot delete objects from the index in real time by object ID. In this case, the Haystack django application cannot work properly. I need a custom script to move records out of the current table to archive and delete objects from the elastic in real time.

Old version of the application, python 2 and old Elastic make it more difficult.

See Example in Vindazo


vim job/management/commands/archiveringbykeyword.py


When you move a record to the archive and then delete them from the elastic index, the errors in haystack can happen.

This fix will provide new opportunities for adding and deleting jobs after indexing and cleaning. In old versions of Haystack it doesn't work as expected.

And if It is not working properly, you need to run update_index or rebuild_index command every day.


Example for low level python Elastic object removing without Haystack


from elasticsearch import Elasticsearch

es = Elasticsearch()

res = es.get(index="jobs", id=result.id)





es.delete(index="jobs", id=result.id, doc_type='modelresult')



python manage.py archiveringbykeyword --keyword="Start People"


Show list of indexes

curl -X GET "localhost:9200/_cat/indices?v=true&s=index&pretty"



Things like


SearchIndex.remove_object(self, instance, using=None, **kwargs)

Remove an object from the index. Attached to the class’s post-delete hook.

SearchIndex.update_object(self, instance, using=None, **kwargs)


Or

from haystack import connections as haystack_connections # Get the object you want to delete or update instance = YourModel.objects.get(id=id) # Get all Names/keys of your indexes / settings.HAYSTACK_CONNECTIONS backend_names = haystack_connections.connections_info.keys() # Get key of connection for your object using = backend_names[0] # Get the backend backend = haystack_connections[using].get_backend() # To remove object backend.remove(instance)




Not working properly


This is the solution for my problem



from elasticsearch import Elasticsearch

es = Elasticsearch()

es.delete(index="jobs", id=result.id, doc_type='modelresult')





You can configure it with an IP and not with a local host. See more information for authentication and other very interesting features of this client.


es = Elasticsearch(hosts=[{"host": "144.76.157.33", "port": 9200}])
es.delete(index="profile", id=result.id, doc_type='modelresult')


Make sure you use the correct name for index because haystack has different names for index and connections that is so can be a joke if you are searching. Why a document has 404 but you just used it via haystack..



More interesting selections



res = es.search(index="profile", body={"query": { "term": {"text": "test"}}})

res = es.search(index="profile", body={"query": { "term": {"id": "test"}}})

res = es.search(index="profile", body={"query": { "term": {"id": "113871"}}})

for hit in res['hits']['hits']:print(hit["_source"])



curl -X POST "144.76.157.33:9200/profile/_search/?size=100&pretty=1"

curl -X POST "144.76.157.33:9200/profile/_doc/spontaneousmail.spontaneousprofile.209392?pretty=1"

curl -X GET "144.76.157.33:9200/profile/_count?&pretty"

curl -X GET "144.76.157.33:9200/profile_vacaturestoday/_count?&pretty"






Elasticsearch DSL

Elasticsearch DSL is an advanced library whose purpose is to help write and run queries against Elasticsearch. It is built on top of the official low-level client (elasticsearch-py). It provides a more convenient and idiomatic way to write and manipulate queries. It is close to Elasticsearch JSON DSL, reflecting its terminology and structure. It directly exposes the entire DSL scope from Python using defined classes or query set-like expressions. It also provides an optional wrapper for processing documents as Python objects: defining mappings, retrieving and saving documents, and wrapping document data in user-defined classes. To use other Elasticsearch APIs (such as cluster health), you only need to use the underlying client.


https://elasticsearch-dsl.readthedocs.io/en/latest/

Comments