{"id":32580,"date":"2024-04-16T12:23:46","date_gmt":"2024-04-16T10:23:46","guid":{"rendered":"https:\/\/www.dbi-services.com\/blog\/?p=32580"},"modified":"2024-09-10T17:12:37","modified_gmt":"2024-09-10T15:12:37","slug":"elasticsearch-ingest-pipeline-and-machine-learning","status":"publish","type":"post","link":"https:\/\/www.dbi-services.com\/blog\/elasticsearch-ingest-pipeline-and-machine-learning\/","title":{"rendered":"Elasticsearch, Ingest Pipeline and Machine Learning"},"content":{"rendered":"\n<p>Elasticsearch has few interesting features around Machine Learning. While I was looking for data to import into Elasticsearch, I found interesting data sets from <a href=\"https:\/\/insideairbnb.com\/get-the-data\/\" target=\"_blank\" rel=\"noreferrer noopener\">Airbnb<\/a> especially reviews. I noticed that it does not contain any rate, but only comments.<\/p>\n\n\n\n<p>To have <a href=\"https:\/\/en.wikipedia.org\/wiki\/Sentiment_analysis\" target=\"_blank\" rel=\"noreferrer noopener\">sentiment<\/a> of the a review, I would rather have an opinion on that review like:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Negative<\/li>\n\n\n\n<li>Positive<\/li>\n\n\n\n<li>Neutral<\/li>\n<\/ul>\n\n\n\n<p>For that matter, I found the <a href=\"https:\/\/huggingface.co\/cardiffnlp\/twitter-roberta-base-sentiment-latest\" target=\"_blank\" rel=\"noreferrer noopener\">cardiffnlp\/twitter-roberta-base-sentiment-latest<\/a> to suite my needs for my tests.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-import-model\">Import Model<\/h2>\n\n\n\n<p>Elasticsearch provides the tool to import models from <a href=\"https:\/\/huggingface.co\/\" target=\"_blank\" rel=\"noreferrer noopener\">Hugging face<\/a> into Elasticsearch itself: <a href=\"https:\/\/github.com\/elastic\/eland\" target=\"_blank\" rel=\"noreferrer noopener\">eland<\/a>.<\/p>\n\n\n\n<p>It is possible to install it or even use the pre-built docker image:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: bash; title: ; notranslate\" title=\"\">\ndocker run -it --rm --network host docker.elastic.co\/eland\/eland\n<\/pre><\/div>\n\n\n<p>Let&#8217;s import the model:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: bash; title: ; notranslate\" title=\"\">\neland_import_hub_model -u elastic -p &#039;password!&#039; --hub-model-id cardiffnlp\/twitter-roberta-base-sentiment-latest --task-type classification --url https:\/\/127.0.0.1:9200\n<\/pre><\/div>\n\n\n<p>After a minute, import completes:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n2024-04-16 08:12:46,825 INFO : Model successfully imported with id &#039;cardiffnlp__twitter-roberta-base-sentiment-latest&#039;\n<\/pre><\/div>\n\n\n<p>I can also check that it was imported successfully with the following API call:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nGET _ml\/trained_models\/cardiffnlp__twitter-roberta-base-sentiment-latest\n<\/pre><\/div>\n\n\n<p>And result (extract):<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: jscript; title: ; notranslate\" title=\"\">\n{\n  &quot;count&quot;: 1,\n  &quot;trained_model_configs&quot;: &#x5B;\n    {\n      &quot;model_id&quot;: &quot;cardiffnlp__twitter-roberta-base-sentiment-latest&quot;,\n      &quot;model_type&quot;: &quot;pytorch&quot;,\n      &quot;created_by&quot;: &quot;api_user&quot;,\n      &quot;version&quot;: &quot;12.0.0&quot;,\n      &quot;create_time&quot;: 1713255117150,\n...\n      &quot;description&quot;: &quot;Model cardiffnlp\/twitter-roberta-base-sentiment-latest for task type &#039;text_classification&#039;&quot;,\n      &quot;tags&quot;: &#x5B;],\n...\n          },\n          &quot;classification_labels&quot;: &#x5B;\n            &quot;negative&quot;,\n            &quot;neutral&quot;,\n            &quot;positive&quot;\n          ],\n...\n  ]\n}\n<\/pre><\/div>\n\n\n<p>Next, model must be started:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nPOST _ml\/trained_models\/cardiffnlp__twitter-roberta-base-sentiment-latest\/deployment\/_start\n<\/pre><\/div>\n\n\n<p>This is subject to licensing. You might face this error &#8220;<code>current license is non-compliant for [ml]<\/code>&#8220;. For my tests, I used a trial.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-filebeat-configuration\">Filebeat Configuration<\/h2>\n\n\n\n<p>I will use Filebeat to read review.csv file and ingest it into Elasticsearch. filebeat.yml looks like this:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: yaml; title: ; notranslate\" title=\"\">\nfilebeat.inputs:\n- type: log\n  paths:\n    - &#039;C:\\csv_inject\\*.csv&#039;\n\noutput.elasticsearch:\n  hosts: &#x5B;&quot;https:\/\/localhost:9200&quot;]\n  protocol: &quot;https&quot;\n  username: &quot;elastic&quot;\n  password: &quot;password!&quot;\n  ssl:\n    ca_trusted_fingerprint: fakefp4076a4cf5c1111ac586bafa385exxxxfde0dfe3cd7771ed\n  \n  indices:\n    - index: &quot;csv&quot;\n  pipeline: csv\n<\/pre><\/div>\n\n\n<p>So each time a new file gets into <mark class=\"has-inline-color has-vivid-cyan-blue-color\">csv_inject<\/mark> folder, Filebeat will parse it and send it to my Elasticsearch setup within <mark class=\"has-inline-color has-luminous-vivid-orange-color\">csv<\/mark> index.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-pipeline\">Pipeline<\/h2>\n\n\n\n<p>Ingest pipeline can perform basic transformation to incoming data before being indexed.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-data-transformation\">Data transformation<\/h3>\n\n\n\n<p>First step consists of converting <mark class=\"has-inline-color has-luminous-vivid-amber-color\">message <\/mark>field, which contains one line of data, into several target fields (ie. split csv). Next, remove <mark class=\"has-inline-color has-luminous-vivid-amber-color\">message <\/mark>field. This looks like this in Processors section of the Ingest pipeline:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"892\" height=\"197\" src=\"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-11_04_28-Ingest-Pipelines-Elastic-\u2014-Mozilla-Firefox.png\" alt=\"\" class=\"wp-image-32587\" srcset=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-11_04_28-Ingest-Pipelines-Elastic-\u2014-Mozilla-Firefox.png 892w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-11_04_28-Ingest-Pipelines-Elastic-\u2014-Mozilla-Firefox-300x66.png 300w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-11_04_28-Ingest-Pipelines-Elastic-\u2014-Mozilla-Firefox-768x170.png 768w\" sizes=\"auto, (max-width: 892px) 100vw, 892px\" \/><\/figure>\n\n\n\n<p>Next, I also want to replace the content of the default timestamp field (ie. <code>@timestamp<\/code>) with the timestamp of the review (and remove the date field after that):<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"602\" height=\"117\" src=\"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-11_06_49-Ingest-Pipelines-Elastic-\u2014-Mozilla-Firefox.png\" alt=\"\" class=\"wp-image-32588\" srcset=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-11_06_49-Ingest-Pipelines-Elastic-\u2014-Mozilla-Firefox.png 602w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-11_06_49-Ingest-Pipelines-Elastic-\u2014-Mozilla-Firefox-300x58.png 300w\" sizes=\"auto, (max-width: 602px) 100vw, 602px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-inference\">Inference<\/h3>\n\n\n\n<p>Now, I add the <mark class=\"has-inline-color has-luminous-vivid-amber-color\">Inference <\/mark>step:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1021\" height=\"55\" src=\"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-11_09_47-Ingest-Pipelines-Elastic-\u2014-Mozilla-Firefox.png\" alt=\"\" class=\"wp-image-32589\" srcset=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-11_09_47-Ingest-Pipelines-Elastic-\u2014-Mozilla-Firefox.png 1021w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-11_09_47-Ingest-Pipelines-Elastic-\u2014-Mozilla-Firefox-300x16.png 300w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-11_09_47-Ingest-Pipelines-Elastic-\u2014-Mozilla-Firefox-768x41.png 768w\" sizes=\"auto, (max-width: 1021px) 100vw, 1021px\" \/><\/figure>\n\n\n\n<p>The only customization of that step is the field map as the default input field name is &#8220;<code>text_field<\/code>&#8220;, In the reviews, fields is named &#8220;<code>comment<\/code>&#8220;:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"687\" height=\"543\" src=\"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-11_48_32-Ingest-Pipelines-Elastic-\u2014-Mozilla-Firefox.png\" alt=\"\" class=\"wp-image-32593\" srcset=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-11_48_32-Ingest-Pipelines-Elastic-\u2014-Mozilla-Firefox.png 687w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-11_48_32-Ingest-Pipelines-Elastic-\u2014-Mozilla-Firefox-300x237.png 300w\" sizes=\"auto, (max-width: 687px) 100vw, 687px\" \/><\/figure>\n\n\n\n<p>Optionally, but recommended, it is possible to add <strong>Failure processors<\/strong> which will set a field to keep track of the cause and will put them in a different index:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"647\" height=\"171\" src=\"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-11_14_46-Ingest-Pipelines-Elastic-\u2014-Mozilla-Firefox.png\" alt=\"\" class=\"wp-image-32592\" srcset=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-11_14_46-Ingest-Pipelines-Elastic-\u2014-Mozilla-Firefox.png 647w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-11_14_46-Ingest-Pipelines-Elastic-\u2014-Mozilla-Firefox-300x79.png 300w\" sizes=\"auto, (max-width: 647px) 100vw, 647px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-ingest\">Ingest<\/h2>\n\n\n\n<p>Now, I can simply copy the review.csv into the watched directory and Filebeat will send lines to Elasticsearch. After few minutes, I can see the first results:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"80\" src=\"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-12_08_28--1024x80.png\" alt=\"\" class=\"wp-image-32595\" srcset=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-12_08_28--1024x80.png 1024w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-12_08_28--300x23.png 300w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-12_08_28--768x60.png 768w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-12_08_28--1536x120.png 1536w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-12_08_28-.png 1611w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Or, a considered negative example with the associated prediction rate:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"64\" src=\"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-12_10_42--1024x64.png\" alt=\"\" class=\"wp-image-32596\" srcset=\"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-12_10_42--1024x64.png 1024w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-12_10_42--300x19.png 300w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-12_10_42--768x48.png 768w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-12_10_42--1536x96.png 1536w, https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-12_10_42-.png 1608w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-next\">What Next?<\/h2>\n\n\n\n<p>Of course, we could try another model to compare results.<\/p>\n\n\n\n<p>If you did not noticed, this was also a first step into Extract-transform-load topic (ETL).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Elasticsearch has few interesting features around Machine Learning. While I was looking for data to import into Elasticsearch, I found interesting data sets from Airbnb especially reviews. I noticed that it does not contain any rate, but only comments. To have sentiment of the a review, I would rather have an opinion on that review [&hellip;]<\/p>\n","protected":false},"author":40,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1320],"tags":[2810,86,702,3334,3333,3295],"type_dbi":[],"class_list":["post-32580","post","type-post","status-publish","format-standard","hentry","category-devops","tag-ai","tag-elasticsearch","tag-etl","tag-ingest","tag-machinelearning","tag-ml"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.2 (Yoast SEO v27.5) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Elasticsearch, Ingest Pipeline and Machine Learning - dbi Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.dbi-services.com\/blog\/elasticsearch-ingest-pipeline-and-machine-learning\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Elasticsearch, Ingest Pipeline and Machine Learning\" \/>\n<meta property=\"og:description\" content=\"Elasticsearch has few interesting features around Machine Learning. While I was looking for data to import into Elasticsearch, I found interesting data sets from Airbnb especially reviews. I noticed that it does not contain any rate, but only comments. To have sentiment of the a review, I would rather have an opinion on that review [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.dbi-services.com\/blog\/elasticsearch-ingest-pipeline-and-machine-learning\/\" \/>\n<meta property=\"og:site_name\" content=\"dbi Blog\" \/>\n<meta property=\"article:published_time\" content=\"2024-04-16T10:23:46+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-09-10T15:12:37+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-12_08_28--1024x80.png\" \/>\n<meta name=\"author\" content=\"Middleware Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Middleware Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/elasticsearch-ingest-pipeline-and-machine-learning\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/elasticsearch-ingest-pipeline-and-machine-learning\\\/\"},\"author\":{\"name\":\"Middleware Team\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/8d8563acfc6e604cce6507f45bac0ea1\"},\"headline\":\"Elasticsearch, Ingest Pipeline and Machine Learning\",\"datePublished\":\"2024-04-16T10:23:46+00:00\",\"dateModified\":\"2024-09-10T15:12:37+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/elasticsearch-ingest-pipeline-and-machine-learning\\\/\"},\"wordCount\":406,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/elasticsearch-ingest-pipeline-and-machine-learning\\\/#primaryimage\"},\"thumbnailUrl\":\"http:\\\/\\\/www.dbi-services.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2024\\\/04\\\/2024-04-16-12_08_28--1024x80.png\",\"keywords\":[\"ai\",\"Elasticsearch\",\"ETL\",\"Ingest\",\"machinelearning\",\"ML\"],\"articleSection\":[\"DevOps\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/elasticsearch-ingest-pipeline-and-machine-learning\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/elasticsearch-ingest-pipeline-and-machine-learning\\\/\",\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/elasticsearch-ingest-pipeline-and-machine-learning\\\/\",\"name\":\"Elasticsearch, Ingest Pipeline and Machine Learning - dbi Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/elasticsearch-ingest-pipeline-and-machine-learning\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/elasticsearch-ingest-pipeline-and-machine-learning\\\/#primaryimage\"},\"thumbnailUrl\":\"http:\\\/\\\/www.dbi-services.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2024\\\/04\\\/2024-04-16-12_08_28--1024x80.png\",\"datePublished\":\"2024-04-16T10:23:46+00:00\",\"dateModified\":\"2024-09-10T15:12:37+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/8d8563acfc6e604cce6507f45bac0ea1\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/elasticsearch-ingest-pipeline-and-machine-learning\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/elasticsearch-ingest-pipeline-and-machine-learning\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/elasticsearch-ingest-pipeline-and-machine-learning\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2024\\\/04\\\/2024-04-16-12_08_28-.png\",\"contentUrl\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2024\\\/04\\\/2024-04-16-12_08_28-.png\",\"width\":1611,\"height\":126},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/elasticsearch-ingest-pipeline-and-machine-learning\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Elasticsearch, Ingest Pipeline and Machine Learning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/\",\"name\":\"dbi Blog\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/#\\\/schema\\\/person\\\/8d8563acfc6e604cce6507f45bac0ea1\",\"name\":\"Middleware Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/ddcae7ba0f9d1a0e7ae707f0e689e4a9c95bb48ec49c8e6d9cc86d43f4121cb6?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/ddcae7ba0f9d1a0e7ae707f0e689e4a9c95bb48ec49c8e6d9cc86d43f4121cb6?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/ddcae7ba0f9d1a0e7ae707f0e689e4a9c95bb48ec49c8e6d9cc86d43f4121cb6?s=96&d=mm&r=g\",\"caption\":\"Middleware Team\"},\"url\":\"https:\\\/\\\/www.dbi-services.com\\\/blog\\\/author\\\/middleware-team\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Elasticsearch, Ingest Pipeline and Machine Learning - dbi Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.dbi-services.com\/blog\/elasticsearch-ingest-pipeline-and-machine-learning\/","og_locale":"en_US","og_type":"article","og_title":"Elasticsearch, Ingest Pipeline and Machine Learning","og_description":"Elasticsearch has few interesting features around Machine Learning. While I was looking for data to import into Elasticsearch, I found interesting data sets from Airbnb especially reviews. I noticed that it does not contain any rate, but only comments. To have sentiment of the a review, I would rather have an opinion on that review [&hellip;]","og_url":"https:\/\/www.dbi-services.com\/blog\/elasticsearch-ingest-pipeline-and-machine-learning\/","og_site_name":"dbi Blog","article_published_time":"2024-04-16T10:23:46+00:00","article_modified_time":"2024-09-10T15:12:37+00:00","og_image":[{"url":"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-12_08_28--1024x80.png","type":"","width":"","height":""}],"author":"Middleware Team","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Middleware Team","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.dbi-services.com\/blog\/elasticsearch-ingest-pipeline-and-machine-learning\/#article","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/elasticsearch-ingest-pipeline-and-machine-learning\/"},"author":{"name":"Middleware Team","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/8d8563acfc6e604cce6507f45bac0ea1"},"headline":"Elasticsearch, Ingest Pipeline and Machine Learning","datePublished":"2024-04-16T10:23:46+00:00","dateModified":"2024-09-10T15:12:37+00:00","mainEntityOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/elasticsearch-ingest-pipeline-and-machine-learning\/"},"wordCount":406,"commentCount":0,"image":{"@id":"https:\/\/www.dbi-services.com\/blog\/elasticsearch-ingest-pipeline-and-machine-learning\/#primaryimage"},"thumbnailUrl":"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-12_08_28--1024x80.png","keywords":["ai","Elasticsearch","ETL","Ingest","machinelearning","ML"],"articleSection":["DevOps"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.dbi-services.com\/blog\/elasticsearch-ingest-pipeline-and-machine-learning\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.dbi-services.com\/blog\/elasticsearch-ingest-pipeline-and-machine-learning\/","url":"https:\/\/www.dbi-services.com\/blog\/elasticsearch-ingest-pipeline-and-machine-learning\/","name":"Elasticsearch, Ingest Pipeline and Machine Learning - dbi Blog","isPartOf":{"@id":"https:\/\/www.dbi-services.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.dbi-services.com\/blog\/elasticsearch-ingest-pipeline-and-machine-learning\/#primaryimage"},"image":{"@id":"https:\/\/www.dbi-services.com\/blog\/elasticsearch-ingest-pipeline-and-machine-learning\/#primaryimage"},"thumbnailUrl":"http:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-12_08_28--1024x80.png","datePublished":"2024-04-16T10:23:46+00:00","dateModified":"2024-09-10T15:12:37+00:00","author":{"@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/8d8563acfc6e604cce6507f45bac0ea1"},"breadcrumb":{"@id":"https:\/\/www.dbi-services.com\/blog\/elasticsearch-ingest-pipeline-and-machine-learning\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.dbi-services.com\/blog\/elasticsearch-ingest-pipeline-and-machine-learning\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.dbi-services.com\/blog\/elasticsearch-ingest-pipeline-and-machine-learning\/#primaryimage","url":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-12_08_28-.png","contentUrl":"https:\/\/www.dbi-services.com\/blog\/wp-content\/uploads\/sites\/2\/2024\/04\/2024-04-16-12_08_28-.png","width":1611,"height":126},{"@type":"BreadcrumbList","@id":"https:\/\/www.dbi-services.com\/blog\/elasticsearch-ingest-pipeline-and-machine-learning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/www.dbi-services.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Elasticsearch, Ingest Pipeline and Machine Learning"}]},{"@type":"WebSite","@id":"https:\/\/www.dbi-services.com\/blog\/#website","url":"https:\/\/www.dbi-services.com\/blog\/","name":"dbi Blog","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.dbi-services.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.dbi-services.com\/blog\/#\/schema\/person\/8d8563acfc6e604cce6507f45bac0ea1","name":"Middleware Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/ddcae7ba0f9d1a0e7ae707f0e689e4a9c95bb48ec49c8e6d9cc86d43f4121cb6?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/ddcae7ba0f9d1a0e7ae707f0e689e4a9c95bb48ec49c8e6d9cc86d43f4121cb6?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/ddcae7ba0f9d1a0e7ae707f0e689e4a9c95bb48ec49c8e6d9cc86d43f4121cb6?s=96&d=mm&r=g","caption":"Middleware Team"},"url":"https:\/\/www.dbi-services.com\/blog\/author\/middleware-team\/"}]}},"_links":{"self":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/32580","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/users\/40"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/comments?post=32580"}],"version-history":[{"count":8,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/32580\/revisions"}],"predecessor-version":[{"id":32697,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/posts\/32580\/revisions\/32697"}],"wp:attachment":[{"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/media?parent=32580"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/categories?post=32580"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/tags?post=32580"},{"taxonomy":"type","embeddable":true,"href":"https:\/\/www.dbi-services.com\/blog\/wp-json\/wp\/v2\/type_dbi?post=32580"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}