{"id":7644,"date":"2025-06-15T16:25:20","date_gmt":"2025-06-15T14:25:20","guid":{"rendered":"https:\/\/launix.de\/launix\/?p=7644"},"modified":"2025-06-23T09:38:45","modified_gmt":"2025-06-23T07:38:45","slug":"benchmarking-ollama-embeddings-with-a-minimal-node-js-script","status":"publish","type":"post","link":"https:\/\/launix.de\/launix\/benchmarking-ollama-embeddings-with-a-minimal-node-js-script\/","title":{"rendered":"Benchmarking Ollama Embeddings with a Minimal Node.js Script"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Semantic search is becoming easier to prototype thanks to lightweight models and tools like Ollama&#8217;s <em>nomic-embed-text<\/em>. I wanted to see how well a local solution would perform\u2014without relying on cloud services or heavyweight vector databases.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So I built a tiny Node.js script that:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Embeds blog posts using Ollama model <em>nomic-embed-text<\/em><\/li>\n\n\n\n<li>Stores the vectors in memory<\/li>\n\n\n\n<li>Ranks documents based on vector similarity<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Here\u2019s what I came up with:<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udda5\ufe0f Platform Specs<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>CPU<\/strong>: AMD Ryzen 9 7900X3D (12 cores, 24 threads)<\/li>\n\n\n\n<li><strong>RAM<\/strong>: 64 GB<\/li>\n\n\n\n<li><strong>Disk<\/strong>: NVMe SSD<\/li>\n\n\n\n<li><strong>GPU:<\/strong> not used (ollama on CPU only)<\/li>\n\n\n\n<li><strong>Environment<\/strong>: Node.js (no DB, no external vector store)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udcda Dataset<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Source<\/strong>: 650 blog posts from <a class=\"\" href=\"https:\/\/launix.de\">launix.de<\/a><\/li>\n\n\n\n<li><strong>Average post size<\/strong>: ~4 KB<\/li>\n\n\n\n<li><strong>Total text size<\/strong>: ~2.6 MB<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\u23f1\ufe0f Embedding Performance<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The script reads all posts and sends each one to Ollama for embedding. Total time:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>167,148 ms<\/strong> for all 650 posts<\/li>\n\n\n\n<li>That\u2019s <strong>257 ms per post<\/strong><\/li>\n\n\n\n<li>Or ~<strong>16 KB\/s<\/strong> read+embed throughput<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Each embedding was fetched via HTTP from a locally running Ollama instance, so this includes I\/O, JSON parsing, and vector serialization.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udd0d Search Performance (In-Memory)<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">To test semantic search:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>A query is embedded using Ollama.<\/li>\n\n\n\n<li>The resulting vector is compared against all post vectors in memory using cosine similarity.<\/li>\n\n\n\n<li>Top 5 posts are returned.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Performance<\/strong>:<br><strong>21\u201351 ms per query<\/strong> \u2014 even with no indexing.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For small datasets, this is more than fast enough for real-time use, especially with our ERP\/CRM\/DMS products where users can find matching documents quickly and only few users access the system.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83e\udde0 Replacing the Loop: SQL Vector Search with MemCP<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">While the Node.js prototype loops through each post in a for loop, that doesn\u2019t scale well. Normally, we\u2019d reach for a vector database like Pinecone, Qdrant, or the upcoming MySQL vector extension to make use of a vector index.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">But MySQL\u2019s vector support isn&#8217;t released yet \u2014 so I tried something new: <a class=\"\" href=\"https:\/\/memcp.org\">MemCP<\/a>, an in-memory SQL engine with vector support.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">\u2705 The Goal<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Replace this Node.js loop:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>for (const post of posts) {\n  const distance = cosineSimilarity(queryVector, post.vector);\n  \/\/ store or rank the result\n}<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">With a SQL query like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>SELECT ID, post_title, url\nFROM posts\nORDER BY VECTOR_DISTANCE(vector, STRING_TO_VECTOR(?)) ASC\nLIMIT 5;<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>?<\/code> is filled the search embedding vector<\/li>\n\n\n\n<li><code>VECTOR_DISTANCE()<\/code> is a new function that computes the distance between two vectors<\/li>\n\n\n\n<li><code>STRING_TO_VECTOR(?)<\/code> parses the search vector on the fly from a JSON-string into its internal format<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83e\uddea Why MemCP?<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>It\u2019s blazing fast (entirely in RAM)<\/li>\n\n\n\n<li>No external dependencies<\/li>\n\n\n\n<li>SQL syntax makes integration clean<\/li>\n\n\n\n<li>If I want, I can implement my whole REST Microservice inside the database with no extra need for an application server<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">\ud83d\udd27 Integration Plan<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Export embedded vectors from Node.js to CSV or JSON<\/li>\n\n\n\n<li>Load into MemCP using <code>memcp import posts.csv<\/code><\/li>\n\n\n\n<li>Replace the JS loop with a parameterized SQL query<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83c\udfc1 Summary<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Task<\/th><th>Time \/ Speed<\/th><\/tr><\/thead><tbody><tr><td>Embedding 650 posts<\/td><td>167,148 ms total<\/td><\/tr><tr><td>Avg. embedding time\/post<\/td><td>257 ms<\/td><\/tr><tr><td>Read+Embed speed<\/td><td>~16 KB\/s<\/td><\/tr><tr><td>In-memory vector search<\/td><td>21\u201351 ms per query<\/td><\/tr><tr><td>SQL vector search (MemCP)<\/td><td>(In progress, but promising)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udccc Takeaways<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Ollama + Node.js gives you working semantic search in a day.<\/li>\n\n\n\n<li>With a CPU like the Ryzen 9 7900X3D, even brute-force search is fast.<\/li>\n\n\n\n<li>MemCP offers a SQL-native path for scaling vector search without changing your app logic.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">If you\u2019re building internal tools, dashboards, or blog search, this is a surprisingly effective stack\u2014with zero cloud dependencies.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Semantic search is becoming easier to prototype thanks to lightweight models and tools like Ollama&#8217;s nomic-embed-text. I wanted to see how well a local solution would perform\u2014without relying on cloud services or heavyweight vector databases. So I built a tiny Node.js script that: Here\u2019s what I came up with: \ud83d\udda5\ufe0f Platform Specs \ud83d\udcda Dataset \u23f1\ufe0f&#8230;<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_editorskit_title_hidden":false,"_editorskit_reading_time":0,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","_uag_custom_page_level_css":"","footnotes":""},"categories":[129],"tags":[],"class_list":["post-7644","post","type-post","status-publish","format-standard","hentry","category-memcp","single-item"],"featured_image_urls_v2":{"full":"","thumbnail":"","medium":"","medium_large":"","large":"","1536x1536":"","2048x2048":"","trp-custom-language-flag":"","xs-thumb":"","appku-shop-single":""},"post_excerpt_stackable_v2":"<p>Semantic search is becoming easier to prototype thanks to lightweight models and tools like Ollama&#8217;s nomic-embed-text. I wanted to see how well a local solution would perform\u2014without relying on cloud services or heavyweight vector databases. So I built a tiny Node.js script that: Embeds blog posts using Ollama model nomic-embed-text Stores the vectors in memory Ranks documents based on vector similarity Here\u2019s what I came up with: \ud83d\udda5\ufe0f Platform Specs CPU: AMD Ryzen 9 7900X3D (12 cores, 24 threads) RAM: 64 GB Disk: NVMe SSD GPU: not used (ollama on CPU only) Environment: Node.js (no DB, no external vector store)&hellip;<\/p>\n","category_list_v2":"<a href=\"https:\/\/launix.de\/launix\/category\/memcp\/\" rel=\"category tag\">MemCP<\/a>","author_info_v2":{"name":"Carl-Philip H\u00e4nsch","url":"https:\/\/launix.de\/launix\/author\/carli\/"},"comments_num_v2":"0 comments","uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false,"trp-custom-language-flag":false,"xs-thumb":false,"appku-shop-single":false},"uagb_author_info":{"display_name":"Carl-Philip H\u00e4nsch","author_link":"https:\/\/launix.de\/launix\/author\/carli\/"},"uagb_comment_info":0,"uagb_excerpt":"Semantic search is becoming easier to prototype thanks to lightweight models and tools like Ollama&#8217;s nomic-embed-text. I wanted to see how well a local solution would perform\u2014without relying on cloud services or heavyweight vector databases. So I built a tiny Node.js script that: Here\u2019s what I came up with: \ud83d\udda5\ufe0f Platform Specs \ud83d\udcda Dataset \u23f1\ufe0f...","_links":{"self":[{"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/posts\/7644","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/comments?post=7644"}],"version-history":[{"count":1,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/posts\/7644\/revisions"}],"predecessor-version":[{"id":7645,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/posts\/7644\/revisions\/7645"}],"wp:attachment":[{"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/media?parent=7644"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/categories?post=7644"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/launix.de\/launix\/wp-json\/wp\/v2\/tags?post=7644"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}