Pošlite nám správu

Spojte sa s nami

Telefonné číslo

+421 948 490 415

Edit Template

How I solved nutrition aligned to diet problem using vector database

Tento článok je dostupný len v anglickom jazyku.

Arrival of ChatGPT and large language models recently has led to creation and growth of databases that store information in vector form. Vectors are represented as numerical arrays, on which linear algebra operations can be performed.

Traditional use of vector databases is for storing vectorized unstructured data. These are created by transforming input data (texts, images, video) into vectors while preserving the semantics of the data. Such vectors then allow better search using algorithms determining their similarity with query vector.

But, are there cases when it does make sense to store structured data into vector databases and take advantage of the similarity search? In this article we will try to use properties of vector databases to analyze data about nutrition and diet.

What is a vector database?

Vector database:

  • stores records as vectors with metadata,
  • allows similarity search in vector space,
  • enables bulk vector operations on the whole vector space (or subset of it).

There are different kinds of vector databases for AI with features like embedding functions, that convert unstructured data to embedding vector. For our needs, we will use traditional relational database PostgreSQL with vector extension called pgvector.

pgvector

Pgvector extension supports storing vectors in PostgreSQL database, querying for nearest neighbors with different methods (L2 distance, inner product, cosine distance, L1 distance, Hamming distance and Jaccard distance).

Running PostgreSQL with pgvector extension

				docker run -p5432:5432 -e POSTGRES_PASSWORD=pass pgvector/pgvector:pg18-trixie
			

Database is accessible on localhost:5432 with username postgresql and password pass. The extension is ready to use, but we have to enable it for our database schema:

				CREATE EXTENSION vector;
			

Structured Data

For our example we need two data structures:

Macronutrient Ratio

Macronutrient ratio. source: [www.ajcn.org]

Macronutrient ratio represents percentage of calories produced from macronutrients.

Diet data structure.

				CREATE TABLE IF NOT EXISTS diets (
  id bigserial PRIMARY KEY, 
  name varchar(50), 
  mr vector(3)
);
			

In order to be able to directly compare diet vector and nutrition vector, we have to convert percentage to grams according to formulas:

Macronutrient ratio in grams.

For Mediterranean diet:

Mediterranean diet MR in grams.

Resulting vector represents 1 kcal contained in 0.04 g protein, 0.095 g carbohydrates and 0.051 g fat with macronutrient ratio according to Mediterranean diet.

Here is a picture of how the diets are distributed (for better clarity I’m showing just two axes — carbohydrate and fat):

Diet MR in grams of fats and carbs per 1 kcal energy.

Nutrition Facts

source: nutritionix.com

Represents macronutrient weight and energy contained in one serving of food. In this case it’s a slice of pizza, so we’d better write down weight of food for provided macronutrients and calories.

Nutrition_facts data structure.

				CREATE TABLE nutritions_facts(
  id bigserial PRIMARY KEY, 
  name varchar(50),
  macronutrients vector(3), 
  calories int,
  weight int);
			

Nutrition facts for different foods

To compare food with diet, we have to calculate Macronutrient ratio of food. We just divide macronutrient vector by kcal from the table, e.g.:

Macronutrient ratio calculated from Nutrition facts (1 kcal in grams).

For Pizza Margherita:

Macronutrient ratio for Pizza Margherita (1 kcal in grams).

Food MR in grams of fats and carbs per 1 kcal energy.

Servings

To be able to navigate through macronutrient space, we also need to store informations about typical serving of food:

Servings data structure.

				CREATE TABLE servings (
    id bigserial PRIMARY KEY,
    name varchar(50),
    macronutrients vector(3),
    calories decimal,
    weight integer
);
			

Where data represent values of one serving.

Arithmetic Vector Operations

We can try to calculate the resulting vector with the pgvector with statement

				SELECT mr / '[4,4,9]'::vector AS mr FROM diets;
			

but database returns error, that it has not implemented such operation (vector / vector).

Let’s try other vector operations:

				-- This works
SELECT '[1,2,3]'::vector + '[0,1,2]'::vector;
-- This works
SELECT '[1,2,3]'::vector - '[0,1,2]'::vector;
-- Not supported...
SELECT '[1,2,3]'::vector / 4;
-- ... but this works
SELECT '[1,2,3]'::vector * array_to_vector(array[1/4::decimal,1/4::decimal,1/4::decimal], 3, true)::vector;
-- This works
SELECT '[1,2,3]'::vector * '[4,4,9]'::vector;
-- Not supported...
SELECT '[1,2,3]'::vector * 10;
			

It looks like, although presented as vector database, it doesn’t support all basic arithmetic vector operations. But as you can see, we can use some workarounds, so it doesn’t really hinder us to achieve what we need to, just the resulting statement is less readable.

So the solution for our macronutrient ratio vector calculation is:

				SELECT mr * array_to_vector(array[1/4::decimal,1/4::decimal,1/9::decimal],3,true) AS mr FROM diets;
			

NOTE: database seed statements already contain calculated values.

Similarity search methods

There are three most common methods used by vector databases:

Cosine similarity

Measures the cosine of the angle between two vectors in a vector space. It doesn’t take into account vector lengths, only direction.

Cosine of two vectors angle.

				SELECT 1 - ('[1,2,3]'::vector <=> '[2,3,4]'::vector);

-- 90 degrees
SELECT degrees(acos(1 - ('[0,1]'::vector <=> '[1,0]'::vector)));
			

Euclidean distance (L2 norm)

Measures distance between two points in n-dimensional space:

Distance between two vectors.

				SELECT '[1,2,3]'::vector <-> '[2,3,4]'::vector

-- distance is 1
SELECT '[0,1]'::vector <-> '[1,1]'::vector;
			

Dot product

Measures the degree of alignment between two vectors:

Dot product of two vectors.

				SELECT ('[1,2,3]'::vector <#> '[2,3,4]'::vector) * -1

-- dot product is 0 => vectors are perpendicular
SELECT ('[0,1]'::vector <#> '[1,0]'::vector) * -1;
			

Let’s prepare our data

				-- Diets. Macronutrient ratio is in grams per 1 kcal.
INSERT INTO diets (id, name, mr) VALUES (0, 'Ornish', '[0.045, 0.188, 0.008]');
INSERT INTO diets (id, name, mr) VALUES (1, 'Omni-Carb', '[0.04, 0.135, 0.033]');
INSERT INTO diets (id, name, mr) VALUES (2, 'DASH', '[0.05, 0.13, 0.031]');
INSERT INTO diets (id, name, mr) VALUES (3, 'AHA TLC', '[0.037, 0.138, 0.033]');
INSERT INTO diets (id, name, mr) VALUES (4, 'NHANES III', '[0.037, 0.13, 0.037]');
INSERT INTO diets (id, name, mr) VALUES (5, 'Omni-Protein', '[0.065, 0.11, 0.033]');
INSERT INTO diets (id, name, mr) VALUES (6, 'Omni-Unsat', '[0.04, 0.11, 0.044]');
INSERT INTO diets (id, name, mr) VALUES (7, 'Mediterranean', '[0.04, 0.095, 0.051]');
INSERT INTO diets (id, name, mr) VALUES (8, 'Zone', '[0.087, 0.095, 0.03]');
INSERT INTO diets (id, name, mr) VALUES (9, 'South Beach', '[0.065, 0.083, 0.046]');
INSERT INTO diets (id, name, mr) VALUES (10, 'Atkins', '[0.072, 0.022, 0.069]');

-- Nutrition facts. Macronutrients in grams for weight.
INSERT INTO nutrition_facts (id, name, macronutrients, calories, weight) VALUES (0, 'Ham and eggs', '[12.2,0.8,14.3]', 181, 100);
INSERT INTO nutrition_facts (id, name, macronutrients, calories, weight) VALUES (1, 'Scrambled eggs', '[12.45,2.2,11.91]', 166, 100);
INSERT INTO nutrition_facts (id, name, macronutrients, calories, weight) VALUES (2, 'Sausage', '[13,1,23]', 260, 100);
INSERT INTO nutrition_facts (id, name, macronutrients, calories, weight) VALUES (3, 'Bryndzove halusky', '[20,3,23]', 317, 100);
INSERT INTO nutrition_facts (id, name, macronutrients, calories, weight) VALUES (4, 'Fried cheese with french fries', '[12,32.8,15.6]', 319, 100);
INSERT INTO nutrition_facts (id, name, macronutrients, calories, weight) VALUES (5, 'Fried pork cutlet', '[19.81,15,30.01]', 411, 100);
INSERT INTO nutrition_facts (id, name, macronutrients, calories, weight) VALUES (6, 'Pizza margherita', '[4.5,18,9]', 170, 107);
INSERT INTO nutrition_facts (id, name, macronutrients, calories, weight) VALUES (7, 'Pancakes with jam', '[5.52,37.87,3.83]', 209, 100);
INSERT INTO nutrition_facts (id, name, macronutrients, calories, weight) VALUES (8, 'Tikka masala', '[28,11,24]', 365, 236);
INSERT INTO nutrition_facts (id, name, macronutrients, calories, weight) VALUES (9, 'Kung pao', '[24,17,10]', 240, 150);

-- Servings. All values represent one serving of food.
INSERT INTO servings (id, name, macronutrients, calories, weight) VALUES (0, 'Ham and eggs', '[24.4, 1.6, 28.6]', 361, 200)
INSERT INTO servings (id, name, macronutrients, calories, weight) VALUES (1, 'Scrambled eggs', '[12.4, 2.2, 11.9]', 166, 100)
INSERT INTO servings (id, name, macronutrients, calories, weight) VALUES (2, 'Sausage', '[15.6, 1.2, 27.6]', 316, 120)
INSERT INTO servings (id, name, macronutrients, calories, weight) VALUES (3, 'Bryndzove halusky', '[60.0, 9.0, 69.0]', 897, 300)
INSERT INTO servings (id, name, macronutrients, calories, weight) VALUES (4, 'Fried cheese with french fries', '[18.0, 49.2, 23.4]', 479, 150)
INSERT INTO servings (id, name, macronutrients, calories, weight) VALUES (5, 'Fried pork cutlet', '[39.6, 30.0, 60.0]', 818, 200)
INSERT INTO servings (id, name, macronutrients, calories, weight) VALUES (6, 'Tikka masala', '[19.04, 7.52, 16.32]', 253, 160)
INSERT INTO servings (id, name, macronutrients, calories, weight) VALUES (7, 'Kung pao', '[40.0, 28.25, 16.75]', 424, 250)
INSERT INTO servings (id, name, macronutrients, calories, weight) VALUES (8, 'Pancakes with jam', '[13.75, 94.75, 9.5]', 520, 250)
INSERT INTO servings (id, name, macronutrients, calories, weight) VALUES (9, 'Pizza margherita', '[20.16, 80.64, 40.32]', 766, 480)
			

Dietary target

First, we want to choose foods to be as close to target macronutrient ratio as possible. As we don’t care about vector length, we can sort foods according to their nutrient vectors’ direction. For this case, we use cosine similarity.

Food’s MR similarity with Mediterranean diet.

				SELECT name, 
  degrees(acos(1-((SELECT mr FROM diets WHERE id=7)
   <=> macronutrients 
    * array_to_vector(array[1/calories::decimal,1/calories::decimal,1/calories::decimal],3,true)))) AS angle
FROM nutrition_facts
ORDER BY angle;
			

NOTE: graphs used in this article consider [carbohydrate, fats] dimensions only, database statements take into account the whole macronutrient space.

Choosing Mediterranean diet as our target diet, we can see how close is carbs/fats food ratio to it. With the angle of 1.66 degrees, Pizza margherita is a typical Mediterranean diet food.

🍕 How many pizzas?

Let’s say our daily energy expenditure is 2000 kcal. We want to eat just enough food to cover it.

We have our healthy Mediterranean vector for 1 kcal:

Mediterranean MR in grams per 1 kcal.

to get 2000 kcal vector, we just multiply it by 2000:

Mediterranean MR in grams per 2000 kcal.

				SELECT mr * '[2000,2000,2000]'::vector FROM diets WHERE id = 7;
			

We’ll do it “healthy way” — we’ll eat just pizza the whole day… So how many grams of pizza should we eat?

One slice of pizza contains 170 kcal. Simply by dividing 2000 kcal by the energy contained in one slice of pizza, we get number of pizza slices to eat:

How many slices of pizza yield 2000 kcal?

				SELECT 2000/calories::decimal FROM nutrition_facts WHERE id = 6;
			

Or we can calculate its weight (one pizza slice weighs 107 g):

How many grams of pizza yield 2000 kcal?

				SELECT (2000/calories::decimal) * weight FROM nutrition_facts WHERE id = 6;
			

🫣 More than 1 kg of pizza daily…

What is the macronutrient structure of 1.2 kg Pizza Margherita?

Macronutrient structure of 2000 kcal pizza.

				SELECT macronutrients * array_to_vector(array[1/calories::decimal, 1/calories::decimal, 1/calories::decimal], 3, true) * '[2000,2000,2000]' 
FROM nutrition_facts 
WHERE id = 6;
			

Comparing with [80,190,102] we are close enough to our target, missing 28 g of protein, with 22 g more carbs and just about the right amount of fat.

Daily Nutrition

But we don’t want to eat only pizza — do we? What if we decide to eat what we want? E.g. ham and eggs for breakfast and pancakes for dinner? How does it project onto our decision to eat Mediterranean diet (from the fats/carbs perspective)?

Now we need our vectors to represent one serving of food. So we have to divide values in our food table by its weights (normalized 100 g or one serving — like the pizza slice).

Ham and eggs for breakfast:

Macronutrient structure of 1g of Ham and eggs.

				SELECT macronutrients * array_to_vector(array[1/weight::decimal, 1/weight::decimal, 1/weight::decimal], 3, true) FROM nutrition_facts WHERE id = 0;
			

The serving will be 200 g:

Macronutrient structure of 200g of Ham and eggs.

				SELECT macronutrients * array_to_vector(array[1/weight::decimal, 1/weight::decimal, 1/weight::decimal], 3, true) * '[200,200,200]' FROM nutrition_facts WHERE id = 0;
			

Now the pancakes for dinner:

Macronutrient structure of 1g of Pancakes with jam.

				SELECT macronutrients * array_to_vector(array[1/weight::decimal, 1/weight::decimal, 1/weight::decimal], 3, true) FROM nutrition_facts WHERE id = 7;
			

The serving will be 250 g:

Macronutrient structure of 250g of Pancakes with jam.

				SELECT macronutrients * array_to_vector(array[1/100::decimal, 1/100::decimal, 1/100::decimal], 3, true) * '[250,250,250]' FROM nutrition_facts WHERE id = 7;
			

Pizza margherita:

Macronutrient structure of 1g of Pizza margherita.

				SELECT macronutrients * array_to_vector(array[1/weight::decimal, 1/weight::decimal, 1/weight::decimal], 3, true) FROM nutrition_facts WHERE id = 6;
			

The serving is 480 g:

Macronutrient structure of 480g of Pizza margherita.

				SELECT macronutrients * array_to_vector(array[1/weight::decimal, 1/weight::decimal, 1/weight::decimal], 3, true) * '[480,480,480]' FROM nutrition_facts WHERE id = 6;
			

🍳 Breakfast

Our daily target represents 2000 kcal of Mediterranean diet. In the direction of our target vector is closest Pizza margherita:

Our starting point.

But we decide to eat Ham and eggs! Our macronutrient intake is now:

Macronutrients in 200g of Ham and Eggs [g].

Target vector is now:

New target after breakfast.

				SELECT 
  (SELECT mr * '[2000,2000,2000]' 
    FROM diets d 
    WHERE d.id = 7)
- (SELECT macronutrients * array_to_vector(array[1/weight::decimal, 1/weight::decimal, 1/weight::decimal], 3, true) * '[200,200,200]' 
    FROM nutrition_facts 
    WHERE id = 0);
			

🥪 Lunch

After eating ham and eggs for breakfast, our situation looks like this:

We diverted from direct path to our target, so now we have a slightly different target vector direction. But pizza margherita is still closest to it — let’s eat it:

Macronutrients in 480g of Pizza Margherita [g].

Now we have consumed:

Macronutrients until now (breakfast, lunch) [g].

				SELECT 
  (SELECT macronutrients * array_to_vector(array[1/weight::decimal, 1/weight::decimal, 1/weight::decimal], 3, true) * '[200,200,200]' 
    FROM nutrition_facts 
    WHERE id = 0)
+ (SELECT macronutrients * array_to_vector(array[1/weight::decimal, 1/weight::decimal, 1/weight::decimal], 3, true) * '[480,480,480]' 
    FROM nutrition_facts 
    WHERE id = 6);
			

And our target vector is:

Remaining macronutrients target vector after lunch.

				SELECT 
  (SELECT mr * '[2000,2000,2000]' 
    FROM diets d 
    WHERE d.id = 7)
- (SELECT macronutrients * array_to_vector(array[1/weight::decimal, 1/weight::decimal, 1/weight::decimal], 3, true) * '[200,200,200]' 
    FROM nutrition_facts 
    WHERE id = 0)
- (SELECT macronutrients * array_to_vector(array[1/weight::decimal, 1/weight::decimal, 1/weight::decimal], 3, true) * '[480,480,480]' 
    FROM nutrition_facts 
    WHERE id = 6);
			

🍝 Dinner

We want to eat for dinner something different, but it has to contribute to reaching our daily target as close as possible. Now we use different method to choose the dinner. After dinner we want to be as close as possible to our daily target.

Choosing dinner to complete our energy target.

				SELECT name, (macronutrients * array_to_vector(array[1/weight::decimal, 1/weight::decimal, 1/weight::decimal],3,true)) <->
(SELECT
  (SELECT mr * '[2000,2000,2000]'
    FROM diets d
    WHERE d.id = 7)
  - (SELECT macronutrients * array_to_vector(array[1/weight::decimal, 1/weight::decimal, 1/weight::decimal], 3, true) * '[200,200,200]'
    FROM nutrition_facts
    WHERE id = 0)
  - (SELECT macronutrients * array_to_vector(array[1/weight::decimal, 1/weight::decimal, 1/weight::decimal], 3, true) * '[480,480,480]'
    FROM nutrition_facts
    WHERE id = 6)) as distance
FROM nutrition_facts
ORDER BY distance ASC;
			

Now we have the choice to eat more pizza, or eat something different — and we choose pancakes, because we don’t want to eat pizza for the rest of the day.

Our target macronutrient deficit in grams is now:

Macronutrient deficit after lunch.

which is in calories:

Calories deficit after lunch.

				SELECT l1_distance('[0,0,0]'::vector, '[35.44,107.46,33.08]'::vector * '[4,4,9]'::vector);
			

The l1_distance is so called Manhattan distance, which effectively sums all the vector dimensions.

Now as we have chosen pancakes to satisfy our energy consumption, we have to find out the macronutrient structure of 869.31 kcal pancakes. 1 kcal macronutrients vector for pancakes is:

Pancakes macronutrient vector representing 1 kcal.

				SELECT macronutrients * array_to_vector(array[1/calories::decimal, 1/calories::decimal, 1/calories::decimal],3,true) FROM nutrition_facts WHERE id = 7;
			

We have to multiply this 1 kcal vector by target calories intake, which is:

Macronutrients content in 869.31 kcal of pancakes.

				SELECT (SELECT macronutrients * array_to_vector(array[1/calories::decimal, 1/calories::decimal, 1/calories::decimal],3,true) FROM nutrition_facts WHERE id = 7) * '[869.31,869.31,869.31]';
			

When we add these 869.31 kcal to our lunch, we end up with the total macronutrients intake of:

Total macronutrients intake at the end of the day.

				SELECT 
  (SELECT macronutrients * array_to_vector(array[1/weight::decimal, 1/weight::decimal, 1/weight::decimal], 3, true) * '[200,200,200]' 
    FROM nutrition_facts 
    WHERE id = 0)
+ (SELECT macronutrients * array_to_vector(array[1/weight::decimal, 1/weight::decimal, 1/weight::decimal], 3, true) * '[480,480,480]' 
    FROM nutrition_facts 
    WHERE id = 6)
+ (SELECT
    (SELECT macronutrients * array_to_vector(array[1/calories::decimal, 1/calories::decimal, 1/calories::decimal],3,true) 
      FROM nutrition_facts 
      WHERE id = 7)
   * '[869.31,869.31,869.31]');
			

Now our daily macronutrients deficit is:

Difference from Mediterranean diet target.

				SELECT
    (SELECT mr * '[2000,2000,2000]'::vector FROM diets WHERE id = 7)
- (SELECT
  (SELECT macronutrients * array_to_vector(array[1/weight::decimal, 1/weight::decimal, 1/weight::decimal], 3, true) * '[200,200,200]'
    FROM nutrition_facts
    WHERE id = 0)
  + (SELECT macronutrients * array_to_vector(array[1/weight::decimal, 1/weight::decimal, 1/weight::decimal], 3, true) * '[480,480,480]'
    FROM nutrition_facts
    WHERE id = 6)
  + (SELECT
    (SELECT macronutrients * array_to_vector(array[1/calories::decimal, 1/calories::decimal, 1/calories::decimal],3,true)
      FROM nutrition_facts
      WHERE id = 7)
     * '[869.31,869.31,869.31]'));
			

After our 2000 kcal total daily intake, we have deficit on proteins (13 g), surplus on carbohydrates (50 g) and deficit on fats (17 g) according to ideal Mediterranean macronutrient intake structure.

Possible solution to our gastronomy problem.

By the way, how many pancakes do we have to eat to cover the last 869.31 kcal? According to the food table, 100g of pancakes contain 209 kcals, so the formula is simple:

To fulfill our daily calories target, we have to eat 416 g of pancakes.

				SELECT 869.31 / (SELECT calories FROM nutrition_facts WHERE id = 7) * 100;
			

One pancake with jam weighs about 110 g, so it makes 4 pancakes for dinner.

Conclusion

In this article, we tried using a vector database with structured data, and it worked well. Although there are some things to be desired — such as more arithmetic vector operations (division, scalar multiplication) and syntax cleanup — that may be polished some time soon, similarity search is a useful addition to standard query operations.

When your problem domain involves working with numeric or quantifiable attributes and you need to use approximation methods, vector databases are a practical and effective way to get the job done.

Tento článok si môžete prečítať na našom Medium

O nás

Vitajte na našom blogu! Prinášame inovatívne a efektívne riešenia a delíme sa o naše skúsenosti, aby ste mohli rásť spolu s nami. 

Odporúčané články

Najnovšie články

  • All Post
  • Branding
  • Desktopové aplikácie
  • Development
  • Leadership
  • Management
  • Mobilné aplikácie
  • Nezaradené
  • Projekty
  • Webové aplikácie

Zistite viac o našej spoločnosti

Navrhujeme softvérové riešenia, ktoré posunú váš biznis vpred!

CORETEQ Technology s.r.o.