import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

import pandas as pd
from langchain_groq.chat_models import ChatGroq
from PIL import Image
import pytesseract

Groq_Token = "your_groq_api_key_here" 

groq_models = {
    "llama3-70b": "llama3-70b-8192",
    "mixtral": "mixtral-8x7b-32768",
    "gemma-7b": "gemma-7b-it",
    "llama3.1-70b": "llama-3.1-70b-versatile",
    "llama3-8b": "llama3-8b-8192",
    "llama3.1-8b": "llama-3.1-8b-instant",
    "gemma-9b": "gemma2-9b-it"
}

image_path = "img2_shot.jpg"  
image = Image.open(image_path)

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
extracted_text = pytesseract.image_to_string(image)

extracted_text

'Serving Size: 1 Tablet (0.709 g) | Each serving contains (Approx. Values):\n\nIngredient Oty. / Serving\n\n*PHOSPHOcomplex® Silybin (Sillybum marianum) 200 mg\nDandelion (Taraxacum officinale) leaf extract - 10:1 100 mg\nKutki (Picrorhiza kurroa)rhizome extract - 0.5% Bitters 50 mg\nKasani (Cichorium intybus) seed extract - 1% Bitters 25 mg\nPunarnava (Boerhavia diffusa) root extract - 0.07% alkaloids 25 mg\nBhui amla (Phyllanthus amarus) WP extract - 0.5% Bitters 25 mg\nAmla (Phyllanthus emblica) fruit extract - 10% Tannins 25 mg\nLicorice (Glycyrrhiza glabra) root extract - 5% Glycyrrhizin 25 mg\nVitamin E 10 mg\nPiper nigrum fruit extract — 95% Piperine 5mg\n\nNutrients Qty. / Serving\n\nEnergy 3.04 kcal\nCarbohydrate 051g\n(Sugars) 02g\nProtein 0.049\nFat 0.09 g\n\n"ZRDA values established as per ICMR 2010 for sedentary lifestyle-Men.\n**Z RDA not established by ICMR\n\n'

query = f"""
* You are an information extraction model.
* Your task is to analyze the extracted text and extract relevant information such as weight or height.
* Provide the extracted information along with a brief explanation of your reasoning.

Extracted Text: {extracted_text}
"""

model_name = "llama3-70b"  
llm = ChatGroq(model=groq_models[model_name], api_key=Groq_Token, temperature=0)
answer = llm.invoke(query)

print(answer.content)

After analyzing the extracted text, I have identified the following relevant information related to weight and other measurements:

* **Weight:** The serving size is 1 tablet, which weighs 0.709 g.
* **Height:** No information about height is mentioned in the extracted text.

Additionally, I have extracted other relevant information related to the nutritional content of the supplement:

* **Energy:** 3.04 kcal per serving
* **Carbohydrate:** 0.51 g per serving
* **Sugars:** 0.02 g per serving
* **Protein:** 0.049 g per serving
* **Fat:** 0.09 g per serving

My reasoning is based on the explicit mention of these values in the "Nutrients Qty. / Serving" section of the extracted text.

!pip install pytesseract

Defaulting to user installation because normal site-packages is not writeable
Collecting pytesseract
  Downloading pytesseract-0.3.13-py3-none-any.whl.metadata (11 kB)
Requirement already satisfied: packaging>=21.3 in c:\users\dharm\appdata\roaming\python\python38\site-packages (from pytesseract) (24.1)
Requirement already satisfied: Pillow>=8.0.0 in c:\users\dharm\appdata\roaming\python\python38\site-packages (from pytesseract) (8.0.1)
Downloading pytesseract-0.3.13-py3-none-any.whl (14 kB)
Installing collected packages: pytesseract
Successfully installed pytesseract-0.3.13

WARNING: Ignoring invalid distribution -orch (c:\users\dharm\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -orch (c:\users\dharm\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -orch (c:\users\dharm\appdata\roaming\python\python38\site-packages)
WARNING: Ignoring invalid distribution -orch (c:\users\dharm\appdata\roaming\python\python38\site-packages)

[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip

import pandas as pd
from langchain_groq.chat_models import ChatGroq
from PIL import Image
import pytesseract

Groq_Token = "your_groq_api_key_here" 

groq_models = {
    "llama3-70b": "llama3-70b-8192",
    "mixtral": "mixtral-8x7b-32768",
    "gemma-7b": "gemma-7b-it",
    "llama3.1-70b": "llama-3.1-70b-versatile",
    "llama3-8b": "llama3-8b-8192",
    "llama3.1-8b": "llama-3.1-8b-instant",
    "gemma-9b": "gemma2-9b-it"
}

image_path = "img2_shot.jpg"  
image = Image.open(image_path)

import pytesseract

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

extracted_text = pytesseract.image_to_string(image)

extracted_text

'Serving Size: 1 Tablet (0.709 g) | Each serving contains (Approx. Values):\n\nIngredient Oty. / Serving\n\n*PHOSPHOcomplex® Silybin (Sillybum marianum) 200 mg\nDandelion (Taraxacum officinale) leaf extract - 10:1 100 mg\nKutki (Picrorhiza kurroa)rhizome extract - 0.5% Bitters 50 mg\nKasani (Cichorium intybus) seed extract - 1% Bitters 25 mg\nPunarnava (Boerhavia diffusa) root extract - 0.07% alkaloids 25 mg\nBhui amla (Phyllanthus amarus) WP extract - 0.5% Bitters 25 mg\nAmla (Phyllanthus emblica) fruit extract - 10% Tannins 25 mg\nLicorice (Glycyrrhiza glabra) root extract - 5% Glycyrrhizin 25 mg\nVitamin E 10 mg\nPiper nigrum fruit extract — 95% Piperine 5mg\n\nNutrients Qty. / Serving\n\nEnergy 3.04 kcal\nCarbohydrate 051g\n(Sugars) 02g\nProtein 0.049\nFat 0.09 g\n\n"ZRDA values established as per ICMR 2010 for sedentary lifestyle-Men.\n**Z RDA not established by ICMR\n\n'

query = f"""
* You are an information extraction model.
* Your task is to analyze the extracted text and extract relevant information such as weight or height.
* Provide the extracted information along with a brief explanation of your reasoning.

Here are a few examples:
1. Extracted Text: 'The bottle weighs 500g.'
   Extracted Information: 500g 

2. Extracted Text: 'The height of the box is 25 cm.'
   Extracted Information: 25 cm 

3. Extracted Text: 'Net weight: 709 gm.'
   Extracted Information: 709 gm 

Extracted Text: {extracted_text}
"""

model_name = "llama3-70b" 
llm = ChatGroq(model=groq_models[model_name], api_key=Groq_Token, temperature=0)
answer = llm.invoke(query)

print(answer.content)

Extracted Text: Serving Size: 1 Tablet (0.709 g) | Each serving contains (Approx. Values):

...

Extracted Information:

* Weight: 0.709 g (extracted from the serving size information, which specifies the weight of one tablet)

Reasoning: The extracted text explicitly mentions the weight of one tablet as 0.709 g, which is a direct measurement of weight.

image_path = "img1_shot.jpg" 
image = Image.open(image_path)

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
extracted_text = pytesseract.image_to_string(image)

extracted_text

'Product Name Harvest Festival Doll\n\n15X9X26CM\n52x42x40CM\nPolyester material\n\nCraftsmanship Manual\n\nGrey . Orange\n\nProduct Wei 158g\n\n'

query = f"""
* You are an information extraction model.
* Your task is to analyze the extracted text and extract relevant information such as weight or height.
* Provide the extracted information along with a brief explanation of your reasoning.

Here are a few examples:
1. Extracted Text: 'The bottle weighs 500g.'
   Extracted Information: 500g 

2. Extracted Text: 'The height of the box is 25 cm.'
   Extracted Information: 25 cm 

3. Extracted Text: 'Net weight: 709 gm.'
   Extracted Information: 709 gm 

Extracted Text: {extracted_text}
"""

model_name = "llama3-70b"  
llm = ChatGroq(model=groq_models[model_name], api_key=Groq_Token, temperature=0)
answer = llm.invoke(query)

print(answer.content)

After analyzing the extracted text, I found the following relevant information:

Extracted Information: 
- Height: 26CM (from the dimension 15X9X26CM)
- Weight: 158g (from the text "Product Wei 158g")

Reasoning:
- I extracted the height by identifying the dimension "15X9X26CM" which represents the length, width, and height of the product, respectively. 
- I extracted the weight by identifying the text "Product Wei 158g" which explicitly mentions the weight of the product.

Feature	Zero-Shot Prompting	Few-Shot Prompting
Examples	No examples provided	Few task-specific examples are provided
Model Training	Relies solely on pre-trained knowledge	Uses pre-trained knowledge + prompt examples
Task Familiarity	Useful for general tasks	Better for more complex or specific tasks
Flexibility	High flexibility but can be less accurate	More accurate with task examples

Shot Prompting¶

Zero-Shot Prompting¶

Few-Shot Prompting¶

Image1¶

Doing it for another image¶