๐Ÿค–
AI

PABLO AI

Automotive Parts Intelligence Platform

๐Ÿค–

Pablo AI Assistant

Automotive Parts Expert โ€ข Always Online

๐Ÿค–

๐Ÿ‘‹ Hi! I'm Pablo, your AI automotive parts expert. I'm trained on 577+ million part fitment records!

I can help you find the perfect parts for your vehicle, verify fitment, and answer any automotive questions. What are you working on today?

Pablo AI โ€ข Just now

I need an oil filler cap for my 1986 Hyundai Excel with the 1.5L engine

You โ€ข Just now
๐Ÿ‘ค
๐Ÿค–

๐ŸŽฏ Great! I found 4 compatible oil filler caps for your 1986 Hyundai Excel 1.5L:

๐Ÿ”ง
Beck/Arnley 016-0050 Best Match

Engine Oil Filler Cap โ€ข OE Exact Grade

$8.99 Fits 326 applications
๐Ÿ”ง
Gates 31085 Premium

Engine Oil Filler Cap โ€ข Twist Lock

$12.49 Fits 546 applications

๐Ÿ’ก Pro Tip: The Beck/Arnley 016-0050 has the exact OE specifications (1.41" neck diameter, twist-lock type) for your Excel's 1.5L SOHC engine.

Pablo AI โ€ข Just now โ€ข Searched 577M records in 0.3s

๐Ÿ“Š Real-time Metrics

Response Accuracy 99.2%
Avg Response Time 0.3s
Fitment Precision 99.7%

๐Ÿง  Knowledge Base

577M
Fitment Records
2.1M
Part Numbers
45K
Vehicle Models
850+
Brands
๐Ÿ—๏ธ

Pablo's Brain: How It Works

Understanding the LLM Architecture in Simple Terms

๐Ÿฝ๏ธ Think of Pablo Like a Super-Smart Restaurant Waiter

Imagine a restaurant waiter who has memorized every dish from every restaurant in the world, knows every ingredient, and can instantly tell you what dishes match your dietary needs. That's Pablo โ€” but for auto parts.

When you ask "I need an oil cap for my 1986 Hyundai Excel," Pablo doesn't just randomly guess. It uses a sophisticated system with three main parts working together, just like how a great restaurant has a host, waiter, and chef working as a team.

๐Ÿงฉ The Three Parts of Pablo's Brain

๐Ÿง 

Part 1: The Brain (Llama 3.1 70B Model)

This is the core "thinking" part of Pablo. It's a Large Language Model (LLM) created by Meta called Llama 3.1 with 70 billion parameters. Think of parameters like brain cells โ€” more parameters means more capacity to understand and generate complex responses.

What Does "70 Billion Parameters" Mean?

Imagine a massive spreadsheet with 70 billion numbers. Each number has been carefully adjusted during training so that when you feed text in, the model produces intelligent responses. These numbers encode everything the model has learned about language, logic, and knowledge.

70B
Parameters
128K
Context Window
~140GB
Model Size

๐ŸŽฏ Why Llama 3.1 70B? It's the sweet spot between being smart enough to understand complex automotive questions and being small enough to run on our hardware. Larger models (like 405B) are smarter but require massive data centers. Smaller models (8B) are faster but less accurate.

๐Ÿ“š

Part 2: The Library (RAG Vector Database)

Even with 70 billion parameters, the brain can't memorize ALL 577 million parts. So we give it a "reference library" it can search instantly. This technique is called RAG (Retrieval-Augmented Generation).

How RAG Works โ€” A Simple Analogy:

Imagine you're taking an open-book exam. You're smart and know the concepts, but you're allowed to look up specific facts in your textbook. RAG works the same way:

1

User asks: "Oil cap for 1986 Hyundai Excel 1.5L"

2

System searches 577M records and finds the 10 most relevant parts

3

These 10 parts are given to the brain along with the question

4

The brain crafts a helpful response using this specific information

Vector Database

We use Pinecone โ€” a specialized database that can search through 577 million records in under 50 milliseconds by comparing mathematical "fingerprints" of text.

Embeddings

Each part record is converted into a 1024-number "fingerprint" using the E5-large-v2 model. Similar parts have similar fingerprints, making search fast and accurate.

๐ŸŽ“

Part 3: The Specialized Training (QLoRA Fine-tuning)

The base Llama model is smart but doesn't know anything about auto parts. Fine-tuning is how we teach it automotive expertise by showing it 150,000 example conversations about parts, fitment, and vehicle specifications.

The "Sticky Notes" Technique (QLoRA):

Retraining all 70 billion parameters would take weeks and cost hundreds of thousands of dollars. Instead, we use a clever technique called QLoRA:

Imagine the base model as a massive filing cabinet with 70 billion folders. Instead of rewriting all folders, we attach small "sticky notes" (about 400 million of them) to key folders. When the AI looks something up, it reads both the main folder AND the sticky note.

This means we only train 0.5% of the model, making it possible to train on 4 consumer GPUs instead of a massive data center!

LoRA Rank
64
Adapter size
Training Data
150K
Q&A pairs
Training Time
~72 hrs
On 4ร— RTX 4090

๐Ÿ”„ How All Three Parts Work Together

When a customer asks: "I need an oil filler cap for my 1986 Hyundai Excel 1.5L"

๐Ÿ’ฌ
Customer Query
Natural language
โ†’
๐Ÿ”
Extract Info
Year, Make, Model
โ†’
๐Ÿ“š
Search Library
Find 10 best parts
โ†’
๐Ÿง 
Brain Processes
Llama 3.1 + QLoRA
โ†’
โœ…
Smart Response
Parts + advice

๐ŸŽฏ The Result:

In about 0.3 seconds, Pablo combines its specialized automotive training, searches through 577 million parts, and generates a helpful, accurate response that recommends the right oil filler cap (Beck/Arnley 016-0050) with detailed specifications, alternative options, and pro tips โ€” just like having an expert parts specialist available 24/7.

โš™๏ธ Technical Specifications Summary

๐Ÿง 

Base Model

ModelLlama 3.1 70B
Parameters70 Billion
Context Window128K tokens
PrecisionBF16 / INT4
๐ŸŽฏ

Fine-tuning

MethodQLoRA
LoRA Rank64
LoRA Alpha128
Training Data150K Q&A
๐Ÿ“Š

RAG System

EmbeddingE5-large-v2
Vector DBPinecone
Total Vectors577M
Top-K Retrieval10
๐Ÿ“š

How We Train Pablo AI

A Simple Guide to Understanding LLM Training

๐ŸŽ“ Think of it Like Training a New Employee

Imagine you hired a brilliant new employee who has read millions of books and websites โ€” they're incredibly smart and can write, code, and answer questions about almost anything. But they know nothing about auto parts.

That's exactly what the base Llama 3.1 model is like. It's incredibly intelligent, but it doesn't know that a 016-0050 oil filler cap fits a 1986 Hyundai Excel.

Training is how we teach this brilliant employee everything about automotive parts โ€” which parts fit which vehicles, what the part numbers mean, how to help customers find the right parts.

๐Ÿ”„ The Complete Training Journey

1

๐Ÿ“ฅ Gathering the Knowledge (Raw Data Collection)

First, we collect ALL the automotive knowledge that exists. This comes from ACES files โ€” the industry-standard format that every auto parts manufacturer uses to share their catalog data.

Real ACES Data Example:

When Beck/Arnley says "our part 016-0050 fits the 1986 Hyundai Excel", it looks like this in their catalog file:

<App id="5223">
  <BaseVehicle id="139" />   โ† This number means "1986 Hyundai Excel"
  <EngineBase id="1062" />   โ† This means "1.5L 4-cylinder SOHC engine"
  <PartType id="5269" />     โ† This means "Oil Filler Cap"
  <Part>016-0050</Part>        โ† The actual part number
</App>

We have 577 million of these records from 850+ brands!

577M
Fitment Records
2.1M
Unique Parts
45K
Vehicle Types
~500GB
Raw Data
2

๐Ÿงน Making the Data Human-Readable (Data Processing)

Those cryptic codes like BaseVehicle id="139" mean nothing to a human or AI. We need to translate them into actual vehicle and part information by joining with reference databases.

Before โ†’ After Translation:
โŒ Raw (Unreadable):
BaseVehicle: 139
EngineBase: 1062
PartType: 5269
Part: 016-0050
โœ… Processed (Human-Readable):
Make: HYUNDAI
Model: EXCEL  
Year: 1986
Engine: 1.5L L4 SOHC
Product: Engine Oil Filler Cap
Part Number: 016-0050
Brand: Beck/Arnley

This processed data goes into our euro table โ€” a massive database with 50+ columns of vehicle and part attributes for every fitment record.

3

โœ๏ธ Creating Lesson Plans (Q&A Dataset Generation)

Now here's the key insight: you can't just show an AI raw data and expect it to learn. You need to create example conversations โ€” showing the AI "when a customer asks THIS, you should respond like THIS."

Training Example (Q&A Pair):
๐Ÿ‘ค USER (Input):

"I need an oil filler cap for my 1986 Hyundai Excel with a 1.5 liter engine"

๐Ÿค– PABLO (Expected Output):

"For your 1986 Hyundai Excel 1.5L SOHC, I recommend the Beck/Arnley 016-0050 Engine Oil Filler Cap. This is an OE-exact replacement with a twist-lock design, 1.41" neck diameter. It fits all Excel trims (Base, GL, GLS) for the USA and Canadian markets. The Gates 31085 is also compatible if you prefer an alternative brand."

We generate 150,000+ of these training examples programmatically from our 577M records!

๐Ÿ’ก Analogy: This is like creating a training manual with 150,000 example customer conversations. "When a customer says X, the perfect response is Y."

4

๐Ÿ”ข Converting Text to Numbers (Tokenization)

Computers don't understand words โ€” they only understand numbers. We need to convert every word into tokens (numbers) that the AI can process.

Tokenization Example:
Original text:

"oil filler cap for Hyundai Excel"

After tokenization:

[8435, 1254, 6623, 369, 19478, 35621]

The Llama tokenizer has a vocabulary of 128,000 tokens, covering every word and subword it might encounter.

5

๐Ÿง  The Actual Training (Fine-tuning with QLoRA)

This is where the magic happens. We load the base Llama 3.1 70B model and teach it using our 150,000 Q&A examples. But here's the clever part โ€” we don't modify the entire 70 billion parameters (that would require massive computing power). Instead, we use a technique called QLoRA.

๐Ÿ’ก The QLoRA Technique Explained Simply:

Imagine the base model as a massive filing cabinet with 70 billion folders (parameters). Rewriting all those folders would take forever.

Instead, QLoRA adds a small "sticky note system" โ€” about 400 million small notes attached to key folders. When the AI looks something up, it checks the main folder AND reads the sticky note. The sticky notes contain all our automotive knowledge!

This means we only need to train 0.5% of the model (the sticky notes), making it possible to train on consumer GPUs instead of requiring a massive data center.

๐Ÿ”ง Training Configuration:
Base Model
Llama 3.1 70B
Loaded in 4-bit (~35GB)
LoRA Adapters
Rank 64, Alpha 128
~400M trainable params
Training Data
150,000 Q&A pairs
3 epochs (passes)
6

โš™๏ธ What Happens Inside During Training

During each training step, the computer does this cycle millions of times:

A
Show Example

Feed one Q&A pair: "User asks about Excel oil cap โ†’ Expected response about 016-0050"

B
Model Attempts

The AI generates its own response based on current knowledge

C
Calculate Error

Compare AI's response to expected response โ€” how different are they?

D
Adjust "Sticky Notes"

Slightly update the LoRA adapter weights to reduce the error

E
Repeat!

Do this for all 150,000 examples, 3 times (epochs) = ~450,000 update cycles

๐ŸŽฏ Result: After 72 hours, the LoRA adapters have been tuned so precisely that the AI now understands automotive parts as well as your best employee โ€” but it can answer instantly and never forgets!

7

๐Ÿ“Š Adding the Reference Library (RAG Vector Database)

Even after training, the AI can't memorize ALL 577 million parts. So we give it a "reference library" it can search instantly. This is called RAG (Retrieval-Augmented Generation).

How RAG Works:
1๏ธโƒฃ

Convert each fitment record into a numerical "fingerprint" (embedding vector)

2๏ธโƒฃ

Store all 577M fingerprints in a vector database (Pinecone)

3๏ธโƒฃ

When user asks a question, find the 10 most relevant records instantly

4๏ธโƒฃ

Feed those records to the AI along with the question

This way, Pablo can access any of the 577 million records in under 50 milliseconds, without needing to memorize them all!

๐ŸŽฏ The Complete Training Pipeline

๐Ÿ“ฅ
Raw ACES
500GB XML
โ†’
๐Ÿงน
Process
577M records
โ†’
โœ๏ธ
Generate Q&A
150K pairs
โ†’
๐Ÿ”ข
Tokenize
Numbers
โ†’
๐Ÿง 
Train (72hrs)
4ร— RTX 4090
โ†’
๐Ÿ“Š
RAG Index
577M vectors
โ†’
๐Ÿค–
Pablo AI
Ready!
๐Ÿ–ฅ๏ธ

Local GPU Training Setup

Train on Your Own Hardware โ†’ Deploy to Cloud

๐Ÿ† RECOMMENDED

Best Performance Setup

GPU
4ร— RTX 4090
96GB Total VRAM
CPU
Threadripper PRO
5975WX 32-Core
RAM
256GB DDR5
ECC 5600MHz
Storage
4TB NVMe
PCIe Gen5
~72 hours
Training Time
~1,800W
Peak Power
~$15,000
GPU Cost
๐Ÿ’ฐ BUDGET

Cost-Effective Alternative

2ร— RTX 4090
48GB VRAM
Ryzen 9 7950X
16-Core
128GB DDR5
Non-ECC
~144 hours
Training Time

๐Ÿ“Š GPU Memory Usage

GPU 022.1GB / 24GB
GPU 121.8GB / 24GB
GPU 222.3GB / 24GB
GPU 321.9GB / 24GB

โš™๏ธ Training Config

MethodQLoRA (4-bit)
LoRA Rank64
Batch Size1 ร— 8 grad accum
Learning Rate2e-4
Epochs3

๐Ÿ› ๏ธ Software Stack

Training

PyTorch2.2.0+cu121
Transformers4.40.0
PEFT0.10.0
bitsandbytes0.43.0

CUDA

CUDA12.1
cuDNN8.9.7
Flash Attention2.5.0
Driver545.xx

Inference

vLLM0.4.0
AutoAWQ0.2.0
FastAPI0.110.0
LangChain0.1.10
โ˜๏ธ

AWS Inference (Cost-Optimized)

Train Locally โ†’ Deploy to AWS for Inference Only

๐Ÿ’ก

Cost Optimization Strategy

By training locally on your own hardware, you eliminate expensive GPU training costs on AWS. Cloud is used purely for scalable inference.

Training
LOCAL
One-time HW cost
Inference
AWS
Pay-per-use
Savings
~85%
vs full cloud training

๐Ÿ—๏ธ Minimum Viable Infrastructure

๐Ÿค–
LLM Inference
1ร— g5.12xlarge
4ร— A10G (96GB)
~$5.67/hr
โšก
API Workers
2ร— t3.large
Auto-scaling
~$0.17/hr
๐Ÿ—ƒ๏ธ
Database
RDS db.t3.large
PostgreSQL
~$0.15/hr
๐Ÿ“Š
Vector DB
Pinecone s1
Starter Pod
~$70/mo

๐Ÿ’ฐ Monthly Cost (On-Demand)

g5.12xlarge (730 hrs)$4,139
API Workers$248
RDS PostgreSQL$109
Pinecone$70
S3 + Transfer$50
Total ~$4,616/mo

๐Ÿ’ก Cost Reduction Options

Spot Instances (~60-70% off)
GPU: ~$1,500/mo
Reserved Instances (~40% off)
1-year commitment: ~$2,500/mo
Auto-Scaling
Scale down during low traffic
โšก Optimized Total: ~$2,000/mo
๐Ÿ“…

Comprehensive Development Plan

10-Week Sprint Plan with Detailed Deliverables

10
Weeks Total
5
Major Phases
20+
Deliverables
MVP
Final Goal

๐Ÿ“Š Project Timeline Overview

Phase 1
W1-2
Phase 2
W3-4
Phase 3
W5-6
Phase 4
W7-8
Phase 5
W9-10
๐Ÿ“… Weeks 1-2

Phase 1: Research & Planning

Foundation Setup & Architecture Design

14 Days
~80 Hours

๐Ÿ“‹ Week 1: Data Assessment

โœ“
ACES/VCdb Data Audit
Inventory all 500GB+ of raw XML files, validate data quality, identify missing brands
โœ“
Database Schema Analysis
Review existing euro table structure, plan optimizations for 577M records
โœ“
Sample Q&A Generation Test
Create 100 sample training pairs to validate approach

๐Ÿ—๏ธ Week 2: Architecture Design

โœ“
LLM Model Selection
Benchmark Llama 3.1 70B vs 8B, finalize QLoRA approach
โœ“
RAG Architecture Design
Design vector DB schema, embedding strategy, retrieval pipeline
โœ“
Hardware Procurement
Order 4ร— RTX 4090 GPUs, verify workstation specs
๐Ÿ“ฆ Phase 1 Deliverables:
โœ“Data Quality Report
โœ“Architecture Document
โœ“Hardware Setup Complete
๐Ÿ“… Weeks 3-4

Phase 2: Data Pipeline & Vector Database

Process 577M Records & Build RAG Infrastructure

14 Days
~100 Hours

๐Ÿ”„ Week 3: Data Processing

โœ“
ACES XML Parser Development
Build Python parser to extract all App records from XML files
โœ“
VCdb/PCdb Join Pipeline
Translate IDs to human-readable values (Make, Model, Year, etc.)
โœ“
Euro Table Population
Load all 577M processed records into PostgreSQL

๐Ÿ“Š Week 4: Vector Database

โœ“
Embedding Generation Pipeline
Convert 577M records to E5-large-v2 embeddings (batch processing)
โœ“
Pinecone Index Setup
Create index, configure metadata filters, upload vectors
โœ“
RAG Retrieval Testing
Validate search accuracy with 1000 test queries
๐Ÿ“ฆ Phase 2 Deliverables:
โœ“577M Records in PostgreSQL
โœ“577M Vectors in Pinecone
โœ“RAG Pipeline Working
๐Ÿ“… Weeks 5-6

Phase 3: Model Fine-tuning (LOCAL)

Train Pablo AI on 4ร— RTX 4090 GPUs

14 Days
~72 hrs training

โœ๏ธ Week 5: Dataset Preparation

โœ“
Q&A Dataset Generation
Generate 150,000 training pairs from euro table data
โœ“
Dataset Formatting (JSONL)
Format for Llama instruction tuning, add system prompts
โœ“
Training Environment Setup
Install PyTorch, PEFT, bitsandbytes, configure multi-GPU

๐Ÿง  Week 6: Training & Export

โœ“
QLoRA Fine-tuning (~72 hours)
Train on 4ร— RTX 4090, 3 epochs, monitor loss curves
โœ“
Model Evaluation
Test on 500 held-out examples, measure accuracy metrics
โœ“
AWQ Quantization
Convert to 4-bit for efficient deployment (~35GB)
๐Ÿ“ฆ Phase 3 Deliverables:
โœ“150K Training Dataset
โœ“Fine-tuned LoRA Adapters
โœ“AWQ Quantized Model
๐Ÿ“… Weeks 7-8

Phase 4: API Development & Integration

Build Production-Ready Backend Services

14 Days
~100 Hours

โšก Week 7: Inference Server

โœ“
vLLM Server Setup
Deploy model with tensor parallelism, optimize throughput
โœ“
RAG Pipeline Integration
Connect Pinecone retrieval with LLM generation
โœ“
Response Validation
Build fitment verification against ACES database

๐Ÿ”Œ Week 8: API & UI

โœ“
FastAPI Endpoints
REST API for chat, part lookup, fitment verification
โœ“
WebSocket Streaming
Real-time response streaming for chat interface
โœ“
Chat UI Development
React frontend with conversation history, product cards
๐Ÿ“ฆ Phase 4 Deliverables:
โœ“vLLM Inference Server
โœ“REST/WebSocket API
โœ“Chat UI Prototype
๐Ÿ“… Weeks 9-10

Phase 5: AWS Deployment & Launch

Production Deployment & Go-Live

14 Days
~80 Hours

โ˜๏ธ Week 9: Infrastructure

โœ“
AWS Infrastructure Setup
Deploy g5.12xlarge, RDS, ElastiCache via Terraform
โœ“
Model Upload to S3
Transfer 35GB quantized model, configure EBS volumes
โœ“
Auto-Scaling Configuration
Set up CloudWatch alarms, scaling policies

๐Ÿš€ Week 10: Launch

โœ“
Load Testing
Stress test with 100 concurrent users, optimize latency
โœ“
Monitoring & Logging
Set up CloudWatch dashboards, error tracking
โœ“
๐Ÿš€ Production Launch!
Go-live with Pablo AI MVP
๐Ÿ“ฆ Phase 5 Deliverables:
โœ“AWS Infrastructure Live
โœ“Monitoring Dashboard
โœ“๐ŸŽ‰ Pablo AI MVP Live!

โš ๏ธ Key Risks & Mitigation

Risk: GPU Hardware Delays
Mitigation: Order hardware in Week 1, have cloud backup (RunPod)
Risk: Training Quality Issues
Mitigation: Weekly eval checkpoints, prepared to extend Phase 3
Risk: Vector DB Costs
Mitigation: Start with Pinecone s1, scale if needed; pgvector as fallback
Risk: Data Quality Gaps
Mitigation: Phase 1 audit identifies gaps, prioritize top 80% brands

๐ŸŽฏ Success Metrics (MVP)

>95%
Fitment Accuracy
<1s
Response Time
100
Concurrent Users
99.5%
Uptime Target