Demo Mode

Project One: Hugging Face Project

:information_source: Project Overview Difficulty Level: Intermediate
Estimated Time: 2-3 hours
Skills Practiced:

JavaScript programming

AI API integration

WebGPU technology

Prompt engineering

Real-time web applications

Debugging and troubleshooting

Introduction to Hugging Face Platform

What is Hugging Face? Hugging Face is like the "GitHub for AI" - a platform where developers and researchers share AI models, datasets, and applications. Think of it as a library where you can borrow powerful AI tools instead of building them from scratch.

Key Features:

Models Hub: Thousands of pre-trained AI models for different tasks (text, images, audio)
Spaces: Interactive demos where you can test AI models in your browser
Transformers Library: Easy-to-use tools to integrate AI into your applications
Community: Collaborate with AI developers worldwide

Why Use Hugging Face? Instead of spending months training an AI model, you can use existing models that experts have already created and tested. It's like using a calculator app instead of building your own calculator!

SmolVLM Model: The model we'll use today is called "SmolVLM" (Small Vision Language Model). It's designed to be lightweight and run directly in web browsers while still being powerful enough to understand and describe images.

Project Roadmap

mermaid

graph LR
    A[🚀 Start] --> B[📋 Setup & Analysis]
    B --> C[💻 Code Completion]
    C --> D[🧪 Testing & Debug]
    D --> E[🔬 Experimentation]
    E --> F[✨ Enhancement]
    F --> G[🎯 Presentation]
    
    style A fill:#f9f,stroke:#333,stroke-width:2px
    style G fill:#9f9,stroke:#333,stroke-width:2px

Project Overview

You will build a real-time AI vision application that uses your webcam to see and describe the world around you. This project combines web development, AI integration, and user interface design to create a practical application that demonstrates how modern AI can be embedded into everyday tools.

Learning Objectives

Understand how AI vision models work
Learn to integrate AI APIs into web applications
Practice JavaScript programming and debugging
Experience prompt engineering for AI

Setup Requirements

:warning: Prerequisites Check Before starting, ensure you have:

:white_check_mark: Modern web browser with WebGPU support (Chrome 113+ or Safari Technology Preview)

:white_check_mark: Working webcam connected to your computer

:white_check_mark: HTTPS connection or localhost environment

:white_check_mark: Stable internet connection (for downloading AI models)

:white_check_mark: At least 2GB of free RAM for model loading

Not sure if your browser supports WebGPU? Visit webgpu.io to check!

Phase One: Code Completion & Setup

Focus: Understanding the codebase and making it work

:bulb: Getting Started Don't worry if this looks complex at first! We'll tackle it step by step. Remember, even experienced developers debug their code - it's part of the learning process!

Activities:

Code Analysis Session
- Make a copy of this code and paste in your VS Code IDE Download HTML file here
- Other alternative: you can download the source code template from Stackblitz using Download Project button.
- Review the template code structure
- Identify the 9 TODO items that need completion
- Understand the flow: Model Loading -> Camera Setup -> Image Capture -> AI Processing
Fill-in-the-Blanks Challenge
- Complete all TODO items systematically
- Test each section as you go
- Document any errors and how you solved them
First Run & Troubleshooting
- Run your completed code
- Grant camera permissions and wait for model loading
- Troubleshoot common issues (WebGPU support, camera access, etc.)

:dart: Milestone Checkpoint 1

Before moving to the next phase, ensure:

All 9 TODOs are completed
Code runs without JavaScript errors
Camera permission is granted
You see "Ready to start" message

Instructions: Fill in the TODOs

TODO 1 & 2: Application Title

Location: Lines with _______ in title and heading

Task: Choose a creative name for your AI camera app

:bulb: Naming Your App Think about what makes your app special! A good name should be:

Memorable and catchy

Related to AI or vision

Easy to understand

Examples:

"Smart Camera"
"AI Vision"
"Camera Buddy"
"Vision Assistant"
"SmartEye AI"

TODO 3: Default Instruction

Location: instructionText.value = "_______";

Task: Set what question the AI should answer about the camera feed

Hint: Look at the original code for the default

Try these alternatives:

"Describe this scene in detail"
"What objects can you identify?"
"Tell me what's happening here"

TODO 4: Model Selection

Location: const modelId = "_______";

Task: Choose which AI model to use

Hint: Look for "HuggingFaceTB" in the original code

Alternatives:

"HuggingFaceTB/SmolVLM-500M-Instruct"
"HuggingFaceTB/SmolVLM-256M-Instruct" (smaller, faster)

TODO 5: Camera Access

Location: stream = await navigator.mediaDevices._______({

Task: Fill in the method name and the true/false values

Hint: We want to get user media (camera)

:warning: Common Mistake Make sure you spell the method name correctly! It's getUserMedia (not getusermedia or get_user_media)

TODO 6: Token Limit

Location: _______: _______, in the generate function

Task: Set the maximum number of words the AI can respond with

Hint: Starts with "max_new_tokens"

Experiment: Try different numbers like 50 or 150

TODO 7 & 8: Processing State

Location: isProcessing = _______; in both handleStart and handleStop

Task: Set the processing state correctly

TODO 9: Initialize Functions

Location: await _______(); calls at the bottom

Task: Call the initialization functions in the right order

:bulb: Order Matters! Think about what needs to happen first:

Load the AI model (this takes time)

Set up the camera (needs user permission) Remember: You can't process images until both are ready!

Phase 2: Experimentation & Customization

Focus: Understanding AI behavior and prompt engineering

:dart: Milestone Checkpoint 2

Before starting experimentation:

Basic app is working
AI responds to camera feed
You understand how prompts affect responses

Activities:

Prompt Engineering Lab
- Test 10 different instruction prompts
- Document which prompts work best for different scenarios
- Experiment with question styles (open-ended vs. specific)
Try these prompt categories:
- Descriptive: "Describe this scene in detail"
- Analytical: "What is the mood of this image?"
- Counting: "How many objects do you see?"
- Educational: "What can you learn from this scene?"
- Creative: "Tell a story about what you see"
Model Comparison Study
- Switch between SmolVLM-500M and SmolVLM-256M models
- Compare response quality and speed
- Test the same scenes with both models
- Document differences in performance
Application Customization
- Modify the app title and styling to reflect your vision
- Adjust processing intervals for different use cases
- Add your own creative touches to the interface

Testing Your Code

Step One: Basic Functionality

Open your HTML file in a supported browser
Grant camera permissions when prompted
Wait for "Ready to start" message
Click Start button
Check if AI responds to your camera feed

Step 2: Debugging Tips

:warning: Common Issues & Solutions

:emoji: "WebGPU not available"

Solution 1: Update to Chrome 113+ or Safari Technology Preview

Solution 2: Enable WebGPU in browser flags (chrome://flags)

Solution 3: Check if your GPU supports WebGPU at webgpu.io

:emoji: "Camera not available"

Solution 1: Click "Allow" when browser asks for camera permission

Solution 2: Ensure you're using HTTPS or localhost (not file://)

Solution 3: Close other apps using your camera (Zoom, Teams, etc.)

Solution 4: Check browser settings -> Privacy -> Camera permissions

⏳ Model loading stuck

Solution 1: Be patient! First load can take 2-5 minutes

Solution 2: Check console for download progress

Solution 3: Try the smaller 256M model for faster loading

Solution 4: Ensure stable internet connection

:emoji: No response from AI

Solution 1: Check browser console for errors (F12)

Solution 2: Verify your instruction syntax is correct

Solution 3: Try simple prompts like "What do you see?"

Solution 4: Ensure camera is pointing at visible objects

Step 3: Experimentation

Once it works, try:

Different instructions/questions
Different processing intervals
Moving the camera to different scenes
Asking specific questions about objects

:bulb: Pro Debugging Tip Open the browser console (F12) to see detailed error messages. The console will show you exactly where things go wrong!

Phase 3: Enhancement & Innovation

Focus: Adding new features and improving user experience

:dart: Milestone Checkpoint 3

Ready for enhancements when:

Tested at least 5 different prompts
Understand model response patterns
Have ideas for improvements

Activities:

Feature Development Choose any enhancement to implement:

Option One: Custom Instructions

Add buttons for preset instructions:

:bulb: Implementation Guide This enhancement makes your app more user-friendly by providing quick access to common questions!

javascript

// Add after line with instructionText.value
const presetButtons = [
    "What colors do you see?",
    "How many people are in this image?",
    "Describe the lighting in this scene",
    "What objects are on the table?"
];

// Create buttons for each preset
presetButtons.forEach(preset => {
    const btn = document.createElement('button');
    btn.textContent = preset;
    btn.onclick = () => instructionText.value = preset;
    // Add to your UI
});

Option 2: Response History

Save previous responses:

:bulb: Why Add History? This feature helps you track how the AI's understanding changes as you move the camera or adjust lighting!

javascript

let responseHistory = [];
// Add to sendData function after getting reply
responseHistory.push({
    instruction: instruction,
    response: reply,
    timestamp: new Date()
});

// Display history
function showHistory() {
    responseHistory.forEach(item => {
        console.log(`[${item.timestamp}] Q: ${item.instruction} A: ${item.response}`);
    });
}

Option 3: Confidence Indicator

Modify the instruction to ask for confidence:

javascript

instructionText.value = "What do you see? Rate your confidence from 1-10.";

// Parse confidence from response
function parseConfidence(response) {
    const match = response.match(/\b([1-9]|10)\b/);
    return match ? parseInt(match[0]) : null;
}

User Experience Testing
- Test your enhanced app with different lighting conditions
- Try various objects and scenes
Documentation & Code Comments
- Add clear comments explaining your enhancements
- Document any challenges and how you overcame them

Extension Challenges :rocket:

Ready for more? Try these advanced challenges:

Challenge One: Multi-Language Support

Make your AI respond in different languages:

javascript

// Add language selector
const languages = ['English', 'Spanish', 'French', 'Chinese'];
instructionText.value = `Describe what you see in ${selectedLanguage}`;

Challenge 2: Object Detection Counter

Create a feature that counts specific objects:

javascript

instructionText.value = "Count all the books you can see and list their colors";

Challenge 3: Accessibility Mode

Add voice output for visually impaired users:

javascript

// Use Web Speech API
function speakResponse(text) {
    const utterance = new SpeechSynthesisUtterance(text);
    speechSynthesis.speak(utterance);
}

Challenge 4: Screenshot Gallery

Save and display previous captures:

javascript

const gallery = [];
function saveSnapshot() {
    const imageData = canvas.toDataURL();
    gallery.push({
        image: imageData,
        description: responseText.textContent,
        timestamp: new Date()
    });
}

:bulb: Challenge Yourself! These extension challenges will help you stand out and demonstrate advanced understanding. Pick one that interests you most!

Phase 4: Presentation & Demonstration

Focus: Showcasing your work and reflecting on learning

:dart: Final Milestone Checkpoint

Ready to present when:

All features work smoothly
You can explain your code
You have examples to demonstrate
You've practiced your presentation

Presentation Requirements:

Live Demonstration
- Show your working application
- Demonstrate key features and your enhancements
- Test different scenarios live (various objects, lighting, prompts)
- Highlight what makes your version unique
Technical Explanation
- Explain one challenging technical aspect you overcame
- Describe your prompt engineering discoveries
- Compare the two AI models you tested
- Discuss any limitations you discovered
Real-World Applications
- Suggest 2-3 practical uses for this technology
- Explain how this could help solve real problems
- Discuss privacy considerations of AI vision systems
Q&A Session
- Answer questions from classmates and instructor
- Demonstrate specific features if requested

Assessment Criteria

Functionality (60%)

Code compiles without errors
Camera initializes successfully
AI model loads and responds
Start/Stop buttons work correctly
All TODOs completed correctly

Understanding (20%)

Can explain what each TODO does
Understands the flow of the application
Can troubleshoot basic issues

Creativity (20%)

Chooses meaningful app title
Tests different instructions
Attempts extension challenges
Documents their experiments

:warning: Important Reminder Don't skip the debugging process! Learning to fix errors is just as important as writing code. Every developer faces bugs - what matters is how you solve them!

Submission Requirements

Completed HTML file with all TODOs filled in
Screenshot of your working application
Additional: Video showing your app in action

Submit Your Project Here

:bulb: Final Words of Encouragement You've got this! Building AI applications might seem challenging at first, but remember:

Every expert was once a beginner

Debugging makes you a better programmer

Your unique perspective brings value to the project

The AI community is here to help you succeed!

If something doesn't work, read the error messages carefully and try to understand what they're telling you. Each bug you fix is a lesson learned!

Have fun building your AI vision application! :rocket:

Project 1: Hugging Face

Project 1: Hugging Face