Apply your knowledge to build something amazing!
:information_source: Project Overview Difficulty Level: Intermediate
Estimated Time: 2-3 hours
Skills Practiced:
- JavaScript programming
- AI API integration
- WebGPU technology
- Prompt engineering
- Real-time web applications
- Debugging and troubleshooting
What is Hugging Face? Hugging Face is like the "GitHub for AI" - a platform where developers and researchers share AI models, datasets, and applications. Think of it as a library where you can borrow powerful AI tools instead of building them from scratch.
Key Features:
Why Use Hugging Face? Instead of spending months training an AI model, you can use existing models that experts have already created and tested. It's like using a calculator app instead of building your own calculator!
SmolVLM Model: The model we'll use today is called "SmolVLM" (Small Vision Language Model). It's designed to be lightweight and run directly in web browsers while still being powerful enough to understand and describe images.
graph LR
A[๐ Start] --> B[๐ Setup & Analysis]
B --> C[๐ป Code Completion]
C --> D[๐งช Testing & Debug]
D --> E[๐ฌ Experimentation]
E --> F[โจ Enhancement]
F --> G[๐ฏ Presentation]
style A fill:#f9f,stroke:#333,stroke-width:2px
style G fill:#9f9,stroke:#333,stroke-width:2px
You will build a real-time AI vision application that uses your webcam to see and describe the world around you. This project combines web development, AI integration, and user interface design to create a practical application that demonstrates how modern AI can be embedded into everyday tools.
:warning: Prerequisites Check Before starting, ensure you have:
- :white_check_mark: Modern web browser with WebGPU support (Chrome 113+ or Safari Technology Preview)
- :white_check_mark: Working webcam connected to your computer
- :white_check_mark: HTTPS connection or localhost environment
- :white_check_mark: Stable internet connection (for downloading AI models)
- :white_check_mark: At least 2GB of free RAM for model loading
Not sure if your browser supports WebGPU? Visit webgpu.io to check!
Focus: Understanding the codebase and making it work
:bulb: Getting Started Don't worry if this looks complex at first! We'll tackle it step by step. Remember, even experienced developers debug their code - it's part of the learning process!
Code Analysis Session
Fill-in-the-Blanks Challenge
First Run & Troubleshooting
Before moving to the next phase, ensure:
Location: Lines with _______
in title and heading
Task: Choose a creative name for your AI camera app
:bulb: Naming Your App Think about what makes your app special! A good name should be:
- Memorable and catchy
- Related to AI or vision
- Easy to understand
Examples:
Location: instructionText.value = "_______";
Task: Set what question the AI should answer about the camera feed
Hint: Look at the original code for the default
Try these alternatives:
Location: const modelId = "_______";
Task: Choose which AI model to use
Hint: Look for "HuggingFaceTB" in the original code
Alternatives:
"HuggingFaceTB/SmolVLM-500M-Instruct"
"HuggingFaceTB/SmolVLM-256M-Instruct"
(smaller, faster)Location: stream = await navigator.mediaDevices._______({
Task: Fill in the method name and the true/false values
Hint: We want to get user media (camera)
:warning: Common Mistake Make sure you spell the method name correctly! It's
getUserMedia
(not getusermedia or get_user_media)
Location: _______: _______,
in the generate function
Task: Set the maximum number of words the AI can respond with
Hint: Starts with "max_new_tokens"
Experiment: Try different numbers like 50 or 150
Location: isProcessing = _______;
in both handleStart and handleStop
Task: Set the processing state correctly
Location: await _______();
calls at the bottom
Task: Call the initialization functions in the right order
:bulb: Order Matters! Think about what needs to happen first:
- Load the AI model (this takes time)
- Set up the camera (needs user permission) Remember: You can't process images until both are ready!
Focus: Understanding AI behavior and prompt engineering
Before starting experimentation:
Prompt Engineering Lab
Try these prompt categories:
Model Comparison Study
Application Customization
:warning: Common Issues & Solutions
:emoji: "WebGPU not available"
- Solution 1: Update to Chrome 113+ or Safari Technology Preview
- Solution 2: Enable WebGPU in browser flags (chrome://flags)
- Solution 3: Check if your GPU supports WebGPU at webgpu.io
:emoji: "Camera not available"
- Solution 1: Click "Allow" when browser asks for camera permission
- Solution 2: Ensure you're using HTTPS or localhost (not file://)
- Solution 3: Close other apps using your camera (Zoom, Teams, etc.)
- Solution 4: Check browser settings -> Privacy -> Camera permissions
โณ Model loading stuck
- Solution 1: Be patient! First load can take 2-5 minutes
- Solution 2: Check console for download progress
- Solution 3: Try the smaller 256M model for faster loading
- Solution 4: Ensure stable internet connection
:emoji: No response from AI
- Solution 1: Check browser console for errors (F12)
- Solution 2: Verify your instruction syntax is correct
- Solution 3: Try simple prompts like "What do you see?"
- Solution 4: Ensure camera is pointing at visible objects
Once it works, try:
:bulb: Pro Debugging Tip Open the browser console (F12) to see detailed error messages. The console will show you exactly where things go wrong!
Focus: Adding new features and improving user experience
Ready for enhancements when:
Add buttons for preset instructions:
:bulb: Implementation Guide This enhancement makes your app more user-friendly by providing quick access to common questions!
// Add after line with instructionText.value
const presetButtons = [
"What colors do you see?",
"How many people are in this image?",
"Describe the lighting in this scene",
"What objects are on the table?"
];
// Create buttons for each preset
presetButtons.forEach(preset => {
const btn = document.createElement('button');
btn.textContent = preset;
btn.onclick = () => instructionText.value = preset;
// Add to your UI
});
Save previous responses:
:bulb: Why Add History? This feature helps you track how the AI's understanding changes as you move the camera or adjust lighting!
let responseHistory = [];
// Add to sendData function after getting reply
responseHistory.push({
instruction: instruction,
response: reply,
timestamp: new Date()
});
// Display history
function showHistory() {
responseHistory.forEach(item => {
console.log(`[${item.timestamp}] Q: ${item.instruction} A: ${item.response}`);
});
}
Modify the instruction to ask for confidence:
instructionText.value = "What do you see? Rate your confidence from 1-10.";
// Parse confidence from response
function parseConfidence(response) {
const match = response.match(/\b([1-9]|10)\b/);
return match ? parseInt(match[0]) : null;
}
User Experience Testing
Documentation & Code Comments
Ready for more? Try these advanced challenges:
Make your AI respond in different languages:
// Add language selector
const languages = ['English', 'Spanish', 'French', 'Chinese'];
instructionText.value = `Describe what you see in ${selectedLanguage}`;
Create a feature that counts specific objects:
instructionText.value = "Count all the books you can see and list their colors";
Add voice output for visually impaired users:
// Use Web Speech API
function speakResponse(text) {
const utterance = new SpeechSynthesisUtterance(text);
speechSynthesis.speak(utterance);
}
Save and display previous captures:
const gallery = [];
function saveSnapshot() {
const imageData = canvas.toDataURL();
gallery.push({
image: imageData,
description: responseText.textContent,
timestamp: new Date()
});
}
:bulb: Challenge Yourself! These extension challenges will help you stand out and demonstrate advanced understanding. Pick one that interests you most!
Focus: Showcasing your work and reflecting on learning
Ready to present when:
Live Demonstration
Technical Explanation
Real-World Applications
Q&A Session
:warning: Important Reminder Don't skip the debugging process! Learning to fix errors is just as important as writing code. Every developer faces bugs - what matters is how you solve them!
:bulb: Final Words of Encouragement You've got this! Building AI applications might seem challenging at first, but remember:
- Every expert was once a beginner
- Debugging makes you a better programmer
- Your unique perspective brings value to the project
- The AI community is here to help you succeed!
If something doesn't work, read the error messages carefully and try to understand what they're telling you. Each bug you fix is a lesson learned!
Have fun building your AI vision application! :rocket: