Understand how AI vision models work in web browsers
Learn to integrate Hugging Face Transformers.js library
Practice JavaScript programming with AI APIs
Experience prompt engineering for AI vision tasks
Build a real-time AI application
Modern web browser with WebGPU support (Chrome 113+, Safari Technology Preview, or Edge 113+)
Working webcam
Local web server (Live Server extension in VS Code, or Python's http.server)
bash
cd project-01-huggingface
python3 -m http.server 8000
✅ HTML structure with camera preview (fully working)
✅ CSS styling with professional UI (fully working)
✅ Camera permission handling (fully working)
✅ Basic page layout and buttons (fully working)
⚠️ TODO : AI model loading (70% complete - needs your work!)
⚠️ TODO : Image processing logic (60% complete - needs your work!)
⚠️ TODO : Prompt engineering customization (30% complete - needs your work!)
Location : script.js - Line ~45
Success Criteria:
Hint : Look for // TODO 1 in script.js
Location : script.js - Line ~78
Success Criteria:
Hint : You need to call the getUserMedia API
Location : script.js - Line ~120
Success Criteria:
Hint : Look for the processFrame() function
Location : script.js - Line ~150
Success Criteria:
Current Challenge : The default prompt is generic. Experiment with different prompts!
Add preset instruction buttons for common questions:
"What colors do you see?"
"How many objects are visible?"
"Describe the lighting"
Save the last 10 AI responses and display them in a sidebar
Add a button to capture and save the current frame with AI description
Modify prompt to ask AI for confidence level (1-10) and display it visually
Solution : Update to Chrome 113+ or enable WebGPU in browser flags
Solution :
Click the camera icon in browser address bar
Allow camera access
Refresh the page
Solution :
Be patient! First load can take 2-5 minutes
Check internet connection
Try the smaller model variant (256M instead of 500M)
Solution :
Open browser console (F12) and check for errors
Ensure model has finished loading
Verify camera is showing video
HTML5 : Semantic markup, video element
CSS3 : Modern UI, animations, responsive design
JavaScript ES6+ : Async/await, modules
Transformers.js : Hugging Face ML library for browser
WebGPU : GPU-accelerated AI inference
MediaDevices API : Camera access
Privacy : All processing happens in your browser - no data sent to servers
Transparency : Users should know they're using AI
Limitations : AI may misidentify objects or make mistakes
Accessibility : Consider adding screen reader support
Appropriate Use : Don't use for surveillance without consent
Test incrementally : After completing each TODO, test immediately
Read error messages : Browser console provides valuable debugging info
Ask for help : Use AI assistants (Claude, ChatGPT) to explain errors
Experiment : Try different models, prompts, and processing intervals
Document : Add comments explaining your changes
Good luck! Remember: debugging is part of learning. Every error is an opportunity to understand the system better! 🚀