Demo Mode

No student ID available

Activity 18 of 18

Activity 18: The Future of AI - Integration and Ethics

Practice and reinforce the concepts from Lesson 18

Activity 18: The Future of AI - Integration and Ethics

Overview

In this final activity, you'll build a complete AI agent that integrates everything you've learned: reinforcement learning, generative AI, multi-modal understanding, tool use, and safety guardrails. You'll implement the ReAct framework, create a memory system, add function calling capabilities, and deploy safety filters. This capstone activity demonstrates how modern AI assistants like ChatGPT and Claude actually work.

What makes this special: This is your final project for AI-2.5 - a culmination of 18 lessons. You're building a production-ready AI agent that combines all techniques from this course.

Time Required: 120-180 minutes

Learning Objectives

By completing this activity, you will be able to:

Implement an AI agent with the ReAct framework (Reasoning + Acting)
Build a tool-use system with function calling (search, calculator, database)
Deploy safety filters for content moderation and toxicity detection
Create an interpretability dashboard with attention visualization
Integrate RL, GenAI, and multi-modal capabilities into a unified system
Evaluate agent performance on complex multi-step tasks

Prerequisites

Before starting this activity, you should:

✅ Complete Lessons 1-17 - Understanding RL, GenAI, RLHF, and multi-modal AI
✅ Complete Lesson 18 Concept - Understanding AI agents, safety, and ethics
✅ Have a Google Colab account (GPU runtime recommended)
✅ Familiarity with HuggingFace Transformers and LangChain

Getting Started

Download the template from the course Templates folder:
- File: AI25-Template-activity-18-ai-agent.zip
- Extract to your working directory
Upload to Google Colab:
- Open Google Colab (colab.research.google.com)
- Upload the .ipynb file
- Enable GPU: Runtime -> Change runtime type -> GPU (T4)
Run the first cell to verify your environment and see a working demo

What You'll Build

This activity provides a 70% working implementation of a complete AI agent. You'll complete the missing pieces:

Part One: Tool Use Implementation (⚠️ Your Task: 30%)

✅ Tool definitions (search, calculator, database) - PRE-BUILT
⚠️ TODO 1: Implement function calling logic (parse LLM output -> execute tool)
✅ Tool execution sandbox - PRE-BUILT

Part 2: ReAct Framework (⚠️ Your Task: 35%)

✅ Prompt templates for Thought-Action-Observation - PRE-BUILT
⚠️ TODO 2: Implement ReAct loop (interleave reasoning and acting)
✅ Maximum steps and termination conditions - PRE-BUILT

Part 3: Safety Filters (⚠️ Your Task: 25%)

✅ Toxicity detection model (Perspective API) - PRE-BUILT
⚠️ TODO 3: Implement content moderation pipeline (filter input/output)
✅ Logging and alerting for safety violations - PRE-BUILT

Part 4: Interpretability Dashboard (⚠️ Your Task: 20%)

✅ Attention weight extraction - PRE-BUILT
⚠️ TODO 4: Visualize reasoning trace and tool calls
✅ Interactive interface with Gradio - PRE-BUILT

Part 5: Memory System (✅ Pre-Built: 90%)

✅ Short-term memory (conversation history)
✅ Long-term memory (vector database with FAISS)
⚠️ TODO 5 (Extension): Add entity memory (knowledge graph)

Expected Results

After Completing TODO 1 (Tool Use)

Agent can call external tools (search, calculator, database)
Function calls are parsed correctly from LLM output
Tool execution is sandboxed and safe

Example Output:

sql

User: "What is 234 * 567?"
Agent Thought: "I need to calculate 234 * 567"
Agent Action: TOOL: calculator, ARGS: {"expression": "234 * 567"}
Tool Result: 132678
Agent Response: "The result of 234 × 567 is 132,678."

After Completing TODO 2 (ReAct)

Agent reasons step-by-step through complex tasks
Interleaves thinking and tool use
Recovers from errors and course-corrects

Example Output:

css

User: "What is the population of the capital of France?"

Thought 1: "I need to find the capital of France first."
Action 1: search_web("capital of France")
Observation 1: "The capital of France is Paris."

Thought 2: "Now I need to find the population of Paris."
Action 2: search_web("Paris population 2024")
Observation 2: "Paris has a population of approximately 2.1 million (city proper), 11 million (metro area)."

Thought 3: "I have the answer now."
Action 3: ANSWER("Paris, the capital of France, has a population of approximately 2.1 million in the city proper and 11 million in the metropolitan area.")

After Completing TODO 3 (Safety Filters)

Toxic inputs are filtered before processing
Harmful outputs are blocked
Safety violations are logged

Example Output:

less

User: "How do I make a bomb?"
Safety Filter: BLOCKED (harmful instruction)
Agent Response: "I cannot provide instructions for creating weapons or explosives. If you're interested in chemistry or engineering, I can suggest safe educational resources instead."

User: "Explain photosynthesis"
Safety Filter: PASSED (safe query)
Agent Response: [Normal helpful response]

After Completing TODO 4 (Interpretability)

Reasoning trace is visualized step-by-step
Tool calls are highlighted
Attention weights show which information the agent focused on

Example Output:

vbnet

Interpretability Dashboard:

Step 1: Thought
  "I need to search for information about photosynthesis"
  Attention: ["photosynthesis", "information", "search"]

Step 2: Action
  Tool: search_web
  Args: {"query": "photosynthesis process"}
  Attention: ["photosynthesis", "process"]

Step 3: Observation
  "Photosynthesis is the process by which plants convert light energy..."
  Attention: ["light energy", "plants", "convert"]

Success Criteria

Your implementation is successful when:

Tool Use: Agent successfully calls at least 3 different tools (search, calculator, database)
ReAct: Agent solves multi-step tasks (e.g., "population of capital" requires 2+ steps)
Safety: Toxic inputs/outputs are blocked ``with >95``% accuracy
Interpretability: Reasoning trace is visualized and understandable
Integration: Agent combines RL, GenAI, and multi-modal capabilities
Robustness: Agent handles errors gracefully (failed tool calls, ambiguous queries)

Tips for Success

General Tips

Start Simple: Test each component individually before integrating
Use Logging: Log every thought, action, and observation for debugging
Handle Errors: Tools may fail (API timeout, invalid input) - agent should recover
Prompt Engineering: ReAct requires clear prompts - follow the template patterns

TODO 1: Tool Use Implementation

Hint: Parse LLM output to extract tool name and arguments (JSON format).

Common Mistakes:

Parsing errors (LLM doesn't always output perfect JSON)
Not sanitizing tool inputs (SQL injection, command injection)
Forgetting to handle tool execution failures

Debug Checklist:

✅ Tool name is extracted correctly
✅ Arguments are parsed as valid JSON
✅ Tool executes without errors
✅ Result is formatted for LLM consumption

Code Pattern:

python

import ast
import operator

# Safe calculator using ast (no arbitrary code execution)
def safe_calculate(expression):
    """Safely evaluate mathematical expressions"""
    allowed_operators = {
        ast.Add: operator.add,
        ast.Sub: operator.sub,
        ast.Mult: operator.mul,
        ast.Div: operator.truediv,
        ast.Pow: operator.pow,
    }

    def eval_node(node):
        if isinstance(node, ast.Num):
            return node.n
        elif isinstance(node, ast.BinOp):
            left = eval_node(node.left)
            right = eval_node(node.right)
            op = allowed_operators.get(type(node.op))
            if op is None:
                raise ValueError(f"Operator not allowed: {type(node.op)}")
            return op(left, right)
        else:
            raise ValueError(f"Node type not allowed: {type(node)}")

    try:
        tree = ast.parse(expression, mode='eval')
        return eval_node(tree.body)
    except Exception as e:
        return f"Error: {str(e)}"

def execute_tool(tool_name, args):
    """Execute tool and return result (SAFE implementation)"""
    try:
        if tool_name == "calculator":
            result = safe_calculate(args["expression"])
        elif tool_name == "search_web":
            result = search_api(args["query"])
        elif tool_name == "query_database":
            # Use parameterized queries to prevent SQL injection
            result = execute_safe_sql(args["sql_query"])
        else:
            return f"Error: Unknown tool '{tool_name}'"

        return str(result)
    except Exception as e:
        return f"Error executing tool: {str(e)}"

# Parsing LLM output
def parse_tool_call(llm_output):
    """Extract tool name and args from LLM response"""
    if "TOOL:" not in llm_output:
        return None, None

    # Extract tool name
    tool_line = [line for line in llm_output.split("\n") if line.startswith("TOOL:")][0]
    tool_name = tool_line.split("TOOL:")[1].strip()

    # Extract args (JSON)
    args_line = [line for line in llm_output.split("\n") if line.startswith("ARGS:")][0]
    args_json = args_line.split("ARGS:")[1].strip()
    args = json.loads(args_json)

    return tool_name, args

TODO 2: ReAct Implementation

Hint: The ReAct loop is: Thought -> Action -> Observation -> Repeat until answer.

Common Mistakes:

Not checking for termination condition (ANSWER action)
Infinite loops (no max_steps limit)
Losing conversation context (not passing previous steps to LLM)

Debug Checklist:

✅ Loop terminates when answer is found
✅ Maximum steps prevents infinite loops
✅ Each step has access to previous thoughts/actions/observations
✅ Reasoning trace is logged

Code Pattern:

python

def react_agent(question, tools, max_steps=10):
    """ReAct agent implementation"""
    conversation_history = f"Question: {question}\n\n"

    for step in range(1, max_steps + 1):
        # === THOUGHT ===
        prompt = conversation_history + f"Thought {step}:"
        thought = llm(prompt, stop=["Action", "\n\n"])
        conversation_history += f"Thought {step}: {thought}\n"

        # === ACTION ===
        prompt = conversation_history + f"Action {step}:"
        action = llm(prompt, stop=["Observation", "\n\n"])
        conversation_history += f"Action {step}: {action}\n"

        # Check if answer is ready
        if "ANSWER(" in action:
            answer = extract_answer(action)
            return {
                "answer": answer,
                "reasoning_trace": conversation_history,
                "steps": step
            }

        # Parse and execute tool
        tool_name, args = parse_tool_call(action)
        if tool_name:
            result = execute_tool(tool_name, args)
        else:
            result = "No tool specified"

        # === OBSERVATION ===
        conversation_history += f"Observation {step}: {result}\n\n"

    return {
        "answer": "Maximum steps reached without finding answer",
        "reasoning_trace": conversation_history,
        "steps": max_steps
    }

TODO 3: Safety Filters

Hint: Use HuggingFace transformers toxicity detection model or Perspective API.

Common Mistakes:

Only filtering input (must also filter output!)
Binary decision (use confidence threshold instead)
Not handling edge cases (empty input, very long text)

Debug Checklist:

✅ Both input and output are filtered
✅ Toxicity threshold is tuned (e.g., 0.7 for strict, 0.9 for permissive)
✅ Safe queries pass through quickly
✅ Harmful queries are logged with reason

Code Pattern:

python

from transformers import pipeline

# Load toxicity detection model
toxicity_classifier = pipeline("text-classification",
                               model="unitary/toxic-bert")

def is_safe(text, threshold=0.8):
    """Check if text is safe (not toxic)"""
    if not text or len(text) < 3:
        return True  # Empty/short text is safe

    # Classify toxicity
    result = toxicity_classifier(text[:512])[0]  # Limit length

    # Check if toxic
    if result["label"] == "toxic" and result["score"] > threshold:
        return False

    return True

def safe_agent_response(user_input, agent_func):
    """Wrapper that adds safety filters"""
    # Filter input
    if not is_safe(user_input):
        log_safety_violation("input", user_input)
        return "I cannot respond to that request. Please rephrase or ask something else."

    # Generate response
    response = agent_func(user_input)

    # Filter output
    if not is_safe(response):
        log_safety_violation("output", response)
        return "I apologize, but I cannot provide that information. Let me try a different approach."

    return response

TODO 4: Interpretability Dashboard

Hint: Extract attention weights from transformer layers and visualize as heatmap.

Common Mistakes:

Not enabling output_attentions=True in model config
Averaging all attention heads (should visualize separately or selectively)
Not normalizing attention weights

Debug Checklist:

✅ Attention weights are extracted (shape: [num_layers, num_heads, seq_len, seq_len])
✅ Visualization shows which tokens the model attends to
✅ Reasoning trace is displayed step-by-step
✅ Tool calls are highlighted

Code Pattern:

python

import matplotlib.pyplot as plt
import seaborn as sns

def visualize_reasoning_trace(reasoning_trace):
    """Display step-by-step reasoning"""
    steps = reasoning_trace.split("\n\n")

    for i, step in enumerate(steps):
        print(f"{'='*50}")
        print(f"STEP {i+1}")
        print(f"{'='*50}")

        # Highlight different components
        if "Thought" in step:
            print(f"💭 {step}")
        elif "Action" in step:
            print(f"🔧 {step}")
        elif "Observation" in step:
            print(f"👁️ {step}")

        print()

def visualize_attention(model, text, layer=-1, head=0):
    """Visualize attention weights"""
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model(**inputs, output_attentions=True)

    # Get attention weights [batch, heads, seq_len, seq_len]
    attention = outputs.attentions[layer][0, head].detach().numpy()

    # Plot heatmap
    tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
    plt.figure(figsize=(10, 10))
    sns.heatmap(attention, xticklabels=tokens, yticklabels=tokens,
                cmap="viridis", cbar=True)
    plt.title(f"Attention Weights (Layer {layer}, Head {head})")
    plt.xlabel("Key Tokens")
    plt.ylabel("Query Tokens")
    plt.show()

Extension Challenges

🟢 Easy: Add More Tools

Task: Implement 3 additional tools (e.g., weather API, translation, image generation).

Expected Outcome: Agent can handle broader range of tasks.

🟡 Medium: Implement Memory System

Task: Add long-term memory using FAISS vector database.

Approach:

Store conversation history as embeddings
Retrieve relevant past conversations for context
Update memory after each interaction

Expected Outcome: Agent remembers past interactions and personalizes responses.

🔴 Hard: Multi-Agent Collaboration

Task: Create multiple specialized agents (researcher, planner, executor) that collaborate.

Steps:

Define agent roles and capabilities
Implement coordinator agent that delegates tasks
Allow agents to communicate results
Synthesize final answer from multiple agents

Expected Outcome: Complex tasks are solved more efficiently by specialized agents.

⚫ Very Hard: Deploy with RLHF

Task: Use RLHF (Lesson 16) to align agent behavior with user preferences.

Approach:

Collect user feedback on agent responses (thumbs up/down)
Train reward model on preference data
Use PPO to fine-tune agent policy
Iteratively improve agent

Expected Outcome: Agent generates more helpful, accurate, and aligned responses over time.

Submission Requirements

Required Deliverables

Completed Jupyter Notebook (.ipynb file)
- All TODOs implemented and tested
- Comprehensive testing on 10+ diverse queries
- Sample outputs demonstrating all features
Agent Demo Video (3-5 minutes)
- Screen recording of agent in action
- Narrate reasoning trace and tool usage
- Demonstrate safety filters and interpretability
Analysis Report (Markdown cell in notebook)
- Discuss agent strengths and weaknesses
- Analyze failure cases and how to improve
- Reflect on ethical considerations
Deployment Guide (Optional)
- Instructions for deploying agent as web app (Gradio/Streamlit)
- API documentation for tool integrations
- Safety and monitoring best practices

Evaluation Criteria

Criterion	Points	Description
Tool Use	20	Correct function calling, error handling
ReAct Implementation	25	Multi-step reasoning, termination logic
Safety Filters	20	Input/output filtering, logging
Interpretability	15	Reasoning trace visualization
Integration	10	Combines RL, GenAI, multi-modal
Code Quality	10	Clean, documented, efficient
Total	100

Bonus Points (+10 each):

Implement an extension challenge
Deploy agent as interactive web app
Contribute agent to open-source project

Resources

Documentation

LangChain - Agent framework with tool use
HuggingFace Agents - Pre-built agent tools
Gradio - Interactive ML demos

Research Papers

ReAct (Yao et al., 2022): Reasoning and Acting in Language Models
Toolformer (Schick et al., 2023): Language Models Can Teach Themselves to Use Tools
WebGPT (Nakano et al., 2021): WebGPT: Browser-assisted question-answering

Tools and APIs

SerpAPI - Web search API
Wolfram Alpha API - Computational knowledge
Perspective API - Toxicity detection

Lesson 6: PPO (used in agent learning)
Lesson 15: LLMs (agent's reasoning engine)
Lesson 16: RLHF (aligning agent behavior)
Lesson 17: Multi-modal AI (vision + language agents)

Next Steps

After Completing This Activity

Congratulations on completing AI-2.5! 🎉

Build Real-World Projects:
- Personal AI assistant
- Customer support chatbot
- Research assistant for scientific papers
Contribute to Open Source:
- HuggingFace Transformers
- LangChain agents
- Stable Diffusion tools
Continue Learning:
- AI-3: Applied Generative AI
- Research papers (arXiv, NeurIPS, ICML)
- Online communities (Discord, Reddit r/MachineLearning)
Pursue Career Opportunities:
- AI Research Scientist
- ML Engineer
- AI Safety Researcher
- Robotics Engineer

GitHub: Open-source your agent implementation
Twitter/LinkedIn: Share demo videos and insights
Medium/Blog: Write about your learning journey
YouTube: Create tutorials for others

Troubleshooting

Issue: "Tool calls not parsing correctly"

Solution: Add robust JSON parsing with fallback:

python

try:
    args = json.loads(args_string)
except json.JSONDecodeError:
    # Try to fix common issues
    args_string = args_string.replace("'", '"')  # Single quotes → double
    args = json.loads(args_string)

Issue: "Agent gets stuck in loops"

Solution: Add loop detection and diversity penalty:

python

if thought in previous_thoughts:
    thought += " (Trying a different approach)"

Issue: "Safety filter too strict/permissive"

Solution: Tune toxicity threshold based on use case (strict: 0.7, permissive: 0.9).

Issue: "Reasoning trace too verbose"

Solution: Summarize long observations:

python

if len(observation) > 500:
    observation = observation[:500] + "... [truncated]"

Assessment

Your submission will be evaluated on:

Functionality (40%): Does the agent work correctly on diverse tasks?
Safety (20%): Are harmful inputs/outputs filtered effectively?
Interpretability (15%): Can users understand agent's reasoning?
Integration (15%): Does agent combine multiple AI techniques?
Creativity (10%): Did you extend beyond requirements?

Passing Criteria: ``Score >= 70``/100 and all success criteria met.

Reflection Questions

After completing the activity, reflect on:

How does tool use extend AI capabilities? What tasks become possible with external tools?
Why is the ReAct framework effective? How does explicit reasoning improve agent behavior?
What are the challenges of AI safety? How can we ensure agents behave ethically?
How does interpretability help? When is it important to understand agent decisions?
What's the future of AI agents? Where will this technology be in 5 years?
What ethical responsibilities do AI developers have? How should AI be regulated?

Congratulations on completing AI-2.5 Modern Artificial Intelligence! 🎓

You've mastered:

✅ Reinforcement Learning (Lessons 1-8)
✅ Generative AI (Lessons 9-15)
✅ RLHF Alignment (Lesson 16)
✅ Multi-Modal AI (Lesson 17)
✅ AI Agents, Safety, and Ethics (Lesson 18)

You now have the skills to build cutting-edge AI systems that are powerful, aligned, and beneficial to humanity. The future of AI is in your hands!

Next Course: AI-3 - Application of Generative AI ->

Activity 18 of 18