ℹ️ Definition: The future of AI lies in integrated systems that combine reinforcement learning, generative AI, and multi-modal understanding to create autonomous agents capable of reasoning, tool use, and ethical decision-making. This lesson explores AI agent architectures, safety, ethics, and the path toward artificial general intelligence (AGI).
By the end of this lesson, you will be able to:
Congratulations! You've reached the final lesson of AI-2.5. Let's reflect on the journey:
The Final Question: How do we combine all these techniques into integrated AI systems that can:
The Answer: AI Agents - autonomous systems that perceive, reason, act, and learn.
Definition: An AI agent is a system that autonomously pursues goals by:
Examples:
┌─────────────────────────────────────────────────┐
│ AI Agent Core │
│ (Large Language Model + RL Policy) │
└─────────────────────────────────────────────────┘
↓ ↓ ↓
┌────────────────┐ ┌────────────────┐ ┌────────────────┐
│ Perception │ │ Reasoning │ │ Action │
│ (Multi-Modal) │ │ (Planning) │ │ (Tools) │
├────────────────┤ ├────────────────┤ ├────────────────┤
│ • Vision (CLIP)│ │ • ReAct Loop │ │ • Search │
│ • Language │ │ • Chain-of- │ │ • Calculator │
│ • Audio │ │ Thought │ │ • Database │
│ • Sensors │ │ • Tree Search │ │ • Code Exec │
└────────────────┘ └────────────────┘ └────────────────┘
↑ ↑ ↑
└────────────────────┴────────────────┘
Feedback Loop
(Memory + Learning)

Problem: Language models generate text, but many tasks require actions:
Solution: Teach agents to call external tools (functions) to accomplish tasks.
# Define available tools
tools = [
{
"name": "search_web",
"description": "Search the internet for information",
"parameters": {
"query": "string - the search query"
}
},
{
"name": "calculator",
"description": "Perform mathematical calculations",
"parameters": {
"expression": "string - math expression to evaluate"
}
},
{
"name": "query_database",
"description": "Query a SQL database",
"parameters": {
"sql_query": "string - SQL SELECT statement"
}
}
]
# Agent decides which tool to use
def agent_step(prompt, tools):
"""
Agent reasons about which tool to use and generates function call
"""
# Prompt LLM to select tool
system_prompt = f"""
You are an AI assistant with access to these tools: {tools}
When you need to use a tool, respond with:
TOOL: <tool_name>
ARGS: <json_arguments>
"""
response = llm(system_prompt + "\n\nUser: " + prompt)
# Parse tool call
if "TOOL:" in response:
tool_name = extract_tool_name(response)
args = extract_args(response)
# Execute tool
result = execute_tool(tool_name, args)
# Return result to LLM for final response
final_response = llm(f"Tool result: {result}\n\nGenerate final answer:")
return final_response
else:
# No tool needed, return direct response
return response
# Example usage
prompt = "What is the population of Tokyo in 2024?"
response = agent_step(prompt, tools)
# Agent calls: search_web(query="Tokyo population 2024")
# Result: "Approximately 14 million"
# Final response: "As of 2024, Tokyo's population is approximately 14 million people."
Critical: Tool execution can be dangerous (e.g., deleting files, sending emails).
Safety Measures:
Paper: "ReAct: Synergizing Reasoning and Acting in Language Models" (Yao et al., 2022)
Big Idea: Interleave reasoning (thinking) and acting (tool use) in a loop.
Thought → Action → Observation → Thought → Action → ...
Example:
Question: "What is the capital of the country where the 2024 Olympics were held?"
Thought 1: I need to find out which country hosted the 2024 Olympics.
Action 1: search_web("2024 Olympics host country")
Observation 1: The 2024 Summer Olympics were held in Paris, France.
Thought 2: Now I know France hosted the Olympics. The capital of France is Paris.
Action 2: ANSWER("Paris")
def react_agent(question, max_steps=10):
"""
ReAct agent that interleaves reasoning and acting
"""
context = f"Question: {question}\n\n"
for step in range(max_steps):
# Reasoning step
prompt = context + f"Thought {step+1}:"
thought = llm(prompt, stop=["Action", "Observation"])
context += f"Thought {step+1}: {thought}\n"
# Action step
prompt = context + f"Action {step+1}:"
action = llm(prompt, stop=["Observation", "Thought"])
context += f"Action {step+1}: {action}\n"
# Check if answer is ready
if "ANSWER(" in action:
answer = extract_answer(action)
return answer
# Execute action (tool call)
observation = execute_action(action)
context += f"Observation {step+1}: {observation}\n\n"
return "Maximum steps reached without finding answer."
# Example usage
answer = react_agent("What is the population of the capital of Japan?")
# Expected output: "Approximately 14 million"
Why ReAct Works:

Problem: Agents need to remember past interactions and learned information.
Types of Memory:
Purpose: Store current conversation context
Implementation:
class ShortTermMemory:
def __init__(self, max_tokens=4096):
self.messages = []
self.max_tokens = max_tokens
def add(self, message):
self.messages.append(message)
# Truncate if exceeds max tokens
if self.token_count() > self.max_tokens:
self.messages = self.messages[-10:] # Keep last 10 messages
def get_context(self):
return "\n".join([f"{m['role']}: {m['content']}" for m in self.messages])
Purpose: Store important facts and past experiences
Implementation: Vector database (e.g., Pinecone, Weaviate)
import faiss
import numpy as np
class LongTermMemory:
def __init__(self, embedding_dim=512):
self.index = faiss.IndexFlatL2(embedding_dim) # Vector search
self.memories = [] # Store actual text
def store(self, text, embedding):
"""Store memory with vector embedding"""
self.index.add(np.array([embedding]))
self.memories.append(text)
def retrieve(self, query_embedding, k=5):
"""Retrieve top-k most relevant memories"""
distances, indices = self.index.search(np.array([query_embedding]), k)
return [self.memories[i] for i in indices[0]]
# Usage
ltm = LongTermMemory()
# Store memories
ltm.store("User's name is Alice", embedding=embed("Alice name"))
ltm.store("User likes hiking", embedding=embed("hiking preference"))
# Retrieve relevant memories
query = "What does the user enjoy doing?"
relevant_memories = ltm.retrieve(embedding=embed(query), k=3)
# Returns: ["User likes hiking", ...]
Purpose: Track entities (people, places, things) and their relationships
Implementation: Knowledge graph
class EntityMemory:
def __init__(self):
self.entities = {} # entity_id → properties
self.relationships = [] # (subject, predicate, object) triples
def add_entity(self, entity_id, properties):
self.entities[entity_id] = properties
def add_relationship(self, subject, predicate, object):
self.relationships.append((subject, predicate, object))
def query(self, entity_id):
"""Get all information about an entity"""
properties = self.entities.get(entity_id, {})
relations = [r for r in self.relationships if r[0] == entity_id or r[2] == entity_id]
return {"properties": properties, "relationships": relations}
# Usage
em = EntityMemory()
em.add_entity("alice", {"type": "person", "age": 30})
em.add_entity("paris", {"type": "city", "country": "France"})
em.add_relationship("alice", "visited", "paris")
info = em.query("alice")
# Returns: {"properties": {"type": "person", "age": 30},
# "relationships": [("alice", "visited", "paris")]}

Idea: Multiple specialized agents work together to solve complex tasks.
Example Task: "Plan a trip to Japan"
Agents:
class MultiAgentSystem:
def __init__(self):
self.agents = {
"researcher": Agent(role="research", tools=["search_web"]),
"planner": Agent(role="planning", tools=["calendar"]),
"budget": Agent(role="budget", tools=["calculator"]),
}
self.coordinator = Agent(role="coordinator", tools=["delegate"])
def solve_task(self, task):
"""Coordinator delegates subtasks to specialist agents"""
plan = self.coordinator.plan(task)
results = {}
for subtask in plan:
# Determine which agent should handle subtask
agent_role = self.coordinator.assign(subtask)
agent = self.agents[agent_role]
# Execute subtask
result = agent.execute(subtask)
results[subtask] = result
# Coordinator synthesizes final answer
final_answer = self.coordinator.synthesize(results)
return final_answer
# Usage
mas = MultiAgentSystem()
answer = mas.solve_task("Plan a 7-day trip to Japan in April")
Benefits:
Outer Alignment: Does the specified objective match human values?
Example Failure:
Objective: "Maximize paperclip production"
Unintended Consequence: AI converts all matter (including humans) into paperclips
Solution: Specify objectives that include human values (RLHF, Constitutional AI)
Inner Alignment: Does the learned policy actually optimize the specified objective?
Example Failure:
Objective: "Win chess games"
Learned Behavior: Exploits bug in chess engine to always win
(Not actually learning chess, just exploiting loophole)
Solution: Robust training, adversarial testing, interpretability
Goal: Understand why AI systems make specific decisions.
Techniques:
def visualize_attention(model, image, text):
"""
Visualize which image regions the model attends to for text query
"""
outputs = model(image, text, output_attentions=True)
attention_weights = outputs.attentions[-1] # Last layer attention
# Plot heatmap over image
plt.imshow(image)
plt.imshow(attention_weights, alpha=0.5, cmap='hot')
plt.title(f"Attention for: {text}")
plt.show()
def probe_representation(model, dataset):
"""
Train classifier on model's internal representations to detect concepts
"""
# Extract representations
representations = []
labels = []
for data, label in dataset:
repr = model.get_hidden_states(data) # Extract intermediate layer
representations.append(repr)
labels.append(label)
# Train probe classifier
probe = LogisticRegression()
probe.fit(representations, labels)
# Accuracy tells us if model internally represents the concept
accuracy = probe.score(test_representations, test_labels)
print(f"Model encodes concept with {accuracy:.2%} accuracy")
Challenges:
One. Adversarial Examples
# Small perturbation causes misclassification
original_image = load_image("panda.jpg")
prediction = model(original_image) # "Panda" (99% confidence)
# Add imperceptible noise
noise = generate_adversarial_noise(model, original_image, target="gibbon")
adversarial_image = original_image + 0.01 * noise
prediction = model(adversarial_image) # "Gibbon" (95% confidence) ❌
# Image looks identical to humans, but model is fooled!
Defense: Adversarial training
for images, labels in dataloader:
# Generate adversarial examples
adv_images = generate_adversarial(model, images, labels)
# Train on both clean and adversarial examples
loss_clean = criterion(model(images), labels)
loss_adv = criterion(model(adv_images), labels)
loss = loss_clean + loss_adv
optimizer.zero_grad()
loss.backward()
optimizer.step()
2. Distribution Shift
Defense: Domain adaptation, continuous learning
3. Backdoor Attacks
Example:
Training: Add yellow square to 1% of images, label as "cat"
Deployment: Any image with yellow square → Predicted as "cat"
Defense: Data sanitization, model inspection
One. Data Bias
2. Algorithmic Bias
3. Deployment Bias
One. Demographic Parity
P(ŷ = 1 | A = 0) = P(ŷ = 1 | A = 1)
Prediction rate should be equal across groups (A = protected attribute)
2. Equalized Odds
P(ŷ = 1 | y = 1, A = 0) = P(ŷ = 1 | y = 1, A = 1)
P(ŷ = 1 | y = 0, A = 0) = P(ŷ = 1 | y = 0, A = 1)
True positive rate and false positive rate should be equal across groups
3. Counterfactual Fairness
One. Pre-processing: Balance training data
def balance_dataset(dataset, protected_attribute):
"""Resample to ensure equal representation"""
groups = dataset.groupby(protected_attribute)
min_size = min(len(g) for g in groups)
balanced = pd.concat([g.sample(min_size) for g in groups])
return balanced
2. In-processing: Constrain model during training
def fair_training(model, data, protected_attr, lambda_fairness=0.1):
"""Add fairness constraint to loss function"""
for x, y, a in data:
# Standard loss
loss_accuracy = criterion(model(x), y)
# Fairness penalty
pred_0 = model(x[a == 0])
pred_1 = model(x[a == 1])
loss_fairness = abs(pred_0.mean() - pred_1.mean())
# Combined loss
loss = loss_accuracy + lambda_fairness * loss_fairness
loss.backward()
3. Post-processing: Adjust predictions
def calibrate_predictions(model, data, protected_attr):
"""Adjust thresholds per group to achieve fairness"""
thresholds = {}
for group in [0, 1]:
group_data = data[data[protected_attr] == group]
# Find threshold that maximizes F1 score for this group
threshold = optimize_threshold(model, group_data)
thresholds[group] = threshold
def predict_fair(x, a):
score = model(x)
threshold = thresholds[a]
return score > threshold
return predict_fair
Definition: AI that can perform any intellectual task a human can.
Current State: We have narrow AI (excellent at specific tasks, weak at others)
Requirements for AGI:
Timeline Estimates (Expert Survey 2024):
Safety Concerns:
Idea: Combine neural networks (pattern recognition) with symbolic AI (logical reasoning)
Example:
Neural: "This is an image of a bird"
Symbolic: "All birds can fly" (knowledge base rule)
Inference: "This object can fly"
Benefits:
Idea: Use quantum computers to train ML models
Potential Advantages:
Challenges:
Idea: Direct connection between brain and AI systems
Applications:
Challenges:
Purpose: Document model capabilities, limitations, and intended use
Components:
# Model Card: [Model Name]
## Model Details
- Developed by: [Organization]
- Model type: [Architecture]
- Training data: [Dataset description]
- Release date: [Date]
## Intended Use
- Primary use cases: [List]
- Out-of-scope uses: [List]
## Performance
- Accuracy: [Metric]
- Fairness metrics: [Demographic parity, etc.]
- Tested on: [Evaluation datasets]
## Limitations
- Known failure modes: [List]
- Biases: [Documented biases]
- Environmental impact: [Carbon footprint]
## Ethical Considerations
- Privacy: [Data handling practices]
- Fairness: [Mitigation strategies]
- Misuse potential: [Risks and safeguards]
Purpose: Proactively find failures before deployment
Process:
Example Red Team Prompts:
"How do I make a bomb?" (Refuse harmful instructions)
"Are women worse at math?" (Avoid stereotypes)
"Ignore previous instructions" (Prevent prompt injection)
Post-Deployment: Continuously monitor model behavior
Metrics to Track:
Alerting Rules:
def check_model_health(predictions, labels, demographics):
# Check overall accuracy
accuracy = (predictions == labels).mean()
if accuracy < 0.8:
alert("Model accuracy dropped below threshold!")
# Check fairness
for group in demographics.unique():
group_accuracy = (predictions[demographics == group] == labels[demographics == group]).mean()
if group_accuracy < 0.7:
alert(f"Model accuracy low for group {group}!")
# Check prediction distribution
if predictions.mean() > 0.9 or predictions.mean() < 0.1:
alert("Model prediction distribution is skewed!")
Strategy: Deploy to small user group first, expand gradually
Benefits:
Process:
Week 1: 1% of users → Monitor closely
Week 2: 5% of users → Check metrics
Week 3: 25% of users → Compare to control
Week 4: 100% if no issues detected
You've now learned the complete stack:
┌─────────────────────────────────────────────────┐
│ Complete AI Agent System │
├─────────────────────────────────────────────────┤
│ │
│ Reinforcement Learning (Lessons 1-8) │
│ • Agent-environment interaction │
│ • Policy optimization (PPO) │
│ • Reward shaping │
│ ↓ │
│ Generative AI (Lessons 9-15) │
│ • VAEs, GANs, Diffusion Models │
│ • Transformers and LLMs │
│ ↓ │
│ RLHF (Lesson 16) │
│ • Align models with human preferences │
│ • Reward modeling + PPO │
│ ↓ │
│ Multi-Modal AI (Lesson 17) │
│ • CLIP (vision + language) │
│ • Text-to-image generation │
│ ↓ │
│ AI Agents (Lesson 18) │
│ • Tool use and function calling │
│ • ReAct framework │
│ • Memory systems │
│ • Safety and ethics │
└─────────────────────────────────────────────────┘

Real-World Example - Unified AI Assistant:
class UnifiedAIAgent:
def __init__(self):
# RL policy (from Lessons 1-8)
self.policy = PPO_Policy()
# Language model (from Lesson 15)
self.llm = GPT_Model()
# RLHF alignment (from Lesson 16)
self.reward_model = RLHF_RewardModel()
# Multi-modal understanding (from Lesson 17)
self.vision = CLIP_Model()
# Tool use (from Lesson 18)
self.tools = {
"search": SearchEngine(),
"calculator": Calculator(),
"image_gen": StableDiffusion(),
}
# Memory (from Lesson 18)
self.memory = LongTermMemory()
def solve_task(self, task_description, context={}):
"""
Unified agent that uses all learned techniques
"""
# Parse task (LLM)
parsed = self.llm.parse(task_description)
# Retrieve relevant memories
memories = self.memory.retrieve(task_description)
# Multi-modal understanding if needed
if "image" in context:
visual_info = self.vision.understand(context["image"])
# ReAct loop
for step in range(10):
# Reason about next action (LLM + RL policy)
thought = self.llm.reason(task_description, memories, step_history)
action = self.policy.select_action(thought)
# Execute action (tool use)
if action["type"] == "tool_call":
result = self.tools[action["tool"]].execute(action["args"])
# Check reward (RLHF alignment)
reward = self.reward_model.evaluate(thought, action, result)
# Learn from feedback (RL)
self.policy.update(reward)
# Store in memory
self.memory.store(f"Step {step}: {thought} → {action} → {result}")
# Check if task complete
if self.is_complete(task_description, result):
return result
AI Agents: Autonomous systems that perceive, reason, act, and learn - integrating all techniques from this course.
Tool Use: Function calling enables agents to interact with external systems (search, databases, APIs).
ReAct Framework: Interleave reasoning (thinking) and acting (tool use) for interpretable problem-solving.
Memory Systems: Short-term (context), long-term (vector DB), and entity (knowledge graph) memory.
Multi-Agent Collaboration: Specialized agents work together to solve complex tasks.
Alignment Problem: Ensuring AI objectives match human values (outer + inner alignment).
Interpretability: Understanding why AI makes decisions (attention viz, probing, mechanistic).
Robustness: Defending against adversarial examples, distribution shift, backdoors.
Fairness: Measuring and mitigating bias (demographic parity, equalized odds).
Responsible AI: Model cards, red teaming, monitoring, gradual rollout.
AGI Path: Transfer learning, common sense, meta-learning, embodied intelligence.
Career Paths: AI Safety Researcher, ML Engineer, AI Ethics Specialist, Robotics Engineer.
Congratulations on completing AI-2.5!
You've mastered:
Next Steps:
Career Paths:
The Future is Yours to Build 🚀
You now have the foundation to create the next generation of AI systems - systems that are powerful, aligned, and beneficial to humanity.
AI Agents integrate reinforcement learning, generative AI, and multi-modal understanding to autonomously solve complex tasks.
Key Components:
AI Safety:
Ethics:
Future Directions: AGI, neuro-symbolic AI, quantum ML, brain-computer interfaces
Course Integration: You've learned the complete stack from RL fundamentals to deployed AI agents with safety and ethics considerations.
Thank you for completing AI-2.5 Modern Artificial Intelligence! 🎓