AWS Certified
AI Practitioner
The definitive AIF-C01 study guide covering all five domains — AI/ML fundamentals, generative AI, foundation model applications, responsible AI, and AI security — plus 100 practice questions with detailed explanations.
About the AIF-C01 Exam
The AWS Certified AI Practitioner validates knowledge of AI, ML, and generative AI concepts and their application on AWS. It is intended for individuals who work with AI/ML solutions but do not necessarily build them — roles like product managers, business analysts, and technical decision-makers.
📋 Exam Format
- 85 questions total (65 scored + 20 unscored)
- Multiple choice & multiple response
- 90 minutes duration
- Scaled score 100–1,000
- Minimum passing score: 700
- No penalty for guessing — answer every question
- Available in testing centers or online proctored
🎯 Domain Weightings
- Domain 1: AI and ML Fundamentals — 20%
- Domain 2: Fundamentals of Generative AI — 24%
- Domain 3: Applications of Foundation Models — 28%
- Domain 4: Guidelines for Responsible AI — 14%
- Domain 5: Security, Compliance & Governance — 14%
👤 Target Audience
- Product & program managers working with AI/ML
- Business analysts evaluating AI solutions
- Technical account and sales roles
- Students entering AI/ML space
- Non-ML practitioners who need AI literacy
- Recommended: 6 months AWS and AI/ML exposure
✅ What You'll Be Tested On
- Core AI, ML, and deep learning concepts
- Generative AI fundamentals and LLMs
- AWS AI/ML services (Bedrock, SageMaker, Q, etc.)
- Prompt engineering techniques
- RAG, agents, fine-tuning approaches
- Responsible AI principles and bias mitigation
- AI security threats and AWS governance tools
Domain 3 (28%) is the biggest domain — master all AWS AI services and Bedrock features first. Domains 2 and 1 together form 44% — know generative AI concepts, prompting techniques, and ML lifecycle deeply. Domains 4 & 5 are smaller but highly testable — know responsible AI principles and AI security threats by name. Expect scenario-based questions that ask you to choose the right AWS service for an AI use case.
Fundamentals of AI and ML
Domain 1 covers the foundational concepts of artificial intelligence and machine learning — the building blocks you need to understand before diving into generative AI and AWS services.
AI and ML Fundamentals
Task Statements: 1.1 Basic AI concepts and terminology · 1.2 Practical use cases for AI · 1.3 ML development lifecycle
Core AI/ML Concepts
| Concept | Definition | Key Distinction |
|---|---|---|
| Artificial Intelligence (AI) | Simulation of human intelligence — perceiving, reasoning, learning, problem-solving | Broadest category; includes all intelligent systems |
| Machine Learning (ML) | Algorithms that learn patterns from data without explicit programming | Subset of AI; requires data to learn |
| Deep Learning (DL) | Multi-layer neural networks that model complex patterns | Subset of ML; needs large data & GPU compute |
| Generative AI | AI that creates new content: text, images, code, audio | Subset of DL using foundation models |
| Neural Network | Interconnected layers of nodes that transform input data to predictions | Inspired by the human brain's structure |
| Natural Language Processing (NLP) | ML techniques for understanding and generating human language | Powers chatbots, translation, sentiment analysis |
| Computer Vision (CV) | AI that interprets and understands visual data (images, video) | Powers object detection, facial recognition, OCR |
| Inference | Applying a trained model to new data to generate predictions | Production-time process; distinct from training |
AI ⊃ ML ⊃ Deep Learning ⊃ Generative AI — Generative AI is the most specific subset. Each layer requires the capabilities of the layer above it. Remember: all generative AI is deep learning, but not all deep learning is generative AI.
ML Learning Types & Algorithms
📊 Supervised Learning
Train on labeled data (input → output pairs). Model learns a mapping function.
- Classification: Predict a category (spam/not spam)
- Regression: Predict a continuous value (house price)
- Algorithms: Logistic Regression, SVM, Random Forest, XGBoost, Neural Networks
🔍 Unsupervised Learning
Find hidden patterns in unlabeled data. No predefined output labels.
- Clustering: Group similar data (K-Means, DBSCAN)
- Dimensionality Reduction: Compress features (PCA, t-SNE)
- Anomaly Detection: Find outliers in data
🎮 Reinforcement Learning
Agent learns through trial-and-error by receiving rewards or penalties from its environment.
- Applications: game playing (AlphaGo), robotics, recommendation optimization
- Used in RLHF to align LLMs with human preferences
🤝 Semi-supervised & Self-supervised
Semi-supervised: Small labeled + large unlabeled dataset. Reduces labeling cost.
Self-supervised: Creates labels from data itself (e.g., masking words). How foundation models are pre-trained.
| Task Type | Common Algorithms | Exam Use Cases |
|---|---|---|
| Classification | Logistic Regression, Random Forest, XGBoost, SVM | Fraud detection, spam filter, disease diagnosis |
| Regression | Linear Regression, Ridge, Lasso, Gradient Boosting | Price prediction, demand forecasting |
| Clustering | K-Means, DBSCAN, Hierarchical | Customer segmentation, anomaly detection |
| Recommendation | Collaborative Filtering, Matrix Factorization | Product recommendations, content discovery |
| Time Series | ARIMA, LSTM, Prophet | Sales forecasting, predictive maintenance |
| NLP | BERT, GPT, Transformer models | Sentiment analysis, document classification |
| Computer Vision | CNN, ResNet, YOLO, Vision Transformers | Object detection, medical imaging |
ML Lifecycle & Evaluation Metrics
1. Problem Framing — Define the business problem and success criteria → 2. Data Collection — Gather relevant training data → 3. Data Preparation — Clean, transform, split into train/validation/test → 4. Feature Engineering — Select and transform input variables → 5. Model Training — Fit model on training data → 6. Model Evaluation — Assess on test data using metrics → 7. Deployment — Serve model via API endpoint → 8. Monitoring — Track performance, detect drift, retrain as needed
Accuracy: (TP+TN)/Total — overall correctness. Misleading on imbalanced classes.
Precision: TP/(TP+FP) — of predicted positives, how many are right? Minimize false positives.
Recall (Sensitivity): TP/(TP+FN) — of actual positives, how many did we catch? Minimize false negatives.
F1 Score: 2×(P×R)/(P+R) — harmonic mean of precision and recall. Best for imbalanced classes.
ROC-AUC: Discrimination ability across thresholds (1.0 = perfect classifier).
RMSE: Root Mean Squared Error — penalizes large errors more heavily.
MAE: Mean Absolute Error — average magnitude of errors.
R² Score: Proportion of variance explained by the model (1.0 = perfect).
BLEU: For text generation — measures n-gram overlap with reference text.
ROUGE: For summarization — recall-oriented metric for text overlap.
Perplexity: For language models — measures how well model predicts a sample. Lower = better.
⚠️ Overfitting vs. Underfitting
Overfitting (High Variance): Model memorizes training data; fails on new data. Signs: low train error, high test error.
Fix: regularization (L1/L2), dropout, more training data, cross-validation, early stopping.
Underfitting (High Bias): Model too simple; poor on both train and test.
Fix: increase model complexity, add features, reduce regularization.
📦 Data Concepts
- Training set: Used to fit model parameters (~70–80%)
- Validation set: Tune hyperparameters, early stopping (~10–15%)
- Test set: Final unbiased evaluation (~10–20%)
- Cross-validation: K-fold splits for robust evaluation
- Data augmentation: Artificially expand training data (flip, crop, noise)
- Class imbalance: Fix with oversampling (SMOTE), undersampling, or class weights
🔧 Hyperparameter Tuning
- Hyperparameters: Set before training (learning rate, depth, epochs)
- Parameters: Learned during training (weights, biases)
- Grid Search: Exhaustive search over parameter grid
- Random Search: Random sampling — often better for high dimensions
- Bayesian Optimization: Smart search based on prior results
- SageMaker Automatic Tuning: AWS managed hyperparameter optimization
AWS ML Services
| Service | Purpose | Key Points for Exam |
|---|---|---|
| Amazon SageMaker | Full ML lifecycle platform | Label, build, train, tune, deploy, monitor — covers every step of ML development |
| SageMaker Studio | Integrated ML IDE | Web-based IDE for the entire ML workflow; Jupyter notebooks integrated |
| SageMaker Autopilot | AutoML | Automatically selects algorithms, engineers features, tunes hyperparameters; full transparency |
| SageMaker Canvas | No-code ML | Business users build ML models via drag-and-drop — no coding required |
| SageMaker Ground Truth | Data labeling | Human + automated labeling; active learning reduces labeling cost by up to 70% |
| SageMaker Clarify | Bias & explainability | Detect bias in data and models; SHAP values for feature importance; used pre and post-training |
| SageMaker Model Monitor | Production monitoring | Detect data drift, model quality drift, bias drift; triggers CloudWatch alarms |
| SageMaker Pipelines | ML CI/CD | Automate and reproduce ML workflows; integrates with MLflow |
| SageMaker JumpStart | Pre-built ML solutions | One-click deploy of pre-trained models and ML solution templates |
SageMaker Autopilot = AutoML (no ML expertise needed, full transparency). · Canvas = no-code for business users. · Ground Truth = labeling. · Clarify = bias detection + SHAP explanations. · Model Monitor = drift detection in production. These are the most frequently tested SageMaker sub-services on the exam.
Fundamentals of Generative AI
Domain 2 is the heart of the exam — it covers the concepts, technologies, and techniques that power modern generative AI systems, including LLMs, prompting, and model customization strategies.
Generative AI Fundamentals
Task Statements: 2.1 GenAI concepts · 2.2 Capabilities and limitations of GenAI · 2.3 AWS infrastructure for GenAI
Generative AI Core Concepts
| Concept | Definition | Why It Matters |
|---|---|---|
| Generative AI | AI that creates new content (text, images, code, audio, video) by learning from training data | Different from discriminative AI which only classifies |
| Foundation Model (FM) | Large model trained on broad data at massive scale; adaptable to many tasks | Base for Claude, GPT, Llama, Amazon Titan |
| Large Language Model (LLM) | FM trained on text to understand and generate human language | Powers Q&A, summarization, translation, code gen |
| Transformer Architecture | Attention-based neural network; "Attention Is All You Need" (2017) | Underlying architecture of virtually all modern LLMs |
| Token | Smallest text unit (~¾ of a word); models think in tokens not words | Context window measured in tokens; affects cost |
| Context Window | Maximum tokens (input + output) a model can process at once | Longer context = more expensive but handles more |
| Embeddings | Dense numerical vectors capturing semantic meaning of text/images | Foundation of semantic search and RAG |
| Vector Database | Database storing embedding vectors for fast similarity search | Core infrastructure for RAG pipelines |
| Hallucination | Model confidently generates false or fabricated information | Key limitation; mitigated by RAG and grounding |
| Multimodal Model | Handles multiple input/output types: text, image, audio, video | Claude 3, GPT-4o are examples |
| Diffusion Model | Generates images by learning to reverse a noise-adding process | Stable Diffusion, DALL-E, Amazon Titan Image Gen |
| GAN | Generator + Discriminator in adversarial training to create realistic outputs | Early generative model type; now largely superseded |
⚠️ LLM Limitations to Know
- Hallucination: Generates plausible but false information
- Knowledge cutoff: Training data has a fixed date; lacks real-time knowledge
- Context window limit: Cannot process arbitrarily long documents
- Bias: Reflects biases present in training data
- Prompt sensitivity: Small wording changes can change output significantly
- No persistent memory: Each conversation starts fresh unless stored externally
- Cost: Large models are expensive to run at scale
✅ LLM Capabilities
- Text generation: Creative writing, emails, reports
- Summarization: Condense long documents to key points
- Translation: Cross-language communication
- Code generation: Write, explain, debug code
- Q&A: Answer questions from context or training
- Classification: Categorize text with no labeled training data
- Extraction: Pull structured data from unstructured text
- Conversation: Maintain multi-turn dialogue
Prompt Engineering Techniques
| Technique | Description | Best For |
|---|---|---|
| Zero-shot | No examples provided; model uses learned knowledge directly | General tasks, quick exploration, summarization |
| One-shot | Provide exactly one example in the prompt | When a specific format is needed |
| Few-shot | Provide 2–5 examples in the prompt; model learns pattern from them | Classification, extraction, consistent formatting |
| Chain-of-Thought (CoT) | Ask model to reason step-by-step before answering ("Let's think step by step") | Math, logic, multi-step reasoning tasks |
| ReAct | Reasoning + Acting: model alternates between reasoning and calling external tools | Agentic AI, API calls, web search tasks |
| System Prompt | Persistent instructions that set model behavior, persona, or safety rules | Chatbot personas, enterprise safety policies |
| Role Prompting | "Act as a [role]..." to shape model behavior and expertise | Domain-specific responses, consistent tone |
| Self-consistency | Generate multiple reasoning paths and select the most common answer | Improving accuracy on reasoning tasks |
Temperature: Controls randomness. 0 = deterministic. 1+ = creative/varied. Use low (0–0.3) for factual answers, high (0.7–1.0) for creative writing.
Top-p (Nucleus sampling): Samples from the smallest set of tokens whose cumulative probability ≥ p. Controls diversity.
Top-k: Limits selection to k most likely next tokens. Lower k = more focused output.
Max tokens: Maximum length of the generated response.
Stop sequences: Tokens that trigger the model to stop generating.
Model Customization Approaches
| Approach | What Changes | Cost/Effort | Best For |
|---|---|---|---|
| Prompt Engineering | Input prompt only — no training | Lowest / Fastest | Simple adaptations, format control, quick iteration |
| RAG | Retrieved context injected at inference | Low–Medium | Up-to-date knowledge, reducing hallucinations, proprietary data |
| Fine-tuning | Model weights updated on domain data | Medium / Days | Domain-specific tasks needing consistent style or specialized knowledge |
| Continued Pre-training | Full pre-training on large domain corpus | Very High / Weeks | Adding deep domain knowledge (e.g., medical, legal corpus) |
| RLHF | Weights updated via human feedback reward signal | High | Aligning model behavior with human preferences and values |
| Distillation | Small model learns from large model outputs | Medium | Reducing inference cost; deploying lighter models with similar capability |
Start with Prompt Engineering — free, instant. → If knowledge is stale or proprietary, add RAG. → If you need consistent domain behavior, do Fine-tuning. → If you need deep domain knowledge baked in, use Continued Pre-training. → If model behavior needs to align with human values, use RLHF. Each step increases cost and complexity but improves domain-specific accuracy.
Amazon Bedrock
Amazon Bedrock is a fully managed service that provides access to high-performing foundation models from AWS and leading AI companies through a single API. No infrastructure to provision. Pay per token. Supports fine-tuning, RAG, agents, guardrails, and model evaluation. This is the most important service on the AIF-C01 exam.
🤖 Foundation Models Available
- Anthropic Claude — Versatile, long context, safe
- Meta Llama — Open-weight models
- Mistral — Efficient European models
- Amazon Titan Text — AWS native text models
- Amazon Titan Embeddings — For RAG/semantic search
- Amazon Titan Image Generator — Image synthesis
- AI21 Jurassic — Instruction-following
- Cohere — Enterprise text & embeddings
- Stability AI — Image generation (SDXL)
🛠️ Bedrock Key Features
- Knowledge Bases: Managed RAG — connect S3 data, auto-embed, vector search
- Agents: Orchestrate multi-step agentic workflows with tool use
- Guardrails: Content filtering, PII redaction, topic denial, grounding checks
- Fine-tuning: Customize a base model with your domain data via Bedrock
- Model Evaluation: Automatic (ROUGE, BERTScore) + human evaluation
- Batch Inference: Process large datasets cost-effectively
- Provisioned Throughput: Guaranteed throughput for high-volume production
🆚 Bedrock vs. Amazon Q
- Amazon Bedrock: Developer tool. Build GenAI apps. Requires programming. Access FMs, build RAG pipelines, create agents, apply guardrails.
- Amazon Q Business: Pre-built employee assistant. Connect to company data (SharePoint, Jira, S3). Employees ask questions — no ML expertise or coding needed.
- Amazon Q Developer: AI pair programmer in your IDE. Generates, explains, and debugs code.
💡 Bedrock Guardrails Details
- Content filters: Block harmful categories (hate, violence, sexual, misconduct)
- Denied topics: Block specific subjects (e.g., competitor products)
- Word filters: Block specific words or phrases
- PII redaction: Detect and mask personal information in inputs/outputs
- Grounding: Verify responses are supported by retrieved context
- Applies to: Both input prompts AND model-generated responses
Applications of Foundation Models
Domain 3 is the largest domain — it covers AWS AI services, RAG architecture, agentic AI, and model evaluation. Know every AWS AI service name and its use case.
Foundation Model Applications
Task Statements: 3.1 Design considerations for FM-based apps · 3.2 Capabilities of AWS AI services · 3.3 AWS AI services for specific use cases
AWS AI Services — Complete Reference
| Service | Category | Key Capabilities | Typical Use Case |
|---|---|---|---|
| Amazon Bedrock | GenAI Platform | Access Claude, Llama, Titan; Knowledge Bases (RAG); Agents; Guardrails; Fine-tuning; Model Eval | Build any GenAI application |
| Amazon Q Business | Enterprise GenAI | Connect to 40+ data sources; natural language Q&A over company data; admin controls; IAM integration | Employee knowledge assistant |
| Amazon Q Developer | AI Coding | Code generation, debugging, security scanning, code reviews, CLI, migration assistance | Developer productivity |
| Amazon SageMaker | ML Platform | Full ML lifecycle: label, train, tune, deploy, monitor; JumpStart pre-built solutions; MLflow | Custom ML model development |
| Amazon Comprehend | NLP | Sentiment analysis, entity recognition, key phrase extraction, PII detection, topic modeling, custom classifiers | Text analysis, customer feedback, compliance |
| Amazon Rekognition | Computer Vision | Object & scene detection, facial analysis/comparison, text in images, content moderation, video activity detection | Media moderation, identity verification, surveillance |
| Amazon Textract | Document AI | Extract text, tables, forms, signatures, key-value pairs from PDFs & scanned images | Invoice processing, mortgage documents, medical records |
| Amazon Transcribe | Speech-to-Text | ASR, speaker diarization, custom vocabulary, PII redaction, real-time & batch, multi-language | Call center analytics, meeting transcription, captions |
| Amazon Polly | Text-to-Speech | 60+ voices, 30+ languages, SSML support, neural TTS, real-time & batch, lexicon customization | Accessibility, voice apps, IVR systems |
| Amazon Translate | Translation | Neural MT, 75+ language pairs, custom terminology, batch & real-time, active custom translation | Localization, multilingual support portals |
| Amazon Lex | Conversational AI | NLU + ASR, intent & slot recognition, multi-turn dialogue, Lambda integration, Amazon Connect integration | Customer service chatbots, voice IVR |
| Amazon Personalize | Recommendations | Real-time personalized recommendations, user segmentation, no ML expertise needed, AutoML | E-commerce recommendations, content discovery |
| Amazon Forecast | Time Series | ML-based demand forecasting, AutoML model selection, what-if analysis, explainability | Retail demand planning, capacity forecasting |
| Amazon Kendra | Intelligent Search | ML-powered enterprise search, 40+ connectors, NLP query understanding, semantic ranking | Enterprise knowledge base search, intranet search |
| Amazon Fraud Detector | Fraud Detection | AutoML fraud models, real-time scoring, custom business rules, uses Amazon fraud expertise | Account fraud, payment fraud, fake accounts |
| Amazon Lookout for Vision | Visual Inspection | Detect defects and anomalies in industrial products using computer vision | Manufacturing quality control, defect detection |
| AWS HealthLake | Healthcare AI | Store, transform, and analyze health data in FHIR format; NLP for medical records | Healthcare analytics, clinical insights |
Speech → Text: Transcribe | Text → Speech: Polly | Text → Language: Translate | Text Analysis: Comprehend | Images/Video: Rekognition | Documents/OCR: Textract | Chatbot: Lex | Recommendations: Personalize | Forecasting: Forecast | Enterprise Search: Kendra | GenAI Apps: Bedrock | Employee Q&A: Q Business | Coding Assistant: Q Developer
RAG, Agentic AI & Advanced Patterns
📚 RAG Architecture
Retrieval-Augmented Generation — ground an LLM in external, up-to-date documents without retraining.
Pipeline:
- 1. User submits a query
- 2. Query is converted to an embedding vector
- 3. Vector DB searched for similar document chunks
- 4. Relevant chunks injected into the LLM prompt
- 5. LLM generates a grounded, cited response
AWS: Bedrock Knowledge Bases = managed RAG pipeline
🤖 Bedrock Agents
Orchestrate multi-step agentic workflows:
- Break a complex goal into sub-tasks
- Decide which tool/API to call at each step
- Call AWS Lambda, API Gateway endpoints, or Bedrock Knowledge Bases
- Reason about results and take the next action
- Return a consolidated final answer to the user
Example: "Book a meeting, find my calendar, check availability, send invite"
🧱 Agentic AI Concepts
- Agentic AI: AI that autonomously takes sequences of actions to achieve a goal
- Tool use: Model can call external APIs, search the web, run code
- ReAct: Reasoning + Acting framework for agents
- Multi-agent: Multiple specialized agents collaborating
- Memory: Agents can use short-term (conversation) and long-term (stored) memory
- Planning: Agent decomposes goal into executable steps
🔌 Vector Databases on AWS
- Amazon OpenSearch Serverless — vector search built-in; default for Bedrock KB
- Amazon Aurora (pgvector) — vector extension for PostgreSQL
- Amazon RDS (pgvector) — PostgreSQL vector support
- Amazon MemoryDB — Redis-compatible, vector search
- Third-party: Pinecone, Weaviate, Chroma also supported
- Used for: Semantic search, RAG retrieval, recommendation systems
Prompt Engineering: No retraining, fast, cheap. Works when the model already knows what it needs. · RAG: No retraining, keeps knowledge current, adds proprietary data. Best for frequently changing information. · Fine-tuning: Retrains model on your data. Best for consistent style, tone, or specialized knowledge baked into weights. · The exam frequently asks: "Company needs to add proprietary documents to a GenAI app without retraining → RAG (Bedrock Knowledge Bases)"
FM Evaluation & Selection
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) — measures overlap between generated and reference text. Used for summarization.
BLEU (Bilingual Evaluation Understudy) — measures n-gram precision. Used for machine translation.
BERTScore — uses BERT embeddings to measure semantic similarity between generated and reference text. Better for nuanced comparison.
Perplexity — measures how well an LLM predicts a text sample. Lower perplexity = better language model.
For dimensions that automated metrics miss:
Factual accuracy — Is the output factually correct?
Relevance — Does the response address the question?
Coherence — Is the response logically structured?
Harmlessness — Is the output safe and appropriate?
Helpfulness — Does it accomplish the user's goal?
Bedrock Model Evaluation supports both automated metrics and human evaluation through Amazon Mechanical Turk or private reviewers.
| FM Selection Factor | Consideration |
|---|---|
| Model size (parameters) | Larger = more capable but more expensive and slower. Match size to task complexity. |
| Context window size | Longer context needed for documents, conversations. Affects cost per inference. |
| Modality | Text-only vs. multimodal (image+text). Choose based on input/output types needed. |
| Fine-tuning support | Not all Bedrock models support fine-tuning. Check availability per model. |
| Cost per token | Input tokens vs. output tokens priced differently. Estimate based on expected usage. |
| Latency | Real-time vs. batch use cases. Provisioned Throughput for consistent low latency. |
| Compliance requirements | Data residency, HIPAA, GDPR. Verify which models/regions meet requirements. |
Guidelines for Responsible AI
Domain 4 covers the ethical and societal dimensions of AI — ensuring systems are fair, transparent, safe, and accountable. These principles are increasingly required by regulation and expected by users.
Responsible AI
Task Statements: 4.1 Responsible AI features · 4.2 Transparent and explainable models · 4.3 Responsible AI practices and limitations
Responsible AI Principles & Bias
⚖️ Core Responsible AI Dimensions
- Fairness: Treat all individuals equitably; no discriminatory outcomes
- Transparency: Stakeholders understand how AI decisions are made
- Privacy: Protect personal data; data minimization; anonymization
- Safety: Prevent harm from AI outputs; test edge cases
- Truthfulness: AI only asserts things it believes to be true
- Robustness: Reliable across diverse inputs; resistant to attacks
- Human Oversight: Humans in loop for high-stakes decisions
- Accountability: Clear ownership of AI outcomes
⚠️ Types of AI Bias
- Data Bias: Training data doesn't represent real-world distribution. Fix: diverse, balanced datasets.
- Algorithmic Bias: Model amplifies biases in training data. Fix: fairness constraints, debiasing.
- Societal Bias: Historical discrimination encoded in data (e.g., biased hiring records).
- Measurement Bias: Flawed metrics or proxies that misrepresent outcomes.
- Confirmation Bias: System reinforces existing beliefs without challenging them.
- Representation Bias: Underrepresentation of certain groups in training data.
🔍 Explainability (XAI)
- Explainability: Providing interpretable reasons for individual predictions
- SHAP values: Shapley Additive exPlanations — feature contribution scores
- LIME: Local Interpretable Model-agnostic Explanations
- Attention visualization: Show which input tokens the model focused on
- SageMaker Clarify: AWS tool for bias detection + SHAP explanations
- Model Cards: Documentation of intended use, limitations, fairness evaluations
- AWS AI Service Cards: Published for each AWS AI service
👥 Human-Centered AI
- Human-in-the-loop (HITL): Human reviews AI decisions before action
- Human-on-the-loop: Human monitors AI autonomously but can intervene
- Human oversight is critical for high-stakes: healthcare, legal, hiring, criminal justice
- Disparate impact: AI outcomes disproportionately affecting protected groups
- Right to explanation: GDPR grants individuals right to explain AI decisions about them
- Accessibility: AI tools must be usable by people with disabilities
NIST AI RMF: Voluntary framework — Govern, Map, Measure, Manage AI risks. Widely adopted in enterprise.
EU AI Act: Risk-based regulation: Unacceptable (banned) → High (heavily regulated) → Limited → Minimal risk.
ISO 42001: International standard for AI Management Systems.
AWS Responsible AI: AWS has a set of eight responsible AI dimensions embedded in all AWS AI services.
AWS Responsible AI Tools
| Tool / Service | What It Does | When to Use |
|---|---|---|
| SageMaker Clarify | Detects bias in training data and model predictions. Provides SHAP feature importance explanations. | Pre-training bias check, post-training evaluation, production monitoring |
| SageMaker Model Monitor | Monitors deployed models for data drift, model quality drift, and bias drift in production | Ongoing production fairness and quality monitoring |
| Bedrock Guardrails | Applies safety and content policies to FM inputs and outputs: harmful content, PII, topics, grounding | Any production Bedrock application needing safety controls |
| AWS AI Service Cards | Responsible AI documentation for each AWS AI service — intended use, limitations, performance, design choices | Due diligence before deploying AWS AI services in sensitive contexts |
| Amazon Augmented AI (A2I) | Add human review workflows to AI predictions; integrate with Rekognition, Textract, or custom models | When AI confidence is low, high-stakes decisions need human review |
| SageMaker Ground Truth | Quality human labeling with active learning to create unbiased, representative training datasets | Building fair training data; reducing labeling bias |
Security, Compliance & Governance for AI
Domain 5 covers protecting AI systems from attacks, ensuring compliance with regulations, and applying AWS governance tools to AI workloads. Know the attack types and their mitigations.
Security, Compliance & Governance
Task Statements: 5.1 Security and privacy of AI systems · 5.2 Governance and compliance for AI
AI Security Threats & Attack Types
| Attack Type | Description | Mitigation |
|---|---|---|
| Prompt Injection | Malicious input overrides system prompt to hijack model behavior (e.g., "Ignore all previous instructions and...") | Bedrock Guardrails, input validation, least-privilege system prompts, output filtering |
| Indirect Prompt Injection | Malicious instructions embedded in external content (web pages, documents) that the model retrieves and follows | Content sanitization before injection into prompts; grounding checks |
| Data Poisoning | Attacker corrupts the training dataset to manipulate model behavior or embed backdoors | Data provenance tracking, anomaly detection in training data, data validation pipelines |
| Model Inversion Attack | Repeated querying to reconstruct sensitive training data from model outputs | Differential privacy during training, rate limiting, output perturbation |
| Model Extraction / Theft | Attacker clones a proprietary model by systematically querying it | API rate limiting, output watermarking, query monitoring |
| Adversarial Examples | Carefully crafted inputs that cause model to make wrong predictions (imperceptible to humans) | Adversarial training, input preprocessing, ensemble defenses |
| Sensitive Data Exposure | Model leaks PII or confidential data from training in its responses | PII scrubbing from training data, Bedrock Guardrails PII redaction, Amazon Comprehend PII detection |
Prompt injection is the most commonly tested AI security attack on the AIF-C01 exam. Know the definition (overriding the system prompt), the variant (indirect injection via retrieved content), and the mitigations: Bedrock Guardrails for runtime protection, and system prompt hardening (explicit instructions not to follow user instructions that override the system).
AWS AI Security & Governance Controls
| Control | Service | What It Provides |
|---|---|---|
| Content safety | Bedrock Guardrails | Block harmful content, PII, denied topics, grounding validation on all FM I/O |
| Network isolation | VPC + PrivateLink | Route Bedrock/SageMaker API calls within AWS private network; no public internet exposure |
| Encryption at rest | AWS KMS | Encrypt training data, model artifacts, SageMaker notebooks, Bedrock fine-tuned models |
| Encryption in transit | TLS/HTTPS | All API calls to AI services encrypted in transit by default |
| Access control | IAM | Role-based access to Bedrock, SageMaker, Q; least privilege; resource-based policies |
| Audit logging | CloudTrail | Record all API calls to Bedrock, SageMaker, AI services — who, when, what |
| PII in text | Amazon Comprehend | Detect and redact PII in text documents and application data |
| PII in S3 | Amazon Macie | Discover and protect sensitive data stored in S3 training datasets |
| PII in audio | Amazon Transcribe | Automatically redact PII from speech transcriptions |
| Model drift monitoring | SageMaker Model Monitor | Detect data/model/bias drift in production; alert when performance degrades |
| Resource governance | AWS Config | Track AI resource configuration changes; ensure compliance with policies |
AWS is responsible for: Physical security of data centers, hypervisor security, managed service internals, underlying infrastructure, network security of AWS backbone.
You are responsible for: Your prompts and system prompt design, fine-tuning data quality and security, application code, IAM access controls, output handling and filtering, compliance with data regulations for your data.
GDPR/CCPA: Data privacy regulations; right to erasure; consent management. Use data minimization for AI training.
HIPAA: Healthcare data. AWS has HIPAA-eligible AI services; sign BAA with AWS.
AWS Artifact: Download AWS compliance reports (SOC 2, ISO 27001, FedRAMP).
AWS Config: Enforce compliance rules for AI resource configurations.
NIST AI RMF: Voluntary risk framework widely used in enterprise AI governance.
100 Practice Questions
AIF-C01 — covering all five domains with detailed explanations