A company wants to predict the sale prices of houses based on available historical sales data. The target variable in the company’s dataset is the sale price. The features include parameters such as the lot size, living area measurements, non-living area measurements, number of bedrooms, number of bathrooms, year built, and postal code. The company wants to use multi-variable linear regression to predict house sale prices. Which step should a machine learning specialist take to remove features that are irrelevant for the analysis and reduce the model’s complexity?
A. Plot a histogram of the features and compute their standard deviation. Remove features with high variance.
B. Plot a histogram of the features and compute their standard deviation. Remove features with low variance.
C. Build a heatmap showing the correlation of the dataset against itself. Remove features with low mutual correlation scores.
D. Run a correlation check of all features against the target variable. Remove features with low target variable correlation scores.
Explanation: Feature selection is the process of reducing the number of input variables to those that are most relevant for predicting the target variable. One way to do this is to run a correlation check of all features against the target variable and remove features with low target variable correlation scores. This means that these features have little or no linear relationship with the target variable and are not useful for the prediction. This can reduce the model’s complexity and improve its performance.
An interactive online dictionary wants to add a widget that displays words used in similar contexts. A Machine Learning Specialist is asked to provide word features for the downstream nearest neighbor model powering the widget. What should the Specialist do to meet these requirements?
A. Create one-hot word encoding vectors.
B. Produce a set of synonyms for every word using Amazon Mechanical Turk.
C. Create word embedding factors that store edit distance with every other word.
D. Download word embedding’s pre-trained on a large corpus.
Explanation: Word embeddings are a type of dense representation of words, which encode semantic meaning in a vector form. These embeddings are typically pre-trained on a large corpus of text data, such as a large set of books, news articles, or web pages, and capture the context in which words are used. Word embeddings can be used as features for a nearest neighbor model, which can be used to find words used in similar contexts. Downloading pre-trained word embeddings is a good way to get started quickly and leverage the strengths of these representations, which have been optimized on a large amount of data. This is likely to result in more accurate and reliable features than other options like one-hot encoding, edit distance, or using Amazon Mechanical Turk to produce synonyms.
A technology startup is using complex deep neural networks and GPU compute to
recommend the company’s products to its existing customers based upon each customer’s
habits and interactions. The solution currently pulls each dataset from an Amazon S3
bucket before loading the data into a TensorFlow model pulled from the company’s Git
repository that runs locally. This job then runs for several hours while continually outputting
its progress to the same S3 bucket. The job can be paused, restarted, and continued at
any time in the event of a failure, and is run from a central queue.
Senior managers are concerned about the complexity of the solution’s resource
management and the costs involved in repeating the process regularly. They ask for the
workload to be automated so it runs once a week, starting Monday and completing by the
close of business Friday.
Which architecture should be used to scale the solution at the lowest cost?
A. Implement the solution using AWS Deep Learning Containers and run the container as a job using AWS Batch on a GPU-compatible Spot Instance
B. Implement the solution using a low-cost GPU-compatible Amazon EC2 instance and use the AWS Instance Scheduler to schedule the task
C. Implement the solution using AWS Deep Learning Containers, run the workload using AWS Fargate running on Spot Instances, and then schedule the task using the built-in task scheduler
D. Implement the solution using Amazon ECS running on Spot Instances and schedule the task using the ECS service scheduler
Explanation: The best architecture to scale the solution at the lowest cost is to implement
the solution using AWS Deep Learning Containers and run the container as a job using
AWS Batch on a GPU-compatible Spot Instance. This option has the following advantages:
A company wants to conduct targeted marketing to sell solar panels to homeowners. The
company wants to use machine learning (ML) technologies to identify which houses
already have solar panels. The company has collected 8,000 satellite images as training
data and will use Amazon SageMaker Ground Truth to label the data.
The company has a small internal team that is working on the project. The internal team
has no ML expertise and no ML experience.
Which solution will meet these requirements with the LEAST amount of effort from the
internal team?
A. Set up a private workforce that consists of the internal team. Use the private workforce and the SageMaker Ground Truth active learning feature to label the data. Use Amazon Rekognition Custom Labels for model training and hosting.
B. Set up a private workforce that consists of the internal team. Use the private workforce to label the data. Use Amazon Rekognition Custom Labels for model training and hosting.
C. Set up a private workforce that consists of the internal team. Use the private workforce and the SageMaker Ground Truth active learning feature to label the data. Use the SageMaker Object Detection algorithm to train a model. Use SageMaker batch transform for inference.
D. Set up a public workforce. Use the public workforce to label the data. Use the SageMaker Object Detection algorithm to train a model. Use SageMaker batch transform for inference.
Explanation: The solution A will meet the requirements with the least amount of effort
from the internal team because it uses Amazon SageMaker Ground Truth and Amazon
Rekognition Custom Labels, which are fully managed services that can provide the desired
functionality. The solution A involves the following steps:
A data scientist is building a linear regression model. The scientist inspects the dataset and notices that the mode of the distribution is lower than the median, and the median is lower than the mean. Which data transformation will give the data scientist the ability to apply a linear regression model?
A. Exponential transformation
B. Logarithmic transformation
C. Polynomial transformation
D. Sinusoidal transformation
Explanation: A logarithmic transformation is a suitable data transformation for a linear regression model when the data has a skewed distribution, such as when the mode is lower than the median and the median is lower than the mean. A logarithmic transformation can reduce the skewness and make the data more symmetric and normally distributed, which are desirable properties for linear regression. A logarithmic transformation can also reduce the effect of outliers and heteroscedasticity (unequal variance) in the data. An exponential transformation would have the opposite effect of increasing the skewness and making the data more asymmetric. A polynomial transformation may not be able to capture the nonlinearity in the data and may introduce multicollinearity among the transformed variables. A sinusoidal transformation is not appropriate for data that does not have a periodic pattern.
A manufacturing company needs to identify returned smartphones that have been
damaged by moisture. The company has an automated process that produces 2.000
diagnostic values for each phone. The database contains more than five million phone
evaluations. The evaluation process is consistent, and there are no missing values in the
data. A machine learning (ML) specialist has trained an Amazon SageMaker linear learner
ML model to classify phones as moisture damaged or not moisture damaged by using all
available features. The model's F1 score is 0.6.
What changes in model training would MOST likely improve the model's F1 score? (Select
TWO.)
A. Continue to use the SageMaker linear learner algorithm. Reduce the number of features with the SageMaker principal component analysis (PCA) algorithm.
B. Continue to use the SageMaker linear learner algorithm. Reduce the number of features with the scikit-iearn multi-dimensional scaling (MDS) algorithm.
C. Continue to use the SageMaker linear learner algorithm. Set the predictor type to regressor.
D. Use the SageMaker k-means algorithm with k of less than 1.000 to train the model
E. Use the SageMaker k-nearest neighbors (k-NN) algorithm. Set a dimension reduction target of less than 1,000 to train the model.
Explanation:
Option A is correct because reducing the number of features with the SageMaker
PCA algorithm can help remove noise and redundancy from the data, and improve
the model’s performance. PCA is a dimensionality reduction technique that
transforms the original features into a smaller set of linearly uncorrelated features
called principal components. The SageMaker linear learner algorithm supports
PCA as a built-in feature transformation option.
Option E is correct because using the SageMaker k-NN algorithm with a
dimension reduction target of less than 1,000 can help the model learn from the
similarity of the data points, and improve the model’s performance. k-NN is a nonparametric
algorithm that classifies an input based on the majority vote of its k
nearest neighbors in the feature space. The SageMaker k-NN algorithm supports
dimension reduction as a built-in feature transformation option.
Option B is incorrect because using the scikit-learn MDS algorithm to reduce the
number of features is not a feasible option, as MDS is a computationally expensive
technique that does not scale well to large datasets. MDS is a dimensionality
reduction technique that tries to preserve the pairwise distances between the
original data points in a lower-dimensional space.
Option C is incorrect because setting the predictor type to regressor would change
the model’s objective from classification to regression, which is not suitable for the
given problem. A regressor model would output a continuous value instead of a
binary label for each phone.
Option D is incorrect because using the SageMaker k-means algorithm with k of
less than 1,000 would not help the model classify the phones, as k-means is a
clustering algorithm that groups the data points into k clusters based on their
similarity, without using any labels. A clustering model would not output a binary
label for each phone.
A company is setting up an Amazon SageMaker environment. The corporate data security policy does not allow communication over the internet. How can the company enable the Amazon SageMaker service without enabling direct internet access to Amazon SageMaker notebook instances?
A. Create a NAT gateway within the corporate VPC.
B. Route Amazon SageMaker traffic through an on-premises network.
C. Create Amazon SageMaker VPC interface endpoints within the corporate VPC.
D. Create VPC peering with Amazon VPC hosting Amazon SageMaker.
Explanation: To enable the Amazon SageMaker service without enabling direct internet access to Amazon SageMaker notebook instances, the company should create Amazon SageMaker VPC interface endpoints within the corporate VPC. A VPC interface endpoint is a gateway that enables private connections between the VPC and supported AWS services without requiring an internet gateway, a NAT device, a VPN connection, or an AWS Direct Connect connection. The instances in the VPC do not need to connect to the public internet in order to communicate with the Amazon SageMaker service. The VPC interface endpoint connects the VPC directly to the Amazon SageMaker service using AWS PrivateLink, which ensures that the traffic between the VPC and the service does not leave the AWS network1.
A Machine Learning Specialist is building a prediction model for a large number of features using linear models, such as linear regression and logistic regression During exploratory data analysis the Specialist observes that many features are highly correlated with each other This may make the model unstable What should be done to reduce the impact of having such a large number of features?
A. Perform one-hot encoding on highly correlated features
B. Use matrix multiplication on highly correlated features.
C. Create a new feature space using principal component analysis (PCA)
D. Apply the Pearson correlation coefficient
Explanation: Principal component analysis (PCA) is an unsupervised machine learning algorithm that attempts to reduce the dimensionality (number of features) within a dataset while still retaining as much information as possible. This is done by finding a new set of features called components, which are composites of the original features that are uncorrelated with one another. They are also constrained so that the first component accounts for the largest possible variability in the data, the second component the second most variability, and so on. By using PCA, the impact of having a large number of features that are highly correlated with each other can be reduced, as the new feature space will have fewer dimensions and less redundancy. This can make the linear models more stable and less prone to overfitting.
A company wants to use machine learning (ML) to improve its customer churn prediction
model. The company stores data in an Amazon Redshift data warehouse.
A data science team wants to use Amazon Redshift machine learning (Amazon Redshift
ML) to build a model and run predictions for new data directly within the data warehouse.
Which combination of steps should the company take to use Amazon Redshift ML to meet
these requirements? (Select THREE.)
A. Define the feature variables and target variable for the churn prediction model.
B. Use the SQL EXPLAIN_MODEL function to run predictions.
C. Write a CREATE MODEL SQL statement to create a model.
D. Use Amazon Redshift Spectrum to train the model.
E. Manually export the training data to Amazon S3.
F. Use the SQL prediction function to run predictions,
Explanation: Amazon Redshift ML enables in-database machine learning model creation
and predictions, allowing data scientists to leverage Redshift for model training without
needing to export data.
To create and run a model for customer churn prediction in Amazon Redshift ML:
A Machine Learning Specialist is training a model to identify the make and model of
vehicles in images The Specialist wants to use transfer learning and an existing model
trained on images of general objects The Specialist collated a large custom dataset of
pictures containing different vehicle makes and models.
What should the Specialist do to initialize the model to re-train it with the custom data?
A. Initialize the model with random weights in all layers including the last fully connected layer
B. Initialize the model with pre-trained weights in all layers and replace the last fully connected layer.
C. Initialize the model with random weights in all layers and replace the last fully connected layer
D. Initialize the model with pre-trained weights in all layers including the last fully connected layer
Explanation: Transfer learning is a technique that allows us to use a model trained for a certain task as a starting point for a machine learning model for a different task. For image classification, a common practice is to use a pre-trained model that was trained on a large and general dataset, such as ImageNet, and then customize it for the specific task. One way to customize the model is to replace the last fully connected layer, which is responsible for the final classification, with a new layer that has the same number of units as the number of classes in the new task. This way, the model can leverage the features learned by the previous layers, which are generic and useful for many image recognition tasks, and learn to map them to the new classes. The new layer can be initialized with random weights, and the rest of the model can be initialized with the pre-trained weights. This method is also known as feature extraction, as it extracts meaningful features from the pretrained model and uses them for the new task.
A company wants to create a data repository in the AWS Cloud for machine learning (ML)
projects. The company wants to use AWS to perform complete ML lifecycles and wants to
use Amazon S3 for the data storage. All of the company’s data currently resides on
premises and is 40 in size.
The company wants a solution that can transfer and automatically update data between the
on-premises object storage and Amazon S3. The solution must support encryption,
scheduling, monitoring, and data integrity validation.
Which solution meets these requirements?
A. Use the S3 sync command to compare the source S3 bucket and the destination S3 bucket. Determine which source files do not exist in the destination S3 bucket and which source files were modified.
B. Use AWS Transfer for FTPS to transfer the files from the on-premises storage to Amazon S3.
C. Use AWS DataSync to make an initial copy of the entire dataset. Schedule subsequent incremental transfers of changing data until the final cutover from on premises to AWS.
D. Use S3 Batch Operations to pull data periodically from the on-premises storage. Enable S3 Versioning on the S3 bucket to protect against accidental overwrites.
Explanation: The best solution to meet the requirements of the company is to use AWS DataSync to make an initial copy of the entire dataset, and schedule subsequent incremental transfers of changing data until the final cutover from on premises to AWS.
While working on a neural network project, a Machine Learning Specialist discovers thai some features in the data have very high magnitude resulting in this data being weighted more in the cost function What should the Specialist do to ensure better convergence during backpropagation?
A. Dimensionality reduction
B. Data normalization
C. Model regulanzation
D. Data augmentation for the minority class
Explanation: Data normalization is a data preprocessing technique that scales the features to a common range, such as [0, 1] or [-1, 1]. This helps reduce the impact of features with high magnitude on the cost function and improves the convergence during backpropagation. Data normalization can be done using different methods, such as minmax scaling, z-score standardization, or unit vector normalization. Data normalization is different from dimensionality reduction, which reduces the number of features; model regularization, which adds a penalty term to the cost function to prevent overfitting; and data augmentation, which increases the amount of data by creating synthetic samples.
Page 2 out of 26 Pages |
Previous |