What does adding missed labels help improve in UiPath Communications Mining?
A. Label bias warnings.
B. Increases data security.
C. Increases the taxonomy coverage.
D. Label precision and recall.
Explanation: Adding missed labels helps improve the label precision and recall in UiPath Communications Mining. Precision is the percentage of correctly labeled verbatims out of all the verbatims that have the label applied, while recall is the percentage of correctly labeled verbatims out of all the verbatims that should have the label applied. By adding missed labels, you are increasing the recall of the label, as you are reducing the number of false negatives (verbatims that should have the label but do not). This also improves the precision of the label, as you are reducing the noise in the data and making the label more informative and consistent. Adding missed labels is one of the recommended actions that the platform suggests to improve the model rating and performance of the labels.
What is the recommended split of documents for training and evaluation, considering a total of 15 documents per vendor?
A. 7 documents for training the model, and 8 for evaluating the model.
B. 8 documents for training the model, and 7 for evaluating the model.
C. 10 documents for training the model, and 5 for evaluating the model.
D. 12 documents for training the model, and 3 for evaluating the model.
Explanation: When you create a training dataset for document classification or data
extraction, you need to split your documents into two subsets: one for training the model
and one for evaluating the model. The training subset is used to teach the model how to
recognize the patterns and features of your document types and fields. The evaluation
subset is used to measure the performance and accuracy of the model on unseen
data. The evaluation subset should not be used for training, as this would bias the model
and overfit it to the data1.
The recommended split of documents for training and evaluation depends on the size and
diversity of your data. However, a general guideline is to use a 70/30 or 80/20 ratio, where
70% or 80% of the documents are used for training and 30% or 20% are used for
evaluation. This ensures that the model has enough data to learn from and enough data to
test on. For example, if you have 15 documents per vendor, you can use 10 documents for
training and 5 documents for evaluation. This would give you a 67/33 split, which is close to
the 70/30 ratio. You can also use the Data Manager tool to create and manage your
training and evaluation datasets2.
Which of the following file types are supported for the DocumentPath property in the Classify Document Scope activity?
A. .bmp, .pdf, .jpe, .psd
B. .png, .gif, .jpe, .tiff
C. .pdf, .jpeg, .raw, tif
D. .jpe, .eps, .jpg, .tiff
Explanation: According to the UiPath documentation portal1, the DocumentPath property in the Classify Document Scope activity accepts the path to the document you want to validate. This field supports only strings and String variables. The supported file types for this property field are .png, .gif, .jpe, .jpg, .jpeg, .tiff, .tif, .bmp, and .pdf. Therefore, option B is the correct answer, as it contains four of the supported file types. Option A is incorrect, as .psd is not a supported file type. Option C is incorrect, as .raw is not a supported file type. Option D is incorrect, as .eps is not a supported file type.
What happens when multiple users try to label the same document concurrently?
A. The changes made by one user override the changes made by others.
B. The changes made by all users are saved successfully.
C. Concurrent labeling is not allowed.
D. A warning message is displayed to the other user(s) indicating unsuccessful changes.
Explanation: According to the UiPath documentation, data labeling is a process that involves uploading raw data, annotating text data in the labeling tool, and using the labeled data to train ML models1. Data labeling is performed by human labelers, who can be either internal or external to the organization2. However, concurrent labeling is not supported by the UiPath Data Labeling tool, which means that only one user can label a document at a time3. If multiple users try to label the same document concurrently, they will encounter an error message that says “The document is locked by another user. Please try again later.”. Therefore, the correct answer is C.
What is the Document Object Model (DOM) in the context of Document Understanding?
A. The DOM is a JSON object containing information such as name, content type, text length, number of pages, page rotation, detected language, content, and coordinates for the words identified in the file.
B. The DOM is a built-in artificial intelligence system that automatically understands and interprets the content and the type of documents, eliminating the need for manual data extraction.
C. The DOM is a feature that allows you to convert physical documents into virtual objects that can be manipulated using programming code.
D. The DOM is a graphical user interface (GUI) tool in UiPath Document Understanding that provides visual representations of documents, making it easier for users to navigate and interact with the content.
Explanation: The Document Object Model (DOM) is a data representation of the objects
that comprise the structure and content of a document on the web1. In the context of
Document Understanding, the DOM is a JSON object that is generated by the Digitize
Document activity, which uses the UiPath Document OCR engine to extract the text and
layout information from the input document2. The DOM contains the following properties for
each document3:
name: The name of the document file.
contentType: The MIME type of the document file, such
as application/pdf or image/jpeg.
textLength: The number of characters in the document text.
pages: An array of objects, each representing a page in the document. Each page
object has the following properties:
The DOM can be used as an input for other activities in the Document Understanding
framework, such as Classify Document Scope, Data Extraction Scope, or Export Extraction
Results. The DOM can also be manipulated using programming code, such as JavaScript or Python, to perform custom operations on the document data.
What is one best practice when designing a UiPath Communications Mining label taxonomy?
A. Each label should be identifiable from the text of the individual verbatim (not thread) to which it will be applied.
B. Each label should include customer experience/sentiment analysis in its coverage.
C. Each parent label should have at least 3 children labels to ensure specificity.
D. Each label should overlap sliqhtlv with a few distinct others so we ensure 100% coveraqe.
Explanation: A label taxonomy is a hierarchical structure of concepts that you want to
capture from your communications data, such as emails, chats, or calls. Each label
represents a specific concept that serves a business purpose and is aligned to your
objectives. A label taxonomy can have multiple levels of hierarchy, where each child label
is a subset of its parent label. For example, a parent label could be “Product Feedback”
and a child label could be “Product Feature Request” or “Product Bug Report”. A label
taxonomy is used to train a machine learning model that can automatically classify your
communications data according to the labels you defined1.
One of the best practices for designing a label taxonomy is to ensure that each label is
clearly identifiable from the text of the individual verbatim (not thread) to which it will be
applied. A verbatim is a single unit of communication, such as an email message, a chat
message, or a call transcript segment. A thread is a collection of related verbatims, such as
an email conversation, a chat session, or a call recording. When you train your model, you
will apply labels to verbatims, not threads, so it is important that each label can be
recognized from the verbatim text alone, without relying on the context of the thread. This
will help the model to learn the patterns and features of each label and to generalize to new
data. It will also help you to maintain consistency and accuracy when labelling your data2.
Which UiPath Communications Mining model performance factor assesses the proportion of the entire dataset that has informative label predictions?
A. Average label performance.
B. Coverage.
C. Balance.
D. Underperforming labels.
Explanation: According to the UiPath Communications Mining documentation, coverage is one of the four main factors that contribute to the model rating, which is a holistic measure of the model’s performance and health. Coverage assesses the proportion of the entire dataset that has informative label predictions, meaning that the predicted labels are not generic or irrelevant. Coverage is calculated as the percentage of verbatims (communication units) that have at least one informative label out of the total number of verbatims in the dataset. A high coverage indicates that the model is able to capture the main topics and intents of the communications, while a low coverage suggests that the model is missing important information or producing noisy predictions.
What information should be filled in when adding an entity label for the OOB (Out Of the Box) labeling template?
A. Name. Data Type. Attribute name, and Color.
B. Name, Data Type. Attribute name. Shortcut, and Color.
C. Name, Shortcut, and Color.
D. Name. Input to be labeled. Attribute name. Shortcut, and Color.
Explanation: The OOB labeling template is a predefined template that you can use to label
your text data for entity recognition models. The template comes with some preset labels
and text components, but you can also add your own labels using the General UI or the
Advanced Editor. When you add an entity label, you need to fill in the following information:
Name: the name of the new label. This is how the label will appear in the labeling
tool and in the exported data.
Input to be labeled: the text component that you want to label. You can choose
from the existing text components in the template, such as Date, From, To, CC,
and Text, or you can add your own text components using the Advanced Editor.
The text component determines the scope of the text that can be labeled with the
entity label.
Attribute name: the name of the attribute that you want to extract from the text.
You can use this to create attributes such as customer name, city name, telephone
number, and so on. You can add more than one attribute for the same label by
clicking on + Add new.
Shortcut: the hotkey that you want to assign to the label. You can use this to label
the text faster by using the keyboard. Only single letters or digits are supported.
Color: the color that you want to assign to the label. You can use this to distinguish
the label from the others visually.
What is the definition of a UiPath Communications Mining data source?
A. A collection of raw unlabeled communications data of a similar type, that can be associated with up to 10 datasets.
B. The model that we create when training the platform to understand the data in those sources.
C. A permissioned storage area within the platform which contains communications and labels.
D. A user-permissioned project containing a taxonomy with labels and entities.
Under what condition can a project be deleted in UiPath AI Center?
A. If it does not have any pipeline data.
B. If it does not have any running pipelines.
C. If it does not have any deployed packages.
D. If it does not have any scheduled pipelines.
Explanation: A project in UiPath AI Center is an isolated group of resources (datasets, pipelines, packages, skills, and logs) that you use to build a specific ML solution. You can create, edit, or delete projects from the Projects page or the project’s Dashboard page. However, you can only delete a project if it does not have any package currently deployed in a skill. A package is a versioned and deployable unit of an ML model or an OS script that can be used to create an ML skill. A skill is a consumer-ready, live deployment of a package that can be used in RPA workflows in Studio. If a project has a package deployed in a skill, you need to undeploy the skill first before deleting the project. This is to ensure that you do not accidentally delete a project that is being used by a skill12.
What happens during the Classify stage of the Document Understanding Framework?
A. The OCR engine is used to extract text from the image document.
B. The extracted data is exported as a dataset.
C. The target fields are extracted from the document and sent to Action Center for human validation.
D. The documents are included in one of the taxonomy document types or skipped.
Explanation: According to the UiPath documentation, the Classify stage of the Document Understanding Framework is used to automatically determine what document types are found within a digitized file. The document types are defined in the project taxonomy, which is a collection of all the labels and fields applied to the documents in a dataset. The Classify stage uses one or more classifiers, which are algorithms that assign document types to files based on their content and structure. The classifiers can be configured and executed using the Classify Document Scope activity, which also allows for document type filtering, taxonomy mapping, and minimum confidence threshold settings. The Classify stage outputs the classification information in a unified manner, irrespective of the source of classification. The documents that are classified are then sent to the next stage of the framework, which is Data Extraction. The documents that are not classified or skipped are either excluded from further processing or sent to Action Center for human validation and correction.
What additional information can be included in the exported data, apart from the extraction results?
A. The number of occurrences and the extraction confidence.
B. The page number from which the field was extracted and the exact position on the page.
C. The extraction confidence and the digitization confidence.
D. The position on the page.
Explanation: The exported data from the UiPath Document Understanding Template contains the extraction results in a JSON format, along with some additional information that can be useful for debugging or analysis purposes. One of the additional information that can be included is the page number from which the field was extracted and the exact position on the page, represented by the coordinates of the bounding box. This information can help to locate the field on the original document image and to verify the accuracy of the extraction. The additional information can be enabled or disabled by setting the IncludeMetadata parameter to true or false in the Config file of the template.
Page 2 out of 7 Pages |
Previous |