Managing Data Sets

One of the advantages of Disco is that it supports your project work through the management of multiple data sets in one project view. In a typical process mining project, you will import your log files in different ways, filter them, and make copies to save intermediate results. This results in many different versions and views of your data sets and can easily get out of hand.

The project view in Disco is there to help you keep an overview. It keeps all your work in one place and lets you easily share it with others. In this chapter you find all the details about how projects can be managed in Disco.

The Project View

The project view is symbolized by the file cabinet symbol shown in Figure 1.

../_images/workspace-icon.png

Figure 1: Project view symbol in Disco.

You can find the project symbol in the upper left corner right next to the open symbol as shown in Figure 2. The project view can be reached from anywhere in Disco.

../_images/NavigateToProject.png

Figure 2: You can return to your project view from anywhere in Disco.

Clicking on the project symbol will bring you to the project view (see Figure 3). The project view contains the following elements:

Project name (1)
You can rename your project as described in Renaming Projects and Data Sets.
Data sets (2)
Each event log that is imported will be placed as a separate data set in your project view. Refer to the Import chapter for detailed instructions on how to import your data into Disco. When you make copies of your data sets, then these copies will also appear in your list of data sets here. Learn more about how to make copies in Copying Data Sets. You can rename your data sets as described in Renaming Projects and Data Sets.

The main area of the screen (3-8) holds information and controls for the currently selected data set.

Overview and springboard (3)
Overview information about the currently selected data set is shown here. You see a thumbnail preview of the process map (Analyzing Process Maps), overview statistics (Analyzing Statistics), and cases (Analyzing Cases). This overview information helps you to quickly identify the right data set and also serves as a springboard to jump right into the respective data set analysis view. Navigating From the Project View to the Analysis Screens shows you how this works.
Reload data (4)
Pressing the Reload data button brings you back to the configuration screen of your imported file. It provides a quick way to go back and include that attribute you forgot, or to simply look at the source data again. The Import reference chapter explains in more detail how the Reload button works in Adjusting the Import Configuration.
View details (5)
The View details button brings you to the detailed analysis view (Map, Statistics, or Cases) that you last viewed for the currently selected data set. You can always go to the project view to get an overview, and the View details button brings you back precisely to where you were before. Read also Navigating From the Project View to the Analysis Screens about how to navigate from the project view.
Notes field (6)
The notes field is great to remember observations and findings from the analysis as well as for keeping track of ideas and questions that are still open (ToDo items). When you plan to share your analysis results with others (see also Managing and Sharing Projects), the notes field can hold guiding descriptions for your colleagues about which results can be found in which data set.
Filtering (7)
The log filter controls for each data set can be accessed from the detailed analysis views as well as from the project view here. Filters are an important instrument to clean your data and focus your analysis. Read Filtering for detailed information on how filtering works in Disco.
Copy, Remove, and Export data sets (8)
Data sets can be copied and deleted in the project view. Read Copying Data Sets for further details. The export functionality of Disco is explained in detail in the Export reference in Export.
Clear and export project (9)
If you want to start a new project or export your current work, you can do that from the project view in the upper right corner. Read Managing and Sharing Projects for further details on exporting and importing projects.
../_images/TheProjectView.png

Figure 3: The project view in Disco.

Switching Between Data Sets in the Analysis View

Once you have imported multiple data sets or created copies (see Copying Data Sets) to keep bookmarks of your analysis results, you can access them right from within the analysis view by selecting the data set from the quick switch list shown in Figure 7. This way, you can rapidly move back and forth to compare different data sets, and to “jump” to bookmarks in your analysis.

../_images/SwitchingDataSets.png

Figure 7: Along with the imported data sets, copies are available through the quick switch list for rapid access from the analysis views.

Make sure to use short and meaningful names for your data sets (see Renaming Projects and Data Sets) to quickly find the data sets you are looking for, and to efficiently move back and forth between them.

Renaming Projects and Data Sets

You can rename projects and individual data sets to give them names that better reflect what your project is about.

To rename your project, simply click on the current project name, see (1) in Figure 3. When you click on the project name, a text field appears and you can type in your new project name (see Figure 8). Then press the Enter key.

../_images/RenamingProject.png

Figure 8: To rename your project simply click on the current project name and type in your new project name.

Renaming data sets works in the same way. First, select the data set for which you want to change the name from the list of data sets in your project, see (2) in Figure 3. Then, click on the name of the data set in the overview panel as shown in Figure 9 to change it.

../_images/RenamingDatasets.png

Figure 9: Select the data set you want to rename and click on the current name of the data set to change it.

The new name of the data set will also be reflected in the drop-down list at the top of your analysis views (see screenshot in Figure 10).

../_images/Renaming-Datasets-2.png

Figure 10: The new data set name will also appear in the quick-switch list at the top of your Map, Statistics, and Cases views.

We recommend to use short and meaningful names for your data sets (keep the details in the Notes section), so that you can quickly find (and switch between) your data sets during the analysis.

Copying Data Sets

Copies of data sets are particularly useful to keep “bookmarks” of differently filtered versions of your event log. They provide you with a means to preserve a specific view on your process, store analysis results, and they allow you to easily compare different parts of your process.

You can make copies while you are filtering. This is useful when you are exploring your process through filtering and—along the way—decide that you want to preserve what you did so far and apply the new filter to a copy rather than the current data set. The filter reference (see Filtering) explains in detail how to do that.

Alternatively, you can make copies right from the project view. This is useful if you already know what you want to do. In both situations, the newly created copy is placed in your project as a new data set. The effect is the same, but the workflow in Disco is different. The following example scenario explains how making copies in the project view works.

Copy Scenario

Let us say that you want to compare how your call center process looks like for service requests that come in through email and those that are initiated on the phone. You know that you have an attribute called Medium in your event log that indicates through which channel the request was created. You can use this attribute to create differently filtered copies for “Mail” and “Phone” in the following way:

Step 1: You start by taking the complete call center data as a reference point, and you make a copy by pressing the Copy button on the lower right of the project view screen. A dialog appears in which you can provide a custom name for the new data set. We give it the name Callcenter – Only Mail because we intend to use this copy as a bookmark for the cases that were initiated by email. After you press Create, the copy is placed in the list of data sets. Figure 11 visualizes this step.

Although we have named the new copy Callcenter – Only Mail, at this point it is still an exact copy of the complete data set. To actually focus the new data set on email requests only, we can add a filter by pressing the filter symbol in the lower left of the data set window, see also (7) in Figure 3.

../_images/Copying-Step1.png

Figure 11: Example Scenario - Step 1: Make a copy of the complete data set.

Step 2: Figure 12 illustrates this step. We add an Attribute Filter, select the Medium attribute, and choose to only keep events with the value “Mail”. Refer to Filtering for details on the individual log filters that are available in Disco. After the filter is applied, the new copy actually contains only email requests. In Figure 12 you can see that the filtered subset contains 35% of the cases compared to the complete call center log.

../_images/Copying-Step2.png

Figure 12: Example Scenario - Step 2: Add a filter to focus on email requests.

Step 3: Then, the second copy Callcenter – Only Phone is created. Figure 13 shows this step.

During copying, all currently applied filters are copied along with the data set as they are. So, when we use the Callcenter – Only Mail data set as the reference point for our copy, then, again, the newly created data set is already called Callcenter – Only Phone but at this point still contains just email requests.

../_images/Copying-Step3.png

Figure 13: Example Scenario - Step 3: Make another copy for the phone requests.

Step 4: To change the copied filter, you can press the filter symbol and adapt the filter settings as shown in Figure 13. After the filter settings have been changed, the new copy contains now only phone requests as desired.

../_images/Copying-Step4_new.png

Figure 14: Example Scenario – Step 4: Change the filter to focus on phone requests.

All imported data sets and all created copies can be accessed through the project view as explained in Navigating From the Project View to the Analysis Screens. In addition, the data sets are also available through a quick switch list from within the detailed analysis views (see Figure 15).

../_images/SwitchingDataSets1.png

Figure 15: Along with the imported data sets, copies are available through the quick switch list for rapid access from the analysis views.

This way, you can rapidly change and compare different data sets and “jump” to bookmarks in your analysis.

Copying Filters vs. Permanently Applying filters

During normal copying, all currently applied filters are copied along with the data set but can be changed afterwards as shown in Step 4 in the example scenario in Copy Scenario. Sometimes, however, you want to make a clean copy and permanently apply your filters to a data set. Here is an example scenario where this is relevant.

Imagine that you found out that the purchasing process data you are currently analyzing contains incomplete cases. You want to remove these incomplete cases to not let them disturb your throughput time measurements, which should only be based on the completed instances (not those that are are still running and might have started just recently).

To remove these incomplete cases, you use the Endpoints Filter and keep only those cases that end with the regular “Pay invoice” end activity. The result is a filtered data set with 413 completed cases out of the initial 608 cases (67%) as shown in Figure 16.

../_images/PermanentCopy1.png

Figure 16: Example Scenario – Step 1: Add filter to remove incomplete cases.

Then you decide that these completed cases should be the new baseline for your further analysis. For example, you want to apply the Performance Filter to verify service level targets and have the filter results reflect the right percentages (like, for example, “15% of the cases do not meet the agreed-upon service level”). The incomplete cases would be in the way.

So, you make a copy of the filtered data set as before but tick the Apply filters permanently box before you press the Create button as shown in Figure 17. The copied data set still contains these 413 completed cases, but the Endpoints Filter from before has been applied and “consolidated”. It cannot be changed or removed anymore.

../_images/PermanentCopy2.png

Figure 17: Example Scenario – Step 2: Make copy with Apply filters permanently option selected to take the filtered output as a new reference point for the copied data set.

As a result, the filter portion indication in the lower left has disappeared: The previously filtered 67% are now the new 100% for the data set Purchasing Process (Complete Cases).

In a similar way, filters can also be applied permanently when you create a data set copy for the current configuration of the filter settings. Refer to the filter reference in Applying Filters and Configuring Filters Based on the Outcome of Previous Filters for further details on how and when to create (permanently) filtered copies of your data sets.

Re-ordering Data Sets

To properly manage your project files, the order of the data sets is important. For example, if you split out your data into multiple data sets for different regions or countries (to compare them), then you would like to have these data sets located after each other in your project.

To change the order of a data set, you can simply click on it, drag it to the right position in the list, and drop it there (see screenshot in Figure 18).

../_images/disco-project-dragndrop.gif

Figure 18: Simply drag and drop your data sets in the right order from the project view.

The re-ordering of your data sets in the project view will also be reflected in the order of the data sets in the drop-down list at the top of your analysis views (see screenshot in Figure 19).

../_images/Reordering-Datasets-2.png

Figure 19: The new order will be reflected in the quick-switch list at the top of your Map, Statistics, and Cases views.

So, organizing your data sets in the project view will help you to better switch between data sets during your analysis as well.

Deleting Data Sets

The currently selected data set can be deleted by pressing the Delete button in the lower right as shown in Figure 20. Be careful when you do this. Deleting data sets cannot be undone.

../_images/DeletingDatasets.png

Figure 20: The currently selected data set can be deleted by pressing the Delete button in the lower right.

Managing and Sharing Projects

You can only have one active project in Disco at the same time. However, you can create multiple projects and work on each of them at a different time. If you analyze multiple processes, it is advisable to keep only related data sets together in the same project file to stay organized.

Projects are also useful for sharing your work with other people. By sending them your project file, you will be able to directly share your complete project; including all the data sets, filters, and the notes that you made.

Exporting Projects

You can export your current project from the project view, see also (9) in Figure 3, by clicking on the Export button shown in Figure 21.

../_images/ExportProject1.png

Figure 21: Exporting your current project.

A file dialog will appear and you can choose where you want to save your project. Projects are exported as a Disco project file with the .dsc file extension. Project files are self-contained. This means that you can just send the .dsc file (without sending the original log files) to a colleague and she will be able to import your project in exactly the same state it was when you exported it (see next section).

Importing Projects

Importing projects works in the same way as importing event logs. To load the project, you click the open symbol in the upper left corner as shown in Figure 22 and locate the project file you want to import.

../_images/OpenProject.png

Figure 22: Opening an existing project.

When Disco realizes that you want to import a project file (not just another data set), it will ask you whether you want to save your current project first (see Figure 23).

If you do not want to lose your current work, make sure to save your current project before you complete the import of the project file. You can do this directly from the dialog shown in Figure 23:

  • Save project first. Press this button to save your current work. Disco will bring up a file dialog that lets you choose the location and name for the exported project. It then first exports your current process to the chosen location and opens the imported project right afterwards.
  • Discard changes. Press this button if you don’t have anything in your current workspace that you feel is worth saving. Disco will directly load the imported project and not save your current workspace.
  • Cancel. Press this button if you do not want to complete the import of the new project file at all. Disco will bring you back to your current workspace.
../_images/SaveCurrentProject.png

Figure 23: Disco reminds you to save your current workspace before you import or create a new one.

In the Import reference in Import you can find further details about the types of files that can be opened with Disco.

Creating New Projects

If you simply want to start a new project, for example, to make a new start for another process that you want to analyze, then you can press the New button as shown in Figure 24.

../_images/NewProject.png

Figure 24: Creating a new project.

Again, you will be asked whether you want to save your current project before clearing your workspace (see Figure 23). Press Save project first to save your current work before a new, empty project is created.