Data quality in Azure Stream Analytics

In Stream Analytics transformations can be made by feeding one or more input sources / references and it will result in at least 1 output. When one of your inputs is dynamic or when you have many inputs it is difficult to check whether each scenario gives the right output. This can be solved by creating test cases in an Azure Stream Analytics project in Visual Studio Code. It will improve your data quality even before deploying it to Azure. In this article we explain how this can be done.

Creating a Stream Analytics project

  1. Open Visual Studio Code
  2. Install the extension “Azure Stream Analytics Tools“.
  3. Press F1 and search for “ASA: Create New Project” and select it
  4. Enter a name for your project, press enter and select a folder for storage
  5. Visual Studio will create a default project

Preparation

Before we can create a test case, we must prepare the solutions. For this instruction I changed the Stream Analytics query. Open the file *.asaql and change the query:

SELECT
	name,
	company,
	(year + 1) AS NextYear	
INTO
	[output]
FROM
	[input]

Create test case

Now that we have prepared the basics, we can start by creating a test case.

  1. Create a new folder “Tests”
  2. Create a new folder “Tests\TestCase1”
  3. Create 3 new files:

Tests\TestCase1/input.json

{
    "name": "Wilko van de Velde",
    "company": "Alten",
    "year": 2022
}

Tests\TestCase1/output.json

{
  "name":"Wilko van de Velde",
  "company":"Alten",
  "NextYear":2023
}

Tests\testConfig.json

{
    "Script": "<Folder>\\MyProject.asaql",
    "TestCases": [
      {
        "Name": "Test case 1",
        "Inputs": [
          {
            "InputAlias": "input",
            "Type": "Data Stream",
            "Format": "Json",
            "FilePath": "TestCase1/input.json",
            "ScriptType": "InputMock"
          }   
        ],
        "ExpectedOutputs": [
          {
            "OutputAlias": "output",
            "FilePath": "TestCase1/output.json",
            "Required": true
          }           
        ]
      }     
    ]
  }

Don’t forget to change the tag <Folder> to your local path.

Start the test

To start the test, open the terminal and type:

azure-streamanalytics-cicd test -project asaproj.json -testConfigPath Tests\testConfig.json -outputPath C:\Temp\

Don’t forget to change the tag <Folder> to your local path. The results will be stored in C:\Temp\, you can change it if you want.

The terminal will now result a successful test, which looks something like this:

Add another test case

  1. Create a new folder “Tests\TestCase2”
  2. Create 2 new files:

Tests\TestCase2/input.json

{
    "name": "Wilko van de Velde",
    "company": "Alten",
    "year": 2023
}

Tests\TestCase2/output.json
I intentionally don’t change the output in this test case so the test case should fail.

{
  "name":"Wilko van de Velde",
  "company":"Alten",
  "NextYear":2023
}

And now change the file Tests\testConfig.json

{
    "Script": "<Folder>\\MyProject.asaql",
    "TestCases": [
      {
        "Name": "Test case 1",
        "Inputs": [
          {
            "InputAlias": "input",
            "Type": "Data Stream",
            "Format": "Json",
            "FilePath": "TestCase1/input.json",
            "ScriptType": "InputMock"
          }   
        ],
        "ExpectedOutputs": [
          {
            "OutputAlias": "output",
            "FilePath": "TestCase1/output.json",
            "Required": true
          }           
        ]
      },
      {
        "Name": "Test case 2",
        "Inputs": [
          {
            "InputAlias": "input",
            "Type": "Data Stream",
            "Format": "Json",
            "FilePath": "TestCase2/input.json",
            "ScriptType": "InputMock"
          }   
        ],
        "ExpectedOutputs": [
          {
            "OutputAlias": "output",
            "FilePath": "TestCase2/output.json",
            "Required": true
          }           
        ]
      }    
    ]
  }

Start the tests again

Open the terminal and type:

azure-streamanalytics-cicd test -project asaproj.json -testConfigPath Tests\testConfig.json -outputPath C:\Temp\

The terminal will now result 1successful test and 1 failed test, which looks something like this:

As seen in the screenshot test case 2 fails. The actual result of the Stream Analytics Query is stored in c:\Temp\Test case 2\ for further analysis. In this case I intentionally didn’t changed the output in the test case so it would fail.

Continuous Integration en Continuous Deployment

To top off your project you can also add your test cases to the CI/CD pipeline in for example Azure DevOps. Every time you start a deployment it will run the test cases.

Conclusion

It is very easy to add test cases to your Stream Analytics project in Visual Studio Code and improve your data quality. In this article we explained the concepts with a simple example. It will be more powerful with complex queries with multiple inputs and outputs.

Leave a comment

Your email address will not be published. Required fields are marked *

Exit mobile version