Load testing has been attempted using the package from K6.io. Output has been written to an InfluxDB database, and visualizations have been configured using Graphana. Tests can be viewed at https://dev.azure.com/TCOPP/_git/EGIS?path=%2FARM%2Floadtest&version=GBfeatures%2Floadtest%2Fk6, specifically egisloadtest.js. See the readme.md in that folder for details on how to run these.
Sandbox
Testing has been performed on the SBX instance, consisting of an Azure VM for each of the Web and Portal, GIS Server, and Data Store roles. The following tests were performed from
The results show surprisingly good results from the Sandbox machine.
- Confirm that the tested URLs existed and didn’t return very fast errors (ESRI gives HTTP 200 codes when it shouldn’t sometimes).
Dev
Testing has been performed on the DEV instance, consisting of an Azure VM for each of the Web and Portal, GIS Server, and Data Store roles.
Portal
Portal has been tested on the Pilot instance by creating an increasing number of virtual users and logging the response time from subsequent requests to the portal home page, portal CSS, and a hosted image. The pilot instance splits the Web Adaptor and Portal roles into their own VMs.
This test was performed from a developer laptop on the TC network. We can see that average response time (green), remains under 2.5 seconds with a load of just over 200 simultaneous users. When another block of virtual users are added, the portal begins to fail
- Check if portal did fail or if the networking on my laptop was saturated.
When running a similar test from an Azure virtual machine, we see the following results:
This takes a lot, but not all, of the network considerations our of the equation (Azure Canada East calling Canada Central). We see that our Dev portal can handle over 100 users nicely, and over 200 users reasonably well.
Server
Similar tests were devised to stress the server by querying a feature service, fetching a feature, and exporting a web map. The following tests were performed on the azure load testing VM and called the Dev instance.
These graphs show that the server performance starts off slow and gets worse. We’ll need to improve these numbers.
Azure Monitor
Azure monitor showed that the CPU on the GIS machine could handle the load it was burdened with. Over the afternoon of testing, the CPU never crossed 80% usage.
Similarly, the DS machine had no CPU limitations:
Pilot
The same tests were run on Pilot and the results were as follows:
The test were run from TC, then Azure. We see that portal has no issue handling up to 200 users. From the server side, we see a warm up or caching phase, then load is handled reasonably until there are over 200 virtual users.
Differences between Dev and Pilot
The GIS VMs are difference between our Dev and Pilot instances. EGIS-DEV-GIS0 uses a D4S_v3 image that provides a maximum of 6400 IOPS. Pilot uses a DS3_v2 image that provides a maximum of 12800 IOPS. Twice the performance.
Geo Event Server
TODO
Suggestions
Best practices for publishing and symbolizing data must be followed regardless of our hardware.
Create map caches where possible
Re-project all data to WGS84 Web Mercator
Avoid all re-projection on the fly
Debate?
Test performance on all layers?
Web hooks, test new layers?