...
Jon, Josh, and Jiehua, as well as an ESRI employee, had inspected logs and the most likely scenario is as follows: The C drive on the server machine ran out of space. This meant that AGS could not write it’s state as required, and the result was a corrupted server that server could not start after a reboot.
...
ARM Template Updates
All OS and Data Disks and the DS data dis were resized via ARM template deployment. This required the ARM templates to be refactored to list the disks as their own resource, as opposed to the short form within the VM resource.
...
The server was already federated and the web adaptor in place. URL checked passed, although the new server had new secrets for token generation, and the federation was not right
Some attempts to unfederate/re-federate failed
Using a fresh web adapter pointing to /arcgis instead of /server was promising, but the portal didn’t recognize the existing data store as an ESRI data store.
Re-installing everything
tall -DebugSwitch
Note |
---|
This uninstalled software, but left all config and content folders in place. Remove them before uninstalling. |
Code Block | ||
---|---|---|
| ||
Configure-ArcGIS `
-ConfigurationParametersFile 'D:\DSC\PowerShell DSC\PLT-BaseDeployment-MultiMachine_DomainController.json' `
-Mode UnInstall |
Lessons learned
Repair
In the end, it would have been quicker to just wipe out the VMs and start from scratch. The DSC scripts can reset the configurations on existing instance to what they should be according to the master branch, but replacing just the Server with a fresh install while leaving the rest didn’t work.
...
We need more awareness of our VMs. A meeting is already scheduled to setup azure monitoring and alerts to prevent this from sneaking up on us in the futureWe have collaborated with the cloud team to configure alerting when the default VM health metrics have been violated.
Staying up-to-date
The restore process required running the versioned scripts for when pilot was installed. Some steps that were required for this are redundant in the current deployment. We should move forward with this upgrade.