Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Jon, Josh, and Jiehua, as well as an ESRI employee, had inspected logs and the most likely scenario is as follows: The C drive on the server machine ran out of space. This meant that AGS could not write it’s state as required, and the result was a corrupted server that server could not start after a reboot.

...

ARM Template Updates

All OS and Data Disks and the DS data dis were resized via ARM template deployment. This required the ARM templates to be refactored to list the disks as their own resource, as opposed to the short form within the VM resource.

...

  • The server was already federated and the web adaptor in place. URL checked passed, although the new server had new secrets for token generation, and the federation was not right

    • Some attempts to unfederate/re-federate failed

    • Using a fresh web adapter pointing to /arcgis instead of /server was promising, but the portal didn’t recognize the existing data store as an ESRI data store.

Re-installing everything

tall -DebugSwitch

Note

This uninstalled software, but left all config and content folders in place. Remove them before uninstalling.

Code Block
languagepowershell
Configure-ArcGIS `
-ConfigurationParametersFile 'D:\DSC\PowerShell DSC\PLT-BaseDeployment-MultiMachine_DomainController.json' `
-Mode UnInstall

Lessons learned

Repair

In the end, it would have been quicker to just wipe out the VMs and start from scratch. The DSC scripts can reset the configurations on existing instance to what they should be according to the master branch, but replacing just the Server with a fresh install while leaving the rest didn’t work.

...

We need more awareness of our VMs. A meeting is already scheduled to setup azure monitoring and alerts to prevent this from sneaking up on us in the futureWe have collaborated with the cloud team to configure alerting when the default VM health metrics have been violated.

Staying up-to-date

The restore process required running the versioned scripts for when pilot was installed. Some steps that were required for this are redundant in the current deployment. We should move forward with this upgrade.