We were approached by a company in need for a technician to replace their failed flash array controller from pure storage. The new controller was delayed by one day leading to rescheduling the initial appointment we have made. Once it arrived, we were ready for the swap.
The new controller was supposed to be pre configured, which sounds like an easy job doesn’t it?
Our technician Chris arrived at the data centre and swapped the flash array for not more than 15 minutes and it was ready for the initial boot.
After powering it on, Chris waited a few seconds, but unexpectedly a problem occurred. The new server didn’t complete the post after showing a strange error explaining that the PCI card on slot 5 has failed. Confronted with the unexpected complication, Chris was ready to go deeper and determine the causes for the fault. The first and most obvious thing to do in such situation is trying to boot the server without the card, which worked out perfectly and didn’t lead to any errors what so ever. This PCI card is crucial for the normal operation of the controller so it could not work properly without it.
After putting more thought into the problem, Chris came to the conclusion that there might be an issue with the slot and not the card itself. Therefore, he switched to another slot and luckily the server booted as expected and passed all of the initial checks.
With extra help from the pure storage technicians, we managed to connect to the console port of the array controller and run the initial checks.
Unfortunately, the new controller showed exactly the same fault as the old one, which was - “no network connectivity”. After another 30 minutes of troubleshooting, Chris tracked down the fault to a loose connection at the main network switch, which connects the flash array to the network.
Since the problem was not the controller to start with, the pure storage technicians decided to install again the old array and ship the new one back. In the meantime they can continue with testing further the faulty PCI slot.
In the end, Chris did a great job in detecting the problem and providing the best possible solution for this specific case.