Server Monitoring at Scale – Lessons from HPE Cloudera

compose

As part of a Managed Services & Operational project at PT. Bringin Inti Teknologi (bit.),
I had the opportunity to work on monitoring more than 112 HPE servers that were used in a Cloudera environment.
The monitoring was performed using HPE OneView, along with supporting tools like Zabbix.


Key Responsibilities

During this project, my main tasks included:

  1. Basic Server Setup
    Installing and configuring Red Hat Enterprise Linux (RHEL) as the primary operating system.

  2. Monitoring & Health Check
    Setting up monitoring for 112 HPE servers using HPE OneView, ensuring that CPU, memory, storage, and network utilization were tracked in real-time.

  3. Alerting & Event Management
    Configuring alerts and log collection to quickly identify failures, bottlenecks, or hardware degradation.

  4. Integration with Monitoring Tools
    Complementing OneView data with Zabbix dashboards for better visualization and trend analysis.


Tools & Technologies

  • HPE OneView → Infrastructure and hardware monitoring.
  • Red Hat Enterprise Linux (RHEL) → Operating system for most of the servers.
  • Zabbix → Visualization, metrics collection, and alerting.
  • Cloudera → The main platform running on top of these servers.

Key Takeaways

Working on a large-scale monitoring environment taught me several important lessons:

  • Scalability matters – Managing more than 100 servers requires automation and consistent monitoring policies.
  • Clear documentation – With multiple engineers on the project, documenting monitoring procedures was essential.
  • Proactive alerting – Early detection of hardware issues prevents downtime and reduces business impact.
  • Cross-tool integration – Combining HPE OneView with Zabbix provided both low-level hardware visibility and high-level metrics visualization.

This experience strengthened my skills in infrastructure monitoring, incident response, and system administration.
It also gave me a deeper understanding of how large-scale enterprise environments maintain reliability and performance through proactive monitoring.