Post-Production Monitoring: a complete guide to effective monitoring

Going live is a crucial stage in the life of a project, marking its entry into the real world. Once the infrastructure has been carefully dimensioned using load tests and usage projections, it's time to make the project fully operational for users.
Site - Monitoring.jpg

Going live is a crucial stage in the life of a project, marking its entry into the real world. Once the infrastructure has been carefully dimensioned using load tests and usage projections, it's time to make the project fully operational for users. This involves ensuring that it runs smoothly, while monitoring and detecting any problems at various levels to guarantee an optimal user experience.

 

H2: What is Post-Production Monitoring? What is it for, and why is it important?

Post-production monitoring includes :

  • Continuous performance monitoring: By closely observing server, database, network and application performance, teams can detect potential problems as soon as they arise.
  • Real-time data analysis: Real-time data collection and analysis helps identify trends, activity peaks and abnormal behavior, facilitating rapid decision-making.
  • Log management and alerts: Log management and alerts enable you to track important events, diagnose problems and react quickly to incidents.
  • Resource optimization: By monitoring the use of resources such as memory, CPU and storage, teams can optimize configurations and allocations to ensure optimum performance.
  • Continuous improvement: Post-production monitoring provides valuable data for continuous application improvement, identifying areas for optimization and guiding future development and deployment decisions.

By implementing these monitoring practices, teams can ensure the stability, reliability and performance of their application throughout its lifecycle.

 

H2: Which tools do you use?

At Ylly, we use 2 tools to monitor our web applications:


 

H3: Zabbix

Thanks to Zabbix, we can monitor the health of our equipment, including servers, routers and more, in real time. This includes crucial data such as processor load, RAM usage and disk partition space.

This tool is indispensable for prompt intervention on the hardware or software underpinning our application. Here are a few practical examples to illustrate its usefulness:

  • Adjusting hardware power following a significant increase in project activity.
  • Restarting a service in the event of a malfunction.

 

H3: ElasticStack

ElasticStack offers a complete solution for the statistical analysis of logs generated by applications and projects. By aggregating this data, we obtain a global view of applications, enabling us to identify recurring problems, monitor project activity and understand which functionalities are most in demand.

Here are just a few concrete examples of how we use it:

  • Symfony log analysis to detect errors and relevant information linked to the application's functional requirements. 
  • HTTP log analysis to track application usage, including URLs, response codes and execution times.

Zabbix and ElasticStack allow us to cross-reference information, paving the way for further analysis of production issues. By combining the hardware and software monitoring data provided by Zabbix with the statistical log analysis offered by ElasticStack, we gain a comprehensive view of the health and performance of our infrastructure and applications. This integrated approach enables us to identify anomalies more quickly and take proactive corrective action to keep our systems running smoothly.

 

H2 : Case Study

A new Symfony web project (API development) is nearing completion. Our system administrators have recently finished installing the project as follows:

 

Each server has a well-defined role. We therefore configure each machine on Zabbix with the corresponding templates:

  • Front-end servers: Linux template, HTTP, PHP
  • Backend servers: Linux template, MySQL, Galera

This configuration enables us to obtain metrics and monitor the resources consumed by the system for each server via the Linux template, as well as role-specific metrics.

Similarly, for ElasticStack, we follow a similar approach by enabling logs on Filebeat depending on the role of the machines:

  • Front-end servers: HTTP server access and error logs, general PHP access and error logs, application logs
  • Backend servers: general MySQL logs, error logs, slow requests

 

 

H2: Best practices and advanced strategies for optimizing Post-Production Monitoring

H3: Best practice tips

  • Alert configuration: Define relevant, well-calibrated alerts to be notified when critical thresholds are exceeded, but avoid unnecessary alerts that could lead to team fatigue or desensitization.
  • Threshold management: establish thresholds based on realistic metrics and adapt them as the application evolves. Avoid thresholds that are too rigid, triggering frequent alerts, or too flexible, allowing significant problems to slip through the cracks.
  • Select metrics to monitor: Identify the key metrics that are most relevant to your application and business objectives. Prioritize monitoring of performance, availability and user experience metrics.

 

H3: Trend analysis and forecasting

  • Importance of trend analysis: Track metrics over the long term to identify significant trends, such as seasonal peaks or changes in behavior. This enables you to anticipate potential problems before they become critical.
  • Use forecasting: Use forecasting techniques to estimate future levels of load, traffic or demand. This enables you to plan the resources you need and avoid service interruptions due to insufficient capacity.

 

H3: Security and compliance

  • Detect suspicious behavior: Set up alerts to detect abnormal behavior or suspicious activity that could indicate a security compromise. This may include unauthorized access attempts, vulnerability scans, or unauthorized configuration changes.
  • Security policy compliance: Use post-production monitoring to verify compliance with security policies and industry standards, such as GDPR, PCI-DSS, or ISO 27001. Ensure that systems and applications meet data confidentiality, integrity and availability requirements.

 

H3: Scalability and performance management

  • Bottleneck monitoring: Identify the critical components that are likely to become bottlenecks as load increases. Monitor these components closely and take proactive measures to optimize their performance and scalability.
  • Resource optimization: Use monitoring data to optimize the use of resources such as memory, CPU and storage, sizing them correctly and distributing them evenly between the various infrastructure components.

In conclusion, the implementation of effective post-production monitoring, combining best practices, proactive trend analysis, attention to security and compliance, and optimal performance management, is essential to guarantee the reliability, stability and security of applications and systems in production. By following these principles and adopting a proactive approach, teams can anticipate problems, optimize resources and deliver an optimal user experience throughout the application lifecycle.
 

Would you like to secure your infra and ensure your teams' peace of mind?


Contact us today and let our experts guide you in bringing your vision to life.

It would be interesting