您当前位置：首页 > php框架 > 框架设计 > 基于dropwizard/metrics ，kafka，zabbix构建应用统计数据收集展示系统

基于dropwizard/metrics ，kafka，zabbix构建应用统计数据收集展示系统

来源：程序员人生发布时间：2015-03-12 09:02:13 阅读次数：5080次

新blog地址：http://hengyunabc.github.io/about-metrics/

想要实现的功能

利用可以用少许的代码，实现统计某类数据的功能
统计的数据可以很方便地展现

metrics

metrics，按字面意思是度量，指标。

举具体的例子来讲，1个web服务器：
- 1分钟内要求多少次？
- 平均要求耗时多长？
- 最长要求时间？
- 某个方法的被调用次数，时长？

以缓存为例：
- 平均查询缓存时间？
- 缓存获得不命中的次数/比例？

以jvm为例：
- GC的次数？
- Old Space的大小？

在1个利用里，需要搜集的metrics数据是多种多样的，需求也是各不同的。需要1个统1的metrics搜集，统计，展现平台。

流行的metrics的库

https://github.com/dropwizard/metrics
java实现，很多开源项目用到，比如hadoop，kafka。下面称为dropwizard/metrics。

https://github.com/tumblr/colossus
scala实现，把数据存到OpenTsdb上。

spring boot 项目里的metrics：

http://docs.spring.io/spring-boot/docs/current/reference/html/production-ready-metrics.html

spring boot里的metrics很多都是参考dropwizard/metrics的。

metrics的种类

dropwizard/metrics 里主要把metrics分为下面几大类：

https://dropwizard.github.io/metrics/3.1.0/getting-started/

Gauges

gauge用于丈量1个数值。比如队列的长度：

public class QueueManager {
    private final Queue queue;
    public QueueManager(MetricRegistry metrics, String name) {
        this.queue = new Queue();
        metrics.register(MetricRegistry.name(QueueManager.class, name, "size"),
                         new Gauge<Integer>() {
                             @Override
                             public Integer getValue() {
                                 return queue.size();
                             }
                         });
    }
}

Counters

counter是AtomicLong类型的gauge。比如可以统计阻塞在队列里的job的数量：

private final Counter pendingJobs = metrics.counter(name(QueueManager.class, "pending-jobs"));
public void addJob(Job job) {
    pendingJobs.inc();
    queue.offer(job);
}
public Job takeJob() {
    pendingJobs.dec();
    return queue.take();
}

Histograms

histogram统计数据的散布。比如最小值，最大值，中间值，还有中位数，75百分位, 90百分位, 95百分位, 98百分位, 99百分位, and 99.9百分位的值(percentiles)。

比如request的大小的散布：

private final Histogram responseSizes = metrics.histogram(name(RequestHandler.class, "response-sizes"));

public void handleRequest(Request request, Response response) {
    // etc
    responseSizes.update(response.getContent().length);
}

Timers

timer正如其名，统计的是某部份代码/调用的运行时间。比如统计response的耗时：

private final Timer responses = metrics.timer(name(RequestHandler.class, "responses"));

public String handleRequest(Request request, Response response) {
    final Timer.Context context = responses.time();
    try {
        // etc;
        return "OK";
    } finally {
        context.stop();
    }
}

Health Checks

这个实际上不是统计数据。是接口让用户可以自己判断系统的健康状态。如判断数据库是不是连接正常：

final HealthCheckRegistry healthChecks = new HealthCheckRegistry();

public class DatabaseHealthCheck extends HealthCheck {
    private final Database database;

    public DatabaseHealthCheck(Database database) {
        this.database = database;
    }

    @Override
    public HealthCheck.Result check() throws Exception {
        if (database.isConnected()) {
            return HealthCheck.Result.healthy();
        } else {
            return HealthCheck.Result.unhealthy("Cannot connect to " + database.getUrl());
        }
    }
}

Metrics Annotation

利用dropwizard/metrics 里的annotation，可以很简单的实现统计某个方法，某个值的数据。
如：

    /**
     * 统计调用的次数和时间
     */
    @Timed
    public void call() {
    }

    /**
     * 统计登陆的次数
     */
    @Counted
    public void userLogin(){
    }

想要详细了解各种metrics的实际效果，简单的运行下测试代码，用ConsoleReporter输出就能够知道了。

metrics数据的传输和展现

dropwizard/metrics 里提供了reporter的接口，用户可以自己实现如何处理metrics数据。

dropwizard/metrics有很多现成的reporter：

ConsoleReporter  输出到stdout
JmxReporter  转化为MBean
metrics-servlets  提供http接口，可以查询到metrics信息
CsvReporter 输出为CSV文件
Slf4jReporter 以log方式输出
GangliaReporter  上报到Ganglia
GraphiteReporter 上报到Graphite

上面的各种reporter中，Ganglia开源多年，但缺少1些监控的功能，图形展现也很简陋。Graphite已停止开发了。

而公司所用的监控系统是zabbix，而dropwizard/metrics没有现成的zabbix reporter。