编译器、回调函数、性能测试

去年设计了一个SDK，负责处理封装一些事件，对外提供一个类接口，服务初始化的时候，调用方实现对应的类，并将对象指针传给模块。接触过C11，好奇心害死猫，就想着这些接口都用lambda函数对象回调来实现会是什么结果，和纯虚函数的接口定义方法比较，更加灵活。疑问就出现了，两种不同的语法，从性能角度来说，哪个更快一些？不懂编译原理，弄段代码试试看。

前言

在线网址，能选择不同编译器，编译参数，在linux平台运行代码，亦或者查看对应的汇编代码。

https://wandbox.org/：有时候做些技术验证，网页执行小片段的代码很省事
https://godbolt.org/：用不同的颜色，区分不同的汇编对应的代码，比本地的调试器看起来更加省事。

正文

标准委员会制定了语法的规则，在编译层面，如何实现，取决于各家的编译器，这里不得不说一声，微软的编译器，挺厉害的。语法糖不是万能的，回调接口不多，使用lambda更加便捷，也无需定义空回调函数接口；回调接口种类繁多的时候，传统的虚函数更有利于业务接口定义的统一。

windows平台，两者性能接近，没有太多的差异
linux平台，虚函数和lambda比较，单次多了1.35ns

常规的业务系统开发中，此级别的性能损耗可以忽略，引入lambda，在设计的上，能带来更多的便捷。在设计多信号处理时，尤为明显，底层有事件触发，如果需要落地日志，出入日志对象的的处理函数。当需要更多的业务处理接口时，底层用vector保存lambda对象，事件触发时，依次遍历调用，类似于QT中的信号和槽，日志、监控、业务1、业务2，互相之间完全解耦。

代码

Counter: 1000000
Time: 3966us
Counter: 1000000
Time: 5316us

#include <iostream>
#include <chrono>
#include <memory>
#include <functional>
#include <atomic>
#include <string>

std::atomic_int64_t counter = 0;

// 定义回调接口
class UserInterface
{
public:
    virtual void name() = 0;
    virtual void full_name() = 0;
};

class User : public UserInterface
{
public:
    void name() {}
    void full_name() { counter++; }
};

void to_string(UserInterface* user)
{
    user->name();
    user->full_name();
}

using name_handler = std::function<void()>;
using full_name_handler = std::function<void()>;

class Test
{
    name_handler name_;
    full_name_handler full_name_;

public:
    void set_name_handler(name_handler name)
    {
        name_ = name;
    }

    void set_full_name_handler(full_name_handler full_name)
    {
        full_name_ = full_name;
    }

    void to_string()
    {
        name_();
        full_name_();
    }
};

int main()
{
    User user;

    auto start = std::chrono::high_resolution_clock::now();

    for (int i = 0; i < 1000000; i++)
    {
        to_string(&user);
    }

    auto end = std::chrono::high_resolution_clock::now();
    std::cout << "Counter: " << counter << std::endl;
    std::cout << "Time: " << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "us" << std::endl;

    counter = 0;
    auto name = []() {};
    auto full_name = []() { counter++; };

    Test test;
    test.set_name_handler(name);
    test.set_full_name_handler(full_name);

    start = std::chrono::high_resolution_clock::now();

    for (int i = 0; i < 1000000; i++)
    {
        test.to_string();
    }

    end = std::chrono::high_resolution_clock::now();
    std::cout << "Counter: " << counter << std::endl;
    std::cout << "Time: " << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "us" << std::endl;

    return 0;
}

后记

查找资料的时候，翻到类似的代码片段 functionperformance.cpp

#include <iostream>
#include <chrono>
#include <memory>
#include <functional>

using namespace std;
using namespace std::chrono;

class Base
{
public:
	Base(){}
	virtual ~Base(){}
	virtual int func(int i) = 0;
};

class Derived : public Base
{
public:
	Derived(int base = 10) : base{base}
	{

	}
	~Derived(){}

	virtual int func(int i)
	{
		return i*base;
	}
private:
	int base;
};

struct Func
{
	int base;
	int operator()(int i)
	{
		return i*base;
	}
	Func(int base) : base {base}
	{

	}
};
const int base = 10;
int calculate(int i)
{
	return base*i;
}

int main()
{
	const int num = 10000;
	Base *p = new Derived{10};
	int total = 0;
	auto start = high_resolution_clock::now();
	for (int i = 0; i < num; ++i)
	{
		total += p->func(i);
	}
	auto end = high_resolution_clock::now();
	std::cout<<"result: "<<total<<"\nvirtual call elapsed: \t"<<duration_cast<nanoseconds>(end-start).count()<<" nanoseconds.\n"<<std::endl;

	total = 0;
	start = high_resolution_clock::now();
	for (int i = 0; i < num; ++i)
	{
		total += calculate(i);
	}
	end = high_resolution_clock::now();
	std::cout<<"result: "<<total<<"\ndirect function call elapsed: \t"<<duration_cast<nanoseconds>(end-start).count()<<" nanoseconds.\n"<<std::endl;

	Func functor{10};
	total = 0;
	start = high_resolution_clock::now();
	for (int i = 0; i < num; ++i)
	{
		total += functor(i);
	}
	end = high_resolution_clock::now();
	std::cout<<"result: "<<total<<"\nfunctor call elapsed: \t"<<duration_cast<nanoseconds>(end-start).count()<<" nanoseconds.\n"<<std::endl;
	int base = 10;
	function<int(int)> lambda = [base](int i)
	{
		return i*base;
	};
	total = 0;
	start = high_resolution_clock::now();
	for (int i = 0; i < num; ++i)
	{
		total += lambda(i);
	}
	end = high_resolution_clock::now();
	std::cout<<"result: "<<total<<"\nlambda call elapsed: \t"<<duration_cast<nanoseconds>(end-start).count()<<" nanoseconds.\n"<<std::endl;
	return 0;
}

/*
test on mac mini i7 2.7GHz
clang++ -std=c++11 chronotest.cpp -O0
output:
result: 499950000
virtual call elapsed: 	43171 nanoseconds.

result: 499950000
direct function call elapsed: 	31379 nanoseconds.

result: 499950000
functor call elapsed: 	41497 nanoseconds.

result: 499950000
lambda call elapsed: 	207416 nanoseconds.
===================================================
clang++ -std=c++11 chronotest.cpp -O1
output:
result: 499950000
virtual call elapsed: 	26144 nanoseconds.

result: 499950000
direct function call elapsed: 	22384 nanoseconds.

result: 499950000
functor call elapsed: 	33477 nanoseconds.

result: 499950000
lambda call elapsed: 	55799 nanoseconds.
===================================================
clang++ -std=c++11 chronotest.cpp -O2
result: 499950000
virtual call elapsed: 	22284 nanoseconds.

result: 499950000
direct function call elapsed: 	36 nanoseconds.

result: 499950000
functor call elapsed: 	30 nanoseconds.

result: 499950000
lambda call elapsed: 	28292 nanoseconds.

===================================================
clang++ -std=c++11 chronotest.cpp -O3
result: 499950000
virtual call elapsed: 	18975 nanoseconds.

result: 499950000
direct function call elapsed: 	29 nanoseconds.

result: 499950000
functor call elapsed: 	30 nanoseconds.

result: 499950000
lambda call elapsed: 	22542 nanoseconds.
===================================================
clang++ -std=c++11 chronotest.cpp -O4

result: 499950000
virtual call elapsed: 	22141 nanoseconds.

result: 499950000
direct function call elapsed: 	30 nanoseconds.

result: 499950000
functor call elapsed: 	30 nanoseconds.

result: 499950000
lambda call elapsed: 	22584 nanoseconds.
*/

这里多了两种模式，普通函数和仿函数，提供接口回调的方式和直接调用比较，性能损耗是数量级的差异，仿函数性能和函数接近，有时候仿函数的性能更优，编译原理这块算是知识盲区，猜测是由于访问的变量地址和函数挨着，有利于CPU处理

附上 wandbox 运行结果

result: 499950000
virtual call elapsed: 6143 nanoseconds.

result: 499950000
direct function call elapsed: 30 nanoseconds.

result: 499950000
functor call elapsed: 31 nanoseconds.

result: 499950000
lambda call elapsed: 15134 nanoseconds.

文章目录

前言

正文

代码

后记