Linux IO复用历史

select 调用

select 是最早期的 I/O 复用机制之一。它允许一个程序监视多个文件描述符，以查看它们是否可读、可写或有错误条件出现。select 的主要特点是：

简单易用，适合小规模的文件描述符集
跨平台支持广泛，几乎所有的 Unix 系统和类 Unix 系统都支持 select

然而，select 也存在明显的缺陷：

每次调用 select 都需要重新传递文件描述符集，效率较低
文件描述符数量有限制（通常为 1024），不适用于高并发场景
随着文件描述符数量增加，性能急剧下降

假设我们有个叫is_ready的函数（非阻塞检测IO接口是否准备好了），select 的实现原理可以使用下面的伪代码表示：

// Returns true if fd is ready for I/O.
bool is_ready(int fd);

struct fd_info {
  int fd;
  bool ready;
};

int select(set<fd_info> fds, int max_fd) {
  int ready_cnt = 0;
  while (ready_cnt == 0) {
    for (int i = 0; i < max_fd; i++) {
      if (is_ready(i)) {
        auto it = fds.find(i);
        it->ready = true;
        ready_cnt++;
      }
    }
  }
  return ready_cnt;
}

poll 调用

为了克服 select 的一些缺点，Linux 引入了 poll 调用。poll 的功能与 select 类似，但有以下改进：

传递文件描述符集的方式更加灵活，避免了 select 中的复制开销
不存在文件描述符数量的硬性限制，可以处理更大的文件描述符集

假设我们有个叫is_ready的函数（非阻塞检测IO接口是否准备好了），select 的实现原理可以使用下面的伪代码表示：

// Returns true if fd is ready for I/O.
bool is_ready(int fd);

struct fd_info {
  int fd;
  bool ready;
};

int poll(struct fd_info* fds, int nfds) {
  int ready_cnt = 0;
  while(ready_cnt == 0) {
    for (int i = 0; i < nfds; i++) {
      if (is_ready(fds[i])) {
        fds[i].ready = true;
        ready_cnt++;
      } else {
        fds[i] = false;
      }
    }
  }
  return ready_cnt;
}

尽管 poll 改善了 select 的一些问题，但它仍然存在性能瓶颈，尤其是在监视大量文件描述符时。每次调用 poll 仍然需要线性扫描整个文件描述符集，导致性能不佳。

epoll 调用

为了进一步提高 I/O 复用的性能，Linux 2.5.44 版本引入了 epoll。epoll 相比 select 和 poll 有显著的性能优势，特别是在处理大量文件描述符时。epoll 的主要特点包括：

使用事件驱动机制，只在文件描述符状态改变时通知应用程序，避免了不必要的扫描
支持边缘触发（edge-triggered）和水平触发（level-triggered）模式，更加灵活
提供了 epoll_create、epoll_ctl 和 epoll_wait 接口，便于管理和使用

假设我们有一个add_monitor函数，会事件循环监测all_fds的变化。epoll相关函数的伪代码如下

// Start monitoring fds in `all_fds` and constantly adds ready ones to
// `ready_fds`.
void add_monitor(const vector<int>& all_fds, vector<int>& ready_fds);

struct fd_info {
  int fd;
  bool ready;
};

struct epoll_info {
  vector<int> all_fds;
  vector<int> ready_fds;
};

map<int, epoll_info> epoll_info_by_epoll_id;

// Create an epoll instance and return its id.
int epoll_create() {
  return epoll_info_by_epoll_fd.size();
}

// Add a fd to monitor to the epoll instance.
void epoll_add(int epoll_id, int fd) {
  epoll_info_by_epoll_id[epoll_id].push_back(fd);
}

// Wait until at least one fd is ready. Return number of ready fds.
// Afte the function returns, the first `ready_cnt` of `ready_fds` contain
// ready fds. The rest can be ignored.
int epoll_wait(int epoll_id, struct fd_info* ready_fds) {
  int ready_cnt = 0;

  struct epoll_info info = epoll_info_by_epoll_id[epoll_id];
  add_monitor(info.allfds, info.ready_fds);
  while (ready_cnt == 0) {
    ready_cnt = ready_fds.size();
    for (int i = 0; i < ready_cnt; i++) {
      ready_fds[i].fd = ready_fds[i];
      ready_fds[i].ready = true;
    }
  }
  return ready_cnt;
}

io_uring 的出现

io_uring 的引入标志着 Linux I/O 复用的一个新高度。相较于之前的机制，io_uring 提供了更高的性能和更低的延迟，特别适用于 I/O 密集型应用。其主要优势包括：

通过环形缓冲区实现真正的异步 I/O，无需频繁的系统调用
支持多种 I/O 操作类型，不仅限于文件描述符
更高效的事件通知机制，减少了应用程序和内核之间的交互开销