It might not sound like it, but Oracle is still in the high-end server business, at least when it comes to the big machines running its eponymous relational database. In fact, the company has launched a new generation of Exadata database servers, and the architecture of these machines shows what is – and what isn’t – important for a clustered database to work. better. At least one based on Oracle’s software stack.
The Exadata X1 database appliances officially debuted in September 2008, but Oracle had been shipping them to select customers for a year, just as the Great Recession was beginning and large companies were spending a fortune on big NUMA servers and storage area networks (SANs) with many Fiber Channel switches to link up. calculation to storage. They were looking for a way to spend less money, and Oracle worked with Hewlett Packard’s ProLiant server division to create a cluster using basic X86 servers, flash-accelerated storage engines, and an InfiniBand interconnect using direct access to. low latency remote memory. (RDMA) to tightly couple nodes for database and storage execution. Networking of clients was provided by Ethernet network interfaces. In a sense, Oracle was using InfiniBand as a backplane, and that’s why it took a stake in Mellanox Technologies at the time.
After this experience and after learning that IBM was considering a $ 6.85 billion acquisition of Unix systems powerhouse Sun Microsystems, Oracle co-founder and CEO Larry Ellison grabbed the “hardware religion” and made it happen. January 2009, made a $ 7.4 billion offer to acquire Sun, and in January 2010, the deal was closed. By September 2009, Ellison was so sure the deal would be approved by regulators that the HP iron was quickly disconnected from the Exadata line and replaced with Sun X86 machines running the Linux variant of Oracle – not Sun Sparc equipment. running Solaris Unix. These were the second generation Exadata V2 machines, which were followed by the Exadata X2 and so on. By the time The next platform had been released for its first year in 2016, Oracle was already up to the seventh generation of Exadata X6, with cranks enabled for compute, storage and networking.
As you can see in the graph above, disk and flash storage capacity, number of processor cores, database node memory capacity and Ethernet bandwidth in Exadata clusters have increased steadily. during the first decade of products. The Exadata X7-2 and X7-8 systems were unveiled in October 2017, and Oracle had thousands of customers in all kinds of industries who had retired their large NUMA machines running the Oracle database (the dominant driver of Unix machines three decades ago, two decades ago, ten years ago and today) and replaced them with Exadata iron.
In any Exadata generation, models with the “2” designation have relatively lean main memory and no local flash on the database servers, and models with the “8” designation have eight times the main memory (terabytes instead. hundreds of gigabytes) per node and eight Xeon processor sockets instead of two. And starting with the Exadata X8-2 and X8-8 generation in June 2019, Oracle has moved from InfiniBand to 100 Gb / sec Ethernet with RoCE extensions for RDMA to link cluster nodes together as well as four or two 10 Gb / sec. 25 Gb / s Ethernet ports per database node to communicate with the outside world.
With Generation X8, Exadata storage servers began to come in two versions: a high capacity (HC) variant that mixed flash cards and disk drives, and an Extreme Flash (EF) variant that had twice as many flash cards. PCI-Express but no disk drive. (which offered maximum throughput but much smaller capacity). Oracle has also started using machine learning to automatically tune the clustered database – exactly the sort of thing AI is good at and people are not so good at.
This little bit of history brings us to the tenth generation of Oracle’s Exadata: the X9M-2 and X9M-8 systems announced last week, which offer an unprecedented scale for running clustered relational databases. .
The X9M-2 database server has a pair of 32-core (64-core) Xeon SP “Ice Lake” processors running at 2.6 GHz and comes with a base of 512 GB of main memory expandable in increments of 512 GB to 2 TB. The X9M-2 database server has a pair of 3.84 TB NVM-Express flash drives and another pair can be added. Again, the two-socket database node can have four 10Gb / sec ports or two 25Gb / sec plain vanilla Ethernet ports to connect to applications and users, and it has a pair of RoCE ports. 100 Gb / sec to connect to the database structure and storage server.
The X9M-8 database node is for heavier database nodes that need more cores and more main memory to burn more transactions or burn transactions faster. It has a pair of four-socket motherboards interconnected with NUMA UltraPath Interconnect fabrics to create an eight-socket shared memory system. (This is all based on Intel chipsets and has nothing to do with Sun technology.) The 9XM-8 database server has eight 24-core Ice Lake Xeon SP 8268 processors running at 2.9 GHz, This is 192 cores and approximately 3.4 times the throughput of the 64-core X9M-2 database node. The main memory of the Exadata X9M-8 large database node starts at 3TB and scales up to 6TB. This database server has a pair of 6.4TB NVM-Express cards that plug in in PCI-Express 4.0 slots so that they have plenty of bandwidth and the same networking options as the Exadata X9M-2 database server.
The HC Disk / Flash Hybrid and EF All-Flash Storage Servers are based on a two-socket server node using a pair of Ice Lake Xeon SP 8352Y processors, with 16 cores each operating at 2.2 GHz. The HC node has 256 GB of DDR4 extended DRAM with 1.5 TB of Optane 200 series persistent memory, configured to act as a read and write cache for main memory. The HC chassis can accommodate a dozen 18TB 7.2,000 RPM disk drives and four of the 6.4TB NVM-Express flash drives. The EF chassis has the same DDR4 and PMEM memory configuration, but doesn’t has no disk and eight of the 6.4TB NVM-Express flash cards. Both types of storage servers have a pair of 100 Gb / s switches to connect to the fabric between them and to the database servers .
The first thing to note is that while 200 Gb / sec and 400 Gb / sec Ethernet (even with RoCE support) is commercially available and certainly affordable (well, compared to the price of Oracle software for sure), you will notice that Oracle is sticking to 100 Gbps switching for the Exadata backplane. We wouldn’t be surprised if the company used cable splitters to remove one tier from the 200 Gb / s switch fabric, and if we built a full-scale Exadata cluster ourselves, we would consider using a switch matrix. higher base and buy a lot less set of switches to cross the database and storage servers. A jump to 400 Gb / s switching would provide even more baseline and fewer hops between devices and fewer devices in the fabric.
Let’s talk about scale for a second. Oracle RAC is based on technology that Compaq has licensed to Oracle but was developed for Digital VAX hardware and its VMS operating system. This VAXcluster and TruCluster clustering software was very efficient for HPC database and application clustering, and Digital’s Rdb database had good working database clustering long before Oracle – it’s questionable that you can call Oracle Parallel Server, which predated RAC, a good implementation of a clustered database. It worked in some circumstances, but it was a pain in the neck to deal with.
The Exadata machine provides both a vertical scale – dealing with increasingly large databases – as well as a horizontal scale – dealing with more and more users or transactions. The eight-socket server provides vertical scale and RAC provides horizontal scale. As far as I know, RAC ran out of gas at eight knots when trying to implement a shared database, but modern versions of RAC, including RAC 19c launched in January 2020, use a shared nothing approach. between database nodes and using shared storage to parallelize processing between datasets. (There is a great white paper on RAC that you can read here.) The point is, Oracle has worked really hard to combine sending functions (sending SQL statements to remote storage servers) to spur analysis, and a mixture of sending data and setting Distributed data cache to boost transaction processing and batch jobs (which govern the business) – all in the same database management system.
An Exadata rack has 14 storage servers, with a usable capacity of 3 PB for disk, 358 TB of flash plus 21 TB of Optane PMEM for HC storage and 717 TB of flash and 21 TB of Optane for storage EF. The rack can have two of the eight-socket (384-core) database servers or eight of the two-socket (512-core) servers for database computation. If you remove some of the storage, of course you can add more compute to any Exadata rack. Up to a dozen racks in total can be connected to the RoCE Ethernet fabric with existing switches supplied by Oracle, and with additional switching levels even larger configurations can be built.
When it comes to performance, a single rack with Exadata X9M-8 database nodes and HC disk / flash hybrid storage can perform 15 million random 8K reads and 6.75 million I / O operations. random flash writes per second (IOPS). Switching to EF storage, which is good for data analysis work, a single rack can scan 75Gb / s per server, for a total of 1TB / s on a single rack, which had three of the servers from Eight-node database and eleven EF storage servers.
Finally, Oracle is still the only high-end server manufacturer to publish a price list for its systems, and it has done so for every generation of Exadata machines. A half-rack of the Exadata X9M-2 (with four of the two-socket database servers and seven storage servers) using hybrid disk / flash HC storage costs $ 935,000, and the half-rack with storage 100% EF flash costs the same. So $ 1.87 million per rack.