ElephantDM

Introduction

Scaia ElephantDM is a distributed analytical database designed to replace the Hadoop + MPP hybrid architecture. It supports standard SQL syntax and provides advanced capabilities such as multi-model analytics, real-time data processing, compute-storage separation, mixed workloads, data federation, and heterogeneous server hybrid deployment. As a multi-model database, ElephantDM is built on a unified multi-model architecture, supporting 10 mainstream data models, including relational, wide table, time-series, geospatial, search, and graph. With a single ElephantDM database, businesses can efficiently handle OLAP, AETP, multi-model integrated analytics, federated computing, data warehousing, real-time data warehouses, and unified lakehouse architectures. As an excellent database product, it has successfully replaced products such as Oracle, DB2 and Teradata in various industries.

Architecture

Scaia ElephantDM has a modern distributed analytical database consisting of several key modules, each designed to enhance performance, scalability, and reliability in handling complex queries and large-scale data processing. Below is a breakdown of its core components:

SQL Compiler

The Quark Server is a proprietary distributed SQL query engine that serves as the SQL compilation module. It supports multiple SQL dialects, including HiveQL, Oracle, Db2, and Teradata, among others. The module also includes an advanced operator system and a robust type system, ensuring compatibility and extensibility across diverse workloads.

Engine Coordinator

The Engine Coordinator is a lightweight distributed computing orchestration module responsible for dynamically scheduling workloads across multiple Quark Servers. It ensures load balancing, fault detection, and automatic failover, guaranteeing high availability by rerouting tasks away from failed servers.

Multi-Model Database Optimizer

The Multi-Model Database Optimizer refines SQL execution plans using rule-based optimization, cost-based optimization, and materialized views. By automatically selecting the most efficient execution model, it optimally partitions and distributes computational tasks, improving query performance.

Transaction Manager

The Transaction Manager, powered by ElephantDM, serves as the core component for implementing distributed transactions. It ensures data consistency and integrity across multiple nodes in a distributed environment.

Vectorized Execution Engine

The Vectorized Execution Engine enables high-speed query processing by efficiently handling batch storage file reads and responding swiftly to both simple and complex queries. This module significantly enhances performance by leveraging modern CPU optimizations.

Distributed Computing Engine

The Distributed Computing Engine is designed for high performance, scalability, and stability. It efficiently processes large-scale analytical workloads, ensuring optimal resource utilization across a distributed environment.

Distributed Metadata Management System

The Metadata Catalog is a scalable metadata management system that supports metadata storage and retrieval for millions of tables. It plays a crucial role in organizing and tracking schema information across the distributed database.

Distributed Data Management System

The Distributed Data Management System ensures strong or eventual consistency across multiple data replicas. It dynamically manages data partitioning, storage scaling, and redistribution, optimizing storage resource utilization. Additionally, it guarantees high availability, ensuring uninterrupted data storage services even in the event of hardware failures.

This architecture enables the database to handle large-scale analytical workloads efficiently, providing businesses with a robust, fault-tolerant, and high-performance distributed data processing platform.

Advantages

 

Comprehensive SQL Compatibility

Fully supports standard SQL syntax and is compatible with Oracle, IBM DB2, and Teradata dialects. It also supports Oracle and DB2 stored procedures, enabling seamless data migration.

High-Performance Compute Engine

Leverages self-developed computation optimization techniques to deliver exceptional performance and efficiency.

Advanced Hardware Adaptation

Features a proprietary storage engine designed to unlock the full potential of modern storage hardware, providing cost-effective and high-performance solutions.

Versatile Multi-Model Support

Offers unified data management with support for four key data models: relational, search, text, and object.

Seamless Federated Computing

Eliminates multi-source data syntax inconsistencies and ETL complexities by enabling cross-database and cross-platform query, computation, and analysis through a unified query language.

Application Scenarios

 

Online Analytical Processing (OLAP)

Replaces Teradata and Cloudera to build a next-generation multi-model, highly scalable, and high-performance data warehouse. No manual modifications to complex legacy SQL are needed, as rich SQL dialect support ensures near-zero migration cost. Supports on-premise deployment.

Unified Data Lakehouse

Integrates data lake and data warehouse technologies within a single platform, eliminating the need for data movement. It enables seamless storage of raw, processed, and modeled data in a unified lakehouse architecture. This allows for high concurrency, precision, and high-performance historical and real-time data queries, while also supporting analytical workloads such as reporting, batch processing, and data mining.

Real-time Data Warehouse

Enables real-time data ingestion with high throughput and low latency, allowing immediate data analysis upon ingestion. Supports SQL-based Complex Event Processing (CEP) and ensures end-to-end exactly-once semantics.

Analytical & Event Transaction Processing (AETP)

Supports hybrid OLAP + OLTP transactions, with transparent row-column storage for upper-layer applications. Enables real-time transaction data analysis directly from row storage via computational engines, eliminating the need for data synchronization.

Federated Computing

Facilitates cross-platform data association and analysis through standard SQL, enabling seamless interoperability. Supports computation pushdown to the source database, leveraging source computation power to reduce data transfer and enhance overall performance. Eliminates the need for data migration, aggregation, or ETL, allowing direct multi-database query execution across multiple databases.