Cloud-native data warehouses have revolutionized data analysis by enabling elasticity,high availability and lower costs.And the increasing popularity of artificial intelligence(AI)drives data warehouses to provide pre...Cloud-native data warehouses have revolutionized data analysis by enabling elasticity,high availability and lower costs.And the increasing popularity of artificial intelligence(AI)drives data warehouses to provide predictive analytics besides the existing descriptive analytics.Consequently,more vendors start to support training and inference of AI models in data warehouses,exploiting the benefits of near-data processing for fast model development and deployment.However,most of the existing solutions are limited by a complex syntax or slow data transportation across engines.In this paper,we present GaussDB-AISQL,a composable SQL system with AI capabilities.GaussDB-AISQL adopts a composable system design that decouples computing,storage,caching,DB engine and AI engine.Our system offers all the functionality needed by end-to-end model training and inference during the model lifecycle.It also enjoys the simplicity and efficiency by providing a SQL-like syntax and removes the burden of manual model management.When training an AI model,GaussDB-AISQL benefits from highly parallel data transportation by concurrent data pulling from the distributed shared memory.The feature selection algorithms in GaussDB-AISQL make the training more data-efficient.When running model inference,GaussDB-AISQL registers the trained model object in the local data warehouse as a user-defined-function,which avoids moving inference data out of the data warehouse to an external AI engine.Experiments show that GaussDB-AISQL is up to 19×faster than baseline approaches.展开更多
基金supported by the fund for building world-class universities(disciplines)of Renmin University of China.
文摘Cloud-native data warehouses have revolutionized data analysis by enabling elasticity,high availability and lower costs.And the increasing popularity of artificial intelligence(AI)drives data warehouses to provide predictive analytics besides the existing descriptive analytics.Consequently,more vendors start to support training and inference of AI models in data warehouses,exploiting the benefits of near-data processing for fast model development and deployment.However,most of the existing solutions are limited by a complex syntax or slow data transportation across engines.In this paper,we present GaussDB-AISQL,a composable SQL system with AI capabilities.GaussDB-AISQL adopts a composable system design that decouples computing,storage,caching,DB engine and AI engine.Our system offers all the functionality needed by end-to-end model training and inference during the model lifecycle.It also enjoys the simplicity and efficiency by providing a SQL-like syntax and removes the burden of manual model management.When training an AI model,GaussDB-AISQL benefits from highly parallel data transportation by concurrent data pulling from the distributed shared memory.The feature selection algorithms in GaussDB-AISQL make the training more data-efficient.When running model inference,GaussDB-AISQL registers the trained model object in the local data warehouse as a user-defined-function,which avoids moving inference data out of the data warehouse to an external AI engine.Experiments show that GaussDB-AISQL is up to 19×faster than baseline approaches.