Using Steam Agent for API Interaction: Democratizing Data Commoditization for Mass Consumption
Last updated
Last updated
With the advancements of large language models (LLMs) in the field of natural language processing, there is a growing interest in leveraging their capabilities to simplify software interactions. In this paper, we propose a novel system that integrates LLMs to classify natural language inputs into corresponding API calls and automate the generation of sample datasets for invoking specific API functionalities. Through natural language commands, our system enables users to access complex API functionalities with simple inputs, thereby enhancing interaction efficiency and lowering the barriers to API usage. This fundamentally shifts API consumption towards non-technical groups and facilitates the rapid commercialization of consumable access.
In recent years, large language models (LLMs) have made significant progress in the field of natural language processing (NLP) [1, 2, 3], demonstrating outstanding performance in tasks ranging from text generation to solving complex problems across various industries such as finance, healthcare, art, and customer service [4, 5, 6]. These advancements have also sparked increasing exploration into the potential of LLMs to simplify and optimize software interactions. Machine learning and advanced deep learning techniques have been extensively studied to enhance the integration of software systems and optimize various applications [7, 8, 9]. As a further development, LLMs are now being researched as powerful tools to make software systems more intuitive and user-friendly for users of varying technical levels [10, 11].
Traditionally, users interact with software through application programming interfaces (APIs), which are crucial for communication between different software applications [12]. However, interacting with APIs often requires a deep understanding of their structure, parameters, and specific calls, posing a barrier for non-technical users or those unfamiliar with the underlying logic of APIs [13]. Integrating LLMs into API management workflows presents an opportunity to interact with APIs through simple, natural language inputs, opening up new possibilities for users with varying technical levels and needs [14, 15].
However, deploying LLMs for API management involves several challenges, primarily related to ensuring that the models can accurately interpret and classify natural language inputs into the correct API calls. Given the diverse structures of APIs and the variability of user inputs based on context, developing a reliable system to evaluate the performance of LLMs across different use cases is crucial. To address these challenges, we propose a novel system that integrates LLMs with two core functionalities. First, it utilizes LLMs to interpret and classify natural language inputs, accurately mapping them to the corresponding API calls. The second component involves using LLMs to automatically generate sample datasets for specific API functionalities, which is essential for systematically evaluating the performance of LLMs in various API classification tasks. Unlike traditional methods, our framework offers a scalable and replicable solution, ensuring that API workflows can be comprehensively tested with high accuracy and relevance to real-world applications.
We conducted extensive experiments on various API functionalities using invocation commands and evaluated the classification capabilities of several well-known LLMs, including GPT-4, GPT-4o-mini, Claude 3.5 Sonnet, GPT-4o (Aug '24), DeepSeek-V2-Chat, DeepSeek-V2.5, LLaMA-3-8B, and Gemini-1.5. The results revealed significant performance differences among the models, ranked from highest to lowest readiness as follows: Claude 3.5 Sonnet, GPT-4o (Aug '24), and GPT-4o-mini. These findings demonstrate the potential of LLMs in API classification, emphasizing the importance of carefully selecting models in different environments, and highlighting the effectiveness of our system as an efficient and practical modular API management tool.
Over the past decade, with the rapid development of machine learning, its applications have expanded across multiple fields, fundamentally transforming industries such as technology [16, 17], healthcare [18, 19, 20, 21, 22, 23], finance [24], and road construction [25]. These advancements not only address complex technical challenges but also provide simplified solutions that enhance user experience [26, 27, 28]. One of the most transformative breakthroughs has been the rise of natural language models. By leveraging deep learning techniques, natural language interfaces enable users to interact with complex systems through simple and intuitive commands [29, 30, 31, 32, 33], significantly lowering the barriers to traditional technical operations and making them accessible to non-expert users.
A notable application is in the field of data querying, where models can parse natural language queries and convert them into structured commands like SQL, allowing non-technical users to retrieve information from databases without needing to understand the underlying syntax [34]. In addition to data querying, LLMs have been integrated into DevOps automation, enabling users to initiate and manage complex workflows using simple commands. This integration simplifies tasks such as infrastructure configuration, deployment, and system monitoring, making traditionally complex areas of DevOps more accessible through a friendlier interaction model [35].
Similarly, the potential of LLMs is being explored for API interactions. Although existing models like Codex can generate code snippets, the field of API call retrieval remains underdeveloped [36]. The complexity of APIs, which often involves multiple protocols, data formats, and domain-specific parameters, presents unique challenges.
We had already implemented the construction, orchestration, and deployment of a multi-agent system before OpenAI released the Swarm [37] experimental framework. We have pushed it to a production level, and the system is defined as follows:
The API retrieval framework is an automated pipeline capable of efficiently handling user queries, ensuring that each query is correctly classified, passed to the corresponding API function, and the results are efficiently returned to the user. Its structured workflow can be divided into the following key stages:
The system first receives the user's natural language query. During the prompting process, the user's input is combined with predefined prompt instructions and then sent to the LLM. These instructions define the hierarchy of the API and establish specific output format rules, ensuring that the system's responses align with the structure and functional requirements of the API. The complexity of the input can range from simple questions to more complex commands. The flexibility of LLMs allows the system to interpret and handle a wide range of user inputs, even when the input phrasing is ambiguous.
Once the query is received, the integrated LLM maps it to the corresponding API functionality. Specifically, the LLM processes the query and returns a label that classifies it according to the predefined API hierarchy. At the same time, relevant keywords required for the input parameters of the API functionality are retrieved. This label determines the API module and specific functionality needed to fulfill the user's request. To ensure efficient processing among available resources, a load balancer is employed to distribute incoming queries.
After the label is returned, the API identifier requests the server to obtain the routing for rendering the API functionality. This step involves dynamically mapping the keywords extracted from the query to the input parameters required by the API functionality. For example, in the case of the Coingecko API, the API call is then executed to retrieve the corresponding JSON feedback data.
Once the API call has been processed, the results are returned in a user-readable format. This step includes error handling. If any issues arise during the API execution, such as invalid parameters or API call failures, the system provides relevant feedback to the user. Additionally, a search history feature allows users to view past queries, adding a layer of functionality for repeated interactions.
This end-to-end framework automates the entire API interaction process, minimizing manual intervention and ensuring that user queries are processed efficiently and accurately. By integrating the API hierarchy during the prompting process, the system ensures ease of scalability, allowing for the addition of new API categories and functionalities as needed.
Large language models (LLMs) can read the JSON object parameters returned by APIs and present the content back to users in a natural language-readable format. This shift brings about a series of significant transformations, transitioning from a technician-dominated limitation to a commercially consumable model accessible to ordinary users, primarily reflected in the following aspects:
Before the involvement of large language models (LLMs) in API data processing, the use of APIs was primarily limited to technicians and developers. Ordinary users often lacked the necessary technical knowledge to interact directly with APIs. This limitation led to several drawbacks:
Knowledge Barrier: Ordinary users needed to possess certain programming skills and knowledge of API usage to effectively utilize the data and services provided by APIs. This exclusion limited the market's widespread application.
High Costs: Businesses had to hire technical personnel to develop and maintain integrations with APIs, increasing operational costs. For data providers, an unstable customer base restricted their motivation for innovation.
Slow Response Times: Technicians often spent a significant amount of time processing API requests and parsing data. This resulted in delays in information retrieval, affecting the timeliness of decision-making.
Due to the concentration of API usage primarily in the hands of technicians, the market's potential has not been fully realized:
Narrow User Base: Only users with a technical background can effectively utilize APIs, leading to limitations in market demand. Users in many industries and fields are unable to enjoy the conveniences offered by APIs.
Limited Innovation: The dominance of technicians restricts user feedback and needs, resulting in a lack of diversity and innovation in the development and improvement of APIs. There may be many unmet needs in the market.
The introduction of LLMs has made API data processing more intuitive and user-friendly, allowing ordinary users to easily access and utilize the data provided by APIs. This transformation brings the following benefits:
Lowered Knowledge Barrier: Users can interact with the system using natural language without needing programming skills. This enables more non-technical users to participate in data consumption, expanding the user base.
Enhanced User Experience: Through natural language processing, users can more conveniently obtain the information they need, reducing learning costs and usage barriers. This friendly user experience is likely to attract more users to utilize related services.
The involvement of LLMs not only enhances the user experience but also drives market expansion and innovation:
Diversified User Base: As the usability of APIs improves, users from more industries can leverage API data for decision-making and analysis. This will promote market diversification and meet the needs of different users.
Encouragement of Innovation: User feedback and needs will be easier to collect and analyze, driving continuous improvement and innovation of APIs. Businesses can develop new features based on actual user demands, enhancing their competitive edge in the market.
The introduction of LLMs has made data processing more efficient, enabling users to access information in real-time and make quick decisions:
Accelerated Response Times: Users can obtain the data they need instantly, reducing waiting times. This is particularly important in the cryptocurrency industry, where quick decision-making is essential.
Enhanced Work Efficiency: Through automated data processing, users can dedicate more time to analysis and decision-making rather than spending it on data retrieval and parsing. This will significantly improve overall work efficiency.
Data providers can collaborate with AI Steam Labs to build an innovative business model that integrates data into our search system, creating an independent agent that can be listed in the Agent Store. This will provide users with convenient data consumption services, delivering data in a natural language-readable format. This section will explore the implementation of this business path and its potential monetization models.
The collaboration between AI Steam Labs and data providers will be based on the following core elements:
Data Integration: The APIs and data sources of data providers will be integrated into the AI Steam Labs search system, transforming them into a user-readable consumption format, allowing users to access and consume data through the platform.
Independent Agents: Data providers will operate as independent agents in the Agent Store, enabling users to interact directly with these agents to obtain the data services they need.
Consumption Points System: Users will use consumption points for payment when consuming data on the platform. The consumption of points will be linked to the amount of data used (e.g., the consumption of 1M tokens), creating a transparent consumption mechanism.
The consumption points system will be the core of this business model, with specific implementation as follows:
Points Acquisition: Users can acquire consumption points through small payments made with stablecoins, as well as through registration rewards, referring friends, participating in activities, and more. This will incentivize users to actively engage with the platform.
Points Consumption: When users utilize data services, points will be deducted based on the consumption amount of 1M tokens. For example, if the consumption of 1M tokens costs 10,000 points, users will have points deducted according to their actual usage.
Points Management: The platform will provide a user interface for managing points, allowing users to view their points balance, consumption history, and acquisition methods at any time, thereby enhancing the user experience.
To achieve revenue, AI Steam Labs and data providers will adopt a revenue-sharing model, detailed as follows:
Sharing Ratio: The revenue-sharing ratio between AI Steam Labs and data providers will be 3:7, meaning AI Steam Labs will receive 30% of the revenue, while data providers will receive 70%. This ratio reflects the importance of data providers in supplying and maintaining the data.
Revenue Sources: Revenue will primarily come from the points fees generated by users consuming data on the platform. As the user base expands and data consumption increases, both parties will experience substantial revenue growth.
Transparent Settlement Mechanism: The platform will establish a transparent settlement mechanism, regularly settling revenue with data providers to ensure that the interests of both parties are protected.
This business path not only creates new revenue opportunities for AI Steam Labs and data providers but also provides significant value to users:
Convenient Data Access: Users can easily access a variety of data services on a single platform, reducing the hassle of switching between different platforms.
Flexible Consumption Methods: By using consumption points, users can flexibly consume data according to their needs, enhancing their sense of participation and satisfaction.
Market Expansion: As data consumption becomes more widespread, AI Steam Labs can attract more users and data providers, further expanding its market share.
As technology continues to advance and market demands evolve, the collaboration model between AI Steam Labs and data providers will continually evolve. Possible future expansion directions include:
Diversification of Data Services: Introducing more types of data providers to enrich the variety of data services on the platform, catering to the needs of different users.
Intelligent Recommendation System: Utilizing AI technology to analyze user consumption behavior and provide personalized data recommendations, enhancing the user experience.
Cross-Platform Collaboration: Partnering with other platforms and service providers to expand the scenarios and applications for data consumption, creating a broader ecosystem.
AI Steam Labs can establish an innovative business path that facilitates convenient data consumption and market expansion. The consumption points mechanism and revenue-sharing model will create sustainable revenue opportunities for both parties while providing users with greater value and experience. This business model not only aligns with current market trends but also lays a solid foundation for future development.
[1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin.Attention is all you need.In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 6000–6010, Red Hook, NY, USA, 2017. Curran Associates Inc.
[2] Jacob Devlin.Bert: Pre-training of deep bidirectional transformers for language understanding.arXiv preprint arXiv:1810.04805, 2018.
[3] Renrui Zhang, Ziyao Zeng, Ziyu Guo, and Yafeng Li.Can language understand depth?In Proceedings of the 30th ACM International Conference on Multimedia, pages 6868–6874, 2022.
[4] Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon Mann.Bloomberggpt: A large language model for finance.arXiv preprint arXiv:2303.17564, 2023.
[5] Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, et al.Composerx: Multi-agent symbolic music composition with llms.arXiv preprint arXiv:2404.18081, 2024.
[6] Yixiao Yuan, Yangchen Huang, Yu Ma, Xinjin Li, Zhenglin Li, Yiming Shi, and Huapeng Zhou.Rhyme-aware chinese lyric generator based on gpt.arXiv preprint arXiv:2408.10130, 2024.
[7] Yijie Weng and Jianhao Wu.Big data and machine learning in defence.International Journal of Computer Science and Information Technology, 16(2), 2024.
[8] Yiyi Tao, Yiling Jia, Nan Wang, and Hongning Wang.The fact: Taming latent factor models for explainability with factorization trees.In Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, pages 295–304, 2019.
[9] Yukun Song.Deep Learning Applications in the Medical Image Recognition.American Journal of Computer Science and Technology, 2(2):22–26, July 2019.
[10] Yuelyu Ji, Zhuochun Li, Rui Meng, Sonish Sivarajkumar, Yanshan Wang, Zeshui Yu, Hui Ji, Yushui Han, Hanyu Zeng, and Daqing He.Rag-rlrc-laysum at biolaysumm: Integrating retrieval-augmented generation and readability control for layman summarization of biomedical texts.arXiv preprint arXiv:2405.13179, 2024.
[11] Tommaso Calò and Luigi De Russis.Leveraging large language models for end-user website generation.In International Symposium on End User Development, pages 52–61. Springer, 2023.
[12] Roy Thomas Fielding.Architectural styles and the design of network-based software architectures.University of California, Irvine, 2000.
[13] Cesare Pautasso, Olaf Zimmermann, and Frank Leymann.Restful web services vs.” big”’web services: making the right architectural decision.In Proceedings of the 17th international conference on World Wide Web, pages 805–814, 2008.
[14] Yaobo Liang, Chenfei Wu, Ting Song, Wenshan Wu, Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, et al.Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis.Intelligent Computing, 3:0063, 2024.
[15] Yifan Song, Weimin Xiong, Dawei Zhu, Wenhao Wu, Han Qian, Mingbo Song, Hailiang Huang, Cheng Li, Ke Wang, Rong Yao, et al.Restgpt: Connecting large language models with real-world restful apis.arXiv preprint arXiv:2306.06624, 2023.
[16] Yixin Jin, Wenjing Zhou, Meiqi Wang, Meng Li, Xintao Li, Tianyu Hu, and Xingyuan Bu.Online learning of multiple tasks and their relationships: Testing on spam email data and eeg signals recorded in construction fields.arXiv preprint arXiv:2406.18311, 2024.
[17] Yiyi Tao.Meta learning enabled adversarial defense.In 2023 IEEE International Conference on Sensors, Electronics and Computer Engineering (ICSECE), pages 1326–1330. IEEE, 2023.
[18] Yiru Gong, Qimin Zhang, Huili Zheng, Zheyan Liu, and Shaohan Chen.Graphical Structural Learning of rs-fMRI data in Heavy Smokers.arXiv preprint arXiv:2409.08395, 2024.
[19] Wanyu Bian, Albert Jang, Liping Zhang, Xiaonan Yang, Zachary Stewart, and Fang Liu.Diffusion modeling with domain-conditioned prior guidance for accelerated mri and qmri reconstruction.IEEE Transactions on Medical Imaging, 2024.
[20] Yumeng Yang, Ashley Gilliam, Ethan B Ludmir, and Kirk Roberts.Exploring the generalization of cancer clinical trial eligibility classifiers across diseases.arXiv preprint arXiv:2403.17135, 2024.
[21] Huili Zheng, Qimin Zhang, Yiru Gong, Zheyan Liu, and Shaohan Chen.Identification of prognostic biomarkers for stage iii non-small cell lung carcinoma in female nonsmokers using machine learning.arXiv preprint arXiv:2408.16068, 2024.
[22] Wanyu Bian, Yunmei Chen, and Xiaojing Ye.An optimal control framework for joint-channel parallel mri reconstruction without coil sensitivities.Magnetic Resonance Imaging, 89:1–11, 2022.
[23] Xintao Li and Sibei Liu.Predicting 30-day hospital readmission in medicare patients: Insights from an lstm deep learning model.medRxiv, 2024.doi:10.1101/2024.09.08.24313212.
[24] Siqiao Zhao, Zhikang Dong, Zeyu Cao, and Raphael Douady.Hedge fund portfolio construction using polymodel theory and itransformer.arXiv preprint arXiv:2408.03320, 2024.
[25] Han-Cheng Dan, Peng Yan, Jiawei Tan, Yinchao Zhou, and Bingjie Lu.Multiple distresses detection for asphalt pavement using improved you only look once algorithm based on convolutional neural network.International Journal of Pavement Engineering, 25(1):2308169, 2024.
[26] Yunyi Zhu, Cedric Honnet, Yixiao Kang, Junyi Zhu, Angelina J Zheng, Kyle Heinz, Grace Tang, Luca Musk, Michael Wessely, and Stefanie Mueller.Demonstration of chromocloth: Re-programmable multi-color textures through flexible and portable light source.In Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pages 1–3, 2023.
[27] Yukun Song, Parth Arora, Rajandeep Singh, Srikanth T. Varadharajan, Malcolm Haynes, and Thad Starner.Going Blank Comfortably: Positioning Monocular Head-Worn Displays When They are Inactive.In Proceedings of the 2023 International Symposium on Wearable Computers, pages 114–118, Cancun, Quintana Roo Mexico, October 2023. ACM.
[28] Yixiao Kang, Zhenglin Zhang, Meiqi Zhao, Xuanhui Yang, and Xubo Yang.Tie memories to e-souvenirs: Hybrid tangible ar souvenirs in the museum.In Adjunct Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, pages 1–3, 2022.
[29] Xinhao Zhang, Zaitian Wang, Lu Jiang, Wanfu Gao, Pengfei Wang, and Kunpeng Liu.Tfwt: Tabular feature weighting with transformer.arXiv preprint arXiv:2405.08403, 2024.
[30] Yuelyu Ji, Yuhe Gao, Runxue Bao, Qi Li, Disheng Liu, Yiming Sun, and Ye Ye.Prediction of covid-19 patients’ emergency room revisit using multi-source transfer learning.In 2023 IEEE 11th International Conference on Healthcare Informatics (ICHI), pages 138–144. IEEE, 2023.
[31] Ziyao Zeng, Daniel Wang, Fengyu Yang, Hyoungseob Park, Stefano Soatto, Dong Lao, and Alex Wong.Wordepth: Variational language prior for monocular depth estimation.In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9708–9719, 2024.
[32] Xiaojing Fan and Chunliang Tao.Towards resilient and efficient llms: A comparative study of efficiency, performance, and adversarial robustness.arXiv preprint arXiv:2408.04585, 2024.
[33] Jinghan Zhang, Xiting Wang, Yiqiao Jin, Changyu Chen, Xinhao Zhang, and Kunpeng Liu.Prototypical reward network for data-efficient rlhf.arXiv preprint arXiv:2406.06606, 2024.
[34] Muhammad Shahzaib Baig, Azhar Imran, Aman Ullah Yasin, Abdul Haleem Butt, and Muhammad Imran Khan.Natural language to sql queries: A review.International Journal of Innovations in Science Technology, 4:147–162, 2022.
[35] Deep Mehta, Kartik Rawool, Subodh Gujar, and Bowen Xu.Automated devops pipeline generation for code repositories using large language models.arXiv preprint arXiv:2312.13225, 2023.
[36] Gabriel Poesia, Oleksandr Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher Meek, and Sumit Gulwani.Synchromesh: Reliable code generation from pre-trained language models.arXiv preprint arXiv:2201.11227, 2022.