Publications

2026


  • MVR-cache: Optimizing Semantic Caching via Multi-Vector Retrieval and Learned Prompt Segmentation
    Ali Noshad, Zishan Zheng, Yinjun Wu
    ICML 2026
  • REGATE: Confidence-Calibrated Integration of Temporally-Aligned Exogenous Texts for Dynamic Graphs
    Liangzu Liu, Mengzhe Ruan, Yinjun Wu, Yang Liu, Guanjun Wang
    ACL 2026 (main conference)
  • Can Large Language Models be a Cardinality Estimator? An Empirical study
    Liangzu Liu, Yiyan Wang, Yinjun Wu, Runze Su, Zhuo Chang, Peizhi Wu, Jianjun Chen, Fuxin Jiang, Rui Shi, Bin Cui, Tieying Zhang
    VLDB Journal 2026
  • Hierarchical Scheduling for Multi-Vector Image Retrieval
    Maoliang Li, Ke Li, Yaoyang Liu, Jiayu Chen, Zihao Zheng, Yinjun Wu, Xiang Chen
    DAC 2026
  • DistVec: Efficient Distributed Machine Learning in Parallel Database Systems
    Xinyi Zhang, Liangzu Liu, Xupeng Miao, Yinjun Wu, Zhen Chen, Wei Lu, Xiaoyong Du, Bin Cui
    ICDE 2026

  • 2025


  • OpDiag: Unveiling Database Performance Anomalies through Query Operator Attribution
    Shiyue Huang, Ziwei Wang, Yinjun Wu, Yaofeng Tu, Jiankai Wang, Bin Cui
    IEEE Transactions on Knowledge and Data Engineering (TKDE) 2025
  • POQD: Performance-Oriented Query Decomposer for Multi-vector retrieval
    Yaoyang Liu, Junlin Li, Yinjun Wu, Zhen Chen
    ICML 2025
  • SiriusBI: A Comprehensive LLM-powered Solution for Data Analytics in Business Intelligence
    Jie Jiang, Haining Xie, Siqi Shen, Yu Shen, Zihan Zhang, Meng Lei, Yifeng Zheng, Yang Li, Chunyou Li, Danqing Huang, Yinjun Wu, Wentao Zhang, Xiaofeng Yang, Bin Cui, Peng Chen VLDB industry 2025

  • 2024


  • DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation
    Yinjun Wu, Mayank Keoliya, Kan Chen, Neelay Velingker, Ziyang Li, Emily J Getzen, Qi Long, Mayur Naik, Ravi B Parikh, Eric Wong
    ICML 2024
  • Towards Compositionality in Concept Learning
    Adam Stein, Aaditya Naik, Yinjun Wu, Mayur Naik, Eric Wong
    ICML 2024
  • TorchQL: A Programming Framework for Integrity Constraints in Machine Learning
    Aaditya Naik, Adam Stein, Yinjun Wu, Mayur Naik, and Eric Wong
    OOPSLA 2024 [Code][Paper]

  • 2023


  • Rectifying Group Irregularities in Explanations for Distribution Shift
    Adam Stein, Yinjun Wu, Eric Wong, Mayur Naik
    Neurips 2023 (XAIA workshop) [Paper]
  • Do Machine Learning Models Learn Statistical Rules Inferred from Data?
    Aaditya Naik, Yinjun Wu, Mayur Naik, Eric Wong
    ICML 2023 [Code][Paper]
  • Learning to Select Pivotal Samples for Meta Re-weighting
    Yinjun Wu, Adam Stein, Jacob Gardner, Mayur Naik
    AAAI 2023 (oral) [Code][Paper][Slides]

  • 2022


  • Provenance-based Model Maintenance: Implications for Privacy [Paper]
    Yinjun Wu, Val Tannen and Susan B. Davidson
    IEEE Data Eng. Bull. 2022

  • 2021


  • CHEF: A Cheap and Fast Pipeline for Iteratively Cleaning Label Uncertainties
    Yinjun Wu, James Weimer, Susan B. Davidson [Technical report][Code]
    in Proceedings of the VLDB Endowment 14, no. 11 (2021): 2410-2418.
  • Dynamic Gaussian Mixture based Deep Generative Model For Robust Forecasting on Sparse Multivariate Time Series
    Yinjun Wu, Jingchao Ni, Wei Cheng, Bo Zong, Dongjin Song, Zhengzhang Chen, Yanchi Liu, Xuchao Zhang, Haifeng Chen, Susan B. Davidson [Paper][Full version][Code]
    In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 1, pp. 651-659. 2021

  • 2020


  • DeltaGrad: Rapid retraining of machine learning models
    Yinjun Wu, Edgar Dobriban, Susan B. Davidson [Paper][Slides][Code]
    In International Conference on Machine Learning (ICML), pp. 10355-10366. PMLR, 2020.
  • Lessons learned from the early performance evaluation of Intel Optane DC Persistent Memory in DBMS
    Yinjun Wu, Kwanghyun Park, Rathijit Sen, Brian Kroth, Jaeyoung Do [Paper][Technical report]
    In Proceedings of the 16th International Workshop on Data Management on New Hardware, pp. 1-3. 2020.
  • PrIU: A provenance-based approach for incrementally updating regression models
    Yinjun Wu, Val Tannen, Susan B. Davidson [Paper][Slides]
    In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 447-462. 2020.

  • 2019


  • ProvCite: Provenance-based Data Citation [Paper][Slides]
    Yinjun Wu, Abdussalam Alawini, Daniel Deutch, Tova Milo, Susan B. Davidson
    In Proceedings of the VLDB Endowment (2019), 12(7)

  • 2018


  • Data Citation: Giving Credit where Credit is Due [Paper][Slides]
    Yinjun Wu, Abdussalam Alawini, Susan B. Davidson, Gianmaria Silvello
    In Proceedings of the 2018 International Conference on Management of Data (SIGMOD conference), pp. 99-114. ACM, 2018.
  • Data Citation: A New Provenance Challenge [Paper]
    Abdussalam Alawini, Susan B. Davidson, Gianmaria Silvello, Val Tannen, Yinjun Wu (authors sorted alphabetically)
    IEEE Data Eng. Bull. 41(1): 27-38 (2018)

  • 2017


  • Automating Data Citation in CiteDB [Paper]
    Abdussalam Alawini, Susan Davidson, Wei Hu, Yinjun Wu (authors sorted alphabetically)
    Proceedings of the VLDB Endowment 10.12 (2017): 1881-1884.

  • See my Google Scholar page for full paper list