ProvenanceWeek will take place on April 30-May 1. Following successful previous ProvenanceWeek events, this year’s instalment will again co-locate the IPAW and TaPP workshops in conjunction with WebConf 2023. IPAW and TaPP build on a successful history of provenance workshops that bring together researchers from a wide range of computer science fields including workflows, semantic web, databases, high performance computing, distributed systems, operating systems, programming languages, and software engineering, as well as researchers from other fields, such as biology and physics that have urgent provenance needs.
Provenance is increasingly important in data science, workflow systems, and many other areas, particularly to support transparency, accountability and explanations. By providing a record of the data creation process and of dependencies between data, provenance information is essential for tracing errors in transformed data back to erroneous inputs, access control, auditing, repeatability and reproducibility, evaluating data quality, and establishing ownership of data.
Happy Birthday W3C PROV! The W3C PROV standard is 10years old. As a part of ProvenanceWeek 2023, we will be celebrating the standards creation and impact. There will be cake.
All speakers and presenters participating in any way are expected to attend the conference in person. For exceptional reasons, if you are not able to attend in person to present, you may assign a proxy who must be in person.
9:15-9:30 am: Welcome and Introduction.
9:30-10:30 am: Keynote by Vanessa Braganholo
Title: Myths and Truths about noWorkflow
Chair: Yuval Moskovitch
Abstract: Scientific experiments are frequently written as scripts. While scripts are powerful and convenient, they lack provenance support. With that problem in mind, we designed noWorkflow aiming at transparently and automatically capturing provenance of Python scripts. In its 10 years of existence, it has been used for different purposes. In this talk, we provide an overview of noWorkflow’s evolution throughout the years and analyze myths and truths about it. To do that, we go over the 150+ papers that cite noWorkflow and group their main claims, classifying them as myths or truths.
Bio: Vanessa Braganholo is an Associate Professor at the Instituto de Computação of Universidade Federal Fluminense (UFF), Brazil, since 2014. Before that, she worked as an Assistant Professor at Universidade Federal do Rio de Janeiro (UFRJ). She holds a Ph.D. degree (2004) in Computer Science from Universidade Federal do Rio Grande do Sul (UFRGS). She received the Best PhD Thesis Award of the Brazilian Computer Society in 2005. She has published over 120 papers in journals and conferences and acted as the PC Chair of IPAW in 2020 and 2021. Her research area is databases, and her current research interests include provenance.
10:30-10:50 am Break
10:50 am-12:30 pm Session 1
Chair: Vanessa Braganholo
12:30-1:30 pm Lunch break
1:30-2:30 pm : Keynote by Boris Glavic
Title: Provenance, Relevance-based Data Management, and the Value of Data
Chair: Tanu Malik
Abstract: In this talk, I introduce the audience to relevance-based data management (RBDM), a new paradigm where information about what data is "relevant" for producing a result of a computation, i.e., the data provenance of the computation, is used to improve management of the data, e.g., by restricting computations to what data is relevant or by making caching decisions based on relevance. I will present provenance-based data skipping (PBDS) as one example of relevance-based data management. In PBDS, a system maintains lightweight sketches of the provenance of queries based on a (virtual) horizontal partitioning of tables which are captured at query runtime. Such provenance sketches are then utilized to speed-up the evaluation of future queries, by translating them into selection conditions that can utilize the existing physical design of a databases. Additionally, I will discuss preliminary ideas on the use of relevance as an objective metric for the value of data.
Bio: Boris Glavic (http://www.cs.iit.edu/~dbgroup/members/bglavic.html) is an Associate Professor in the Department of Computer Science at Illinois Institute of Technology leading the IIT DBGroup. His research spans several areas of database systems and data science including data provenance, explanations, data integration, query execution and optimization, uncertain data, and data curation. Boris strives to build systems that are based on solid theoretical foundations.
2:30-3:00 pm Town Hall
9:30-10:30 am: Keynote by Sudeepa Roy
Title: Provenance Semirings: Beautiful Theory and Applications in Query Debugging for Database Education
Chair: Paul Groth
Abstract: Database provenance tracks the relationships between input tuples and outputs of a relational query that transforms the source data into a desired view. In a seminal work by Green, Karvounarakis, and Tannen (PODS 2007), Provenance Semirings were proposed as a formal and general approach to track data provenance as annotations to tuples produced in different stages of the transformation in a query, which was later extended to aggregate operations by Amsterdamer, Deutch, and Tannen (PODS 2011). In this talk, I will give an overview of provenance semirings and describe how the powerful concept of provenance semirings has been used in our work on building tools for helping new programmers and students learn, debug, and trace relational queries. In our work, we defined and studied the complexity of the “small counterexample” problem for query debugging, built tools for relational algebra and SQL efficiently tracking provenance with user interfaces for easy exploration and understanding, and studied the effectiveness of these tools in database education for helping students debug queries in classroom settings. The "RATest" tool for debugging relational algebra has been successfully used by more than 1000 students in our undergraduate and graduate classes in the last few years.
This is joint work with Zhengjie Miao, Jun Yang, Yihao Hu, Amir Gilad, Kristin Stephens-Martinez, and many other graduate and undergraduate students in the HNRQ project.
Bio: Sudeepa Roy is an Associate Professor in Computer Science at Duke University. She works broadly in data management, with a focus on foundational aspects of big data analysis, which includes causality and explanations for big data, data provenance, query optimization, data repair, probabilistic databases, database theory, and applications of database techniques in other domains. Before she joined Duke in 2015, she did a postdoc at the University of Washington, and obtained her Ph.D. from the University of Pennsylvania. She is a recipient of the VLDB Early Career Research Contributions Award, an NSF CAREER Award, and a Google Ph.D. fellowship in structured data. She co-directs the Almost Matching Exactly (AME) lab for interpretable causal inference at Duke (https://almost-matching-exactly.github.io/).
10:30-10:50 am Break
10:50 am-12:30 pm Session 2
Chair: Sudeepa Roy
12:30-1:30 pm Lunch break
1:30-3:00 pm Session 3
Chair: Boris Glavic
3:00-3:30 pm Break
3:30-4:30 pm : W3C PROV 10 years celebration Panel
We invite innovative and creative contributions, including papers outlining new formal approaches to provenance, innovative use of provenance, experience-based insights, and visionary ideas.
ProvenanceWeek will be held in conjunction with WebConf 2023 in Austin Texas on Sunday, April 30, and Monday, May 1, 2023.
All deadlines are end-of-day in the Anywhere on Earth (AoE) time zone.
Submission is via easychair, using the following link: https://easychair.org/conferences/?conf=thewebconf2023iwpd
Papers must be submitted in PDF format according to the ACM template published in the ACM guidelines, selecting the generic “sigconf” sample. The PDF files must have all non-standard fonts embedded. Workshop papers must be self-contained and in English. Papers should not exceed 12 pages in length (maximum 8 pages for the main paper content + maximum 2 pages for appendixes + maximum 2 pages for references).
Authors should indicate the track in the title (IPAW, TAPP, DEMO, POSTER or BEST)
Papers can be submitted to one of the following tracks:
In addition to regular research papers, we also encourage submissions of thefollowing flavors:
We also encourage the presentation of ongoing work as posters or demonstrations. Proposals for posters or demonstrations should be formatted and submitted as described above, with the following additional restrictions: