****************Call for Participation*************************** Program Title: DIMACS Workshop on Data Quality, Data Cleaning and Treatment of Noisy Data Program Dates: November 3 - 4, 2003 Location of Program: DIMACS Center, CoRE Bldg, Rutgers University, Piscataway, NJ Organizers: Parni Dasu, AT&T Labs, tamr at research.att.com Contacts: Parni Dasu, AT&T Labs, tamr at research.att.com Deadlines: Abstracts for contributed papers and posters: Sept. 6, 2003 WWW Information: http://dimacs.rutgers.edu/Workshops/DataCleaning/ ***************************************************** The word "data" has taken on a broad meaning in the last five years. It is no longer a set of numbers or even text. New data paradigms include data streams characterized by a high rate of accumulation, web scraped documents and tables, web server logs, images, audio and video, to name a few. Well-known challenges of heterogeneity and scale continue to grow as data are integrated from disparate sources and become more complex in size and content. While new paradigms have enriched data, the quality of data has declined considerably. In earlier times, data were collected as a part of pre-designed experiments where data collection could be monitored to enforce data quality standards. The data sets themselves were small enough that even if data collection was unsupervised, the data could be quickly scrubbed through highly manual methods. Today, neither monitoring of data collection nor manual scrubbing of data is feasible due to the sheer size and complexity of the data. An additional challenge in addressing data quality is the domain dependence of problems and solutions. Metadata and domain expertise have to be discovered and incorporated into the solutions, entailing an extensive interaction with widely scattered experts. This particular aspect of data quality makes it difficult to find general one-size-fits-all solutions. However, the process of discovering metadata and domain expertise can be automated through the development of appropriate tools and techniques such as data browsing and exploration, knowledge representation and rule based programming. Many disciplines have taken piecemeal approaches to data quality. The areas of process management statistics, data mining database research and metadata coding have all developed their own ad hoc approaches to solve different pieces of the data quality puzzle. These include statistical techniques for process monitoring, treatment of incomplete data and outliers, techniques for monitoring and auditing data delivery processes, database research for integration, discovery of functional dependencies and join paths, and languages for data exchange and metadata representation. We need an integrated end-to-end approach within a common framework, where the various disciplines can complement and leverage each other's strengths. In this workshop, our broad objective is to bring together experts from different research disciplines to initiate a comprehensive technical discussion on data quality, data cleaning and treatment of noisy data. Specifically, * To provide an overview of the existing research in data quality * To present data quality as a continuous, end-to-end concept * To discuss and update the definition of data quality, to develop metrics for measuring data quality * To emphasize data exploration, data browsing and data profiling for validating schema specific constraints and identifying aberrations * To focus on disciplines such as knowledge representation and rule based programming for capturing and validating domain specific constraints * To highlight applications, case studies * To present research tools and techniques * To identify research problems in data quality and data cleaning Workshop Format The format of the workshop will be a combination of invited talks, contributed papers and posters. Invited and contributed talks will be published in the workshop proceedings. *********************************************************************** Call for Participation: Participants interested in submitting contributed papers and posters, please send an extended abstract (maximum 5 pages including references and figures) for contributed papers or slides for a poster with an accompanying write-up (maximum 2 pages) explaining the slides. The abstracts/poster write-ups should be sent to tamr at research.att.com by Saturday, Sept 6, 2003. Notification of acceptance by Oct 1, 2003. Submission templates A pdf version of the file: sample.pdf (http://dimacs.rutgers.edu/Volumes/sample.pdf) A LaTeX version of the file: sample.txt (http://dimacs.rutgers.edu/Volumes/sample.txt) This DIMACS sample is designed with the goal of uniformizing the appearance of all papers in the volumes, which will make them much more appealing in appearance. It is strongly suggested that all submissions be made in latex. Assistance can be provided with any needed conversion to the proper format by contacting tech at dimacs.rutgers.edu ************************************************************** Registration Fees: (Pre-registration deadline: October 27, 2003) Regular rate Preregister before deadline $120/day After preregistration deadline $140/day Reduced Rate* Preregister before deadline $60/day After preregistration deadline $70/day Postdocs Preregister before deadline $10/day After preregistration deadline $15/day DIMACS Postdocs $0 Non-Local Graduate & Undergraduate students Preregister before deadline $5/day After preregistration deadline $10/day Local Graduate & Undergraduate students $0 (Rutgers & Princeton) DIMACS partner institution employees** $0 DIMACS long-term visitors*** $0 Registration fee to be collected on site, cash, check, VISA/Mastercard accepted. Our funding agencies require that we charge a registration fee for the workshop. Registration fees cover participation in the workshop, all workshop materials, breakfast, lunch, breaks, and any scheduled social events (if applicable). * College/University faculty and employees of non-profit organizations will automatically receive the reduced rate. Other participants may apply for a reduction of fees. They should email their request for the reduced fee to the Workshop Coordinator at workshop at dimacs.rutgers.edu. Include your name, the Institution you work for, your job title and a brief explanation of your situation. All requests for reduced rates must be received before the preregistration deadline. You will promptly be notified as to the decision about it. ** Fees for employees of DIMACS partner institutions are waived. DIMACS partner institutions are: Rutgers University, Princeton University, AT&T Labs - Research, Bell Labs, NEC Laboratories America and Telcordia Technologies. Fees for employees of DIMACS affiliate members Avaya Labs, IBM Research and Microsoft Research are also waived. ***DIMACS long-term visitors who are in residence at DIMACS for two or more weeks inclusive of dates of workshop. *************************************************************** Information on participation, registration, accommodations, and travel can be found at: http://dimacs.rutgers.edu/Workshops/DataCleaning/ **PLEASE BE SURE TO PRE-REGISTER EARLY** ***************************************************************