Show icon Show search tips...
Hide icon Hide search tips...

[Sy-cg-global] [Publicity-list] Call for Participation: DIMACS Workshop on Data Quality, Data Cleaning and Treatment of Noisy Data

Linda Casals lindac at
Tue Sep 2 15:15:50 EDT 2003

********Call for Participation**Deadline September 6, 2003*********

Program Title: DIMACS Workshop on Data Quality, Data Cleaning and
Treatment of Noisy Data 

Program Dates: November 3 - 4, 2003 

Location of Program:
DIMACS Center, CoRE Bldg, Rutgers University, Piscataway, NJ

Parni Dasu, AT&T Labs, tamr at 

Parni Dasu, AT&T Labs, tamr at 

Abstracts for contributed papers and posters: Sept. 6, 2003

WWW Information:

The word "data" has taken on a broad meaning in the last five
years. It is no longer a set of numbers or even text. New data
paradigms include data streams characterized by a high rate of
accumulation, web scraped documents and tables, web server logs,
images, audio and video, to name a few. Well-known challenges of
heterogeneity and scale continue to grow as data are integrated from
disparate sources and become more complex in size and content.

While new paradigms have enriched data, the quality of data has
declined considerably. In earlier times, data were collected as a part
of pre-designed experiments where data collection could be monitored
to enforce data quality standards. The data sets themselves were small
enough that even if data collection was unsupervised, the data could
be quickly scrubbed through highly manual methods. Today, neither
monitoring of data collection nor manual scrubbing of data is feasible
due to the sheer size and complexity of the data.

An additional challenge in addressing data quality is the domain
dependence of problems and solutions. Metadata and domain expertise
have to be discovered and incorporated into the solutions, entailing
an extensive interaction with widely scattered experts. This
particular aspect of data quality makes it difficult to find general
one-size-fits-all solutions. However, the process of discovering
metadata and domain expertise can be automated through the development
of appropriate tools and techniques such as data browsing and
exploration, knowledge representation and rule based programming.

Many disciplines have taken piecemeal approaches to data quality. The
areas of process management statistics, data mining database research
and metadata coding have all developed their own ad hoc approaches to
solve different pieces of the data quality puzzle. These include
statistical techniques for process monitoring, treatment of incomplete
data and outliers, techniques for monitoring and auditing data
delivery processes, database research for integration, discovery of
functional dependencies and join paths, and languages for data
exchange and metadata representation.

We need an integrated end-to-end approach within a common framework,
where the various disciplines can complement and leverage each other's
strengths. In this workshop, our broad objective is to bring together
experts from different research disciplines to initiate a
comprehensive technical discussion on data quality, data cleaning and
treatment of noisy data. Specifically,

* To provide an overview of the existing research in data quality

* To present data quality as a continuous, end-to-end concept

* To discuss and update the definition of data quality, to develop 
metrics for measuring data quality

* To emphasize data exploration, data browsing and data profiling for 
validating schema specific constraints and identifying aberrations

* To focus on disciplines such as knowledge representation and rule 
based programming for capturing and validating domain specific constraints

* To highlight applications, case studies

* To present research tools and techniques

* To identify research problems in data quality and data cleaning

Workshop Format

The format of the workshop will be a combination of invited talks,
contributed papers and posters. Invited and contributed talks will be 
published in the workshop proceedings.


Call for Participation:

Participants interested in submitting contributed papers and posters,
please send an extended abstract (maximum 5 pages including references
and figures) for contributed papers or slides for a poster with an
accompanying write-up (maximum 2 pages) explaining the slides. The
abstracts/poster write-ups should be sent to tamr at by
Saturday, Sept 6, 2003. Notification of acceptance by Oct 1, 2003.

Submission templates

A pdf version of the file: sample.pdf 

A LaTeX version of the file: sample.txt 

This DIMACS sample is designed with the goal of uniformizing the
appearance of all papers in the volumes, which will make them much
more appealing in appearance. It is strongly suggested that all
submissions be made in latex. Assistance can be provided with any
needed conversion to the proper format by contacting
tech at 

Registration Fees: 

(Pre-registration deadline: October 27, 2003) 

Regular rate
Preregister before deadline $120/day
After preregistration deadline $140/day

Reduced Rate*
Preregister before deadline $60/day
After preregistration deadline $70/day

Preregister before deadline $10/day
After preregistration deadline $15/day

DIMACS Postdocs $0

Non-Local Graduate & Undergraduate students
Preregister before deadline $5/day
After preregistration deadline $10/day

Local Graduate & Undergraduate students $0
(Rutgers & Princeton)

DIMACS partner institution employees** $0

DIMACS long-term visitors*** $0

Registration fee to be collected on site, cash, check, VISA/Mastercard

Our funding agencies require that we charge a registration fee for the
workshop. Registration fees cover participation in the workshop, all
workshop materials, breakfast, lunch, breaks, and any scheduled social
events (if applicable).

* College/University faculty and employees of non-profit organizations
will automatically receive the reduced rate. Other participants may
apply for a reduction of fees. They should email their request for the
reduced fee to the Workshop Coordinator at
workshop at  Include your name, the Institution you
work for, your job title and a brief explanation of your situation.
All requests for reduced rates must be received before the
preregistration deadline. You will promptly be notified as to the
decision about it.

** Fees for employees of DIMACS partner institutions are waived.
DIMACS partner institutions are: Rutgers University, Princeton
University, AT&T Labs - Research, Bell Labs, NEC Laboratories America
and Telcordia Technologies. Fees for employees of DIMACS affiliate
members Avaya Labs, IBM Research and Microsoft Research are also

***DIMACS long-term visitors who are in residence at DIMACS for two or
more weeks inclusive of dates of workshop.


Information on participation, registration, accommodations, and travel
can be found at:


More information about the Dimacs-sy-cg-global mailing list