Abstract
The objective of this study was to distinguish within a population of patients with and without breast cancer. The study was based on the University of Wisconsin's dataset of 569 patients, of whom 212 were subsequently found to have breast cancer. A subset-conjunctive model, which is related to Logical Analysis of Data, is described to distinguish between the two groups of patients based on the results of a non-invasive procedure called Fine Needle Aspiration, which is often used by physicians before deciding on the need for a biopsy. We formulate the problem of inferring subset-conjunctive rules as a 0-1 integer program, show that it is NP-Hard, and prove that it admits no polynomial-time constant-ratio approximation algorithm. We examine the performance of a randomized algorithm, and of randomization using LP rounding. In both cases, the expected performance ratio is arbitrarily bad. We use a deterministic greedy algorithm to identify a Pareto-efficient set of subset-conjunctive rules; describe how the rules change with a re-weighting of the type-I and type-II errors; how the best rule changes with the subset size; and how much of a tradeoff is required between the two types of error as one selects a more stringent or more lax classification rule. An important aspect of the analysis is that we find a sequence of closely related efficient rules, which can be readily used in a clinical setting because they are simple and have the same structure as the rules currently used in clinical diagnosis.
Full Citation
Discrete Applied Mathematics
vol.
154
,
(January 01, 2006):
1100
-12
.