Preprint. To appear in Data Mining and Knowledge Discovery. DOI 10.1007/s10618-012-0293-7 – Online available. The Effect of Homogeneity on the Computational Complexity of Combinatorial Data Anonymization Robert Bredereck 1,⋆ , André Nichterlein 1 , Rolf Niedermeier 1 , Geevarghese Philip 2,† Received: 18 April 2012 / Accepted: 24 September 2012 Abstract A matrix M is said to be k-anonymous if for each row r in M there are at least k − 1 other rows in M which are identical to r. The NP-hard k-Anonymity problem asks, given an n × m-matrix M over a fixed alphabet and an integer s> 0, whether M can be made k-anonymous by suppressing (blanking out) at most s entries. Complementing previous work, we intro- duce two new “data-driven” parameterizations for k-Anonymity—the num- ber t in of different input rows and the number t out of different output rows— both modeling aspects of data homogeneity. We show that k-Anonymity is fixed-parameter tractable for the parameter t in , and that it is NP-hard even for t out =2 and alphabet size four. Notably, our fixed-parameter tractabil- ity result implies that k-Anonymity can be solved in linear time when t in is a constant. Our computational hardness results also extend to the related privacy problems p-Sensitivity and ℓ-Diversity, while our fixed-parameter tractability results extend to p-Sensitivity and the usage of domain general- ization hierarchies, where the entries are replaced by more general data instead of being completely suppressed. Keywords k-Anonymity · p-Sensitivity · ℓ-Diversity · Domain generalization hierarchies · Matrix modification problems · Parameterized algorithmics · Fixed-parameter tractability · NP-hardness An extended abstract entitled “The Effect of Homogeneity on the Complexity of k- Anonymity” appeared in Proceedings of the 18th International Symposium on Fundamentals of Computation Theory (FCT ’11), volume 6914 of LNCS, pages 53-64, Springer 2011. Apart from the full proofs omitted in that version, the current article also contains new results on ℓ-Diversity, p-Sensitivity, and on the usage of domain generalization hierarchies. ⋆ Supported by the DFG, research project PAWS, NI 369/10. † Supported by the Indo-German Max Planck Centre for Computer Science (IMPECS). 1 Institut für Softwaretechnik und Theoretische Informatik, TU Berlin, Germany 2 Max-Planck-Institut für Informatik, Saarbrücken, Germany {robert.bredereck,andre.nichterlein,rolf.niedermeier}@tu-berlin.de gphilip@mpi-inf.mpg.de