1. Abhijit Pol: Maintaining very large samples using the geometric file, University of Florida, 2007. First Employment: Yahoo! Inc. 2. Shantunu Joshi: Sampling-based randomization techniques for approximate query processing. University of Florida, 2007. First employment: Oracle, Inc. 3. Jayendra G. Venkateswaran: Indexing techniques for metric databases with costly searches, University of Florida, 2008 (advised jointly with Tamer Kahveci). First employment: Oracle, Inc. 4. Xiuyao Song: Novel change detection techniques in multidimensional data mining, University of Florida, 2008 (advised jointly with Sanjay Ranka). First Employment: Yahoo! Inc. 5. Subramanian Arumugam: Efficient algorithms for spatiotemporal data management, University of Florida, 2008. First employment: Greenplum, Inc. 6. Mingxi Wu: Statistical methods for fast anomaly detection, University of Florida, 2008. First employment: Oracle, Inc. 7. Manas H. Somaiya: Novel mixture models to learn complex and evolving patterns in high dimensional data, University of Florida, 2009 (advised jointly with Sanjay Ranka). First employment: Amazon.com. 8. Fei Xu: Correlation-aware statistical methods for sampling-based group by estimates, University of Florida, 2009. First employment: Microsoft. 9. Anna Drummond: Statistical machine learning for text mining with Markov chain Monte Carlo inference, Rice University, 2014. First employment: Houston startup. 10. Zhuhua Cai: Very large scale machine learning, Rice University, 2014. First employment: Google, Inc. 11. Luis Perez: Query processing and optimization for stochastic analytics, Rice University, 2014. 12. Neketan Pansare: Large-scale online aggregation via distributed systems, Rice University, 2014. First employment: IBM Almaden. Organizer: Kavli Frontiers of Science Symposium, 2007-2009 University of Edinburgh, Edinburgh, UK (2013); EPFL, Lausanne, Switzerland (2013), ETH Zurich, Zurich, Switzerland (2013), ACM KDD Best Paper Runner Up, 2010. VLDBJ: 2007-2013 TODS: 2010-present IEEE TKDE: 2008-2013 Associate Editor, IEEE Computer Society Data Engineering Bulletin, 2014-present National Science Foundation, “Data Mining and Cleaning for Medical Data Warehouses.” 9/2010-9/2015, $1.2M. Sole PI at Rice ($600K to UT-HS). Project goal: This project focuses on statistical models and learning algorithms for quantifying and correcting errors in clinical data warehouse records. National Science Foundation, “Design and Implementation of the DBO Data- base System.” 10/2009-10/2013, $750,000. Sole PI. Project goal: The project is concerned with the design and development of a unique system called DBO. Like traditional relational database systems, DBO can run database queries from start to finish and produce exact answers over very large archives. However, unlike any existing research or production system, DBO uses sampling algorithms to produce a statistical estimate for the final query answer at all times throughout query execution. National Science Foundation, “The MCDB Database System for Managing and Modeling Uncertainty.” 9/2009-9/2015, $500,000. Sole PI. Project goal: The project is concerned with the design and implementation of a prototype database system called MCDB that allows an expert-level analyst or statistician to attach arbitrary stochastic models to the database data in order to “guess” the values for unknown or inaccurate data. Department of Energy, “The MCDB System for Management and Analysis of Petabyte-Scale Uncertain Data.” 9/2009-9/2013, $600,000. Sole PI. Project goal: This project is concerned with scaling up MCDB for use in a large-scale distributed environment. National Science Foundation, “III: Medium: SimSQL: A Database System Supporting Distributed Execution of Machine Learning Codes.” 9/2014-9/ 2017, $1,200,000. Sole PI. Project goal: This project will perform the fundamental research necessary to make machine-learning-in-a-database-system a mature technology. National Science Foundation, “ABI Innovation: Algorithms and Models for Distributed Computation of Bayesian Phylogenetics.” 8/2014-8/2017, $1,150.888. PI (co PI: Luay Nakhleh; 50% to Jermaine). Project goal: The project's aim is to develop parallel algorithms for Bayesian phylogenetic inference that are suitable for use in a modern cluster compute environment. DARPA, “PLNY: Mining and Understanding Software Enclaves.” Fall 2014 to Fall 2018, approximately $11 million. Co-PI (many other co-PIs; approxi- mately $3 million to Jermaine). Project goal: This project seeks to develop database, data mining, machine learning, and programing languages technologies that can mine millions of lines of open-source computer code to build models that can help programmers complete programming tasks. Teaching: