Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
Skip to main content
  • Saeed Samet is a member of the Faculty of Medicine as an Assistant Professor at the e-Health Research Unit at the dis... moreedit
Abstract Prior art in the expression of digital rights using XML is demonstrated to require a process of interpretation or parsing, as is characteristic with language processing, and in addition to be subject to the cross-domain scripting... more
Abstract Prior art in the expression of digital rights using XML is demonstrated to require a process of interpretation or parsing, as is characteristic with language processing, and in addition to be subject to the cross-domain scripting problem. The expression of digital rights using JSON, however, represents a novel approach, bypassing the need for language processing, and in addition, solving the cross-domain scripting problem.
Nowadays, data mining and machine learning techniques are widely used in electronic applications in different areas such as e-government, e-health, e-business, and so on. One major and very crucial issue in these type of systems, which... more
Nowadays, data mining and machine learning techniques are widely used in electronic applications in different areas such as e-government, e-health, e-business, and so on. One major and very crucial issue in these type of systems, which are normally distributed among two or more parties and are dealing with sensitive data, is preserving the privacy of individual’s sensitive information. Each
Some phase 1 clinical trials offer strong financial incentives for healthy individuals to participate in their studies. There is evidence that some individuals enroll in multiple trials concurrently. This creates safety risks and... more
Some phase 1 clinical trials offer strong financial incentives for healthy individuals to participate in their studies. There is evidence that some individuals enroll in multiple trials concurrently. This creates safety risks and introduces data quality problems into the trials. Our objective was to construct a privacy preserving protocol to track phase 1 participants to detect concurrent enrollment. A protocol using secure probabilistic querying against a database of trial participants that allows for screening during telephone interviews and on-site enrollment was developed. The match variables consisted of demographic information. The accuracy (sensitivity, precision, and negative predictive value) of the matching and its computational performance in seconds were measured under simulated environments. Accuracy was also compared to non-secure matching methods. The protocol performance scales linearly with the database size. At the largest database size of 20,000 participants, a query takes under 20s on a 64 cores machine. Sensitivity, precision, and negative predictive value of the queries were consistently at or above 0.9, and were very similar to non-secure versions of the protocol. The protocol provides a reasonable solution to the concurrent enrollment problems in phase 1 clinical trials, and is able to ensure that personal information about participants is kept secure.
Abstract Association rule mining provides useful knowledge from raw data in different applications such as health, insurance, marketing and business systems. However, many real world applications are distributed among two or more parties,... more
Abstract Association rule mining provides useful knowledge from raw data in different applications such as health, insurance, marketing and business systems. However, many real world applications are distributed among two or more parties, each of which wants to keep its sensitive ...
ABSTRACT We consider the problem of data clustering on streamed data, when the number of transactions is growing very quickly, or when data is distributed among several parties and their privacy is a concern. In this paper we present two... more
ABSTRACT We consider the problem of data clustering on streamed data, when the number of transactions is growing very quickly, or when data is distributed among several parties and their privacy is a concern. In this paper we present two new protocols for incremental privacy-preserving k-means clustering, which is a very popular data mining method, when data is distributed, horizontally or vertically, among multiple parties. At the end of each protocol, each party, without revealing its own private data, receives the final result of the clustering algorithm. Also, to improve efficiency, previous knowledge is used to incrementally update the centers and membership of each cluster.
There is limited capacity to assess the comparative risks of medications after they enter the market. For rare adverse events, the pooling of data from multiple sources is necessary to have the power and sufficient population... more
There is limited capacity to assess the comparative risks of medications after they enter the market. For rare adverse events, the pooling of data from multiple sources is necessary to have the power and sufficient population heterogeneity to detect differences in safety and effectiveness in genetic, ethnic and clinically defined subpopulations. However, combining datasets from different data custodians or jurisdictions to perform an analysis on the pooled data creates significant privacy concerns that would need to be addressed. Existing protocols for addressing these concerns can result in reduced analysis accuracy and can allow sensitive information to leak. To develop a secure distributed multi-party computation protocol for logistic regression that provides strong privacy guarantees. We developed a secure distributed logistic regression protocol using a single analysis center with multiple sites providing data. A theoretical security analysis demonstrates that the protocol is robust to plausible collusion attacks and does not allow the parties to gain new information from the data that are exchanged among them. The computational performance and accuracy of the protocol were evaluated on simulated datasets. The computational performance scales linearly as the dataset sizes increase. The addition of sites results in an exponential growth in computation time. However, for up to five sites, the time is still short and would not affect practical applications. The model parameters are the same as the results on pooled raw data analyzed in SAS, demonstrating high model accuracy. The proposed protocol and prototype system would allow the development of logistic regression models in a secure manner without requiring the sharing of personal health information. This can alleviate one of the key barriers to the establishment of large-scale post-marketing surveillance programs. We extended the secure protocol to account for correlations among patients within sites through generalized estimating equations, and to accommodate other link functions by extending it to generalized linear models.
ABSTRACT Neural network systems are highly capable of deriving knowledge from complex data, and they are used to extract patterns and trends which are otherwise hidden in many applications. Preserving the privacy of sensitive data and... more
ABSTRACT Neural network systems are highly capable of deriving knowledge from complex data, and they are used to extract patterns and trends which are otherwise hidden in many applications. Preserving the privacy of sensitive data and individuals' information is a major challenge in many of these applications. One of the most popular algorithms in neural network learning systems is the back-propagation (BP) algorithm, which is designed for single-layer and multi-layer models and can be applied to continuous data and differentiable activation functions. Another recently introduced learning technique is the extreme learning machine (ELM) algorithm. Although it works only on single-layer models, ELM can out-perform the BP algorithm by reducing the communication required between parties in the learning phase. In this paper, we present new privacy-preserving protocols for both the BP and ELM algorithms when data is horizontally and vertically partitioned among several parties. These new protocols, which preserve the privacy of both the input data and the constructed learning model, can be applied to online incoming records and/or batch learning. Furthermore, the final model is securely shared among all parties, who can use it jointly to predict the corresponding output for their target data.