Privacy Implications of the Scraping of Healthcare Data

For the past few months, the Wall Street Journal has run a great series on the use of electronic (personal) information by commercial entities. The series is titled “What They Know.” While the emphasis has mostly been on the tracking of Web browsing behavior, a recent article in this series discusses the “scraping” of personal data from a patient support network called PatientsLikeMe.

This is a troubling development.

Like the commerce IT revolution before it, the anticipated healthcare IT revolution depends upon the proper security of (and associated trust in) private data. I blogged about privacy in healthcare in an earlier post, and my summary conclusion remains that this topic will receive a disproportionate amount of attention in the coming months and rightly so.

I don’t agree with those who advocate against use of any patient data (and consequently put a number of patients at risk). The right answer is to put well understood safeguards in place and use data in a manner that promotes innovation in outcomes-based treatment, while maintaining essential protections and safeguards for personal privacy. This is an opportunity for companies to innovate in this space. What products/innovations do you think we need? How can we ensure that the right protections are in place and that consumers understand the tradeoffs involved?

Is proactively preparing for litigation that never happens a waste?

Many, including myself, have advocated over the years to proactively prepare for eDiscovery to lower your risk and costs associated with litigation.

Unplanned eDiscovery can be especially disruptive to an organization.  This is mainly due to the collection of ESI (electronically stored Information) throughout the organization – no matter where it’s stored, including custodian’s laptops/desktops, CDs, DVDs, USB thumb drives, external hard disks, share drives, and email systems to name a few. Searching for potential evidence in all of these locations is very time consuming and greatly affects the individual employee’s productively.

The risks that proactively preparing for eDiscovery addresses are primarily associated with the litigation hold requirement. The Federal Rules of Civil Procedure (FRCP) state that it is the responsibility of those potentially involved in litigation to protect potentially responsive evidence (ESI) as soon as litigation could reasonably be anticipated. How does an organization start protecting ESI even before they have officially been notified that litigation on a particular subject matter is pending? Not an easy process – even when you have prepared for the possibility.

So in the past I have lobbied for organizations to prepare for litigation and eDiscovery before it happens. That included developing ESI retention policies, litigation hold policies, eDiscovery process policies, purchasing and installing automation that lets you centrally control and manage the majority of your organization’s ESI. These solutions include ESI archives such as Iron Mountain’s TEMs or NearPoint ESI archives. It also includes the ability to control ESI on employee laptops and desktops with Iron Mountain’s Connected and Classify and Connect solutions.

So what happens to the return on investment (ROI) on these expenditures if your organization is never sued? Did you waste a bunch of money? Did you buy an insurance policy that never paid off?

The answer is definitely NO.

The solutions put in place to proactively prepare for litigation and eDiscovery also help you better organize your ESI, make it available to employees faster for reference or re-use, get rid of old ESI, and more effectively manage your storage resources. The ROI on these solutions is still positive even if you are never sued.

Discovery – The More Things Change…?

Twenty years ago this fall, I began my litigation career.

I was 28 years old, single and sported a full head of hair. (I swear.) I had just survived my first year of law school and begun dating a beautiful girl from Venezuela.  Vanilla Ice’s debut album To the Extreme (who can ever forget, “Ice Ice Baby?”) topped the Billboard 200.  Seinfeld and The Simpsons ruled the airwaves.  A starting associate at many AmLaw 50 firms could expect to make $65,000 a year.  And, not that I knew it then, but it cost $20,000 to store a single gigabyte of data.

That fall, I participated in my first document review.  It was comprised entirely of paper, and the contents lived in and among dozens of boxes, on conference room tables and floors, and in misplaced manila folders, accordion Redwelds, and lawyer (and at least one intern’s) briefcases.  It was a mess.   Reviewing the documents meant applying torn yellow Post-it Notes, marking them “P” for produce, “I” for irrelevant, “ACP” for attorney-client privilege, and “?” for documents that confused me.

Continue reading “Discovery – The More Things Change…?”

The Enterprise Discovery Mess

The current approach to enterprise discovery in today’s litigious and regulatory-laden business environment is just plain broken. Discovery costs for producing information in response to legal and regulatory requests can be completely unpredictable. The General Counsel often does not know if the discovery for a matter is going to cost $100K or $1 Million. The opposing counsel recognizes this situation and focuses on inflicting maximum pain, driving up discovery costs to raise the settlement threshold.

The prevalent approach to discovery is reactive and ad-hoc resulting in high discovery costs. Discovery is typically matter and event driven. Corporate legal department’s contract with 3rd party consulting firms who show up in suits, grab desktops and use tools to collect information from servers and applications and cart away vast amounts of information for subsequent processing. The manual data collection, ad-hoc steps for legal discovery, and the use of multiple service providers and fragmented point products all result in high costs. It unnecessarily opens up defensibility of collection and search procedures and chain of custody issues. Over collection, over preservation and inability to find relevant information (the “smoking gun” that later shows up) – all are risks resulting from this ad-hoc, manual approach.

Continue reading “The Enterprise Discovery Mess”

Applications built on top of Search

Over the last decade, the success of Google has raised everyone’s expectations of search. Search is currently the default mechanism to access all kinds of unstructured data, including documents, emails, internal websites, document management systems, etc. And most of these systems do not have a link structure that can be used to provide the equivalent of page rank to sort the results. There is usually not enough usage data to gather popularity statistics, either. In such cases, text search then reduces to weighting functions that match the query to the documents, using some variation of the standard Information Retrieval TF.IDF approach. The resulting relevance functions, while optimized for precision and recall, usually do not match user expectations. In some ways, Google has spoiled people to expect the right answer with a 2-3 word query, and it is rare that anyone can do that while searching their email or their intranet. Instead, it is often a combination of keywords with metadata like date range and sender/author that allows the user to narrow the search to approximately the right set of documents.

Continue reading “Applications built on top of Search”