Data Mining and the Tough Personal Information Privacy Sell Considered |

Everyone come on in and have a seat, we will be starting this discussion a little behind schedule due to the fact we have a full-house here today. If anyone has a spare seat next to them, will you please raise your hands, we need to get some of these folks in back a seat. The reservations are sold out, but there should be a seat for everyone at today’s discussion.Okay everyone, I thank you and thanks for that great introduction, I just hope I can live up to all those verbal accolades.Oh boy, not another controversial subject! Yes, well, surely you know me better than that by now, you’ve come to expect it. Okay so, today’s topic is one about the data mining of; Internet Traffic, Online Searches, Smart Phone Data, and basically, storing all the personal data about your whole life. I know, you don’t like this idea do you – or maybe you participate online in social online networks and most of your data is already there, and you’ve been loading up your blog with all sorts of information?Now then, contemporary theory and real world observation of the virtual world predicts that for a fee, or for a trade in free services, products, discounts, or a chance to play in social online networks, employment opportunity leads, or the prospects of future business you and nearly everyone will give up some personal information.So, once this data is collected, who will have access to it, who will use it, and how will they use it? All great questions, but first how can the collection of this data be sold to the users, and agreed upon in advance? Well, this can at times be very challenging; yes, very tough sell, well human psychology online suggests that if we give benefits people will trade away any given data of privacy.Hold That Thought.Let’s digress a second, and have a reality check dialogue, and will come back to that point above soon enough, okay – okay agreed then.The information online is important, and it is needed at various national security levels, this use of data is legitimate and worthy information can be gained in that regard. For instance, many Russian Spies were caught in the US using social online networks to recruit, make business contacts, and study the situation, makes perfect sense doesn’t it? Okay so, that particular episode is either; an excuse to gather this data and analyze it, or it is a warning that we had better. Either way, it’s a done deal, next topic.And, there is the issue with foreign spies using the data to hurt American businesses, or American interests, or even to undermine the government, and we must understand that spies in the United States come from over 70 other nations. And let’s not dismiss the home team challenge. What’s that you ask? Well, we have a huge intelligence industrial complex and those who work in and around the spy business, often freelance on the side for Wall Street, corporations, or other interests. They have access to information, thus all that data mined data is at their disposal.Is this a condemnation of sorts; No! I am merely stating facts and realities behind the curtain of created realities of course, without judgment, but this must be taken into consideration when we ask; who can we trust with all this information once it is collected, stored, and in a format which can be sorted? So, we need a way to protect this data for the appropriate sources and needs, without allowing it to be compromised – this must be our first order of business.Let’s Undigress and Go Back to the Original Topic at hand, shall we? Okay, deal.Now then, what about large corporate collecting information; Proctor and Gamble, Ford, GM, Amazon, etc? They will certainly be buying this data from social networks, and in many cases you’ve already given up your rights to privacy merely by participating. Of course, all the data will help these companies refine their sorts using your preferences, thus, the products or services they pitch you will be highly targeted to your exact desires, needs, and demographics, which is a lot better than the current bombardment of Viagra Ads with disgusting titles, now in your inbox, deleted junk files.Look, here is the deal…if we are going to collect data online, through social networks, and store all that the data, then we also need an excuse to collect the data first place, or the other option is not tell the public and collect it anyway, which we already probably realize that is now being done in some form or fashion. But let’s for the sake of arguments say it isn’t, then should we tell the public we are doing, or are going to do this. Yes, however if we do not tell the public they will eventually figure it out, and conspiracy theories will run rampant.We already know this will occur because it has occurred in the past. Some say that when any data is collected from any individual, group, company, or agency, that all those involved should also be warned on all the collection of data, as it is being collected and by whom. Including the NSA, a government, or a Corporation which intends on using this data to either sell you more products, or for later use by their artificial intelligence data scanning tools.Likewise, the user should be notified when cookies are being used in Internet searchers, and what benefits they will get, for instance; search features to help bring about more relevant information to you, which might be to your liking. Such as which tracks customer inquiries and brings back additional relevant results, most online shopping eCommerce sites do this, and there was a very nice expose on this in the Wall Street Journal recently.Another digression if you will, and this one is to ask a pertinent question; If the government or a company collects the information, the user ought to know why, and who will be given access to this information in the future, so let’s talk about that shall we? I thought you might like this side topic, good for you, it shows you also care about these things.And as to that question, one theory is to use a system that allows certain trusted sources in government, or corporations which you do business with to see some data, then they won’t be able to look without being seen, and therefore you will know which government agencies, and which corporations are looking at your data, and therefore there will be transparency, and there would have to be at that point justification for doing so. Or most likely folks would have a fit and then, a proverbial field day with the intrusion in the media.Now then, one recent report from the government asks the dubious question; “How do we define the purpose for which the data will be used?”Ah ha, another great question in this on-going saga indeed. It almost sounds as if they too were one of my concerned audience members, or even a colleague. Okay so, it is important not only to define the purpose of the data collection, but also to justify it, and it better be good. Hey, I see you are all smiling now. Good, because, it’s going to get a bit more serious on some of my next points here.Okay, and yes this brings about many challenges, and it is also important to note that there will be, ALWAYS more outlets for the data, which is collected, as time goes on. Therefore the consumer, investor, or citizen who allows their data to be compromised, stored for later use for important issues such as national security, or for corporations to help the consumer (in this case you) in their purchasing decisions, or for that company’s planning for inventory, labor, or future marketing (most likely; again to whom; ha ha ha, yes you are catching on; You.Thus, shouldn’t you be involved at every step of the way; Ah, a resounding YES! I see from our audience today, and yes, I would have expected nothing less from you either. And as all this process takes place, eventually “YOU” are going to figure out that this data is out of control, and ends up everywhere. So, should you give away data easily?No, and if it is that valuable, hold out for more. And then, you will be rewarded for the data, which is yours, that will be used on your behalf and potentially against you in some way in the future; even if it is only for additional marketing impressions on the websites you visit or as you walk down the hallway at the mall;”Let’s see a show of hands; who has seen Minority Report? Ah, most of you, indeed, if you haven’t go see, it and you will understand what we are all saying up here, and others are saying in the various panel discussions this weekend.”Now you probably know this, but the very people who are working hard to protect your data are in fact the biggest purveyors of your information, that’s right our government. And don’t get me wrong, I am not anti-government, just want to keep it responsible, as much is humanly possible. Consider if you will all the data you give to the government and how much of that public record is available to everyone else;

Tax forms to the IRS,

Marriage licenses,

Voting Registration,

Selective Services Card,

Property Taxes,

Business Licenses,

Etc.The list is pretty long, and the more you do, the more information they have, and that means the more information is available; everywhere, about who; “YOU! That’s who!” Good I am glad we are all clear on that one. Yes, indeed, all sorts of things, all this information is available at the county records office, through the IRS, or with various branches of OUR government. This is one reason we should all take notice to the future of privacy issues. Often out government, but it could be any first world government, claims it is protecting your privacy, but it has been the biggest purveyors of giving away our personal and private data throughout American history. Thus, there will a little bit of a problem with consumers, taxpayers, or citizens if they no longer trust the government for giving away such things as;

Date of birth,

Social Security number,

Driver’s license,

Driving record,

Taxable information,

Etc., on and on.And let’s not kid ourselves here all this data is available on anyone, it’s all on the web, much of it can be gotten free, some costs a little, never very much, and believe me there is a treasure trove of data on each one of us online. And that’s before we look into all the other information being collected now.Now then, here is one solution for the digital data realm, including smart phone communication data, perhaps we can control and monitor the packet flow of information, whereby all packets of info is tagged, and those looking at the data will also be tagged, with no exceptions. Therefore if someone in a government bureaucracy is looking at something they shouldn’t be looking at, they will also be tagged as a person looking for the data.Remember the big to do about someone going through Joe The Plumber’s records in OH, or someone trying to release sealed documents on President Bush’s DUI when he was in his 20s, or the fit of rage by Sara Palin when someone hacked her Yahoo Mail Account, or when someone at a Hawaii Hospital was rummaging through Barak Obama’s certificate of showing up at the hospital as a baby, with mother in tow?We need to know who is looking at the data, and their reason better be good, the person giving the data has a right-to-know. Just like the “right-to-know” laws at companies, if there are hazardous chemicals on the property. Let me speak on another point; Border Security. You see, we need to know both what is coming and going if we are to have secure borders.You see, one thing they found with our border security is it is very important not only what comes over the border, which we do need to monitor, but it’s also important to see what goes back over the border the other way. This is how authorities have been able to catch drug runners, because they’re able to catch the underground economy and cash moving back to Mexico, and in holding those individuals, to find out whom they work for – just like border traffic – our information goes both ways, if we can monitor for both those ways, it keeps you happier, and our data safer.Another question is; “How do we know the purpose for data being collected, and how can the consumer or citizen be sure that mass data releases will not occur, it’s occurred in almost every agency, and usually the citizens are warned that their data was released or that the data base containing their information was breached, but that’s after the fact, and it just proves that data is like water, and it’s hard to contain. Information wants to be free, and it will always find a way to leak out, especially when it’s in the midst of humans.Okay, I see my time is running short here, let me go ahead and wrap it up and drive through a couple main points for you, then I’ll open it up for questions, of which I don’t doubt there will be many, that’s good, and that means you’ve been paying attention here today.It appears that we need to collect data for national security purposes research, planning, and for IT system for future upgrades. And collecting data for upgrades of an IT system, you really need to know about the bulk transfers of data and the time, which that data flows, and therefore it can be anonymized.For national security issues, and for their research, that data will have anomalies in it, and there are problems with anomalies, because can project a false positives, and to get it right they have to continually refine it all. And although this may not sit well with most folks, nevertheless, we can find criminals this way, spies, terrorist cells, or those who work to undermine our system and stability of our nation.With regards to government and the collection of data, we must understand that if there are bad humans in the world, and there are. And if many of those who shall seek power, may not be good people, and since information is power, you can see the problem, as that information and power will be used to help them promote their own agenda and rise in power, but it undermines the trust of the system of all the individuals in our society and civilization.On the corporate front, they are going to try to collect as much data on you as they can, they’ve already started. After all, that’s what the grocery stores are doing with their rewards program if you hadn’t noticed. Not all the information they are collecting they will ever use, but they may sell it to third part affiliates, partners, or vendors, so that’s at issue. Regulation will be needed in this regard, but the consumer should also have choices, but they ought to be wise about those choices and if they choose to give away personal information, they should know the risks, rewards, consequences, and challenges ahead.Indeed, I thank you very much, and be sure to pick up a handout on your way out, if you didn’t already get one, from the good looking blonde, Sherry, at the door. Thanks again, and let’s take a 5-minute break, and then head into the question and answer session, deal?References;

“Privacy on the Line – Politics of Wiretapping,” by Whitfield Diffie and Susan Landau, MIT Press, MA, (1999), ISBN: 0-262-04167-7.

Wall Street Journal; “On the Web’s Cutting Edge,” by Julia Angwin.

Wall Street Journal; “The Great Privacy Debate,” by Cato Institute, and N. Carr author, Saturday, August 7-8, 2010

What Is Data Lifecycle Management? |

The Data Lifecycle goes through 5 steps: creation, usage, transport, storage and destruction. Most companies have parts of this lifecycle under control, but that means there are lots of areas for gaps in the control measures that could let a threat affect the data. Data lifecycle management (DLM) is a policy and procedure based approach to manage information movement. Data has to be classified and evaluated to properly protect it with the right resources. Ownership is a key factor in managing and maintaining data throughout the lifecycleThe 5 Steps1.Creation – How does data creation get managed?2.Usage – What limitations are on data usage?3.Storage – What controls are in place for storage?4.Transportation – How is data transmitted between company, customers and business partners?5.Destruction – What is the validation and verification process over data destruction?The Data Management Problem· Weak processes in place to track creation usage, transportation, storage and destruction· Weak ability to monitor and manage a customer record throughout the lifecycle·Inconsistent processes across each phase of data movement·Lack of enforcement capabilitiesWhat should be the goal of data lifecycle management?·Provide practical steps to manage each step of the customer record management process·Provide cost effective solution for risk mitigation·Provide framework for data management·Reduce risk of data lossChallenges to Customer Data Records Management·Rarely does a company have a centralized process to track controls over data, over management processes around data, over logging and monitoring, and removal·Organizations rely on technology to secure data not processes that drive technology purchases·The 5 steps of data management are not followed by all functional groups in a company· No clear ownership and classification of customer data elementsDid you know…· 1 in 400 emails contains confidential information·1 in 50 network files contains confidential data· 4 out of 5 companies have lost confidential data when a laptop was lost·1 in 2 USB drives contains confidential information·Companies that incur a data breach experience a significant increase in customer turnover-as much as 11%· Over 35 states have enacted security breach notification laws· Can openers were invented 48 years after canWhy does traditional security not work for DLM?Users have risky behavior. They will always have risk behavior and we rely on mostly technology controls to keep them in a secure box.  Solutions aimed at the external threats coming in, not the regulation and governance of internal communications going out. Problems we see are typically:· Unauthorized application use: 70% of IT say the use of unauthorized programs result in as many as half of data loss incidents.· Misuse of corporate computers: 44% of employees share work devices with others without supervision.· Unauthorized access: 39% of IT said they have dealt with an employee accessing unauthorized parts of a company’s network or facility.· Remote worker security: 46% of employees transfer files between work and personal computers.· Misuse of passwords: 18% of employees share passwords with co-workers.The reasons typical technology controls will not work in the full DLM process are:· Products are not geared to protect a full life cycle of a customer records· Most solutions and processes are outward facing, based on perimeter security· Encryption can affect data management· Real-time intrusion detection and remediation is rare· Context and intent of messages was not analyzed properly· Functional areas in organizations create different policies, monitoring requirements, enforcement priorities and reporting· New technologies can avoid security measures· Technologies look at the network, the operating system or the application not the data across all environments· Not mapped properly to regulationsWhat risks does customer data loss pose for organizations?If we know that security is not working, what are the risks we face? A very recent example of how this can have a practical affect is with the Massachusetts Privacy Law 201 CMR 17.00. Loss of data can have a great financial impact with this law. Key things we need to consider include:· Penalties: Not complying with regulations can cause civil and financial penalties· Confidence: Loss of customer confidence because of a customer data breach lose customers· Reputation: Damage to reputation will lose customer and damage relationships· Competitive Advantage: Information and customers can move to competitors· Costs: Average $6.6 million per breach.· Valuation: Decreased stock prices could result

How Can We Ensure the Accuracy of Data Mining – While Anonymizing the Data? |

Okay so, the topic of this question is meaningful and was recently asked in a government publication on Internet Privacy, Smart Phone Personal Data, and Social Online Network Security Features. And indeed, it is a good question, in that we need the bulk raw data for many things such as; planning for IT backbone infrastructure, allotting communication frequencies, tracking flu pandemics, chasing cancer clusters, and for national security, etc, on-and-on, this data is very important.Still, the question remains; “How Can We Ensure the Accuracy of Data Mining – While Anonymizing the Data?” Well, if you don’t collect any data in the first place, you know what you’ve collected is accurate right? No data collected = No errors! But, that’s not exactly what everyone has in mind of course. Now then if you don’t have sources for the data points, and if all the data is a anonymized in advance, due to the use of screen names in social networks, then none of the accuracy of any of the data can be taken as truthful.Okay, but that doesn’t mean some of the data isn’t correct right? And if you know the percentage of data you cannot trust, you can get better results. How about an example, during the campaign of Barak Obama there were numerous polls in the media, of course, many of the online polls showed a larger percentage, land-slide-like, which never materialized in the actual election; why? Simple, there were folks gaming the system, and because the online crowd, younger group participating was in greater abundance.Back to the topic; perhaps what’s needed is for someone less qualified as a trusted source with their information could be sidelined and identified as a question mark and within or adding to the margin of error. And, if it appears to be fake, a number next to that piece of data, and that identification can then be deleted, when doing the data mining.Although, perhaps a subsystem could allow for tracing and tracking, but only if it was at the national security level, which could take the information all the way down to the individual ISP and actual user identification. And if data was found to be false, it could merely be red flagged, as unreliable.The reality is you can’t trust sources online, or any of the information that you see online, just like you cannot trust word-for-word the information in the newspapers, or the fact that 95% of all intelligence gathered is junk, the trick is to sift through and find the 5% that is reality based, and realize that even the misinformation, often has clues.Thus, if the questionable data is flagged prior to anonymizing the data, then you can increase your margin for error without ever having the actual identification of any one-piece of data in the whole bulk of the database or data mine. Margins for error are often cut short, to purport better accuracy, usually to the detriment of the information or the conclusions, solutions, or decisions made from that data.And then there is the fudge factor, when you are collecting data to prove yourself right? Okay, let’s talk about that shall we? You really can’t trust data as unbiased if the dissemination, collection, processing, and accounting was done by a human being. Likewise, we also know we cannot trust government data, or projections.Consider if you will the problems with trusting the OMB numbers and economic data on the financial bill, or the cost of the ObamaCare healthcare bill. Also other economic data has been known to be false, and even the bank stress tests in China, the EU, and the United States is questionable. For instance consumer and investor confidence is very important therefore false data is often put out, or real data is manipulated before it’s put on the public. Hey, I am not an anti-government guy, and I realize we need the bureaucracy for some things, but I am wise enough to realize that humans run the government, and there is a lot of power involved, humans like to retain and get more of that power. We can expect that.And we can expect that folks purporting information under fake screen names, pen names to also be less-than-trustworthy, that’s all I am saying here. Look, it’s not just the government, corporations do it too as they attempt to put a good spin on their quarterly earnings, balance sheet, move assets around, or give forward looking projections.Even when we look at the data from the FED’s Beige Sheet we could say that most all of that is hearsay, because generally the FED Governors of the various districts do not indicate exactly which of their clients, customers, or friends in industry gave them which pieces of information. Thus we don’t know what we can trust, and we thus must assume we can’t trust any of it, unless we can identify the source prior to its inclusion in the research, report, or mined data query.This is nothing new, it’s the same for all information, whether we read it in the newspaper or our intelligence industry learns of new details. Check sources and if we don’t check the sources in advance, the correct thing to do is to increase the probability that the information is indeed incorrect, and/or the margin for error at some point ends up going hyperbolic on you, thus, you need to throw the whole thing out, but then I ask why collect it in the first place.Ah hell, this is all just philosophy on the accuracy of data mining. Grab yourself a cup of coffee, think about it and email your comments and questions.