Product News: AI enables intelligent semantic search and accelerates the use of large-scale data

Learn more
Data Trends

What Data Can I Publish? A Guide to Data Classification Policy

Data classification 0

Finding what data you can publish should start with the question “Do we have a data classification policy within our organization?” This article will give you an introduction to the Data Classification Policy as well as to standards such as PCI, FERPA, HIPAA and protecting Personally Identifiable Information (PII).

Brand content manager, Opendatasoft
More articles

More Public Sector Agencies (PSAs) are currently publishing data. Though this is good, there are some questions about what data are allowed to be published as open data. What I am going to share with you is an example from North Carolina. Recently, one of our city customers asked me “What data can I publish?”. The answer I gave my client is outlined in this blog post. I respond to the question by first talking about how to use a Data Classification Policy as a roadmap for deciding what data to publish, and then diving further into the question about what data can be published according to standards like PCI, FERPA, HIPAA and protecting Personally Identifiable Information (PII).

Every state and every PSA has its own set of policies and laws. In addition to these, there are Federal guidelines as to what constitutes data that cannot be published. I happen to be very familiar with those of North Carolina, thus guiding my choice for context. However, every state publishes a statute about records disclosures, and I urge you to read that statute as it applies to you.

Take this content as advice from the field and not a legal opinion.

Copy to clipboard

Data Classification Policy

Data Classification Policies should exist within every PSA that is charged with publishing data in any form. A Data Classification Policy divides data based on the sensitivity of the data. The more sensitive the data, the less likely it is to be a viable candidate for publication. If your IT business unit does not have a Data Classification Policy, start there. Get one in writing. If you need examples, Google the term “Data Classification Policy.” Use the quotes. Here is one I found from Columbia University that is pretty good.

How Data Classification Policies Work

Within a public sector agency, all data must be assigned one data classification and sensitivity level. These classifications define the degrees to which data are to be protected. The data will be classified regardless, whether it is electronically stored and transmitted, hard copy or stored on any other media.

Data Stewards, Stakeholders and Business System Owner(s) will be responsible for determining the data classification and sensitivity level for data within their systems or under their custodianship. New data must be classified prior to creation. If this has not happened, a data audit and a data registry should be conducted.

Data Classification Levels

The data classification levels defined in this policy are based on the impact to the PSA in the event that data is disclosed, altered or destroyed without proper authorization. The classifications are as follows: Prohibited, Highly Confidential, Confidential, Internal, Public.

Copy to clipboard

Laws and Statutes Change: Use Data Audits to Mitigate Risk

Data audits are pretty common. They should be seen as a routine part of any open data program. The results become a dataset themselves. However, audits are usually not part of a data life cycle. Having a data lifecycle and managing it can help your PSA in a couple of ways:

  • Laws change: be ready to reclassify your data
  • Your Open Data will be seen as a primary asset
  • The data becomes inherently more reusable, thereby increasing its value
  • Greater confidence in the quality of the data and in the process of releasing data will drive more agencies to participate

These same data stewards, stakeholders and business owners will be responsible for reviewing classifications assigned to data at least once every three years. Data will be reclassified based on changing usage, sensitivities, regulations, or legislations. Data currently classified as public record may be elevated to confidential with the passing of new state or federal laws.

System Security Administrator(s) and/or Data Custodian(s) will take the appropriate steps to implement and protect the data according to its classification level. The data classification levels outlined in this document apply to the development and implementation of information security controls to protect PSA data.

This post does not address confidentiality as it relates to releasing of PSA information under NC Public Records law. NC Statute 132 handles most of the state legal requirements on data retention and release.

Do you want to know what your State and Federal Records Laws contain? Check out this resource from MuckRock.

Data produced at the state and local level must comply with the following regulatory bodies:

  • Municipal PSA policies, laws and ordinances
  • County PSA policies, laws and ordinances
  • State PSA policies, laws and ordinances
  • Federal PSA policies, laws and ordinances

Examples of Restricted Access Data Types

Data should be classified as Prohibited, Highly Confidential or Confidential when the unauthorized disclosure, modification or destruction would result in significant financial loss to the City, or impair its ability to conduct business, or result in a violation of federal or state laws, contractual agreements, government regulations, including, but not limited to, Family Educational Rights and Privacy Act (FERPA), Health Insurance Portability and Accountability Act of 1996 (HIPAA), and Payment Card Industry (PCI) Regulations.

Examples of data that should be labeled Prohibited, Highly Confidential or Confidential, include, but are not limited to are:

  • Social security numbers (SSN’s)
  • Credit card information
  • Banking information
  • PSA trade secrets
  • Information security related assessments
  • HIPAA information
  • SIMS/TIMS Student record data
  • FERPA related data
  • In North Carolina, personnel records have a high degree of sensitivity
  • In North Carolina, settlement in lawsuits between individuals and the state are prohibited from data release

Public Data Types

Data should be classified as Public when the unauthorized disclosure, modification or destruction would not cause any adverse impact to the PSA’s reputation or cause financial loss. All data not identified as Prohibited, Highly Confidential, Confidential or Internal will be classified as Public.

The basic rule of thumb, if it has a public individual identifier then the data should be reviewed in the DAMA context. There are some exceptions. For example, building and commercial permit data often contains the names of individuals “doing business as”. If you request a building permit in North Carolina, you waive your right to privacy about that permit.

Copy to clipboard

Ultimately someone in the organization is responsible for data that is made publicly available. There are a number of ways these persons are described. Their jobs are to protect the data and ensure sound data governance. It is important to remember that they are stewards, rather than owners. The public, in paying taxes to the PSA, are the true owners of the data. Data classification and security is everyone’s job. “Everyone’s job” is also collaborating and making as much data for reuse available as possible within statutory, policy and program guidelines.

Data security and access are part the Data Management Body of Knowledge (DMBOK) from the Global Data Management Community (DAMA). DAMA is the organization charged with defining sound data governance policies, and the DMBOK is their handbook to helping groups implement them.

Copy to clipboard

Business System Owner

The person(s) who is responsible for the functions, data and security of the system. Business system owners retain decision rights for the business use of the system.

System Security Administrator

The person(s) responsible for provisioning access to the requested system and supporting resources.

Data Custodian(s)

The person(s) responsible for control over, granting access to an organization’s electronic files while protecting the data as defined by the organization’s standard IT practices.

Copy to clipboard

At the end of the day, following sound data security and access is common sense. DAMA Data Security Management is the 7th Pillar of sound data governance. Finding what you can publish should start with a discussion with your IT Department: “Do we have a data classification policy?”. Most often the answer is no unless you are a large PSA. Rule of thumb as I have observed: any city with over 100,000 residents should have a data classification policy, a data architect and a data governance committee.

Articles on the same topic : Data catalog Security

Read more
What is cloud-based data governance and why is it crucial for companies? Data Trends
What is cloud-based data governance and why is it crucial for companies?

Data governance is critical to ensuring that data is reliable, trustworthy and accessible by the right users, enabling organizations to become truly data-centric. Ensuring that cloud-based data is well-governed brings new challenges around control, security and compliance - this blog explains how to overcome them.

5 tips for deploying a data mesh approach in your organization Data Trends
5 tips for deploying a data mesh approach in your organization

What are the best practices around deploying the data mesh approach in your organization?

GDPR and the importance of protecting data privacy Data Trends
GDPR and the importance of protecting data privacy

The General Data Protection Regulation has transformed how personal information is used and protected across the European Union. However, its impact goes much further, forming the basis of state legislation within the US. We explain why following its principles benefits all organizations in terms of compliance and good practice.

What is cloud-based data governance and why is it crucial for companies? Data Trends
What is cloud-based data governance and why is it crucial for companies?

Data governance is critical to ensuring that data is reliable, trustworthy and accessible by the right users, enabling organizations to become truly data-centric. Ensuring that cloud-based data is well-governed brings new challenges around control, security and compliance - this blog explains how to overcome them.

5 tips for deploying a data mesh approach in your organization Data Trends
5 tips for deploying a data mesh approach in your organization

What are the best practices around deploying the data mesh approach in your organization?

GDPR and the importance of protecting data privacy Data Trends
GDPR and the importance of protecting data privacy

The General Data Protection Regulation has transformed how personal information is used and protected across the European Union. However, its impact goes much further, forming the basis of state legislation within the US. We explain why following its principles benefits all organizations in terms of compliance and good practice.