使用者工具

網站工具


notes_and_translations:opendata-related_essay_translations:2010-8-11-ten_principles_for_opening_up_government_information_by_john_wonderlich_at_sunlight_foundation

開放政府資訊的十項重要原則

本文件翻譯自美國日光基金會 (SUNLIGHT FOUNDATION) 政策團隊成員John Wonderlich所編撰之「Ten Principles for Opening Up Government Information」專文:http://sunlightfoundation.com/policy/documents/ten-open-data-principles/,翻譯者不主張譯文之任何權利,且聲明唯有原英文版本,方為具正式準據效力之文本。

原文發表日期:2010 年 8 月 11 日 撰文者:美國日光基金會 Mr. John Wonderlich

在 2007 年 10 月的時候,30 名開放政府運動的鼓吹者齊聚在美國加州的Sebastopol市進行會面,討論如何促進政府能夠開放電子式儲存的政府資料,以提供出來供公眾使用。為了達到上述的目標,美國聯邦政府與各州政府已經將部份資料提供出來給公眾,然而這些先期提供的資料,常常不具有連貫性與完整性,反而更引起鼓吹者希冀改善資料質與量的期待。在 2007 年的這次會議裡,在 Carl Malamud 與 Tim O'Reilly 的領導,並經日光基金會的撥款同意支持下,與會者整理出來了 8 項開放政府資訊的重要原則,如果這些原則能夠被具體實踐,那麼在政府將其持有資料進行公開化的過程裡,將能達到更高幅度的公眾參與效益。

目前我們已經將 2007 年 Sebastopol 會議本來的 8 項原則擴充為 10 項原則,這 10 項原則提供一個檢視的標準,可以用來評估個別的政府資料釋出,是不是到達公眾開放性與公眾可及性的範圍。此份原則清單並不能說是鉅細靡遺、無所遺漏,但每一項原則都帶有其相關的說明,以和「開放性 (openness)」的主軸目標間具有連續性的連結。這些原則分別是:完整性、主要性、及時性、實體與電子可及性、機器可讀性、非差別待遇性、開放標準性、授權妥適性、永久性,以及成本合理性。

1. Completeness

Datasets released by the government should be as complete as possible, reflecting the entirety of what is recorded about a particular subject. All raw information from a dataset should be released to the public, except to the extent necessary to comply with federal law regarding the release of personally identifiable information. Metadata that defines and explains the raw data should be included as well, along with formulas and explanations for how derived data was calculated. Doing so will permit users to understand the scope of information available and examine each data item at the greatest possible level of detail.

2. Primacy

Datasets released by the government should be primary source data. This includes the original information collected by the government, details on how the data was collected and the original source documents recording the collection of the data. Public dissemination will allow users to verify that information was collected properly and recorded accurately.

3. Timeliness

Datasets released by the government should be available to the public in a timely fashion. Whenever feasible, information collected by the government should be released as quickly as it is gathered and collected. Priority should be given to data whose utility is time sensitive. Real-time information updates would maximize the utility the public can obtain from this information.

4. Ease of Physical and Electronic Access

Datasets released by the government should as accessible as possible, with accessibility defined as the ease with which information can be obtained, whether through physical or electronic means. Barriers to physical access include requirements to visit a particular office in person or requirements to comply with particular procedures (such as completing forms or submitting FOIA requests). Barriers to automated electronic access include making data accessible only via submitted forms or systems that require browser-oriented technologies (e.g., Flash, Javascript, cookies or Java applets). By contrast, providing an interface for users to download all of the information stored in a database at once (known as “bulk” access) and the means to make specific calls for data through an Application Programming Interface (API) make data much more readily accessible. (An aspect of this is “findability,” which is the ability to easily locate and download content.)

5. Machine readability

Machines can handle certain kinds of inputs much better than others. For example, handwritten notes on paper are very difficult for machines to process. Scanning text via Optical Character Recognition (OCR) results in many matching and formatting errors. Information shared in the widely-used PDF format, for example, is very difficult for machines to parse. Thus, information should be stored in widely-used file formats that easily lend themselves to machine processing. (When other factors necessitate the use of difficult-to-parse formats, data should also be available in machine-friendly formats.) These files should be accompanied by documentation related to the format and how to use it in relation to the data.

6. Non-discrimination

“Non-discrimination” refers to who can access data and how they must do so. Barriers to use of data can include registration or membership requirements. Another barrier is the uses of “walled garden,” which is when only some applications are allowed access to data. At its broadest, non-discriminatory access to data means that any person can access the data at any time without having to identify him/herself or provide any justification for doing so.

7. Use of Commonly Owned Standards

Commonly owned (or “open”) standards refers to who owns the format in which data is stored. For example, if only one company manufactures the program that can read a file where data is stored, access to that information is dependent upon use of the company's processing program. Sometimes that program is unavailable to the public at any cost, or is available, but for a fee. For example, Microsoft Excel is a fairly commonly-used spreadsheet program which costs money to use. Freely available alternative formats often exist by which stored data can be accessed without the need for a software license. Removing this cost makes the data available to a wider pool of potential users.

8. Licensing

The imposition of “Terms of Service,” attribution requirements, restrictions on dissemination and so on acts as barriers to public use of data. Maximal openness includes clearly labeling public information as a work of the government and available without restrictions on use as part of the public domain.

9. Permanence

The capability of finding information over time is referred to as permanence. Information released by the government online should be sticky: It should be available online in archives in perpetuity. Often times, information is updated, changed or removed without any indication that an alteration has been made. Or, it is made available as a stream of data, but not archived anywhere. For best use by the public, information made available online should remain online, with appropriate version-tracking and archiving over time.

10. Usage Costs

One of the greatest barriers to access to ostensibly publicly-available information is the cost imposed on the public for access–even when the cost is de minimus. Governments use a number of bases for charging the public for access to their own documents: the costs of creating the information; a cost-recovery basis (cost to produce the information divided by the expected number of purchasers); the cost to retrieve information; a per page or per inquiry cost; processing cost; the cost of duplication etc.

Most government information is collected for governmental purposes, and the existence of user fees has little to no effect on whether the government gathers the data in the first place. Imposing fees for access skews the pool of who is willing (or able) to access information. It also may preclude transformative uses of the data that in turn generates business growth and tax revenues.


Ten Principles for Opening Up Government Information

August 11, 2010 by John Wonderlich at SUNLIGHT FOUNDATION policy team

In October 2007, 30 open government advocates met in Sebastopol, California to discuss how government could open up electronically-stored government data for public use. Up until that point, the federal and state governments had made some data available to the public, usually inconsistently and incompletely, which had whetted the advocates' appetites for more and better data. The conference, led by Carl Malamud and Tim O'Reilly and funded by a grant from the Sunlight Foundation, resulted in eight principles that, if implemented, would empower the public's use of government-held data.

We have updated and expanded upon the Sebastopol list and identified ten principles that provide a lens to evaluate the extent to which government data is open and accessible to the public. The list is not exhaustive, and each principle exists along a continuum of openness. The principles are completeness, primacy, timeliness, ease of physical and electronic access, machine readability, non-discrimination, use of commonly owned standards, licensing, permanence and usage costs.

1. Completeness

Datasets released by the government should be as complete as possible, reflecting the entirety of what is recorded about a particular subject. All raw information from a dataset should be released to the public, except to the extent necessary to comply with federal law regarding the release of personally identifiable information. Metadata that defines and explains the raw data should be included as well, along with formulas and explanations for how derived data was calculated. Doing so will permit users to understand the scope of information available and examine each data item at the greatest possible level of detail.

2. Primacy

Datasets released by the government should be primary source data. This includes the original information collected by the government, details on how the data was collected and the original source documents recording the collection of the data. Public dissemination will allow users to verify that information was collected properly and recorded accurately.

3. Timeliness

Datasets released by the government should be available to the public in a timely fashion. Whenever feasible, information collected by the government should be released as quickly as it is gathered and collected. Priority should be given to data whose utility is time sensitive. Real-time information updates would maximize the utility the public can obtain from this information.

4. Ease of Physical and Electronic Access

Datasets released by the government should as accessible as possible, with accessibility defined as the ease with which information can be obtained, whether through physical or electronic means. Barriers to physical access include requirements to visit a particular office in person or requirements to comply with particular procedures (such as completing forms or submitting FOIA requests). Barriers to automated electronic access include making data accessible only via submitted forms or systems that require browser-oriented technologies (e.g., Flash, Javascript, cookies or Java applets). By contrast, providing an interface for users to download all of the information stored in a database at once (known as “bulk” access) and the means to make specific calls for data through an Application Programming Interface (API) make data much more readily accessible. (An aspect of this is “findability,” which is the ability to easily locate and download content.)

5. Machine readability

Machines can handle certain kinds of inputs much better than others. For example, handwritten notes on paper are very difficult for machines to process. Scanning text via Optical Character Recognition (OCR) results in many matching and formatting errors. Information shared in the widely-used PDF format, for example, is very difficult for machines to parse. Thus, information should be stored in widely-used file formats that easily lend themselves to machine processing. (When other factors necessitate the use of difficult-to-parse formats, data should also be available in machine-friendly formats.) These files should be accompanied by documentation related to the format and how to use it in relation to the data.

6. Non-discrimination

“Non-discrimination” refers to who can access data and how they must do so. Barriers to use of data can include registration or membership requirements. Another barrier is the uses of “walled garden,” which is when only some applications are allowed access to data. At its broadest, non-discriminatory access to data means that any person can access the data at any time without having to identify him/herself or provide any justification for doing so.

7. Use of Commonly Owned Standards

Commonly owned (or “open”) standards refers to who owns the format in which data is stored. For example, if only one company manufactures the program that can read a file where data is stored, access to that information is dependent upon use of the company's processing program. Sometimes that program is unavailable to the public at any cost, or is available, but for a fee. For example, Microsoft Excel is a fairly commonly-used spreadsheet program which costs money to use. Freely available alternative formats often exist by which stored data can be accessed without the need for a software license. Removing this cost makes the data available to a wider pool of potential users.

8. Licensing

The imposition of “Terms of Service,” attribution requirements, restrictions on dissemination and so on acts as barriers to public use of data. Maximal openness includes clearly labeling public information as a work of the government and available without restrictions on use as part of the public domain.

9. Permanence

The capability of finding information over time is referred to as permanence. Information released by the government online should be sticky: It should be available online in archives in perpetuity. Often times, information is updated, changed or removed without any indication that an alteration has been made. Or, it is made available as a stream of data, but not archived anywhere. For best use by the public, information made available online should remain online, with appropriate version-tracking and archiving over time.

10. Usage Costs

One of the greatest barriers to access to ostensibly publicly-available information is the cost imposed on the public for access–even when the cost is de minimus. Governments use a number of bases for charging the public for access to their own documents: the costs of creating the information; a cost-recovery basis (cost to produce the information divided by the expected number of purchasers); the cost to retrieve information; a per page or per inquiry cost; processing cost; the cost of duplication etc.

Most government information is collected for governmental purposes, and the existence of user fees has little to no effect on whether the government gathers the data in the first place. Imposing fees for access skews the pool of who is willing (or able) to access information. It also may preclude transformative uses of the data that in turn generates business growth and tax revenues.

notes_and_translations/opendata-related_essay_translations/2010-8-11-ten_principles_for_opening_up_government_information_by_john_wonderlich_at_sunlight_foundation.txt · 上一次變更: 2019/01/16 04:45 (外部編輯)