A user account is required in order to edit this wiki, but we've had to disable public user registrations due to spam.

To request an account, ask an autoconfirmed user on Chat (such as one of these permanent autoconfirmed members).

Talk:MetaExtensions: Difference between revisions

From WHATWG Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 57: Line 57:
::[[User:Nick|Nick]] 03:26, 29 October 2009 (UTC)
::[[User:Nick|Nick]] 03:26, 29 October 2009 (UTC)


Being able to search for GPL, LGPL, CC, FreeBSD, etc. would be useful.  [[User:Mnewman|Mnewman]] 15:42, 12 February 2010 (UTC)
==Meta versus content==


==Creator==
Why is content being put into the meta tag that only search engines can view?  Wouldn't it be more useful to have this information in regular HTML code that is viewable to the end user while at the same time easily findable to search engines?


I am going to assume that this is the person who created the layout or CSS.  I have seen the term "Designer" used, and I think that "Designer" is more meaningful.  Although, if content should be separate from design, then there really should be a way to set the author of the CSS in CSS and then have that info available to the web search engines. [[User:Mnewman|Mnewman]] 15:42, 12 February 2010 (UTC)
I will need to think about the details, but here is a start.


[[User:Mnewman|Mnewman]] 15:42, 12 February 2010 (UTC)
There is an HTML tag called "base"


==Audience==
<BASE HREF="http://www.example.com/">


The audience tags need to pre-defined, or else it is going to be meaninglessThis is especially true in kids material.  Also, there would need to two levels for thisFor example, "reading-level" and "read-aloud".  At what reading level is a site's content?  At what level can the site material be read aloud to a child?  The two values can be different, especially for younger grades.
That is the foundation of the website or part thereof of the websiteAll other parts go to that extensionBy creating standard pages that are defined on the page -- Title Page A goes with content page A -- it will make it easier for both search engines and the end reader to find the information.


Second, what about graphics versus text?  A website that can potentially be at an adult reading level, but it could have pictures that are fine for a school project.  An example, would be a veterinarian site about dogs.  The content (aka words) may be directed towards adults, but the pictures are just of animals, which would be fine for kids.
===<TITLEPAGE HREF="titlepage.html">===


Get rid of the word "older".  It is insulting to senior citizens.
On the titlepage, that is associated with that specific web page, one would expect to find all of the information that one would traditionally find on a title of a print source similar to a book or magazine.


I also question the "lawyer/law client"; "patient/medical professional"; These are two specific.  Maybe something more generic "layman/student/Professional" -- no previous knowledge assumed; learning the field with intention of becoming a professional; professional.
There could be standard XML notation for this info:
<TITLE>
<SUBTITLE>
<AUTHOR>
<EDITOR>
<PUBLISHER>
<ILLUSTRATOR>
<PUBLISH-ADDRESS>
<COPYRIGHT>
<DDC> (dewey decimal classification)
<LLC> (library of congress classification)
<ISBN> - maybe start a standard where people could register a website, and that organization verifies that the website's content match the info on the titlepage.


There is too much material included in this one tag.  It needs to be broken up.
Whatever other names are standard in the industry.


What about the following?
===<TOC-COMPACT HREF="toc-compact.html">===


* Reading Level
It points to the TOC of the siteSimilair to a traditional TOC.
** Adult (similair to an adult section in a public library)
** Teen (teen section of a public library)
** Youth (youth section of the public library)
** child (picture books)
** Guided readers (Graded reading levels)
Anything that says "youth" (PG), "child" (G), or "guided reader" should never contain any content (pictures or text) that is not appropriate for childrenIf one wouldn't find the content in their local public library in the chidren's section, then it is not that level.


Teen -- any topic that has this tag and has issues (sex education, drug abuse, etc) should be explained at a level that is appropriate for that age group.  If one would not realistically be able to find the material in the teen section of their local public library, it does not belong. (PG-13).
===<TOC HREF="toc.html">===


Now the only thing that is left is the adult.  How does one distinguish between R, NC-17, and even more adult stuff.  What about layman, vs. professional? 
A more detailed TOC.


[[User:Mnewman|Mnewman]] 15:42, 12 February 2010 (UTC)
===<ABOUTAUTHOR HREF="about_author.html">===


==Filtering categories==
An about the author page.


Easily specify the filtering category or categories?  Yes, there is a the potential for abuse, but it can also be useful. For example, a website that is clearly designed to be about "sex-education" directly by teenagers written by the American Academy of Pediatrics could be defined as:
===<INTRODUCTION HREF="introduction.html">===


<meta name="author" content="Dr. Kid">
Information generally found in an introduction. Who is the target audience of the website, what are the goals of the website.
<meta name="publisher" content="American Academy of Pediatrics">
<meta name="audience" content="tween and teens">
<meta name="filtering" content="sex education">


These would need to be predefined categories that everyone agrees upon.
===<PREVIOUS HREF="previous.html">===


==mirrors of other sites==
If the website is intended to be read like a book, what is the previous page to read?


How about a way to specify that something is a mirror of another site.  For example, a wikipedia port.
===<NEXT HREF="next.html">===


<meta name="mirror" content="http://www.wikipedia.org; http://www.dictionary.com">
What is the next web page to read?


More than one could be specified if data comes from more than one site.
===keywords and descriptions===


For example, if I want to block "wikipedia", what I really want to do is to block wikipedia and all mirrors of wikipedia.
Personally, I don't understand why the keywords and descriptions are in the header as opposed to the main content.  It is content.  All content should be viewable to the end reader without having to look at the source code.  Hidden titles, descriptions, and keywords would be the same as subliminal messages on a TV.  They are being shown, but the user does not know what they are.


[[User:Mnewman|Mnewman]] 15:42, 12 February 2010 (UTC)
Also, by bringing them out into the open part of a web page, there is a less likelihood of abuse.


==subj-==
Maybe things should change so that instead of a search engine just reading the "head" part, they also read the new HTML5 tag "header" and "footer" as well.


Start with the basic ones of Library of Congress, Dewey Decimal, and Universal decimal.  There should also be a way to specify if a modified version of a system is being used.  For example, take religion.  Dewey Decimal is based on Christianity.  Websites for other religions, may use a modified dewey decimal that changes just the Religion category of Dewey Decimal to make their religion to prominent one and the other religions the secondary ones.
[[User:Mnewman|Mnewman]] 17:08, 12 February 2010 (UTC)
 
==Standard pages vs. meta tags?==
 
What about some standard (recommended) page names, so people can generally look about a page without going to the page first?
 
Equivalent to books:
Front Cover (book) - Web page ?
Back Cover (book) - Web page?
Title Page (book) - Web page?
Back of title page (book) - Web page?
Table of contents, at a glance (book) - web page?
Table of contents, detailed (book) - web page?
About the author (book) - web page?
Acknowlegements (book) - web page?
Introduction (book) - web page?
Index (book) - web page?
Bibliography (book) - web page?
 
[[User:Mnewman|Mnewman]] 15:42, 12 February 2010 (UTC)
 
==Format print==
 
This is specified in the CSS, so it does not belong in the meta tag. Would somebody realistically search for a document based on its page layout?  Not really.
 
[[User:Mnewman|Mnewman]] 15:43, 12 February 2010 (UTC)

Revision as of 17:08, 12 February 2010

"description" meta name

I think the description name should be added to the HTML 5 specifications. Yes, search engines have made the keywords name obsolete. However search engines are not that good. It is still only the document author that can provide a reliable, short description of the documents contents. I think there should be constrains on how long, what it contains, and the structure of the description. Short sentences, and plain English descriptions would be the best.

It looks like "description" is in the latest specification. Rfc2549 01:46, 10 October 2008 (UTC)
Keywords still work. Descriptions are a good idea. However, HTML5 should not constrain search engines by specifying short sentences or any particular structure. HTML5 and this MetaExtensions Wiki should be permissive, especially for websites that depend on specialized or foreign search engines that may have other rank-driving preferences for meta tags. Rather, page authors should consider short sentences and the plain language you suggest because when they appear in search results they attract visitors. However, a site meant for physicians might have a different idea on what language is plain for their readers, so HTML can't usefully define that. Advice on good drafting is the province of search engines and various websites that report or advise. Nick 08:45, 20 April 2009 (UTC)

keywords and description should not be unendorsed, should they?

Why are keywords and description unendorsed? Yahoo and Google do use them for searches, if not as much as when they began offering search services. And there are other search engines around the world, which might well support both of these. The unendorsing of cache is explained; what are the explanations for the other two? I see keywords, but not description, was unendorsed from the beginning, which suggests a misclassification by the original proposer. I think someone should reclassify both as proposals. Would it be okay if I did? Nick 08:55, 20 April 2009 (UTC)

On description: resolved. Ian Hickson reports that it's in HTML5, and it is, in section 4.2.5.1 (I had forgotten). On keywords: he didn't mention them in his reply but moved them within the Wiki from unendorsed to failed, and I've submitted a bug report for reconsideration of that decision: W3C Bug 6853. Nick 09:35, 29 April 2009 (UTC)

rights: why reversion

I'm reverting for these reasons:

The revision's added content was largely redundant of the link element rel="license" that, per <http://wiki.whatwg.org/wiki/RelExtensions>, is already in HTML5 (see section 6.12.3.9 of the W3C Working Draft of 8-25-09). The link uses URLs instead of standardized strings, but those URLs can be used the same way as the proposed strings, i.e., a search engine or UA can recognize the URLs without repeatedly fetching as equivalent to the strings. And the set of URLs is essentially extensible without requiring registration here, which simplifies the tag's use.

The proposed string "coprYYYY" is probably not legally sufficient notice even in the U.S., and around the world what that notice must state may vary. It's too much for us to list all the possibilities. Let the page author write such notice as they see fit and let the search engines store or refer to it as they see fit without abbreviating it.

I doubt the proposed code/content distinction can be legally defined in this way and the distinction recognized in a court of law applying copyright law. Most judges and lawyers probably don't know how Internet standards are promulgated. For example, arrangement is copyrightable and a judge might conclude that in copyright law code is one thing and arrangement of the code is another. I haven't researched case law on point but doing so is pointless when copyright law is in many nations that each have their own laws. My proposal is more flexible and thus better able to meet page authors' needs.

The legal needs require a way of writing arbitrary or free-form text (not arbitrary as lawyers use that term).

The revision is more complicated to implement.

The revision doesn't cover multiple licensing for one work. The HTML5 working draft carries an assertion of two licenses. So does Perl, the language. When two licenses are said to apply, a relationship has to be defined, perhaps that the user can choose which to apply, perhaps one applies for commercial use and the other for noncommercial use, or whatever. The revision apparently required multiple meta tags but didn't propose that. If a search engine might take multiple meta rights tags as conflicting, things can get confusing and legal rights may be lost.

The revision doesn't cover multiple licensing for multiple works on one page. Free-form text in the meta tag would allow that. For example, "The photograph may be available from the Permissions Department at . . .; the text is licensed under . . .; the arrangement is the property of . . .; and the music is licensed through BMI."

Thank you for the character entity edit.

Perhaps another meta keyword should be proposed, perhaps rights-standard, to use what was proposed? Would it serve a purpose that link wouldn't?

Thanks. Nick 07:58, 28 October 2009 (UTC)

I see what you're saying -- my point of view would be that it would be beneficial for a machine-readable/parsable value for use in searching, cataloging, etc. Free-form text makes this very difficult, if not impossible to apply to any degree. If the page author is providing a legal notice of copyright, then they must provide the country-appropriate copyright notice visible to the page viewer (in the content itself). Copyright Notice, Deposit, and Registration § 17, 4 U.S.C.§ 401 (2007)
My fear is without caselaw to guide us on the enforceability or applicability of this new way of embedding copyright in the code, we are setting up a standard that will not last past the first court case. Am I confused about the rationale behind this tag? What does it accomplish that a notice the page's viewable content area wouldn't?
>Perhaps another meta keyword should be proposed, perhaps rights-standard, to use what was proposed? I'll do just that -- thank you. BryanH 16:47, 28 October 2009 (UTC)
Notice calculated to be seen by most Web visitors will be visible in the UA's (viz., browser's) window or canvas, so it should be written to be visible there. But code or markup is not seen there. That's seen when a UA user either looks in the source code via the UA or receives it separately and doesn't expose it to the UA. The code will often include not the word "Copyright" or a "C" in a circle but a character entity that mostly only programmers will recognize when it is not interpreted. Without interpretation, "&copy; 2009 Lois Ng" does not meet U.S legal requirements for a copyright notice.
Therefore, I argue that a copyright notice calculated to be seen by someone looking at source code without benefit of a UA must be written to be visible there and humanly understandable in raw form. I use comments for that purpose, in addition to what's to be visible in the browser's window. But comments have a drawback of being usually not parseable by search engines (except for scripts, which use specially formatted comments). Since search engines copy our content, we need a way of embedding a copyright notice they have some way of identifying as being a copyright notice. But because legal requirements vary around the world, over time, and according to what is potentially subject to copyright, not to mention that some of us feel compelled to be extra cautious (either to capture all sorts of rights or because overclaiming can jeopardize an entire claim), there's no way to settle on a single form or a finite list for a copyright notice. A few forms are far more common than others, but it would take a while to research the form of choice to be used in Malawi next year for a sound recording by Madonna on tour if she's donating her work to a local school she's building (e.g., which nation's law applies). Thus, the string must be free-form.
But that limits what search engines can do with the string. With the meta rights proposal, likely the only parsing a machine can do is to recognize that it is a rights statement because it is in a tag and with a keyword that says so in accordance with the spec (which in turn supports the MetaExtensions page) and to recognize the string's length so that it can choose to either copy and display the string or only refer to it but in neither case truncate or edit it.
If a search engine programmer wants to try more sophisticated textual analysis, they'll succeed some of the time, but this very limited parseability is enough to flag a page as having an assertion of a right, which a human might choose to read, and if the human does not read it and infringes the copyright it'll be harder to pretend they infringed innocently, which is relevant to the remedy a U.S. court may apply.
The U.S. law you linked to on form of notice does not control other nations. While it says "or elsewhere", the government's legal authority outside the U.S. is severely limited regardless of what Congress passes and other jurisdictions may enact their own requirements. There is also, within the U.S., state law on copyright, rarely invoked, generally common law, and generally on work that is not in a fixed form; for which I don't know what the notice requirement, if any, is.
Possibly a court will reject anything we design, but generally courts require us to conform to already-existing law. Federal U.S. courts do not give advisory opinions and declaratory relief is expensive, unusual, and difficult to get, so we need to use the best judgment we can marshal now. If we want the benefits of intellectual property law and when it requires that we as owners and licensees give notice, we need to design a means for doing so without waiting for a court ruling on the specific design. Often pages have no rights assertion inside a page and suitable for code readers, and that's simply a deficiency. Thus, this tag.
This tag also can apply to other intellectual property rights. Software is, in some places, subject to patents and patent rights may be granted by their holder and trademark rights may apply, too.
Thanks for creating a rights-standard proposal. You might find more licenses available, such as, perhaps, the MIT license, the GPL the LGPL, the FreeBSD license, and who knows what else that might be applied to pages. While some licenses were explicitly designed for texts, others may be applicable to texts even if that wasn't in the license drafters' original intentions.
Nick 03:26, 29 October 2009 (UTC)

Meta versus content

Why is content being put into the meta tag that only search engines can view? Wouldn't it be more useful to have this information in regular HTML code that is viewable to the end user while at the same time easily findable to search engines?

I will need to think about the details, but here is a start.

There is an HTML tag called "base"

<BASE HREF="http://www.example.com/">

That is the foundation of the website or part thereof of the website. All other parts go to that extension. By creating standard pages that are defined on the page -- Title Page A goes with content page A -- it will make it easier for both search engines and the end reader to find the information.

<TITLEPAGE HREF="titlepage.html">

On the titlepage, that is associated with that specific web page, one would expect to find all of the information that one would traditionally find on a title of a print source similar to a book or magazine.

There could be standard XML notation for this info: <TITLE> <SUBTITLE> <AUTHOR> <EDITOR> <PUBLISHER> <ILLUSTRATOR> <PUBLISH-ADDRESS> <COPYRIGHT> <DDC> (dewey decimal classification) <LLC> (library of congress classification) <ISBN> - maybe start a standard where people could register a website, and that organization verifies that the website's content match the info on the titlepage.

Whatever other names are standard in the industry.

<TOC-COMPACT HREF="toc-compact.html">

It points to the TOC of the site. Similair to a traditional TOC.

<TOC HREF="toc.html">

A more detailed TOC.

<ABOUTAUTHOR HREF="about_author.html">

An about the author page.

<INTRODUCTION HREF="introduction.html">

Information generally found in an introduction. Who is the target audience of the website, what are the goals of the website.

<PREVIOUS HREF="previous.html">

If the website is intended to be read like a book, what is the previous page to read?

<NEXT HREF="next.html">

What is the next web page to read?

keywords and descriptions

Personally, I don't understand why the keywords and descriptions are in the header as opposed to the main content. It is content. All content should be viewable to the end reader without having to look at the source code. Hidden titles, descriptions, and keywords would be the same as subliminal messages on a TV. They are being shown, but the user does not know what they are.

Also, by bringing them out into the open part of a web page, there is a less likelihood of abuse.

Maybe things should change so that instead of a search engine just reading the "head" part, they also read the new HTML5 tag "header" and "footer" as well.

Mnewman 17:08, 12 February 2010 (UTC)