What is a URL?
A URL (Uniform Resource Locator) is the web address of a document on the World Wide Web. URLs are the first step the user takes when interacting with your website. URLs are like the command line to the World Wide Web. It is a string of text that you can use to pass instructions in through the location bar, and you receive a web page with information back.
A link is a way for the browser to utilize a URL from within a document. This interaction is often hidden within links.
A Tour of the URL
To understand links, first we need to understand URLs…
A URL is one type of Uniform Resource Identifier (URI). “Web address” is a synonym for a URL that uses the HTTP protocol.
The five major parts of the URL:
URLs are universal. They work in every browser, in cURL requests, wget requests, your mobile devices, and even written down on sticky notes.
Protocol
The http:
protocol isn’t the only protocol there is. We also have ftp:
, mailto:
, and javascript:
. There are classic protocols like telnet:
and gopher:
There are custom protocols that can launch specific browsers or apps.
The two most common that you will use are HTTP and HTTPS. HTTPS is the secure version of HTTP and it encrypts data passed between the client and the server using SSL.
- http:
- https:
- ftp:
- mailto:
- javascript:
- gopher:
Host / Domain
http://nelilly:pass1234@www.htmlhobbyist.com:80/
The host or domain portion of a URL usually only contains the domain name, but it can also contain authentication credentials and/or a port number.
- Authentication Credentials
- Hostname or Domain Name
- Port
Authentication Credentials
The authentication credentials were used to pass username and password credentials to the server. It isn’t typically used anymore and the ability is disabled in some browsers.
nelilly:pass1234@
Hostname or Domain Name
The domain name is typically what most people think of when talking about a website.
www.htmlhobbyist.com
Hostname has an effect on how cookies can be used throughout the site, based on your canonical domain name.
Domain Name Extension
The hostname contains a domain name extension, also referred to as a TLD (Top Level Domain).
.com
Common domain name extensions:
- .com
- .net
- .org
- .edu
- .mil
Uncommon domain name extensions:
- .art
- .biz
- .horse
- .surf
- .website
Port
You won’t typically need to worry about port numbers. A port number is a way to identify a specific process to which an Internet or other network message is to be forwarded when it arrives at a server. The concept of registering a port predates the Internet.
Some popular default ports are:
- :80
- HTTP port used for the World Wide Web
- :443
- SSL port for secure sites (HTTPS)
- :25
- Simple Mail Transfer Protocol (SMTP) used for email
- :21
- File Transfer Protocol (FTP)
- :22
- Secure Shell (SSH) for directly interacting with a server on the command line
Pathname
/html/hypertext_links.html
The path portion of a URL usually corresponds to a directory path and file name. If the file name is left out the server will often be set up to check for index.html
or default.html
and use that. Servers can be configured to interpret a URL path or file name any way they like.
Query String
Query strings do nothing without programming to back them up. If the page isn’t setup to read query strings on the server side, JavaScript can still utilize them. We’ll cover them more in depth when we discuss forms in Webmastering & the Server.
?q=links&filter=elements
Fragment Identfiers
The fragment identifier (#
), sometimes referred to as a hash, identifies something specific as a function of the document. In HTML you can deep link into a specific fragment of a web page by adding a fragment identifier to the href
, the link will bring you to an element with an id
that matches the fragment id. If no id on the page matches the fragment identfier then it will simply bring you to the top of the designated page.
A liberal use of ids in your documents will help others link to specific locations within your content. You may not use duplicate ids on a page. If you do the url will only take the visitor to the first matching id that it finds.
A URL Strategy
Keep in mind some best practices when constructing and changing your URLs.
I want people to be able to copy URLs. I want people to be able to hack URLs. I’m not ashamed of my URLs …I’m downright proud.
@adactio, Jeremy Keith, May 23, 2016
Here are some basic guidelines when creating URLs for your site:
- URLs should not harm a user or, through inaction, allow a user to come to harm.
URLs should not cause confusion, expose users to security holes, be untrustworthy, etc. Pay attention to the status of the external websites that you link to.
- URL structure should follow orders, except where this would conflict with the first law.
The URL should always display the expected content. The URL should be visible, logical, predictable, and meaningful. The user should be able to make guesses about the structure of the site when they look at the URL. URLs should use real words instead of random strings to increase URL accessibility.
- URLs should continue to exist in the future, except where this would conflict with the first two laws.
You should strive to keep your directory structure and file names as stable as possible. Broken URLs have always been a problem online. In 1996 Keith Shafer and several others proposed a solution to the problem. In a fit of irony, the link to this solution is now broken (purl.oclc.org/OCLC/PURL/INET96). It’s luckily preserved on the Internet Archive’s Wayback Machine, but not all pages are preserved there, and not all users are aware that the Internet Archive’s Wayback Machine exists. When your URLs inevitably change, you should do your best to provide a redirect to the new location and providing fallback support to help visitors find missing pages.
When you click the back button you should arrive at the previous page, and not some other location.
Modern JavaScript practices doesn’t always allow us to maintain these ideals as they often include components with state, calls to server side data, and single page applications that aren’t tracked in the URL.