Skip to content

Latest commit

 

History

History
173 lines (109 loc) · 6.77 KB

ShortURLHandbook.md

File metadata and controls

173 lines (109 loc) · 6.77 KB

A Practical Guide To Approach For Creating URL Shortner Application

What is a Short URL?

A Short URL refers to a web service that transforms a long URL into a much shorter version.This helps to simplify the complex URLs that may be difficult to share or fit into a character limited platforms like sms/tweets.In addition, it is easy and less error prone to type a short url when compared to its longer version.

What are the benefits of a Short URL?

  1. Easy to share and remember. Beautify a URL
  2. Helpful to track the clicks on URLs for Analytics purposes.
  3. Helps to Beautify any Long URLs.

Requirements to create a URL Shortner Application?

  • Generate a short url with the given original url.
  • The short url (or link) should redirect users to original url (link).
  • Provide option to create custom short url as given by end user.

Design goal considered to create a URL Shortner Application?

  • How long a tiny url would be ? Will it ever expire?
  • The services to Create/Retreive/Update/Delete short url must be exposed as REST service.
  • The long URL should also be uniquely identifiable from the short URL.
  • Provide option to create a tiny url by user's choice.
  • If user is allowed to create customer shortened links, it should be of max 7 characters.
  • Once a shortened link is generated it should stay in system for lifetime.
  • The Service should collect metrics like most clicked links.

How a Short URL looks like?

Commomly the most URL shortening application or services provide the short URL where:
Example: `http://localhost:9999/1L9zO9P`

  • Fist Part: is the domain name of the service. example: localhost:9999

  • Second Part: is a string formed by a set of random characters. example 1L9zO9P
    This random string should be unique for each short URL created for a long URL.

  • Length of Short URL or Random String:

The unique random characters can be generated using either base62 or md5.
Both base62 and MD5 algorithm outputs consist of 62 character combinations (a-z A-Z 0-9).

Base62 takes integer type as input. So we need to first convert the long URL to a random number then pass this random number to the base62 algorithm.

MD5 takes string as the input and generates a random fixed length string output, we can directly pass the long URL as input.

URL Shortnening Algorithms?

  • Hashing Algorithm: Use a hashing algorithm such as MD5, SHA-256, or CRC32 to generate a hash value from the long URL. The hash value can then be encoded using base62 or base64 encoding to create a short URL.

  • Base Conversion Algorithm: Convert the unique ID of the long URL into a shorter representation using base conversion techniques. For example, you can convert the decimal representation of the ID into a base58 or base62 encoding, excluding easily confused characters like ‘0’, ‘O’, ‘1’, ‘I’, etc.

  • Bijective Function Algorithm: Utilize a bijective function that maps a unique identifier to a short URL and vice versa. For example, you can use a function like the Bijective Conversion Function (BCF) algorithm, which converts a decimal ID into a sequence of characters using a predefined set of characters.

  • Randomized Approach: Generate a random sequence of characters of a fixed length (e.g., 6 or 8 characters) to create the short URL. Although this approach may not guarantee uniqueness, you can perform a lookup in the database to ensure uniqueness and regenerate if there’s a collision.

  • Counter Based Approach: When the server gets a request to convert a long URL to short URL, it first talks to the counter to get a count value.This value is then passed to the `base62` algorithm to generate random string. Making use of a counter to get the integer value ensures that the number we get will always be unique by default because after every request the counter increments its value.

How base62 Algorithm Works?


The Base62 encoding scheme is a binary-to-text encoding scheme that represents binary data in an ASCII string format. It uses character that can be one of the following:

A lower case alphabet [‘a’ to ‘z’] -> total 26 characters
An upper case alphabet [‘A’ to ‘Z’] -> total 26 characters
A digit [‘0′ to ‘9’] -> total 10 characters

Hence, there are total 26 + 26 + 10 = 62 possible characters. [base62].

The typical order is 0-9 for numbers (values 0 to 9), A-Z for uppercase letters (values 10-35), a-z for lowercase letters (values 36-61)

In this application, it is decided to go with random character string of length 7, so we can have 62^7 ~ 3.5 trillion combinations to work with.This is more than enough for a sample application.

Conversion to base62?

  • Divide the number by 62.
  • Record the remainder as the rightmost digit.
  • Use the quotient as the new number to be divided.
  • Repeat the process until the number is zero.
  • The sequence of remainders (read from bottom to top) forms the Base62 encoded string.

Data in Redis for Url Shortner Application?

  1. Short Url: 7 character long unique short URL.
  2. Original Url: The original long URL.
  3. Counter: Unique integer to create Short URL Id. Initial counter value is 100000000000
  4. Mapping Short URL Id and Long URL: Store the details of Short URL Id and Long URL.
  5. Mapping Short URL Id and Click Count: Store the total clicks on Short URL Id to maintain analytics.

FAQs?

  1. Why initial counter value in redis is set to 100000000000 ?
  2. To Ensure Minimum Length: By starting with a large initial value, we ensure that the encoded ID (short URL) has a minimum length of 7-characters from the beginning.

    Avoid Simple Sequences: Starting with a large number makes the generated short URLs less predictable.

    Collision Avoidance: If there are parallel or multiple ID generation requests then starting at a higher value could help in avoiding collisions or overlap.

  3. Explain the Application flow for Creating Short URL ?
  4. The service recieves a POST request to the /create-short-url endpoint.
    The service then gets a unique counter value from redis database. This value is then used to generate unique Short URL Id using base62 algorithm.
    After storing the mapping of generated Short URL id and long URL and with all the details the service returns response to Client with all the corresponding details.