The basic idea of HLD is to break down the path from a node u to v in a tree among some K paths such that for each of those K paths you can calculate the answer in lesser complexity as it will be pre calculated by you and complexity of merging the K paths result will be O(K) most of the time. That K is at most log(N), so you can calculate the answer of a query in M*log(N) where M is the complexity of calculating result of each of the K paths. We generally use segment tree or BIT so the expected total complexity of answering each query becomes log(N)^2 which is much faster than traversing from node u to v
The idea is very similar to square root decomposition, we divide our sample space into blocks/pieces suitably and represent our works as combination of some continious blocks, and figure out how to process blocks efficiently than simple brute force.
I learned HLD from the link you have already given. So go through it again and again untill it makes sense, if stuck at some point, ask that particular question.
Here is one question to get you into right feel.
Given a undirected graph on n (<= 10^5) weighted vertices(initially all are 0), perform two types of queries(<=10^5)
add(v, x) - add x to all the neighboring vertices of v
get(v) - return the weight of vertex v.