1、前言

  • 数据结构和算法是脱离编程语言束缚的!算法是什么?算法是程序加上数据结构,也是完成一项任务的具体步骤。
  • 这篇笔记将回顾基础的数据结构(如:树、图、链表、栈、堆等)的特性和复杂度;重点放在使用Java语言来实现这些基础数据结构,并且使用Java中的集合类、Map类来实现对数据的查找、排序等。
  • 这篇笔记的Java集合类实现部分以GitHub-yanglbmeopen in new window的代码为参考。
  • 网上描述数据结构的教程五花八门,个有特色。本人本着学习的态度对数据结构进行从浅到深的回顾学习。之前的D3.js博文中提到的创始人的网站对数据结构的排序也有介绍和描述,大家可以参考。

2、Basics Data Structure

2.1 String

String相关的概念、方法在开发中经常用到!在Python、C++和Java中都定义了许多关于字符串操作的方法。关于字符串的操作,Java中还定义了三个类String类、StringBuffer类和StringBuilder类围绕这三个类有大量的方法。

在Python中主要的字符串操作有切片、查找、长度等如下:

s1 = str()
s2 = "zxcvbnm"
s2len = len(s2)
s2[-3:]#bnm
s2[5:8]#nm
s3 += 'kill'
s2.index('b')
s2.find('b')

在Java中,关于字符串的操作,要格外小心!

String s1 = new String();
String s2 = "billryan";
int s2Len = s2.length();
s2.substring(4, 8); // return "ryan"
StringBuilder s3 = new StringBuilder(s2.substring(4, 8));
s3.append("bill");
String s2New = s3.toString(); // return "ryanbill"
// convert String to char array
char[] s2Char = s2.toCharArray();
// char at index 4
char ch = s2.charAt(4); // return 'r'
// find index at first
int index = s2.indexOf('r'); // return 4. if not found, return -1

2.2 Linked List 「链表」

  • 链表是线性表的一种,线性表是最简单的一种数据结构。线性表有两种存储方式:链式存储结构和顺序存储结构。数组就是顺序存储结构。链表存储结构中,两个相邻的元素在内存中不再要求是相邻的。
  • 分类:
    • 单链表
    • 双链表
    • 循环链表
  • 操作:
    • 增、删、改、查、判空等

Python实现

# 编程实现
class ListNode:
    def __init__(self, val):
        self.val = val
        self.next = None

    # 链表基本操作(基于单链表操作)
    # 反转链表:将1->2->3->null反转成3->2->1->null
    def reverse(self, head):
        prev = None
        while head:
            temp = head.next
            head.next = prev
            prev = head
            head = temp
        retrun prev

Java实现

public class ListNode{
    public int val;
    public ListNode next;
    //构造函数
    public ListNode(int val){
        this.val = val;
        this.next = next;
    }
}

// 链表逆转
public ListNode reverse(ListNode head){
    ListNode prev = null;
    while(head != null){
        ListNode next = prev;
        head.next = prev;
        prev = head;
        head = next;
    }
    return prev;
}

//recursive mrthod
public ListNode reverse(ListNode head){
    if(head == null || head.next == null){
        return head;
    }
    ListNode next = head.next;
    ListNode newHead = reverse(next);
    next.next = head;
    head.next = null;
    return newHead;
}

双链表的操逆转作:

class DListNode{
    int val;
    DListNode prev, next;
    DListNode(int val){
        this.val = val;
        this.prev = this.next = null;
    } 
}

//双链表逆转
public DListNode reverse(DListNode head){
    DLisNode curr = null;
    while(head !=null){
        curr = head;
        head = curr.next;
        curr.next = curr.prev;
        curr.prev = head;
    }
    return curr;
}

删除链表中的节点只需要prev -> next = prev->next->next即可。

2.3 Binary Tree 「二叉树」

二叉树是每个节点最多有2个子树的树结构,二叉树常分为二叉搜索树和二叉堆。

  • 满二叉树

  • 完全二叉树

  • 平衡二叉树

  • 编程实现 Python实现

class TreeNode:
    def __init__(sself, val):
        self.val = val
        self.left, self.right = None, None

Java实现

public class TreeNode{
    public int val;
    public TreeNode left, right;
    public TreeNode(int val){
        this.val = val;
        this.left = null;
        this.right = null;
    }
}

树的遍历

  • 前序遍历:先根后左右
  • 中序遍历:先左右后根
  • 后序遍历:先右左后根
  • 层次遍历:按层从根开始遍历
class TreeNode:
    def __init__(self, val):
        self.val = val;
        self.left,self.right = None, None;

class Traversal(object):
    def __init__(self):
        self.traverse_path = list()
    
    def preorder(self, root):
        if root:
            self.traverse_path.append(root.val)
            self.preorder(root.left)
            self.preorder(root.right)

    def inorder(self, root):
        if root:
            self.inorder(root.left)
            self.traverse_path.append(root.val)
            self.inorder(root.right)

    def postorder(self, root):
        if root:
            self.postorder(root.left)
            self.postorder(root.right)
            self.traverse_path.append(root.val)

一颗二叉查找树(BST)是一颗二叉树,其中每个节点都含有一个可进行比较的键及相应的值,且每个节点的键都大于等于左子树中的任意节点的键,而小于右子树中的任意节点的键。

使用中序遍历可得到有序数组,这是二叉查找树的又一个重要特征。

2.4 huffman Compression 「霍夫曼压缩」「霍夫曼编码」

  • 目的:使用最少的比特来存储文件或传输信息;

  • 策略:用较少的比特表示出现频率较高的字符,用较多的比特表示出现频率低的字符;

  • 在信息论和编码理论中都会出现的概念;

  • 一般使用前缀码,避免出现码字不唯一的情况。

  • Python编程实现

import heapq
import collections

def get_rate(compressed_binary, uncompressed_bits):
    return len(compressed_binary) * 100 / uncompressed_bits

class SimpleCompression:
    def __init__(self, string):
        self.symbols = set(string)
        self.bit_len +=1
        while 2**self.bit_len < len(self.symbols):
            self.bit_len += 1
        self.string = string

        self.s2b = {}
        self.b2s = {}
        i = 0
        for s in self.symbols:
            b = bin(i)[2:]
            if len(b) < self.bit_len:
                b = (self.bit_len - len(b)) * '0' + b
            self.s2b[s] = b
            self.b2s[b] = s
            i += 1

    def compress(self):
        bits = ''
        for s in self.string:
            bits += self.s2b[s]
        return bits

    def uncompress(self, bits):
        string = ''
        for i in range(0, len(bits), self.bit_len):
            string += self.b2s[bits[i:i + self.bit_len]]
        return string


class HuffmanCompression:
    class Trie:
        def __init__(self, val, char=''):
            self.val = val
            self.char = char
            self.coding = ''
            self.left = self.right = None

        def __eq__(self, other):
            return self.val == other.val

        def __lt__(self, other):
            return self.val < other.val

        def __gt__(self, other):
            return self.val > other.val

    def __init__(self, string):
        self.string = string
        counter = collections.Counter(string)
        heap = []
        for char, cnt in counter.items():
            heapq.heappush(heap, HuffmanCompression.Trie(cnt, char))

        while len(heap) != 1:
            left = heapq.heappop(heap)
            right = heapq.heappop(heap)
            trie = HuffmanCompression.Trie(left.val + right.val)
            trie.left, trie.right = left, right
            heapq.heappush(heap, trie)

        self.root = heap[0]
        self.s2b = {}
        self.bfs_encode(self.root, self.s2b)

    def bfs_encode(self, root, s2b):
        queue = collections.deque()
        queue.append(root)
        while queue:
            node = queue.popleft()
            if node.char:
                s2b[node.char] = node.coding
                continue
            if node.left:
                node.left.coding = node.coding + '0'
                queue.append(node.left)
            if node.right:
                node.right.coding = node.coding + '1'
                queue.append(node.right)

    def compress(self):
        bits = ''
        for char in self.string:
            bits += self.s2b[char]
        return bits

    def uncompress(self, bits):
        string = ''
        root = self.root
        for bit in bits:
            if bit == '0':
                root = root.left
            else:
                root = root.right
            if root.char:
                string += root.char
                root = self.root
        return string


if __name__ == '__main__':
    s = 'everyday is awesome!'
    # ASCII
    bits = len(s) * 8
    print('Total bits: %d' % bits)

    # simple compression
    sc = SimpleCompression(s)
    compressed = sc.compress()
    print('Compressed binary: ' + compressed)
    print('Uncompressed: ' + sc.uncompress(compressed))
    print(sc.s2b)
    print('Simple Compression-compress rate: %d%%' % get_rate(compressed, bits))

    print('===================')
    # huffman compression
    hc = HuffmanCompression(s)
    compressed = hc.compress()
    print('Compressed binary: ' + compressed)
    print('Uncompressed: ' + hc.uncompress(compressed))
    print(hc.s2b)
    print('Huffman Compression-compress rate: %d%%' % get_rate(compressed, bits))


"""
Total bits: 160
Compressed binary: 00101011001010001100001100001100000101000111000100001010001001110110010100101001
Uncompressed: everyday is awesome!
{'a': '0000', ' ': '0001', 'e': '0010', 'd': '0011', 'i': '0100', 'm': '0101', 'o': '0110', 's': '0111', 'r': '1000', '!': '1001', 'w': '1010', 'v': '1011', 'y': '1100'}
Simple Compression-compress rate: 50%
===================
Compressed binary: 011001011011110011010111111000010000111000111111010011110100011011010001
Uncompressed: everyday is awesome!
{'!': '0001', ' ': '001', 'e': '01', 'd': '11010', 'i': '0000', 'm': '11011', 'o': '1000', 's': '1110', 'r': '1011', 'a': '1111', 'w': '1010', 'v': '1001', 'y': '1100'}
Huffman Compression-compress rate: 45%
"""

2.5 Queue 「队列」

在Java中,Queue是接口,一种实现就是LinkedList,LinkedList向上转型为Queue,队列不能存储null元素,否则与poll()等方法的返回值混淆。

Queue<Integer> q = new LinkedList<Integer>();
int qLen = q.size();

2.6 Heap 「堆」

一般分为大根堆和小根堆,在排序中一般用堆排序。

2.7 Stack 「栈」