1、前言
- 数据结构和算法是脱离编程语言束缚的!算法是什么?算法是程序加上数据结构,也是完成一项任务的具体步骤。
- 这篇笔记将回顾基础的数据结构(如:树、图、链表、栈、堆等)的特性和复杂度;重点放在使用Java语言来实现这些基础数据结构,并且使用Java中的集合类、Map类来实现对数据的查找、排序等。
- 这篇笔记的Java集合类实现部分以GitHub-yanglbme的代码为参考。
- 网上描述数据结构的教程五花八门,个有特色。本人本着学习的态度对数据结构进行从浅到深的回顾学习。之前的D3.js博文中提到的创始人的网站对数据结构的排序也有介绍和描述,大家可以参考。
2、Basics Data Structure
2.1 String
String相关的概念、方法在开发中经常用到!在Python、C++和Java中都定义了许多关于字符串操作的方法。关于字符串的操作,Java中还定义了三个类String类、StringBuffer类和StringBuilder类围绕这三个类有大量的方法。
在Python中主要的字符串操作有切片、查找、长度等如下:
s1 = str()
s2 = "zxcvbnm"
s2len = len(s2)
s2[-3:]#bnm
s2[5:8]#nm
s3 += 'kill'
s2.index('b')
s2.find('b')
在Java中,关于字符串的操作,要格外小心!
String s1 = new String();
String s2 = "billryan";
int s2Len = s2.length();
s2.substring(4, 8); // return "ryan"
StringBuilder s3 = new StringBuilder(s2.substring(4, 8));
s3.append("bill");
String s2New = s3.toString(); // return "ryanbill"
// convert String to char array
char[] s2Char = s2.toCharArray();
// char at index 4
char ch = s2.charAt(4); // return 'r'
// find index at first
int index = s2.indexOf('r'); // return 4. if not found, return -1
2.2 Linked List 「链表」
- 链表是线性表的一种,线性表是最简单的一种数据结构。线性表有两种存储方式:链式存储结构和顺序存储结构。数组就是顺序存储结构。链表存储结构中,两个相邻的元素在内存中不再要求是相邻的。
- 分类:
- 单链表
- 双链表
- 循环链表
- 操作:
- 增、删、改、查、判空等
Python实现
# 编程实现
class ListNode:
def __init__(self, val):
self.val = val
self.next = None
# 链表基本操作(基于单链表操作)
# 反转链表:将1->2->3->null反转成3->2->1->null
def reverse(self, head):
prev = None
while head:
temp = head.next
head.next = prev
prev = head
head = temp
retrun prev
Java实现
public class ListNode{
public int val;
public ListNode next;
//构造函数
public ListNode(int val){
this.val = val;
this.next = next;
}
}
// 链表逆转
public ListNode reverse(ListNode head){
ListNode prev = null;
while(head != null){
ListNode next = prev;
head.next = prev;
prev = head;
head = next;
}
return prev;
}
//recursive mrthod
public ListNode reverse(ListNode head){
if(head == null || head.next == null){
return head;
}
ListNode next = head.next;
ListNode newHead = reverse(next);
next.next = head;
head.next = null;
return newHead;
}
双链表的操逆转作:
class DListNode{
int val;
DListNode prev, next;
DListNode(int val){
this.val = val;
this.prev = this.next = null;
}
}
//双链表逆转
public DListNode reverse(DListNode head){
DLisNode curr = null;
while(head !=null){
curr = head;
head = curr.next;
curr.next = curr.prev;
curr.prev = head;
}
return curr;
}
删除链表中的节点只需要prev -> next = prev->next->next
即可。
2.3 Binary Tree 「二叉树」
二叉树是每个节点最多有2个子树的树结构,二叉树常分为二叉搜索树和二叉堆。
满二叉树
完全二叉树
平衡二叉树
编程实现 Python实现
class TreeNode:
def __init__(sself, val):
self.val = val
self.left, self.right = None, None
Java实现
public class TreeNode{
public int val;
public TreeNode left, right;
public TreeNode(int val){
this.val = val;
this.left = null;
this.right = null;
}
}
树的遍历
- 前序遍历:先根后左右
- 中序遍历:先左右后根
- 后序遍历:先右左后根
- 层次遍历:按层从根开始遍历
class TreeNode:
def __init__(self, val):
self.val = val;
self.left,self.right = None, None;
class Traversal(object):
def __init__(self):
self.traverse_path = list()
def preorder(self, root):
if root:
self.traverse_path.append(root.val)
self.preorder(root.left)
self.preorder(root.right)
def inorder(self, root):
if root:
self.inorder(root.left)
self.traverse_path.append(root.val)
self.inorder(root.right)
def postorder(self, root):
if root:
self.postorder(root.left)
self.postorder(root.right)
self.traverse_path.append(root.val)
一颗二叉查找树(BST)是一颗二叉树,其中每个节点都含有一个可进行比较的键及相应的值,且每个节点的键都大于等于左子树中的任意节点的键,而小于右子树中的任意节点的键。
使用中序遍历可得到有序数组,这是二叉查找树的又一个重要特征。
2.4 huffman Compression 「霍夫曼压缩」「霍夫曼编码」
目的:使用最少的比特来存储文件或传输信息;
策略:用较少的比特表示出现频率较高的字符,用较多的比特表示出现频率低的字符;
在信息论和编码理论中都会出现的概念;
一般使用前缀码,避免出现码字不唯一的情况。
Python编程实现
import heapq
import collections
def get_rate(compressed_binary, uncompressed_bits):
return len(compressed_binary) * 100 / uncompressed_bits
class SimpleCompression:
def __init__(self, string):
self.symbols = set(string)
self.bit_len +=1
while 2**self.bit_len < len(self.symbols):
self.bit_len += 1
self.string = string
self.s2b = {}
self.b2s = {}
i = 0
for s in self.symbols:
b = bin(i)[2:]
if len(b) < self.bit_len:
b = (self.bit_len - len(b)) * '0' + b
self.s2b[s] = b
self.b2s[b] = s
i += 1
def compress(self):
bits = ''
for s in self.string:
bits += self.s2b[s]
return bits
def uncompress(self, bits):
string = ''
for i in range(0, len(bits), self.bit_len):
string += self.b2s[bits[i:i + self.bit_len]]
return string
class HuffmanCompression:
class Trie:
def __init__(self, val, char=''):
self.val = val
self.char = char
self.coding = ''
self.left = self.right = None
def __eq__(self, other):
return self.val == other.val
def __lt__(self, other):
return self.val < other.val
def __gt__(self, other):
return self.val > other.val
def __init__(self, string):
self.string = string
counter = collections.Counter(string)
heap = []
for char, cnt in counter.items():
heapq.heappush(heap, HuffmanCompression.Trie(cnt, char))
while len(heap) != 1:
left = heapq.heappop(heap)
right = heapq.heappop(heap)
trie = HuffmanCompression.Trie(left.val + right.val)
trie.left, trie.right = left, right
heapq.heappush(heap, trie)
self.root = heap[0]
self.s2b = {}
self.bfs_encode(self.root, self.s2b)
def bfs_encode(self, root, s2b):
queue = collections.deque()
queue.append(root)
while queue:
node = queue.popleft()
if node.char:
s2b[node.char] = node.coding
continue
if node.left:
node.left.coding = node.coding + '0'
queue.append(node.left)
if node.right:
node.right.coding = node.coding + '1'
queue.append(node.right)
def compress(self):
bits = ''
for char in self.string:
bits += self.s2b[char]
return bits
def uncompress(self, bits):
string = ''
root = self.root
for bit in bits:
if bit == '0':
root = root.left
else:
root = root.right
if root.char:
string += root.char
root = self.root
return string
if __name__ == '__main__':
s = 'everyday is awesome!'
# ASCII
bits = len(s) * 8
print('Total bits: %d' % bits)
# simple compression
sc = SimpleCompression(s)
compressed = sc.compress()
print('Compressed binary: ' + compressed)
print('Uncompressed: ' + sc.uncompress(compressed))
print(sc.s2b)
print('Simple Compression-compress rate: %d%%' % get_rate(compressed, bits))
print('===================')
# huffman compression
hc = HuffmanCompression(s)
compressed = hc.compress()
print('Compressed binary: ' + compressed)
print('Uncompressed: ' + hc.uncompress(compressed))
print(hc.s2b)
print('Huffman Compression-compress rate: %d%%' % get_rate(compressed, bits))
"""
Total bits: 160
Compressed binary: 00101011001010001100001100001100000101000111000100001010001001110110010100101001
Uncompressed: everyday is awesome!
{'a': '0000', ' ': '0001', 'e': '0010', 'd': '0011', 'i': '0100', 'm': '0101', 'o': '0110', 's': '0111', 'r': '1000', '!': '1001', 'w': '1010', 'v': '1011', 'y': '1100'}
Simple Compression-compress rate: 50%
===================
Compressed binary: 011001011011110011010111111000010000111000111111010011110100011011010001
Uncompressed: everyday is awesome!
{'!': '0001', ' ': '001', 'e': '01', 'd': '11010', 'i': '0000', 'm': '11011', 'o': '1000', 's': '1110', 'r': '1011', 'a': '1111', 'w': '1010', 'v': '1001', 'y': '1100'}
Huffman Compression-compress rate: 45%
"""
2.5 Queue 「队列」
在Java中,Queue是接口,一种实现就是LinkedList,LinkedList向上转型为Queue,队列不能存储null元素,否则与poll()等方法的返回值混淆。
Queue<Integer> q = new LinkedList<Integer>();
int qLen = q.size();
2.6 Heap 「堆」
一般分为大根堆和小根堆,在排序中一般用堆排序。