kudu的compaction机制

1. 为什么进行compaction

kudu中的compaction主要针对磁盘中的DiskRowset进行合并,DiskRowset包含3部分数据:BaseData、RedoLog、UndoLog,对应的有下面3种compaction形式:

  • MergeCompaction:仅对BaseData合并
  • MinorDeltaCompaction:对RedoLog进行合并
  • MajorDeltaCompaction:对UndoLog、BaseData、RedoLog进行合并,删除历史版本数据释放磁盘空间

阅读全文 »

739. Daily Temperatures

739. Daily Temperatures

Difficulty: Medium

Given a list of daily temperatures T, return a list such that, for each day in the input, tells you how many days you would have to wait until a warmer temperature. If there is no future day for which this is possible, put 0 instead.

For example, given the list of temperatures T = [73, 74, 75, 71, 69, 72, 76, 73], your output should be [1, 1, 4, 2, 1, 1, 0, 0].

Note: The length of temperatures will be in the range [1, 30000]. Each temperature will be an integer in the range [30, 100].

1.Stack

利用栈来消除多余的降序数,例如:1, 3, 5, 7, 4, 3, 8中的4, 3在7之后,所以在后续的遍历中完全无用,通过栈可以跳过这部分数据。

阅读全文 »

抽样算法之:水塘抽样(Reservoir Sampling)

Spark的分区器实现包含两种:HashPartitionerRangePartitioner

  • HashPartitioner:根据key % 分区数来进行分区,可能导致数据倾斜问题
  • RangePartitioner:对数据集中的key进行采样,根据采样结果对数据进行分区,由于采样可以反应key的分布情况,所以RangePartitioner可以在一定程度上避免数据倾斜。

为了实现数据集的随机采样,Spark使用了水塘抽样算法(Reservoir Sampling),实现了未知数据量大小场景下的随机抽样(无法一次性加载到内存的大数据、流数据)。

1. 随机采样1条数据

阅读全文 »

438. Find All Anagrams in a String

Given a string s and a non-empty string p, find all the start indices of p's anagrams in s.

Strings consists of lowercase English letters only and the length of both strings s and p will not be larger than 20,100.

The order of output does not matter.

Example 1:

Input:
s: "cbaebabacd" p: "abc"


                
                  
                  

阅读全文 »

494. Target Sum

You are given a list of non-negative integers, a1, a2, ..., an, and a target, S. Now you have 2 symbols + and -. For each integer, you should choose one from + and - as its new symbol.

Find out how many ways to assign symbols to make sum of integers equal to target S.

Example 1:

Input: nums is [1, 1, 1, 1, 1], S is 3. 
Output: 5
Explanation: 


                
                  
                  

阅读全文 »

406. Queue Reconstruction by Height

Suppose you have a random list of people standing in a queue. Each person is described by a pair of integers (h, k), where h is the height of the person and k is the number of people in front of this person who have a height greater than or equal to h. Write an algorithm to reconstruct the queue.

Note:
The number of people is less than 1,100.

Example

Input:
[[7,0], [4,4], [7,1], [5,0], [6,1], [5,2]]


                
                  
                  

阅读全文 »