Graphs

  • Longest path in DAG (Directed Acyclic Graph): Both greedy and dynamic approach:
    1. Find topological sort
    2. Init distance from the source to each node to be minus infinite and zero for the source node.
    3. In the topological order, start from the source node and for each adjacent node update the maximum path from parent to it: max = max(current dist, 1+parent dist)
  • Shortest path in DAG: Same as the longest path, but instead of minus infinite, we init with plus infinite and instead of calculating maximum in step 3, we calculate minimum.
  • Cycle in an undirected graph (No self-edges):https://www.geeksforgeeks.org/union-find/
    1. use UNION-FIND: https://www.geeksforgeeks.org/union-find/
    2. Use DFS: for edge u---v, if v is already visited and there is a back edge from v to one of its ancestors, then there is a cycle.
    Time Complexity: O(V+E)
public boolean isCycleUtil(int vertex, boolean[] visited, boolean[] recursiveArr) {
visited[vertex] = true;
recursiveArr[vertex] = true;
//recursive call to all the adjacent vertices
for (int i = 0; i < adjList[vertex].size(); i++) {
//if not already visited
int adjVertex = adjList[vertex].get(i);
if (!visited[adjVertex] && isCycleUtil(adjVertex, visited, recursiveArr)) {
return true;
} else if (recursiveArr[adjVertex])
return true;
}
//if reached here means cycle has not found in DFS from this vertex
//reset
recursiveArr[vertex] = false;
return false;
}

view raw
isCyclicUtil.cpp
hosted with ❤ by GitHub

Tricks

  • n&(n-1) == 0: check if n is power of 2 or 0
  • numbers between 0 to n, the last digit will be repeated every 10 numbers. the last two digits of each number will be repeated after 10^2 numbers and the last i digits of each number will be repeated every 10^i numbers
  • a group of integers all distinct and in a given range: we can represent these numbers with bits in integers. If we have 42 numbers, we need two integers.
  • Lagrange’s four-square theorem: {\displaystyle p=a_{0}^{2}+a_{1}^{2}+a_{2}^{2}+a_{3}^{2}}
  • 2^n = 1<<n
  • 10000-1 = 1111 : convert the zeros to 1’s
  • ~0 gives a vector of binary 1’s
  • in order to reach the middle of the list, use fast iterator which jump two steps each time:
    while(fast && fast.next){
            fast = fast.next.next
            slow = slow.next
    }
  • Implement an algorithm to find the nth to last element of a singly linked list: The idea is we want to pointers with n steps between them so we walk those two pointers togather untel the leading one reaches the end. by then, since the distance between them is n, the first pointer will point to the n’th to the last. so we need two pointers p1 and p2 pointing to the beginning of the list. walk p1 n steps, then walk p1 and p2 until p1 reaches the end of the list. return p2
  • Bit Manipulation, Add two numbers without arithmatic operation:
    • if you xor two numbers, it is like adding them without the carrys
    • if you and two numbers and shift the result, it will give you the carries.
int add_no_arithm(int a, int b) {
if (b == 0) return a;
int sum = a ^ b; // add without carrying
int carry = (a & b) << 1; // carry, but don’t add
return add_no_arithm(sum, carry); // recurse
}

  • Select m random numbers from an array of size n with equal probability: the trick here is to select a random index between the indices m to n and then swap the selected index with the first index in the array. the next index we randomize will be after the already selected m-sized tail and will be replaced with the second element…etc.

Knowledge-based

  • Write-through – write immidiately
  • write-back – periodic writing
  • Convert decimal fraction to binary: here
  • Numbers base conversion: Link
  • immutable object(unchangeable[1] object) is an object whose state cannot be modified after it is created.
  • concurrency vs parallelism: In programming, concurrency is the composition of independently executing processes, while parallelism is the simultaneous execution of (possibly related) computations. Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once”. assume a server that gets many requests from different users, then we need to deal with concurrency. For a request, we need to execute several services in order to supply a response. this is parallelism.
  • How to debug segmentation fault: core dump file & gdb; debugfs, dmesg
  • Mutual Exclusion:Resources are unsharable, so if a process acquires a resource, other processes have to wait before using it.
  • Thrashing: Thrashing usually refers to phenomena in memory and storage. It occurs when OS spends most of the time on handling page faults(paging in or paging out) rather than getting anything done. For example:
    1. process A has some pages in hard drive and gets a page fault.
    2. The missing page(not in memory) needs to be swapped into memory.
    3. Suppose the memory is full, pages belong to process B need to be swapped out.
    4. Then process B needs to run, so its pages need to be swapped in and the process A’s pages that were just swapped in need to be swapped out
    5. This could happen indefinitely until the issue is resolved or one of the process finishes its job.

    As we can see the performance is terrible, we need to detect the problem first before solve it(or least alleviate it). Kernel usually keep track of the number page-in and page-out, if the number is too high, the physical memory is almost full and CPU utilization is low, the thrashing is happening. There should be a formula to determine thrashing based on number of page-in and page-out in certain period, etc. In userland, there are some tools like vmstat and atop to help us detect thrashing.

    When thrashing is detected, different OSes may deal with it in different ways. We could either kill a process or swap a process completely into secondary storage. Additionally, we could also add more RAM to the machine:)

The ‘CAP’ in the CAP theorem, explained

Consistency

Consistency means that all clients see the same data at the same time, no matter which node they connect to. For this to happen, whenever data is written to one node, it must be instantly forwarded or replicated to all the other nodes in the system before the write is deemed ‘successful.’

Availability

Availability means that any client making a request for data gets a response, even if one or more nodes are down. Another way to state this—all working nodes in the distributed system return a valid response for any request, without exception.

Partition tolerance

partition is a communications break within a distributed system—a lost or temporarily delayed connection between two nodes. Partition tolerance means that the cluster must continue to work despite any number of communication breakdowns between nodes in the system.

Operating Systems

Good Ref: Microsoft Docs

Processes

when linux loads, it starts 2 processes: swapper and then init. each process has pid – integer 32bit. means there can be 32k processes in total. a process can create a child process using fork() or create another totally different process using execv()

execv() will terminate the calling process and start a whole new process

wait(*status) is called by the father in order to wait for its child process to finish.

  • when the shell receives a command, it forks a process which will execute the command and then wait for it to finish, afterward, the shell will move to the next command.

each process has a state, this state can be

  1. TASK_RUNNING: the task is ready to run
  2. TASK_INTERRUPTIBLE: task is waiting for something but that can be stopped by sending the process a signal.
  3. TASK_UNINTERRUPTIBLE: The process can be interrupted only by the destination process it waits for.
  4. TASK_STOPPED
  5. TASK_ZOMBIE: Zombie process. It is a child process that terminated its work but the pid still in the PID table in order for the parent process to read its status using `wait(*status)` .

A process and a thread (any process has at least one thread) has a kernel mode stack and a user mode stack. The kernel mode stack is used by the operating system for scheduling. scheduling intervene function calls in the kernel mode which needs a kernel stack for that. The kernel cannot work with the user space stack because any fault in the user stack will crash the kernel.

A process is created with single thread called the primary thread

Daemon process: A process run in the background without user interaction. The parent process of a daemon is often, but not always, the init process. A daemon is usually created either by a process forking a child process and then immediately exiting, thus causing init to adopt the child process, or by the init process directly launching the daemon

Threads

clone(): When a process is created, it is created with one thread called the primary thread. it can be duplicated using the call clone().

The kernel creates threads using the call sys_clone() and not clone().

threads share the same process environment (address space) while creating a new process using for fork is obligated with creating a new environment.

In a process, each thread has its own stacks (user and kernel) and registers.

threads are supported in the kernel level. basically, each thread has a pid as it is considered somehow a process. however, according to the threads standard, for the user, all the threads have the same pid. in the process descriptor fields, the threads pids are available in a variable named thread_group and a variable named tpid which contains the pid that represents all the threads.

when a thread creates a process using fork(), the process has one parent which is that thread. the other threads cannot wait() on this process.

when running execv() inside a thread, it creates a new process with a new thread – the primary thread of the new process. the rest of the threads are killed.

  • Thread safe in java: declare method/class as synchronized
  • Deadlock: happens because of 4 reasons holds togather:
    • mutual execution: given application resources, only one process can use the resource at a given time.
    • Hold & Wait: processes already holding the resource can request new ones
    • No Preemption: processes cannot remove a process already using the resource
    • starvation:two or more processes are waiting on different resources

IPC – inter process communication

each process has a table named PDT - Process descriptor table.this table contains the descriptors of the files that are used by the devices.

The communication methods are:

  • Pipes/FIFOs: Used to send data between two processes which have father-son-brother connection. has two file descriptors, one for read and the other for write. the pipe is not mapped in the filesystem. FIFO is a pipe, however is global and appears in the filesystem. it also can be used by all the processes.
  • Signals: asynchronous call triggered by the kernel or from another process. each signal has a number. a process can decide what to do with the signal. example of sending signal: cpu will send signal if dividing by zero. kill() will send SIGKILL to process.
  • Sockets: is used cross-machine communication

There is also Shared memory. it resembles queue messaging. read here: https://www.tutorialspoint.com/inter_process_communication/inter_process_communication_shared_memory.htm

Interrupts

The interrupt is a signal handled in the kernel by the CPU. The CPU Checks for interrupts between each two commands it executes. if there is an interrupt waiting, the kernel switches the user stack to the kernel mode stack of the current process in order to handle the interrupt (in linux, interrupts are handled by the kernel).

IDT – Interrupt Descriptor table: a table loaded by BIOS and pointed by a special register named lidt. This table contains 256 entries mapping each possible interrupt to execution method handler.

There are two types of interrupts

  • Hardware interrupts: sent by external Hardware. This type of interrupt is of course asynchronous.
    • the hardware sends IRQn (Interrupt request #n) to the APIC-Advanced Programmable Interrupt controller
    • The APIC check if the IF - Interrupt flag is turned off so it send it to the CPU.
    • The CPU tells the APIC to block this type of interrupt and allow others – in order to ban collisions in the handler of the received interrupt
    • The interrupt handler code IRS receives several IRQs that may be the ones who sent the interrupt – as to one usb port there may be several connected devices.
  • Software Interrupt(Exceptions): This type of interrupt is synchronous as it is generated by the CPU itself. There are two possible reasons for such interrupt: whether an error in the previous execution command such as division by zero, breakpoint or an interrupt generated by the cpu in multi-cpu machines as a way of communication with other cpu’s.
    • Aborts: Dangerous – accessing bad memory
    • Traps: like breakpoint
    • Faults: like division by zero

What is the difference between mutex and semaphore

Both Semaphore and Binary Semaphore does not have ownership of the thread that locked it while mutex does.

This is how lock on mutex(spinlock) is implemented:

function lock(boolean *lock) {
while (test_and_set(lock) == 1);
}

view raw
lock.c
hosted with ❤ by GitHub

/* set the lock to true and send its previous value. lock=true means it is locked.
Notice that the test_and_lock always locks the lock (set lock == true) and returns the previous value before the locking*/
boolean_ref test_and_set(boolean_ref lock) {
boolean initial = lock;
lock = true;
return initial;
}

view raw
test_and_set.c
hosted with ❤ by GitHub

the test-and-set instruction is an instruction used to write 1 (set) to a memory location and return its old value as a single atomic (i.e., non-interruptible) operation. If multiple processes may access the same memory location, and if a process is currently performing a test-and-set, no other process may begin another test-and-set until the first process’s test-and-set is finished.

What’s happened in the Linux System after using ‘ls’ command

strace cmd tells us all activities behind the command.

Trees

  • Any two different tree scans can restore a tree. i.e, Inorder, Pre-order can restore a tree

Algorithms

  • Dijkstra’s algorithm is an algorithm for finding the shortest paths between nodes in a weighted graph,
  • Kruskal’s algorithm a greedy algorithm that finds  a minimum-spanning-tree in the graph. It uses union-find to do so.
    • 1. Create a forest F: a set of nodes as each single node is considered a tree.
    • 2. Create a set of edges S
    • 3. While S is not empty
      • 3.1 take out an edge e from S with the minimum weight
      • 3.2. Find: If e connects two separated groups then add it and UNION the groups.
  • Bellman–Ford algorithm is an algorithm that computes shortest paths from a single source vertex to all of the other vertices in a weighted digraph. It is slower than Dijkstra’s algorithm for the same problem, but more versatile, as it is capable of handling graphs in which some of the edge weights are negative numbers.
  • Floyd-Warshal – A dynamic programming algorithm used to find the shortest path weight between any two nodes in the graph. The algorithm can handle negative weights but cannot handle cycles with negative weight.

Java’s Interview tools

  • Use protected fields rarely: protected fields are accessible from subclasses of the class where it is declared, and from any class in the declared package.
  • four fundamental OOP concepts: encapsulation, inheritance, polymorphism, and abstraction
  • With composition (also called the decorator pattern), you give your new class a private field that references an instance of the existing class. Instance methods in the new class invoke a corresponding method in the existing class, i.e. method forwarding.
  • If you use PriorityQueue in java, the generic class must implement the interface
    Comparable<Tuple>
    which overrides the method:
public int compareTo (Tuple that)

 

  • Count bits:
    Integer.bitCount(5) = 2

 

Data Structures

  • In Java, Heap is PriorityQueue
  • building a heap is O(nlogn) but it can be done with O(n)
  • Divide and Conquer paradigm: In divide and conquer approach, the problem in hand, is divided into smaller sub-problems and then each problem is solved independently. When we keep on dividing the subproblems into even smaller sub-problems, we may eventually reach a stage where no more division is possible. Those “atomic” smallest possible sub-problem (fractions) are solved. The solution of all sub-problems is finally merged in order to obtain the solution of an original problem.
  • Binary Heap: is a almost-complete binary tree where all levels are filled. it can be implemented as an array (any complete/almost-complete tree). navigate to parent/left/right as follows:
    Arr[i/2] Returns the parent node
    Arr[(2*i)+1] Returns the left child node
    Arr[(2*i)+2] Returns the right child node

In order to create a heap, we need to do a post-order on the tree nodes and sift them down when necessary.

OOP

Coupling refers to the knowledge or information or dependency of another class. It arises when classes are aware of each other. If a class has the details information of another class, there is strong coupling. In Java, we use private, protected, and public modifiers to display the visibility level of a class, method, and field. You can use interfaces for the weaker coupling because there is no concrete implementation.

  • Cohesion

Cohesion refers to the level of a component which performs a single well-defined task. A single well-defined task is done by a highly cohesive method. The weakly cohesive method will split the task into separate parts. The java.io package is a highly cohesive package because it has I/O related classes and interface. However, the java.util package is a weakly cohesive package because it has unrelated classes and interfaces.

  • Association

Association represents the relationship between the objects. Here, one object can be associated with one object or many objects. There can be four types of association between the objects:

  • One to One
  • One to Many
  • Many to One, and
  • Many to Many

Let’s understand the relationship with real-time examples. For example, One country can have one prime minister (one to one), and a prime minister can have many ministers (one to many). Also, many MP’s can have one prime minister (many to one), and many ministers can have many departments (many to many).

Association can be undirectional or bidirectional.

  • Aggregation

Aggregation is a way to achieve Association. Aggregation represents the relationship where one object contains other objects as a part of its state. It represents the weak relationship between objects. It is also termed as a has-a relationship in Java. Like, inheritance represents the is-a relationship. It is another way to reuse objects.

  • Composition

The composition is also a way to achieve Association. The composition represents the relationship where one object contains other objects as a part of its state. There is a strong relationship between the containing object and the dependent object. It is the state where containing objects do not have an independent existence. If you delete the parent object, all the child objects will be deleted automatically.

Abstract class vs Interface

  1. Type of methods: Interface can have only abstract methods. Abstract class can have abstract and non-abstract methods. From Java 8, it can have default and static methods also.
  2. Final Variables: Variables declared in a Java interface are by default final. An abstract class may contain non-final variables.
  3. Type of variables: Abstract class can have final, non-final, static and non-static variables. Interface has only static and final variables.
  4. Implementation: Abstract class can provide the implementation of interface. Interface can’t provide the implementation of abstract class.
  5. Inheritance vs Abstraction: A Java interface can be implemented using keyword “implements” and abstract class can be extended using keyword “extends”.
  6. Multiple implementation: An interface can extend another Java interface only, an abstract class can extend another Java class and implement multiple Java interfaces.
  7. Accessibility of Data Members: Members of a Java interface are public by default. A Java abstract class can have class members like private, protected, etc.

 

Sort Problems

  • Bucket Sort usage example – Top K Frequent Elements, here
  •  Read more about sort algorithms: https://www.topcoder.com/community/data-science/data-science-tutorials/sorting/
  • Count Sort/ BucketSort: say range of numbers is O(n), then you can allocate arr of size O(n) where the index is the number and the value is the count. a second array of size O(n) where the index is the count value and the value is the numbers that have that count. Can be used to find the top K frequent elements

Problems Worth review

Longest Increasing sequence:

Two approaches for this problem, the first is strategic – e.g, using dynamic-programming while the last is a trick.

Let’s say I am standing on element i in the array arr. if i know for each j<i the LIS, then the LIS for arr[i] would be arr[j]+1 under the constraint arr[j]<arr[i] for each j<i. if we have several j’s which hold the constraint, then we select the one which maximizes the sum. This is a dynamic programming solution. by adopting a bottom’s up approach, we can initialize an array named dp with size |arr| where dp[i] represents the LIS of the element arr[i]. in the beginning, each cell of dp will be initialized with 1 as each number is LIS of itself. starting with i=1, we scan all the previous cells of dp to find the that maximizes dp[i]: for each j<i and arr[j]<arr[i], let dp[i]=Max(dp[i],1+dp[j])

The time complexity her is O(n^2).

LIS can be solved with O(nlogn) by using binary search on a new constructed array named dp which will reflect the length of the LIS (and not the LIS). we try to build an LIS by trying to add each elements in arr to dp in the appropriate position. lets say if arr[i] cam be entered at index j at array dp then it means that arr[i]<dp[j] and we will replace dp[j] since adding a smaller element will keep the increasing LIS and will enable adding more elements bigger than dp[j] by lowering the peak. i.e, the sequence 1,2,4 will have a better potential to accept elements from arr compared to the sequence 1,2,8.

public int lengthOfLIS(int[] nums) {            
        int[] dp = new int[nums.length];
        int len = 0;
        for(int x : nums) {
            int i = Arrays.binarySearch(dp, 0, len, x);
            if(i < 0) i = -(i + 1); // we found a smaller element than the i'th element
            dp[i] = x; // replace the i'th element with a smaller one which will enable fitting more elements
            if(i == len) len++; // the element can be inserted at the end of dp, means LIS length is increased.
        }
        return len;
}

sort linked-list

with O(logn) and O(1) space: merge-sort bottom-up as referenced here

Kth Smallest Element in a Sorted Matrix:

link, Given a n x n matrix where each of the rows and columns are sorted in ascending order, find the kth smallest element in the matrix.Note that it is the kth smallest element in the sorted order, not the kth distinct element.Example:

matrix = [
   [ 1,  5,  9],
   [10, 11, 13],
   [12, 13, 15]
],
k = 8,

return 13.

Note:
You may assume k is always valid, 1 ≤ k ≤ n2.
Solution: Use a min heap: place the first row and take k-1 times the min element. when an element is out, put in the min heap the next element in the same column

Given sorted matrix NxN, you can find a number with time complexity O(logN): divide the matrix to 4 sections, choose the section where the number we are looking for is in the range…and do the same division to this section and so on..

Coins change Problem:

Given infinite numbers of coins, given the denominations, how many ways there is for change?

aaa
S is the group of coins and m is the length of S. x is the sum that we want the change for

Using dynamic programming: The variables are the number of coins and the value we are trying to calculate. so change[i][j] equals the number of ways to accomplish amount j using first i coins. we are looking for change[|S|][x].
change[i][j] = change[i-1][j]+change[i][j-coins[i]]
Means, that the way to accomplish change[i][j] is by using the i’th coin and by not using the i’th coin.
We can make it more space efficient by using 1-dimention array:
change[i] equals they number of ways to accomplish value i.
for each coin, change[i] =change[i] + change[i-coin]

Merge two lists: link here, Learn how to write a short solution.

Symmetric tree: the iterative solution got a nice twict. link here.

Palindrome list: link here, trick how to reach the middle of the list and convert it

Shortest Unsorted Continuous Subarray: link here,Given an integer array, you need to find one continuous subarray that if you only sort this subarray in ascending order, then the whole array will be sorted in ascending order, too.You need to find the shortest such subarray and output its length.

public int findUnsortedSubarray(int[] A) {
int n = A.length, beg = 1, end = 2, min = A[n1], max = A[0];
for (int i=1;i<n;i++) {
max = Math.max(max, A[i]);
min = Math.min(min, A[n1i]);
if (A[i] < max) end = i;
if (A[n1i] > min) beg = n1i;
}
return end beg + 1;
}

 

Queue Reconstruction by Height: link here

Find the Duplicate Number: here, the trick is two pointers, one fast and one slow. if they meet, then they meet in the loop. we keep the slow pointer rotating in the loop while we start a new pointer from the beginning. they will meet again in the beginning of the loop

Unique paths: here, Dynamic programming. what is the number of the paths that a robot who can go right or down tries to start from the up left corner to the bottom right. the solution is to use a matrix and do the bottom’s up approach: : suppose the number of paths to arrive at a point (i, j) is denoted as P[i][j], it is easily concluded that P[i][j] = P[i - 1][j] + P[i][j - 1].

Longest Substring Without Repeating Characters: Given a string, find the length of the longest substring without repeating characters.

public int lengthOfLongestSubstring(String s) {
HashMap<Character,Integer> set = new HashMap<>();
int max = 0;int j=0;
for(int i=0;i<s.length();i++){
char c = s.charAt(i);
if(set.containsKey(c)){
// as we dont delete history, we dont
// want to jump back to index in a substring that is dead
j = Math.max(j, set.get(c)+1);
}
set.put(c,i);
max = Math.max(ij+1, max);
}
return max;
}

Coin Change: You are given coins of different denominations and a total amount of money amount. Write a function to compute the fewest number of coins that you need to make up that amount. If that amount of money cannot be made up by any combination of the coins, return -1.

/* Dynamic programming solution:
1. Let dp[i] where i is between [0..amount+1] represents the minimum amount of coins needed for i.
2. dynamic programming solution gives:
for each coin in coins and foreach amount i: dp[i] = min(dp[i], dp[i – coins[j]] + 1);
*/
int coinChange(vector<int>& coins, int amount) {
int Max = amount + 1;
vector<int> dp(amount + 1, Max);
dp[0] = 0;
for (int i = 1; i <= amount; i++) {
for (int j = 0; j < coins.size(); j++) {
if (coins[j] <= i) {
dp[i] = min(dp[i], dp[i – coins[j]] + 1);
}
}
}
return dp[amount] > amount ? –1 : dp[amount];
}

view raw
coinChange.cpp
hosted with ❤ by GitHub

Find loop in a list: This question is solved using the fast and the slow pointer.
The slow pointer enters the cycle after k steps. The fast pointer by then walked 2k steps. Therefore, assuming the cycle length is m, then the distance of the slow pointer from the fast pointer is m-k. in order to kill that distance, on each step, the fast pointer will decrease the distance by 1 step as it goes twice faster. Therefore, after m-k steps that the slow pointer do (the fast went 2(m-k)), they will meet.
However, After those m-k steps, the slow pointer will be k steps before the beginning of the cycle as its length is m.
We don’t know how much is k. However, we can do a trick in order to find it:
We send the slow pointer to the beginning and we slow down the fast pointer to go one step at a time. Now we let them run again with one step at a time. After the previously slow pointer moves k step, it will meet the previously fast pointer at the beginning of the cycle as they meet k steps before the beginning.Untitled

 

Various:

 

Amazon – Annapurna:

Givan a Tx Queue with length 16bytes. when the queue sends the 5th byte(moving from capacity 5 to 4) an interrupt is sent. your API providessend_tx(buffer,size). implement a drive api with an interrupt handler. The stack API that the drive can use is push_tx(byte). you cannot overflow the queue as the result will not be expected.
Notice: You are allowed to keep buffers until certain capacity is reached.
Solution:

References
sort algorithms: https://www.topcoder.com/community/data-science/data-science-tutorials/sorting/
UML: https://www.visual-paradigm.com/guide/uml-unified-modeling-language/uml-aggregation-vs-composition/